genetics: published articles ahead of print, published on ...90 (wang et al. 2004). expression can...
TRANSCRIPT
1
Title: Ancestry influences the fate of duplicated genes millions of years after polyploidization of 1
clawed frogs (Xenopus). 2
Author: Ben J. Evans 3
Phone 905.525.9140 Ext. 26973 4
Fax: 905.522.6066 5
Email: [email protected] 6
Author address: 7
Center for Environmental Genomics 8
Department of Biology, McMaster University, 9
Life Sciences Building Room 328 10
1280 Main Street West, Hamilton, ON, Canada L8S 4K1 11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
Sequence data from this article have been deposited with the EMBL/GenBank Data Libraries under 30
accession nos. AY874308, AY874311, AY874309, AY874310, AY874324, AY874346, AY874352, 31
EF535883-EF535988. 32
Genetics: Published Articles Ahead of Print, published on April 15, 2007 as 10.1534/genetics.106.069690
2
33
Running head: 34
Heterodimers in polyploids 35
Key Words: protein-protein interaction, heterosis, pseudogene 36
Corresponding Author: 37
Ben J. Evans 38
Phone 905.525.9140 Ext. 26973 39
Fax: 905.522.6066 40
Email: [email protected] 41
Corresponding Author Address: 42
Center for Environmental Genomics 43
Department of Biology, McMaster University, 44
Life Sciences Building Room 328 45
1280 Main Street West, Hamilton, ON, Canada L8S 4K1 46
47
48
3
ABSTRACT 48
49
Allopolyploid species form through the fusion of two differentiated genomes and, in the 50
earliest stages of their evolution, essentially all genes in the nucleus are duplicated. Because unique 51
mutations occur in each ancestor prior to allopolyploidization, duplicate genes in these species 52
potentially are not interchangeable, and this could influence their genetic fates. This study explores 53
evolution and expression of a simple duplicated complex – a heterodimer between RAG1 and RAG2 54
in clawed frogs (Xenopus). Results demonstrate that copies of RAG1 degenerated in different 55
polyploid species in a phylogenetically-biased fashion, predominately in one lineage of closely-related 56
paralogs. Surprisingly, as a result of an early deletion of one RAG2 paralog, it appears that in many 57
species RAG1/RAG2 heterodimers are composed of proteins that were encoded by unlinked paralogs. 58
If the tetraploid ancestor of extant species of Xenopus arose through allopolyploidization and if 59
recombination between paralogs was rare, then the genes that encode functional RAG1 and RAG2 60
proteins in many polyploid species were each ultimately inherited from different diploid progenitors. 61
This observation is consistent with the notion that ancestry can influence the fate of duplicate genes 62
millions of years after duplication, and it uncovers a dimension of natural selection in allopolyploid 63
genomes that is distinct from other genetic phenomena associated with polyploidization or segmental 64
duplication. 65
66
67
4
INTRODUCTION 67
68
In allopolyploid species, the complete genome of two species are fused, genes are duplicated, 69
and the level of genetic redundancy that results is determined by divergence of protein coding and 70
regulatory regions of genes in each ancestor, and by epigenetic phenomena after allopolyploidization. 71
In these genomes, natural selection and stochastic processes govern the genetic fate of each paralog – 72
fates that include redundancy, subfunctionalization, neofunctionalization, and gene silencing (FORCE et 73
al. 1999; GU 2003; HUGHES and HUGHES 1993; LYNCH and FORCE 2000; LYNCH et al. 2001; OHNO 74
1970). After allopolyploidization, interactions between the subgenomes derived from each ancestral 75
species include exchange of chromosome segments (MOSCONE et al. 1996; SKALICKÁ et al. 2005), 76
concerted evolution (VOLKOV et al. 1999; WENDEL et al. 1995), recombination (ZWIERZYKOWSKI et 77
al. 1998), and epistasis (JIANG et al. 2000). Restructuring can alter the stoichiometry of protein 78
interactions by modifying regulatory elements or lowering gene copy number. Presumably these 79
events trigger or permit compensatory responses including changed regulation, degeneration, or 80
deletion of superfluous upstream or downstream paralogs. Modulation of the transcriptome can occur 81
on an extraordinarily fine scale: a silenced paralog from one ancestor can be tightly linked to loci in 82
which only the paralog from the other ancestor is silenced, or linked to loci in which paralogs from 83
both ancestors are expressed (LEE and CHEN 2001). Genomic changes such as subfunctionalization, 84
gene deletion, and gene silencing can occur within generations after allopolyploidization, and these 85
changes can be non-stochastic and repeatable, and biased by ancestry (ADAMS et al. 2003; ADAMS et 86
al. 2004; ADAMS and WENDEL 2005; OZKAN et al. 2001; SHAKED et al. 2001; VOLKOV et al. 1999; 87
WANG et al. 2004). Allopolyploidization can also lead to completely novel expression patterns that 88
were not present in either parental species, perhaps as a result of dosage dependent gene regulation 89
(WANG et al. 2004). Expression can be influenced by the direction of the hybrid cross (SOLTIS et al. 90
2004), and co-adapted proteins that were inherited from one ancestor may function more efficiently 91
with one another than with proteins that were derived from a different ancestor (COMAI 2000). 92
Epigenetic phenomena, such as nucleolar dominance, gene silencing, alteration of cytosine 93
methylation patterns, and activation of mobile elements, can also foment genetic change (LIU and 94
WENDEL 2002; LIU and WENDEL 2003). However, variation exists among allopolyploid species in the 95
extent of genomic rearrangement – genomic modulation is less common, for example, in the early 96
stages of evolution in synthetic cotton allopolyploids (LIU et al. 2001) than in other allopolyploids 97
such as wheat and arabidopsis (LEVY and FELDMAN 2004; LUKENS et al. 2004; MADLUNG et al. 2002). 98
5
99
Allopolyploid clawed frogs and the RAG1/RAG2 heterodimer. African clawed frogs (genera 100
Xenopus and Silurana) offer a promising model with which to examine the impact of ancestry on gene 101
fate in a allopolyploid genome because multiple independent instances of allopolyploidization 102
occurred (EVANS et al. 2005; EVANS et al. 2004b), and the long term effects of this type of genome 103
duplication can be explored with replication. All extant species of clawed frogs in the genus Xenopus 104
(i.e. those species with multiples of 2x = 18 chromosomes) share a common tetraploid ancestor 105
(EVANS et al. 2005; EVANS et al. 2004b). A definitive test of allopolyploid versus autopolyploid origin 106
of this ancestor is not currently possible because no extant Xenopus diploids (2x = 18) are known. 107
However, this ancestor is suspected to have been an allotetraploid because: (a) other polyploid clawed 108
frogs are definitively allopolyploids (EVANS et al. 2005), (b) Xenopus genomes are diploidized and 109
duplicated pairs of chromosomes have visible differences in secondary constrictions (TYMOWSKA 110
1991), (c) allopolyploid individuals can be created in the laboratory by crossing extant species (KOBEL 111
1996), and (d) multiple unlinked loci indicate that the phylogenetic signal of many paralogs is not 112
blended by recombination (CHAIN and EVANS 2006; CHAIN et al. 2007; EVANS et al. 2005). Silurana 113
and Xenopus diversified from one another roughly 53–64 million years ago and tetraploidization in 114
Xenopus occurred roughly 21–41 million years ago (CHAIN and EVANS 2006; EVANS et al. 2004b). 115
RAG1 and RAG2 proteins form a heterodimer that is crucial for the process of somatic 116
rearrangement of DNA known as V(D)J recombination, making possible the extraordinary molecular 117
variation of B and T-cell antigen receptors that is needed to combat pathogen attack. The core region 118
of RAG1 which, when paired with RAG2 is sufficient to carry out V(D)J recombination, spans human 119
residues 386–1011 out of a total of 1040 amino acids (SADOFSKY et al. 1993). The core region of 120
RAG1 was derived from the Transib transposon superfamily, whereas RAG2 and the N-terminal 121
domain of RAG1 probably appeared through other processes (KAPITONOV and JURKA 2005). These 122
genes are tightly linked in jawed vertebrates; a recent build (version 4.1) of the complete genome 123
sequence of the diploid clawed frog S. tropicalis indicates that there is only one copy of RAG1 and 124
one of RAG2, and that each is convergently transcribed and tightly linked by a ~6.5 kB intergenic 125
region. This is consistent with findings in X. laevis that suggest that this tetraploid was derived from 126
the fusion of two diploid genomes, and that each of these diploid genomes carried only one copy of 127
RAG1 (EVANS et al. 2005; GREENHALGH et al. 1993). Thus, in the absence of paralog deletion and 128
degeneration, a null expectation is that the number of RAG1 and RAG2 paralogs corresponds with the 129
ploidy level of a species – diploids should have one copy of each gene, tetraploids should have two, 130
6
octoploids should have four, and dodecaploids should have six (Fig. 1). Each of these paralogs would 131
have two alleles. 132
Contrary to this null expectation, a previous study reported that in different polyploid species 133
of Xenopus, many RAG1 paralogs had become pseudogenes due to stop codons and frameshift 134
insertion/deletions (EVANS et al. 2005). These species of Xenopus were all ultimately derived from a 135
tetraploid ancestor that probably formed through the amalgamation of two diploid genomes by 136
allopolyploidization; this tetraploid ancestor therefore initially had two paralogs of RAG1 and two 137
paralogs of RAG2. If recombination between paralogs was rare or absent after allotetraploidization, 138
then all of the degenerate paralogs detected by Evans et al. (2005) were ultimately derived from only 139
one of these diploid progenitors – diploid “β” – as opposed to being derived from both of the diploid 140
ancestors (“α” and “β”). 141
This pattern of degeneration could be explained by (a) a shared ancestral degeneration of a 142
coding or regulatory region of RAG1 paralog β that was not sequenced by Evans et al. (2005), (b) 143
multiple independent and biased degenerations in different species, or (c) some combination of shared 144
and independent degeneration (EVANS et al. 2005). This study aims to further investigate these 145
possibilities and track the evolutionary history of this heterodimer through multiple episodes of 146
speciation and genome duplication. To this end, (a) new sequences were obtained from most or all 147
upstream coding regions of RAG1 paralogs to test for shared coding-region degeneration, (b) 148
expression analyses were performed on multiple species to test for ancestral silencing of paralogs, and 149
(c) sequences were obtained from paralogs of the linked partner gene RAG2. 150
151
METHODS 152
153
DNA sequencing. Sequence data were obtained from all or almost all RAG1 and RAG2 paralogs 154
from all known species of clawed frog, averaging 2341 base pairs (bp) out of about 3135 bp total per 155
RAG1 paralog and 1123 bp out of 1575 bp total per RAG2 paralog (Suppl. Fig. 1; Genbank Accession 156
Numbers: AY874308, AY874311, AY874309, AY874310, AY874324, AY874346, AY874352, 157
EF535883-EF535988). The sequenced portion of each RAG1 paralog varied but generally covers 158
most or all of the core region of RAG1 (Suppl. Fig. 2). Some paralogs were not detected using a 159
variety of primer combinations, and these may have been deleted (Suppl. Fig. 1). Sequencing of 160
individual paralogs was accomplished through a combination of TA cloning (Invitrogen) and targeted 161
amplification using paralog-specific primers (Suppl. Fig. 1). Allelic clones with the least number of 162
7
autapomorphic mutations were selected for analysis. When both alleles of a paralog were directly 163
sequenced with paralog-specific primers, polymorphic positions were analyzed using IUPAC 164
degenerate nucleotide symbols. Some sequence data from expressed paralogs were obtained from 165
non-vouchered individuals; these data were conatenated with sequences from the corresponding 166
paralog from other individuals that usually were vouchered, as detailed in (EVANS et al. 2005; EVANS 167
et al. 2004b). Following (EVANS et al. 2005), paralogs of RAG1 and RAG2 that are closely related to 168
the linked X. laevis paralogs identified by Greenhalgh et al (1993) are referred to as β paralogs and the 169
others are referred to as α paralogs. In Silurana, tetraploid paralogs closely related to the diploid S. 170
tropicalis are designated α paralogs, and the others are β paralogs. 171
Attempts to amplify and clone the α paralogs of RAG2 failed in all species of Xenopus using a 172
variety of primer combinations (Suppl. Info. 1), even though both paralogs were detected in the 173
Silurana tetraploids. To rigorously test whether RAG2 α paralogs were deleted in clawed frogs as 174
opposed to just not being amplified, a systematic effort was made to co-amplify both paralogs using 175
seven pairwise combinations of three forward and three reverse primers (Suppl. Fig. 1). S. tropicalis 176
was used as a positive control to demonstrate that in addition to Xenopus RAG2 paralog β, these 177
primers can successfully amplify RAG2 in a more distantly related lineage than Xenopus RAG2 178
paralog α. 179
180
Amplification of expressed RAG1 paralogs. To determine which RAG1 paralogs are expressed in 181
various species, cDNA was amplified across an intron in the 5’ untranslated region of the RAG1 182
transcript (GREENHALGH et al. 1993). Negative controls with no DNA and with genomic DNA were 183
performed to insure that only expressed and spliced paralogs were amplified; previously published and 184
new primers were used (Suppl. Fig. 1; GREENHALGH et al. 1993). RNA was extracted using the 185
RNeasy mini kit (Qiagen) and converted to cDNA using the Omniscript RT kit (Qiagen). Individual 186
expressed paralogs were amplified from cDNA generated from different tissues (brain, liver, spleen, 187
testis, and/or bone marrow) from a variety of species (X. laevis, X. gilli, X. borealis, X. muelleri, X. 188
amieti, X. andrei, X. new octoploid, and X. boumbaensis), and then cloned and sequenced. 189
Additionally, cDNA from some tissues was directly sequenced and chromatograms were inspected for 190
paralog-specific single nucleotide polymorphisms. 191
192
Phylogenetic Analyses. Phylogenetic analysis was performed on coding portions of RAG1 and 193
RAG2 in three types of data configurations: (1) each locus was analyzed independently, (2) putatively 194
8
linked paralogs were combined into single taxonomic units, and (3) for Xenopus only, a “synthetic” 195
dataset was constructed in which α and β paralogs of RAG1 and RAG2 that were derived from the 196
same most recent tetraploid ancestor were combined into a single taxonomic unit. This third 197
configuration exploits the redundant phylogenetic information available in co-inherited paralogs. For 198
example, in the third data configuration, tetraploid α and β paralogs were combined, octoploid α1 and 199
β1 paralogs were combined, but octoploid α1 and β2 paralogs were not combined. For the third 200
analysis, S. tropicalis was used as an outgroup to RAG1 α and RAG2 paralogs and no outgroup 201
sequence was used for the portion of the sequences that was comprised of RAG1 β paralogs. 202
MrBayes version 3.1.2 was used for Bayesian phylogenetic analysis, and Bayes factors were 203
used to select a model of evolution as described by Nylander et al. (2004). Seven partitioned models 204
were explored (Suppl. Table 1). These models were compared based on the harmonic mean of the 205
posterior probability of trees sampled after a conservative burnin of one million generations from two 206
independent MCMC runs, each of 2 million generations. A highly parameterized model of evolution 207
was favored for phylogenetic analyses of RAG1 and RAG2 (Suppl. Table 1), and this model was also 208
used for the combined analyses, which used for 5 million generations and the same burnin. Branch 209
support was also evaluated with 2,000 nonparametric bootstrapping replicates, each with a single 210
replication of random taxon addition, a limit of ten million rearrangements per replicate, and the 211
maximum parsimony criterion using PAUP version 4.0b10 (SWOFFORD 2002). Almost all of the well-212
supported relationships from the Bayesian analyses also have nonparametric bootstrap values of over 213
80% (Suppl. Fig. 3) 214
To test hypotheses of fewer gene silencing events in RAG1 than was suggested by 215
phylogenetic analysis, parametric bootstrap tests (GOLDMAN et al. 2000; HUELSENBECK et al. 1996) 216
were performed as in (EVANS et al. 2005). This procedure tests the fit of the data to alternative 217
phylogenetic hypotheses that are different from the consensus tree that was obtained from the Bayesian 218
analysis, and that would be consistent with fewer instances of independent gene degeneration (Suppl. 219
Fig. 4). To maximize the phylogenetic signal of these tests, simulations were performed according to 220
the “synthetic” data configuration using Seq-Gen version 1.3.2 (RAMBAUT and GRASSLY 1997). Perl 221
scripts were written to modify the patchiness the simulations to match the observed data in terms of the 222
quantity and positions of missing data for each taxon. 223
A caveat to the interpretation that autapomorphic degenerations occurred independently is that 224
recurrent substitutions could have erased an ancestral stop codon in one or both descendant paralogs. 225
To explore this possibility, marginal ancestral reconstruction of ancestral character states was 226
9
performed with a general time reversible nucleotide model and a gamma distributed rate heterogeneity 227
parameter using the baseml program of PAML version 3.14 (YANG 1997). Reconstructed sequences 228
were then translated into protein using MacClade version 4.08 (MADDISON and MADDISON 2000) and 229
inspected for stop codons. 230
231
Testing for phylogenetic bias in RAG1 degeneration. To explore whether there is significant bias in 232
gene degeneration of RAG1 with respect to ancestry of each paralog, two approaches were taken. The 233
first approach used a maximum likelihood framework to compare rates of degeneration (δ), using 234
Discrete version 4.0 (PAGEL 1994). This framework tested whether rates of degeneration in RAG1 and 235
RAG2 were significantly different, and also whether rates of degeneration in the RAG1 α and β 236
lineages were significantly different. The rate of resuscitation of degenerate paralogs was set to a 237
negligible value, missing paralogs were coded as degenerate, and the most recent ancestor of expressed 238
paralogs with autapomorphic degenerations was set as nondegenerate. Likelihood ratio tests were used 239
to compare models with two degeneration parameters to models with only one, and these tests were 240
performed with and without a gamma-distributed approximation for rate heterogeneity (γ). To be 241
conservative, modified topologies were used in which the independent degenerations (numbered 1-12 242
in Fig. 2) were each forced to each be a clade. Branch lengths were estimated under the GTR+I+Γ 243
model and imposing a molecular clock using PAUP* (SWOFFORD 2002). 244
A second approach used simulations to estimate the probability that the observed number of 245
“chimerical” heterodimers – those composed of RAG1 and RAG2 proteins from different paralogous 246
lineages – occurred by chance if there were no bias to RAG1 degeneration. It was assumed that the 247
number of observed degenerate paralogs in each species follows a binomial distribution with a species-248
specific probability density for the mean of this distribution that was calculated from the observed 249
data. It was also assumed that at least one RAG1 paralog must remain functional in each species, that 250
no simulated degeneration could occur in species with no observed degeneration, that each non-251
degenerate paralog is expressed at the same intensity, time, and place, and that RAG1/RAG2 252
heterodimers form from paralogs from the same ancestral lineage (α or β) whenever possible. For 253
eight extant or ancestral species with observed independent RAG1 degeneration (see Results), one 254
hundred thousand simulations drew k degeneration events from n-1 paralogs, where n is the total 255
number of non-degenerate paralogs inferred to be present when that species originated. Degeneration 256
was modeled as a delayed transformation phenomenon wherein paralogs degenerated after 257
allopolyploidization and they were also performed in a phylogenetically-independent manner such that 258
10
ancestral degenerations were inherited and not “re-simulated” in descendant species. Chimerical 259
heterodimers were quantified for each simulation based on observed and suspected degenerations and 260
deletions of RAG2 (i.e. degenerations 10, 11, and 12 in Fig. 2). 261
262
Recombination between alleles of different paralogs. Recombination between alleles of different 263
paralogs could blend their phylogenetic signal, and recombination between the intergenic regions of 264
paralogous chromosomes could alter the paralogs synteny (GREENHALGH et al. 1993). To test for 265
evidence of recombination in Xenopus and Silurana sequences, multiple tests were used because their 266
performance varies with the level of divergence, the extent of recombination, and among site rate 267
heterogeneity (POSADA 2002; POSADA and CRANDALL 2001). These tests included the recombination 268
detection program, geneconv, chimera, bootscan, and siscan, as implemented by the Recombination 269
Detection Program (MARTIN and RYBICKI 2000). Details of these methods can be found elsewhere 270
(GIBBS et al. 2000; MARTIN and RYBICKI 2000; MAYNARD SMITH 1992; PADIDAM et al. 1999; POSADA 271
and CRANDALL 2001; SALMINEN et al. 1995). A variety of parameter settings were explored for each 272
method as in (EVANS et al. 2005). 273
274
RESULTS 275
276
New data lead to a re-assessment of the evolutionary history of X. boumbaensis. Evolutionary 277
relationships inferred from RAG1 (this study; EVANS et al. 2005), mitochondrial DNA (EVANS et al. 278
2004b), and RAG2 (Fig. 2, Suppl. Fig. 3) can be synthesized into a reticulate phylogeny (Fig. 3). New 279
data identified an exception to expected relationships among paralogs in an individual herein referred 280
to as X. cf. boumbaensis. Additional sequencing from X. cf. boumbaensis identified a third RAG1 α 281
paralog and two RAG2 paralogs whose relationships are suggestive of dodecaploidy (Fig 1). This 282
suggests that this individual from Younde, Cameroon, was previously incorrectly classified as X. 283
boumbaensis (EVANS et al. 2004a; EVANS et al. 2005), which is an octoploid species (LOUMONT 1983; 284
TYMOWSKA 1991). It appears that X. cf. boumbaensis is a dodecaploid derived from 285
allopolyploidization between an octoploid ancestor of X. boumbaensis and a tetraploid ancestor of X. 286
cf. fraseri 2 (Fig. 3). Data from an X. boumbaensis individual from the type locality of Moloundou, 287
Cameroon, that were not included in EVANS et al. (2005), are consistent with octoploidy (Fig. 2, 3) and 288
lead to a re-evaluation of the evolutionary history of this species. Moreover it appears that X. 289
boumbaensis, X. ameiti, and X. andrei share a common octoploid ancestor, as opposed to each 290
11
originating independently as was previously proposed (EVANS et al. 2005). Thus these genealogies 291
support three rather than five independent allopolyploid origins of most extant octoploids: (1) X. 292
vestitus, (2) X. wittei + X. new octoploid, and (3) X. amieti + X. andrei + X. boumbaensis, but three 293
rather than two independent origins of dodecaploids: (1) X. ruwenzoriensis, (2) X. longipes, and (3) X. 294
cf. boumbaensis (Fig. 3). This new information also changes the number of ancestral species that are 295
predicted but for whom an extant descendant with the same ploidy level is unknown from three 296
diploids, three tetraploids, and one octoploid (EVANS et al. 2005) to three diploids and three tetraploids 297
(Fig. 3). Mitochondrial DNA sequences from X. cf. boumbaensis are almost identical to a X. 298
boumbaensis sample from the type locality (EVANS et al. 2004b), suggesting recent dodecaploidization 299
of this individual. 300
301
Deletion of RAG2 and RAG1 paralogs. Both paralogs of RAG2 were amplified, cloned, and 302
sequenced from genomic DNA in Silurana tetraploids but only one divergent lineage of RAG2 was 303
amplified in Xenopus. Systematic attempts to amplify additional RAG2 paralogs in Xenopus with 304
seven combinations of other primers (Suppl. Fig. 1) were unsuccessful, even though these primers 305
successfully amplified a more divergent RAG2 ortholog in the diploid S. tropicalis. Southern blotting 306
of X. laevis genomic DNA also detected only one RAG2 paralog but two RAG1 paralogs 307
(GREENHALGH et al. 1993). This suggests that one RAG2 paralog was deleted in an early tetraploid 308
ancestor (deletion 10 in Fig. 2), as opposed to the alternatives that gene conversion occurred or that 309
another paralog is present but undetected. All Xenopus species are therefore suspected to have half of 310
the number of RAG2 paralogs than RAG1 paralogs. 311
Another suspected deletion of RAG2 occurred in an ancestral paralog from which paralog β2 312
of X. amieti, X. ruwenzoriensis, X. boumbaensis, and X. longipes, and RAG2 paralog β3 of X. cf. 313
boumbaensis would have descended (deletion 12 in Fig. 2). These predicted paralogs were not 314
detected in amplifications from genomic DNA, even after multiple attempts with different primer 315
combinations (Suppl. Fig. 1). Linked RAG1 paralogs from these species also were not detected 316
(deletion 12 in Fig. 2), suggesting that the deleted region spans both of them. Apart from these putative 317
deletions, the only other observed gene degeneration in RAG2 was in X. andrei RAG2 paralog β2, 318
which experienced a frameshift deletion. 319
320
Independent degeneration of RAG1 paralogs. To test whether the unique degenerations in the 3’ 321
region of RAG1 (EVANS et al. 2005) could have occurred after a shared ancestral degeneration in the 322
12
5’ coding region, RAG1 sequences were obtained from most of the coding region of most or all RAG1 323
paralogs of all known species of clawed frog (Suppl. Fig. 2). Including previously identified 324
degenerate paralogs (EVANS et al. 2005), a total of seventeen degenerate RAG1 β paralogs and two 325
degenerate RAG1 α paralogs were detected in various Xenopus species (Fig. 2, Suppl. Fig. 2). The 326
only shared stop codons or frameshift mutations that were identified were: (1) one frameshift and one 327
stop codon shared by X. pygmaeus paralog β and X. ruwenzoriensis paralog β3, (2) a stop codon 328
shared by X. vestitus paralog β1 and X. longipes paralog β3, and (3) a stop codon shared by the X. 329
ruwenzoriensis paralog β1 and X. vestitus paralog β2. Maximum likelihood reconstructions indicate 330
that the first example is due to shared ancestry, but that the other two evolved independently. 331
Expression of at least one RAG1 paralog was confirmed in heart, brain, liver, testes, spleen, 332
and bone marrow (Table 1). Xenopus laevis and X. gilli each express both of their RAG1 paralogs and 333
neither is degenerate. Likewise, no degeneration was observed at the DNA level in either RAG1 334
paralog of some other tetraploids including X. clivii, X. largeni, and X. muelleri west (Suppl. Fig. 2). 335
Degeneration of many paralogs appears to have occurred independently because (a) the stop 336
codons and frameshift mutations are not shared with other paralogs and because (b) these degenerate 337
paralogs are either still expressed or closely related to other expressed paralogs. X. andrei RAG1 338
paralog β1, X. boumbaensis RAG1 paralog β1, and X. amieti RAG1 paralog α2, for example, are 339
expressed even though each one is degenerate (degenerations 4, 7, and 9, respectively, in Figs. 2 and 340
4). Overall, under the assumptions of no resuscitation of silenced genes, phylogenetic analyses 341
support a minimum of seven independent episodes of degeneration of the Xenopus RAG1 β paralogs 342
and two independent episodes in the Xenopus RAG1 α paralogs (Figs. 2 and 4). 343
The minimum seven instances of gene silencing RAG1 β paralogs, listed by numbers 344
corresponding with Figs. 2 and 4, include: (1) a heterozygous stop codon in X. borealis RAG1 paralog 345
β – expression of this paralog also was not detected in multiple tissues (Table 1), (2) RAG1 paralog β 346
of a tetraploid ancestor of both of the tetraploid ancestors of X. vestitus, (3) RAG1 paralog β3 of X. 347
longipes or its tetraploid ancestor, (4) X. andrei RAG1 paralog β1, (5) X. wittei RAG1 paralog β1, (6) 348
X. amieti RAG1 paralog β1, and (7) X. boumbaensis RAG1 paralog β1. The two examples of 349
degeneration in the RAG1 α paralogs are (8) X. ruwenzoriensis paralog α2 and (9) X. amieti paralog 350
α2 (Fig. 2). Both instances of degeneration of RAG1 paralog α occurred very recently, and potentially 351
after protein-protein interactions between RAG1 paralog α and RAG2 paralog β were already 352
established by pseudogenization of other RAG1 paralogs. 353
13
Nine additional instances of degeneration of RAG1 β paralogs are possible but their 354
independence depends on whether ancestral gene silencing preceded autapomorphic degeneration of 355
the coding regions of various paralogs (Fig. 2). Expression of X. muelleri RAG1 paralog β, for 356
example, was not detected in multiple tissues (Table 1) and expression of this paralog may have been 357
silenced ancestrally prior to the speciation of X. muelleri and X. borealis and coding region 358
degeneration in X. borealis (degeneration #1 in Fig. 2 and 4). Likewise, a lack of detected expression 359
of X. new octoploid RAG1 paralog β2 could be explained by silenced expression of paralog β in one of 360
the tetraploid ancestors of X. vestitus, X. wittei, and X. new octoploid. If this were the case, then 361
degeneration of RAG1 paralog β1 and β2 in X. vestitus, paralog β2 in X. wittei, and paralog β2 in X. 362
new octoploid should be considered a single event (degeneration 2 in Figs. 2 and 4), even though no 363
shared degeneration of the coding region of these paralogs was observed (Suppl. Info. 2). 364
Parametric bootstrap tests strongly reject hypotheses of fewer independent degenerations in the 365
β lineage of RAG1 (P < 0.001). Of note is that the full sequence of X. boumbaensis RAG1 paralog β1 366
was not obtained (Suppl. Fig. 2) so the possibility of shared degeneration with another closely related 367
paralog cannot be completely dismissed. Also of interest is the observation that in X. new octoploid, 368
the intensity of paralog specific polymorphisms on sequence chromatograms suggests that expression 369
of RAG1 paralog α1 was higher than paralog α2 in the brain, whereas the opposite was observed in 370
amplifications of RAG1 from heart cDNA from this species (Table 1), a pattern of expression that is 371
suggestive of subfunctionalization (FORCE et al. 1999). 372
373
Significant bias to RAG1 degeneration. Comparison of alternative parameterizations of a model of 374
stochastic degeneration indicates that the rate of degeneration of the RAG1 β lineage is significantly 375
higher than the RAG1 α lineage with a gamma rate heterogeneity model (P = 0.0274, δRAG1α= 6.5519, 376
δRAG1β= 50.0343, γ RAG1α = 0.0082, γ RAG1β = 0.0042, df = 2) or without one (P = 0.0054, δRAG1α= 4.8927, 377
δRAG1β= 34.4187, df = 1). Additionally, the overall rate of degeneration of RAG1 is significantly 378
higher than the rate of degeneration of RAG2 when modeled with a gamma rate heterogeneity model 379
(P = 0.0431, δRAG1 = 31.2407, δRAG2 = 3.08511, γ RAG1 = 0.0082, γ RAG2 = 0.0096, df = 2) and without one 380
(P = 0.0323, δRAG1 = 7.8978, δRAG2 = 1.8141, df = 1). 381
Because some species may have inherited RAG1 β paralogs that were already degenerate, even 382
if subsequent degeneration of RAG1 were unbiased, chimerical heterodimers would still be expected 383
to occur. However, under the assumptions discussed above and given the observed pattern of deletion 384
14
in RAG2, simulations indicate that the observed number of chimerical heterodimers that resulted from 385
RAG1 degeneration is significantly in excess of expectations if RAG1 degeneration were unbiased (P 386
< 0.0001, Fig. 5). 387
388
Recombination between RAG1 paralogs and between RAG2 paralogs is rare. Using 389
congruence of multiple tests as a criterion for credibility (POSADA 2002; POSADA and CRANDALL 390
2001), there is not convincing evidence of recombination between paralogs of RAG1, including when 391
new data are included, or between paralogs of RAG2 (this study; EVANS et al. 2005). No two tests 392
recovered significant evidence of recombination for the same region of any paralog of RAG1 or RAG2 393
and most tests did not recover any signal of recombination at all. A few putative recombination events 394
that had a parent sequence within the same species were individually identified by various tests but 395
these were deemed not credible based on a visual inspection of the data and a lack of corroboration by 396
other tests (POSADA 2002; POSADA and CRANDALL 2001). 397
Linkage of RAG1 paralog β and RAG2 paralog β has been demonstrated (GREENHALGH et al. 398
1993) and a similar linkage structure is suggested by a putative deletion that spanned RAG1 and 399
RAG2 paralogs of an ancestor of X. amieti, X. ruwenzoriensis, X. boumbaensis, X. cf. boumbaensis, 400
and X. longipes (deletion 12 in Fig. 2). The possibility that recombination occurred between the 401
intergenic region that separates linked alleles of RAG1 and RAG2 is difficult to conclusively rule out 402
because a diploid Xenopus (2x=18), which could confirm the ancestral synteny of RAG1 and RAG2 403
paralogs, is not available. However, phylogenetic congruence among RAG1, RAG2, and mtDNA, and 404
the internal consistency between the α and β genealogies of RAG1 support the contention that 405
recombination between these paralogs is rare (Fig. 2; EVANS et al. 2005; EVANS et al. 2004b). 406
Recombination among paralogous alleles is expected to be rare if diploidization of these allopolyploid 407
genomes occurred soon after their formation, which is typical of allopolyploid cotton, for example 408
(CRONN et al. 1999). 409
410
411
DISCUSSION 412
413
Evolution of a heterodimer in allopolyploid clawed frogs. The crucial process of V(D)J 414
recombination requires the action of a heterodimer that is made up of the RAG1 and RAG2 proteins, 415
which are encoded by tightly linked genes. The genes whose protein products form this heterodimer 416
15
were duplicated, probably by allo- rather than auto-polyploidization, to generate a tetraploid ancestor 417
of all extant species in the genus Xenopus. Because allotetraploids inherit a complete genome from 418
two different diploid species, this tetraploid ancestor initially had two linked sets of RAG1 and RAG2 419
paralogs. Immediately after allopolyploidization, the RAG1/RAG2 heterodimers probably had 420
essentially identical functions but their dosage could have varied as a result of unique mutations that 421
occurred in each of the diploid ancestors. At this point the paralogous expression domains were also 422
probably similar, making possible protein-protein interactions between paralogs inherited from 423
different diploid ancestors. These linked sets of interacting paralogs were then duplicated again when 424
multiple independent episodes of allopolyploidization generated new octoploid and dodecaploid 425
species (Fig. 3; EVANS et al. 2005). 426
Before the tetraploid Xenopus ancestor diversified to give rise to the extant species, it appears 427
that one RAG2 paralog – paralog α – was deleted, leaving only the other RAG2 paralog – paralog β, 428
but still two functional paralogs of RAG1, α and β (Figs. 2, 4). Later, after speciation of this 429
tetraploid ancestor without change in genome size and also after further episodes of 430
allopolyploidization, paralogs of RAG1 in different species independently degenerated, making their 431
copy number similar or equal to RAG2. Surprisingly, until very recently degeneration of RAG1 432
paralogs occurred exclusively in closely related members of paralog lineage β. As a result, many 433
Xenopus species have functional paralogs of RAG1 that are exclusively α paralogs, and their protein 434
products must heterodimerize with proteins encoded by RAG2 β paralogs (Figs. 2, 4). Moreover, in 435
many species it appears that the functional copy of RAG1 is linked to a region where a paralog of 436
RAG2 was deleted and the functional copy of RAG2 is linked a paralog of RAG1 that either 437
degenerated or that was deleted. While the tetraploid species X. laevis and X. gilli still express 438
functional linked β paralogs of RAG1 and RAG2 and also an α paralog of RAG1, in most other 439
Xenopus species the only functional paralogs of RAG1 and RAG2 are unlinked, suggesting that each 440
one was derived from a different diploid ancestor. In contrast, allotetraploid clawed frogs that evolved 441
independently in the genus Silurana (4x = 40) retain both paralogs of RAG1 and of RAG2, all appear 442
functional at the DNA level in the portions of these genes that were sequenced, and both RAG1 443
paralogs are expressed in at least one Silurana tetraploid (S. new tetraploid 1). 444
The rate of degeneration is significantly higher in RAG1 β paralogs than in RAG1 α paralogs 445
and significantly higher in RAG1 than in RAG2. Simulations also indicate that, given the observed 446
pattern of degeneration of RAG2, the probability of unbiased degeneration of RAG1 producing by 447
chance such a high number of chimerical heterodimers derived from unlinked paralogs of RAG1 and 448
16
RAG2 is very low. It is surprising that degeneration of RAG1 paralogs occurred in this manner 449
because an allopolyploid origin of the tetraploid ancestor of Xenopus would suggest that unlinked 450
paralogs share a shorter co-evolutionary history than do the linked ones. 451
Unique characteristics of each subgenome could account for the biased degeneration of RAG1 452
paralog β. Epigenetic phenomena after allopolyploidization could contribute to this bias, for example, 453
if paralog expression were differently affected by asymmetric mobility of transposable elements in 454
each subgenome. Alternatively, genetic explanations for nonrandom degeneration of RAG1 paralogs 455
include (1) “intra-paralog” phenomena, such as differences in dosage of each RAG1 paralog, or (2) 456
“inter-molecular” phenomena involving selection on interactions between the protein products of 457
specific RAG1 paralogs and other molecules. This second genetic explanation includes scenarios 458
involving a disadvantage (negative selection) or an advantage (positive selection) to interactions 459
between specific paralogs of RAG1 and other molecules, which may or may not include proteins 460
encoded by specific paralogs of RAG2. 461
462
Dosage as an explanation for non-random gene silencing of RAG1. One explanation for biased 463
degeneration of RAG1 is that, after allotetraploidization in Xenopus, expression of the RAG1 paralog 464
that was derived from diploid ancestor α was greater than the one that was derived from diploid 465
ancestor β. If this difference in dosage meant that the RAG1 paralog α could operate alone in a 466
polyploid genome whereas RAG1 paralog β could not, then the former would be both necessary and 467
sufficient. Under this scenario, the function of each RAG1 paralog would be identical and 468
interchangeable and the difference driving degeneration of RAG1 lineage β would be one of quantity, 469
not quality, of proteins encoded by RAG1 paralogs. However, because copy number of RAG2 was 470
reduced by deletion in an ancestor before any RAG1 copies degenerated, dosage constraints on RAG1 471
after allopolyploidization would have to have been imposed by other co-factors of RAG1. 472
Haploinsufficient phenotypes result from reduced expression or activity of a heterozygous 473
locus, and these phenotypes could stem from multiple mechanisms including altered enzymatic 474
stoichiometry (VEITIA 2002). The importance of stoichiometric requirements is suggested in yeast in 475
that proteins that form complexes are rarely members of large gene families and underexpression or 476
overexpression of these genes can be deleterious (PAPP et al. 2003). In the same way, ancestral 477
differences in gene dosage could influence allopolyploid phenotypes and ultimately affect the genetic 478
fate of paralogs in an allopolyploid genome. For example, laboratory generated allopolyploids with 479
two Z chromosomes from X. gilli and one W chromosome from X. laevis were mostly male whereas 480
17
individuals with the same W chromosome but one Z chromosome from X. gilli and one Z chromosome 481
from X. muelleri (or X. laevis) were only female (KOBEL 1996). Dosage could also be an important 482
factor in the evolution of dominance. That wildtype alleles are generally dominant over new mutations 483
could be a byproduct of selection for surplus capability that is needed to operate under heterogeneous 484
conditions (CHARLESWORTH 1979; FORSDYKE 1994; KACSER and BURNS 1981; ORR 1991; WRIGHT 485
1934). 486
487
Selection on interactions between specific paralogs of RAG1 and other molecules. If the 488
tetraploid ancestor of Xenopus evolved through allopolyploidization, negative selection on molecular 489
interactions could arise from Dobzhansky-Muller incompatibilities (DOBZHANSKY 1937; MULLER 490
1942). However, Dobzhansky-Muller incompatibilities between paralogs of RAG1 and RAG2 could 491
explain the degeneration of specific RAG1 paralogs only if the paralogs that are currently functional 492
were derived from the same diploid ancestor. But this does not appear to be the case because, based on 493
comparison to linked paralogs in X. laevis (GREENHALGH et al. 1993), the only functional RAG1 and 494
RAG2 paralogs in most species are not in synteny. Thus, similar to the dosage hypothesis, if 495
Dobzhansky-Muller incompatibilities drove degeneration of RAG1 paralog β, they would probably 496
involve an interaction other than that with the protein product of the RAG2 gene. This could include 497
interactions with other protein co-factors of V(D)J recombination or with the recombination signal 498
sequences (RSSs) or the spacer regions between RSSs that flank variable, diversity, and joining 499
segments (RAMSDEN et al. 1994; SAKANO et al. 1979). 500
If the tetraploid ancestor of Xenopus was allopolyploid, positive selection could account for 501
this pattern of gene silencing in at least two ways. First, co-adaptation of paralogs of RAG1 and 502
RAG2 could have occurred in one of the diploid ancestors and then these paralogs could have been 503
unlinked by recombination after allopolyploidization. However, the co-adaptation hypothesis is 504
disfavored for the same reason that Dobzhansky-Muller incompatibilities between RAG1 and RAG2 505
are an unlikely explanation – recombination between alleles of different paralogs appears rare and 506
functional RAG1 and RAG2 paralogs in many species therefore are probably derived from different 507
diploid ancestors. A second possibility is that there is a performance advantage to protein-protein 508
interactions between RAG1 and RAG2 paralogs that are derived from different diploid ancestors. This 509
scenario posits an advantage, or heterosis, to a combination of two evolutionarily naïve proteins over a 510
heterodimer whose constituents have a longer period of coevolutionary history. Heterosis is also 511
18
suggested, for example, in Xenopus polyploids that are resistant to parasites that infect both of their 512
parental taxa (JACKSON and TINSLEY 2003). 513
Explanations for heterosis include overdominance, dominance, or epistasis. The 514
overdominance hypothesis posits that heterozygous loci are more fit than homozygous loci (whether 515
dominant or recessive) (HULL 1945; SHULL 1908), and this could apply to allopolyploids that co-516
express alleles from each parental species. In fact, allopolyploidization provides a way to maintain 517
heterosis associated with heterozygosity because alleles from each parental species are forced to co-518
segregate in disomic allopolyploids. In the current case, however, the overdominance explanation for 519
heterosis does not apply because, while many allopolyploids have a RAG1/RAG2 heterodimer 520
comprised of proteins derived from genes with different ancestry, each paralog that encodes the 521
proteins in these heterodimers is homozygous with respect to their diploid ancestry. The dominance 522
explanation for heterosis posits that if dominant alleles are more fit, hybrids (or allopolyploids) would 523
benefit from the dominant alleles from each parent (DAVENPORT 1908). Dominance can result from 524
differences in dosage, which was discussed earlier, and could also be a consequence of epistasis, so 525
these explanations for heterosis are not mutually exclusive. Epistasis could lead to heterosis in an 526
allopolyploid if there were a performance advantage to interactions between proteins derived from 527
different ancestors. 528
Taken together, these observations suggest that allopolyploid transcriptomes are sculpted by 529
natural selection on each subgenome. Mutational or regulatory differences that accumulated in each 530
ancestor may be advantageous or deleterious, and their paralogs can be preserved or discarded after 531
genome fusion based on their performance and their interactions with molecules from the other 532
subgenome. In some cases, pseudogenization is strongly biased with respect to ancestry over millions 533
of years, and chimerical interactions between proteins from different ancestors may be favored over 534
interactions between proteins that share a longer period of co-evolution. 535
536
ACKNOWLEDGMENTS 537
538
I am grateful to Brian Golding for helpful discussion about this project and for access to computational 539
resources in his laboratory and to two anonymous reviewers for providing comments on this 540
manuscript. This research was supported by the Canadian Foundation for Innovation, the National 541
Science and Engineering Research Council, the Ontario Research and Development Challenge Fund, 542
and McMaster University. 543
19
544
LITERATURE CITED 545
546
ADAMS, K. L., R. CRONN, R. PERCIFELD and J. F. WENDEL, 2003 Genes duplicated by polyploidy 547
show unequal contributions to the transcriptome and organ-specific reciprocal silencing. 548
Proceedings of the National Academy of Sciences 100: 4649. 549
ADAMS, K. L., R. PERCIFELD and J. F. WENDEL, 2004 Organ-specific silencing of duplicated genes in a 550
newly synthesized cotton allotetraploid. Genetics 168: 2217-2226. 551
ADAMS, K. L., and J. F. WENDEL, 2005 Polyploidy and genome evolution in plants. Current Opinion in 552
Plant Biology 8: 135-141. 553
CHAIN, F. J. J., and B. J. EVANS, 2006 Molecular evolution of duplicate genes in Xenopus laevis is 554
consistent with multiple mechanisms for their retained expression. PLoS Genetics 2: e56. 555
CHAIN, F. J. J., D. ILIEVA and B. J. EVANS, 2007 Immediate purifying selection, dosage requirements, 556
and expression divergence following gene duplication by allopolyploidization. Submitted. 557
CHARLESWORTH, B., 1979 Evidence against Fisher's theory of dominance. Nature 278: 848-849. 558
COMAI, L., 2000 Genetic and epigenetic interactions in allopolyploid plants. Plant Molecular Biology 559
43: 387-399. 560
CRONN, R. C., R. L. SMALL and J. F. WENDEL, 1999 Duplicated genes evolve independently after 561
polyploid formation in cotton. Proceedings of the National Academy of Sciences 96: 14406-562
14411. 563
DAVENPORT, C. B., 1908 Degeneration, albinism and inbreeding. Science 28: 454-455. 564
DOBZHANSKY, T., 1937 Genetics and the origin of species. Columbia University Press, New York. 565
EVANS, B. J., D. C. CANNATELLA and D. J. MELNICK, 2004a Understanding the origins of areas of 566
endemism in phylogeographic analyses: a reply to Bridle et al. Evolution 58: 1397-1400. 567
20
EVANS, B. J., D. B. KELLEY, D. J. MELNICK and D. C. CANNATELLA, 2005 Evolution of RAG-1 in 568
polyploid clawed frogs. Molecular Biology and Evolution 22: 1193-1207. 569
EVANS, B. J., D. B. KELLEY, R. C. TINSLEY, D. J. MELNICK and D. C. CANNATELLA, 2004b A 570
mitochondrial DNA phylogeny of clawed frogs: phylogeography on sub-Saharan Africa and 571
implications for polyploid evolution. Molecular Phylogenetics and Evolution 33: 197-213. 572
FORCE, A., M. LYNCH, B. PICKETT, A. AMORES, Y. L. YAN et al., 1999 Preservation of duplicate genes 573
by complementary, degenerative mutations. Genetics 151: 1531-1545. 574
FORSDYKE, D. R., 1994 The heat-shock response and the molecular basis of genetic dominance. 575
Journal of Theoretical Biology 167: 1-5. 576
GIBBS, M. J., J. S. ARMSTRONG and A. J. GIBBS, 2000 Sister-Scanning: a Monte Carlo procedure for 577
assessing signals in recombinant sequences. Bioinformatics 16: 573-582. 578
GOLDMAN, N., J. P. ANDERSON and A. G. RODRIGO, 2000 Likelihood-based tests of topologies in 579
phylogenetics. Systematic Biology 49: 652-670. 580
GREENHALGH, P., C. E. M. OLESEN and L. A. STEINER, 1993 Characterization and expression of 581
Recombination Activating Genes (RAG-1 and RAG-2) in Xenopus laevis. Journal of 582
Immunology 151: 3100-3110. 583
GU, X., 2003 Evolution of duplicate genes versus genetic robustness against null mutations. Trends in 584
Genetics 19: 354-356. 585
HUELSENBECK, J. P., D. M. HILLIS and R. JONES, 1996 Parametric bootstrapping in molecular 586
phylogenetics: applications and performance., pp. 19-45 in Molecular Zoology: Advances, 587
Strategies, and Protocols., edited by J. D. FERRARIS and S. R. PALUMBI. Wiley-Liss, New 588
York. 589
HUGHES, M. K., and A. L. HUGHES, 1993 Evolution of duplicate genes in a tetraploid animal, Xenopus 590
laevis. Molecular Biology and Evolution 10: 1360-1369. 591
21
HULL, F. G., 1945 Recurrent selection and specific combining ability in corn. Journal of the American 592
Society of Agronomy 37: 134-145. 593
JACKSON, J. A., and R. C. TINSLEY, 2003 Parasite infectivity to hybridising host species: a link 594
between hybrid resistance and allopolyploid speciation? International Journal for Parasitology 595
33: 137-144. 596
JIANG, C.-X., P. W. CHEE, X. DRAYE, P. L. MORRELL, C. W. SMITH et al., 2000 Multilocus 597
interactions restrict gene introgression in interspecific populations of polyploid Gossypium 598
(cotton). Evolution 54: 798-814. 599
KACSER, H., and J. A. BURNS, 1981 The molecular basis of dominance. Genetics 97: 639-666. 600
KAPITONOV, V., and J. JURKA, 2005 RAG1 core and V(D)J recombination signal sequences were 601
derived from Transib transposons. PLoS Biology 3: e181. 602
KOBEL, H. R., 1996 Allopolyploid speciation, pp. 391-401 in The Biology of Xenopus, edited by R. C. 603
TINSLEY and H. R. KOBEL. Clarendon Press, Oxford. 604
LEE, H.-S., and Z. J. CHEN, 2001 Protein-coding genes are epigenetically regulated in Arabidopsis 605
polyploids. Proceedings of the National Academy of Sciences 98: 6753-6758. 606
LEVY, A. A., and M. FELDMAN, 2004 Genetic and epigenetic reprogramming of the wheat genome 607
upon allopoolyploidization. Biological Journal of the Linnean Society 82: 607-613. 608
LIU, B., C. L. BRUBAKER, G. MERGEAI, R. C. CRONN and J. F. WENDEL, 2001 Polyploid formation in 609
cotton is not accompanied by rapid genomic changes. Genome 44: 321-330. 610
LIU, B., and J. F. WENDEL, 2002 Non-mendelian phenomena in allopolyploid genome evolution. 611
Current Genomics 3: 489-505. 612
LIU, B., and J. F. WENDEL, 2003 Epigenetic phenomena and the evolution of plant allopolyploids. 613
Molecular Phylogenetics and Evolution 29: 365-379. 614
22
LOUMONT, C., 1983 Deux especes nouvelles de Xenopus du Cameroun (Amphibia, Pipidae). Rev. 615
Suisse Zool. 90: 169-177. 616
LUKENS, L. N., P. A. QUIJADA, J. UDALL, J. C. PIRES, M. E. SCHRANZ et al., 2004 Genome 617
redundancy and plasticity within ancient and recent Brassica crop species. Biological Journal 618
of the Linnean Society 82: 665-674. 619
LYNCH, M., and A. FORCE, 2000 The probability of duplicate gene preservation by 620
subfunctionalization. Genetics 154: 459-473. 621
LYNCH, M., M. O'HELY, B. WALSH and A. FORCE, 2001 The probability of preservation of a newly 622
arisen gene duplicate. Genetics 159: 1789-1804. 623
MADDISON, D. R., and W. P. MADDISON, 2000 MacClade. Sinauer Associates, Sunderland. 624
MADLUNG, A., R. MASUELLI, B. WATSON, S. H. REYNOLDS, J. DAVISON et al., 2002 Remodeling of 625
DNA methylation and phenotypic and transcriptional changes in synthetic Arabidopsis 626
allotetraploids. Plant Physiology 129: 733-746. 627
MARTIN, D., and E. RYBICKI, 2000 RDP: detection of recombination amongst aligned sequences. 628
Bioinformatics 16: 562-563. 629
MAYNARD SMITH, J., 1992 Analyzing the mosaic structure of genes. Journal of Molecular Evolution 630
34: 126-129. 631
MOSCONE, E. A., M. A. MATZKE and A. J. M. MATZKE, 1996 The use of combined FISH/GISH in 632
conjunction with DAPI counterstaining to identify chromosomes containing transgene inserts 633
in amphidiploid tobacco. Chromosoma 105: 231-236. 634
MULLER, H. J., 1942 Isolating mechanisms, evolution and temperature. Biol. Symp. 6: 71-125. 635
NYLANDER, J. A. A., F. RONQUIST, J. P. HUELSENBECK and J. L. NIEVES-ALDREY, 2004 Bayesian 636
phylogenetic analysis of combined data. Systematic Biology 53: 47-67. 637
OHNO, S., 1970 Evolution by gene duplication. Springer-Verlag, Berlin. 638
23
ORR, H. A., 1991 A test of Fisher's theory of dominance. Proceedings of the National Academy of 639
Sciences 88: 11413-11415. 640
OZKAN, H., A. A. LEVY and M. FELDMAN, 2001 Allopolyploidy-induced rapid genomic evolution of 641
the wheat (Aegilops-Triticum) group. Plant Cell 13: 1735-1747. 642
PADIDAM, M., S. SAWYER and C. M. FAUQUET, 1999 Possible emergence of new geminiviruses by 643
frequent recombination. Virology 265: 218-225. 644
PAGEL, M., 1994 Detecting correlated evolution on phylogenies: a general method for the comparative 645
analysis of discrete characters. Proceedings of the Royal Society of London (B) 255: 37-45. 646
PAPP, B., C. PÁL and L. D. HURST, 2003 Dosage sensitivity and the evolution of gene families in yeast. 647
Nature 424: 194-197. 648
POSADA, D., 2002 Evaluation of methods for detecting recombination from DNA sequences: empirical 649
data. Molecular Biology and Evolution 19: 708-717. 650
POSADA, D., and K. A. CRANDALL, 2001 Evaluation of methods for detecting recombination from 651
DNA sequences: computer simulations. Proceedings of the National Academy of Sciences 98: 652
13757-13762. 653
RAMBAUT, A., and N. C. GRASSLY, 1997 Seq-Gen: an application for the Monte Carlo simulation of 654
DNA sequence evolution along phylogenetic trees. Computer Applications in the Biosciences 655
13: 235-238. 656
RAMSDEN, D. A., K. BAETZ and G. E. WU, 1994 Conservation of sequence in recombination signal 657
sequence spacers. Nucleic Acids Research 22: 1785-1796. 658
SADOFSKY, M. J., J. E. HESSE, J. F. MCBLANE and M. GELLERT, 1993 Expression and V(D)J 659
recombination activity of mutated RAG-1 proteins. Nucleic Acids Res. 21: 5644-5650. 660
SAKANO, H., K. HUPPI, G. HEINRICH and S. TONEGAWA, 1979 Sequences at the somatic recombination 661
sites of immunoglobulin light-chain genes. Nature 280: 288-294. 662
24
SALMINEN, M. O., J. K. CARR, D. S. BURKE and F. E. MCCUTCHAN, 1995 Identification of breakpoints 663
in intergenotypic recombinants of HIV-1 by bootscanning. AIDS Res. Hum. Retroviruses 11: 664
1423-1425. 665
SHAKED, H., K. KASHKUSH, H. OZKAN, M. FELDMAN and A. A. LEVY, 2001 Sequence elimination and 666
cytosime methylation are rapid and reproducible responses of the genome to wide hybridization 667
and allopolyploidy in wheat. The Plant Cell 13: 1749-1759. 668
SHULL, G. H., 1908 The composition of a field of maize. Report of the American Breeding Association 669
4: 296-301. 670
SKALICKÁ, K., K. Y. LIM, R. MATYASEK, M. MATZKE, A. R. LEITCH et al., 2005 Preferential 671
elimination of repeated DNA sequences from the paternal, Nicotiana tomentosiformis genome 672
donor of a synthetic, allotetraploid tobacco. New Phytologist 166: 291-303. 673
SOLTIS, D. E., P. S. SOLTIS, J. C. PRIRES, A. KOVARIK, J. A. TATE et al., 2004 Recent and recurrent 674
polyploidy in Tragopogon (Asteraceae): cytogenetic, genomic and genetic comparisons. 675
Biological Journal of the Linnean Society 82: 485-501. 676
SWOFFORD, D. L., 2002 Phylogenetic analysis using parsimony (* and other methods). Version 4. 677
Sinauer Associates, Sunderland. 678
TYMOWSKA, J., 1991 Polyploidy and cytogenetic variation in frogs of the genus Xenopus., pp. 259-297 679
in Amphibian cytogenetics and evolution, edited by D. S. GREEN and S. K. SESSIONS. 680
Academic Press., San Diego. 681
VEITIA, R. A., 2002 Exploring the etiology of haploinsufficiency. Bioessays 24: 175-184. 682
VOLKOV, R. A., N. V. BORISJUK, I. I. PANCHUK, D. SCHWEIZER and V. HEMLEBEN, 1999 Elimination 683
and rearrangement of parental rDNA in the allotetraploid Nicotiana tabacum. Molecular 684
Biology and Evolution 16: 311-320. 685
25
WANG, J., L. TIAN, A. MADLUNG, H.-S. LEE, M. CHEN et al., 2004 Stochastic and epigenetic changes 686
of gne expression in Arabidopsis polyploids. Genetics 168: 2217-2226. 687
WENDEL, J. F., A. SCHNABEL and T. SEELANAN, 1995 Bidirectional interlocus concerted evolution 688
following allopolyploid speciation in cotton (Gossypium). Proceedings of the National 689
Academy of Sciences 92: 280-284. 690
WRIGHT, S., 1934 Molecular and evolutionary theories of dominance. American Naturalist 63: 24-53. 691
YANG, Z., 1997 PAML: a program package for phylogenetic analysis by maximum likelihood. 692
CABIOS 13: 555-556. 693
ZWIERZYKOWSKI, Z., R. TAYYAR, M. BRUNELL and A. J. LUKASZEWSKI, 1998 Genome recombination 694
in intergeneric hybrids between tetraploid Festuca pratensis and Lolium multiflorum. Journal of 695
Heredity 89: 324-328. 696
697
698
Figure Legends 699
700
Fig. 1. Diploidization of allopolyploid genomes means that recombination between alleles of different 701
paralogs of the same gene is rare. The evolutionary history of linked paralogs in diploidized 702
allopolyploid genomes, therefore, can be inferred by combining information on synteny with 703
information on the evolutionary relationships among paralogs of each gene. The RAG1 and 704
RAG2 genes, for example, are tightly linked. This diagram depicts the evolution of the 705
expected number of paralogs of each of these genes in species with different ploidy levels 706
(tetraploid, octoploid, and dodecaploid) if no deletion or degeneration occurred. An allo-707
tetraploid species inherits a linked set of RAG1 and RAG2 α paralogs from diploid ancestor 1 708
and a linked set of RAG1 and RAG2 β paralogs from diploid ancestor 2. An allo-octoploid 709
species inherits two sets of linked RAG1 and RAG2 paralogs from two different allo-tetraploid 710
ancestors. In these species the α1 and α2 paralogs of RAG1 and RAG2 are derived from 711
diploid ancestor 1 and the β1 and β2 paralogs of RAG1 and RAG2 are derived from diploid 712
26
ancestor 2. An allo-dodecaploid inherits an allo-octoploid and an allo-tetraploid genome and 713
has three α paralogs (α1, α2, α3) derived from diploid ancestor 1 and three β paralogs (β1, β2, 714
β3) derived from diploid ancestor 2. Note that the observed number of functional RAG1 and 715
RAG2 paralogs is less than this expectation in most (RAG1) or all (RAG2) species of Xenopus 716
due to degeneration or deletion. 717
Fig. 2. Combined phylogenetic analysis of RAG1 and RAG2 paralogs. Nodes with less than 95% 718
posterior probability or that were inconsistent between the α and β lineages are collapsed, 719
asterisks indicate RAG1 paralogs for which expression was confirmed, and linked X. laevis 720
RAG1 and RAG2 paralogs (Greenhalgh et al. 1993) are connected by a line. Degenerate 721
paralogs are in red and missing paralogs are in gray. Numbers 1-9 indicate the minimum 722
independent degenerations of RAG1 mentioned in the text, and 10, 11, and 12 refer to 723
degeneration of RAG2 lineage α, degeneration X. andrei RAG2 paralog β2, and a suspected 724
ancestral deletion spanning paralogs of RAG1 and RAG2, respectively. 725
726
Fig. 3. Evolutionary relationships merge when a species evolves by allopolyploidization, and a 727
reticulate phylogeny can be constructed from well-supported nodes in the genealogies of RAG1 728
and RAG2 in Fig. 1. The ploidy level follows each species name including three diploids 729
indicated with daggers whose existence is predicted in the past but for whom an extant 730
descendant with the same ploidy level is not known. Additionally, three tetraploid ancestors 731
(each with 4x=36), also indicated by daggers, are predicted but not known by an extant 732
representative. 733
734
Fig. 4. Evolution of RAG1 and RAG2 in Xenopus. The RAG1 and RAG2 heterodimer is depicted as a 735
scissor and the evolutionary trajectory of this heterodimer is followed through the reticulate 736
phylogeny that is depicted in Fig. 3, including degeneration and deletions of paralogs encoding 737
each protein that are depicted in Fig. 2. Proteins encoded by paralogs inherited from diploid 738
ancestor α are yellow and those encoded by paralogs inherited from diploid ancestor β are blue. 739
Numbered deletion and degeneration events from Fig. 2 are labeled and the ploidy of each 740
ancestor or extant species (2x, 4x, 8x, or 12x) is indicated inside each box. Species and 741
ancestors for which simulations were performed are indicated by a red box. As a result of 742
biased degeneration of RAG1, heterodimers of many extant species must form between 743
proteins that were ultimately inherited from different diploid species (i.e. the scissor are 744
27
composed of a yellow RAG1 protein and a blue RAG2 protein). Single asterisks indicate a 745
minimum of seven independently evolved chimerical heterodimers. Double asterisks indicate 746
caveats that X. cf. fraseri had no observed degeneration of RAG1 paralog β, but that only a 747
small fragment of this paralog was sequenced (Suppl. Fig. 2), and that expression of paralog β 748
was not detected in X. muelleri but no degeneration was observed in the coding region. 749
750
Fig. 5. Simulations of the expected distribution of chimerical heterodimers (those formed between 751
proteins encoded by from paralogs derived from different diploid ancestors) given the observed 752
degeneration and deletion of RAG2 and phylogenetically independent and unbiased deletion of 753
RAG1. Simulations were performed on the eight species or ancestors as indicated in Fig. 4. 754
An asterisk indicates the ten observed chimerical heterodimers in the species for which the 755
simulations were performed, including seven that evolved independently, but not counting the 756
one of the chimerical heterdimers in X. wittei that may have been inherited from one of the 757
tetraploid ancestors that was simulated. The observed number of chimerical heterodimers 758
depart significantly from expectations under no bias (P < 0.0001). 759
760
Suppl. Fig. 1. Genetic samples and primers used in this study. 761
762
Suppl. Fig. 2. (A) Sequenced regions of RAG1. The core region of RAG1 that is required for V(D)J 763
recombination is indicated by a black bar on top. The white bar inside this black bar indicates 764
the region sequenced by Evans et al. 2005. Arrows indicate paralogs whose expression is 765
confirmed or, in the diploid S. tropicalis, assumed. Frameshift mutations and stop codons are 766
indicated by F and S, respectively. Heterozygous degenerations in X. borealis and X. 767
ruwenzoriensis are indicated by an asterisk and shared degenerations are underlined. (B) The 768
sequenced region of RAG2 is indicated by a thick line; essentially the same region was 769
sequences for all RAG2 paralogs in this study. 770
771
Suppl. Fig. 3. Consensus topology from Bayesian analysis of (A) RAG1, (B) RAG2, (C) RAG1 and 772
RAG2 combined, and (D) a synthetic data configuration (see text). A single asterisk indicates 773
branches with at least 95% posterior probability and double asterisks indicate branches that 774
also have bootstrap values of at least 80%. In (B) a bar indicates a putative deletion of RAG2 775
paralog α. Sequences are labeled as in Fig. 1 except in (D) where for tetraploids α and β are 776
28
collapsed into a single taxon, and for octoploids and dodecaploids α1 and β1 are a single taxon 777
and α2 and β2 are a single taxon, and for dodecaploids α3 and β3 are a single taxon. 778
779
Suppl. Fig. 4. Parametric bootstrap tests were performed to evaluate hypotheses of fewer independent 780
degenerations of RAG1. Because X. boumbaensis paralog β1 and X. andrei paralog β1 are 781
both expressed and have only autapomorphic degenerations, it is assumed that they 782
degenerated independently from other paralogs, but the total number of additional independent 783
degenerations depends on their evolutionary relationships with respect to other paralogs 784
because ancestral gene silencing could have preceded the accumulation of autapomorphic 785
degenerations in each paralog. To be conservative, paralog β1 of X. boumbaensis and paralog 786
β1 of X. cf. boumbaensis are assumed to have been silenced in a recent common ancestor, even 787
though insufficient data was collected from these paralogs to test for shared degeneration in the 788
coding region. Parametric bootstrap tests were performed using the “synthetic” configuration of 789
data, so each taxonomic unit includes concatenated data from an α and a β paralog of RAG1 790
and the corresponding β paralog of RAG2. Specifically, lineage A is a single taxonomic unit 791
that includes concatenated X. andrei RAG1 paralog α1 and β1 and RAG2 paralog β1. Clade B 792
includes two taxonomic units; one is made up of X. boumbaensis sequences from RAG1 793
paralog α1 and β1 and RAG2 paralog β1 and the other is concatenated sequences from the 794
same paralogs from X. cf. boumbaensis. Clade C includes ten concatenations: concatenated 795
paralog 1 from X. wittei, X. new octoploid, X. ameiti, X. pygmaeus, X. cf. fraseri 1, X. cf. 796
fraseri 2, X. longipes, and X. ruwenzoriensis, and concatenated paralog 3 from X. longipes and 797
X. ruwenzoriensis. Clade D is Clade C minus the X. longipes concatenated paralog 3. Other 798
taxa are indicated by “O”. All of these hypotheses were strongly rejected (P < 0.0001). 799
800
801
Species (ploidy) Tissue
Expected
paralogs
Observed
expressed
paralogs
Cloned (C) or
directly sequenced
(DS)
S. new tetraploid 1 (4x=40) brain !, " !, " C
testes !, " !, " DS
liver !, " !, " DS
X. laevis (4x=36) bone marrow !, " !, " C, DS
heart !, " ! DS
liver !, " !, " DS
spleen !, " !, " DS
liver !, " !, " DS
X. gilli (4x=36) bone marrow !, " !, " C
X. borealis (4x=36) brain !, " ! DS
liver !, " ! DS
testes !, " ! DS
X. muelleri (4x=36) brain !, " ! DS
testes !, " ! DS
bone marrow !, " ! DS
X. amieti (8x=72) liver !1,!2, "1,"2 !1,!2# C
brain !1,!2, "1,"2 !1 DS
X. andrei (8x=72) bone marrow !1,!2, "1,"2 !1,!2,"1# C, DS
spleen !1,!2, "1,"2 !1 DS
brain !1,!2, "1,"2 !1 DS
X. new octoploid (8x=72) bone marrow !1,!2, "1,"2 !1,!2 C
brain !1,!2, "1,"2 !1,!2(low) DS
heart !1,!2, "1,"2 !1(low),!2 DS
testes !1,!2, "1,"2 !1,!2 C
X. boumbaensis (8x=72) bone marrow !1,!2, "1,"2 !1,!2, "1# C
brain !1,!2, "1,"2 !1,"1# DS
heart !1,!2, "1,"2 "1# DS
Table 1. Gene silencing of RAG1 paralogs in different Xenopus species. Expression was confirmed by
amplifying cDNA across an intron and either cloning (C) or directly sequncing (DS) the PCR product.
Expressed paralogs with an asterisk are degenerate.
RA
G1
RA
G2
RA
G1α
RA
G2α
RA
G1β
RA
G2β
regular speciation and divergence
allopolyploidization
RA
G1α
RA
G2α
RA
G1β
RA
G2β
Dip
loid
an
ce
stor 1
Dip
loid
an
ce
stor 2
Dip
loid
an
ce
stor
Allo
-tetra
plo
id a
nc
esto
r
regular speciation and divergence
allopolyploidization
RA
G1α
2R
AG
2α2
RA
G1β2
RA
G2β2
Allo
-tetra
plo
id a
nc
esto
r 2
RA
G1α
1R
AG
2α1
RA
G1β1
RA
G2β1
Allo
-tetra
plo
id a
nc
esto
r 1
RA
G1α
2R
AG
2α2
RA
G1β2
RA
G2β2
Allo
-oc
top
loid
an
ce
stor
RA
G1α
1R
AG
2α1
RA
G1β1
RA
G2β1
Evans, F
ig. 1
allopolyploidization
RA
G1α
1R
AG
2α1
RA
G1β1
RA
G2β1
RA
G1α
3R
AG
2α3
RA
G1β3
RA
G2β3
Allo
-do
de
ca
plo
id
RA
G1α
2R
AG
2α2
RA
G1β2
RA
G2β2
RA
G1α
RA
G2α
RA
G1β
RA
G2β
Allo
-tetra
plo
id
RA
G1α
2R
AG
2α2
RA
G1β2
RA
G2β2
Allo
-oc
top
loid
RA
G1α
1R
AG
2α1
RA
G1β1
RA
G2β1
ruwenzoriensis β3
gilli α
gilli β
S. tropicalisnew tetraploid 2 αS. epitropicalis αnew tetraploid 1 αS. epitropicalis β
new tetraploid 1β new tetraploid 2 β
largeni α
amieti α1longipes α1
ruwenzoriensis α1cf. boumbaensis α1
pygmaeus αruwenzoriensis α3
cf. fraseri 1 αcf. fraseri 2 α
cf. boumbaensis α2new octoploid α1
wittei α1andrei α1
longipes α3
vestitus α2
amieti α2ruwenzoriensis α2cf. boumbaensis α3
andrei α2longipes α2
new octoploid α2wittei α2
vestitus α1
laevis α
clivii αnew tetraploid 1α
borealis αmuelleri αamieti β1
ruwenzoriensis β1longipes β1
cf. boumbaensis β1cf. fraseri 1 β
cf. boumbaensis β2cf. fraseri 2 βpygmaeus β
new octoploid β1wittei β1andrei β1
longipes β3laevis β
new octoploid β2wittei β2
vestitus β1
vestitus β2andrei β2
largeni βclivii β
new tetraploid 1 βborealis βmuelleri β
amieti β2ruwenzoriensis β2
cf. boumbaensis β3longipes β2
34
2
1
6
98
11
*
*
**
**
*
**
**
**
*
*
*
**
RAG1 RAG2
α
β
α
β
β β
α α
10
Evans, Fig. 2
5
12 12
boumbaensis α2
boumbaensis α1
boumbaensis β1
boumbaensis β2
7
Xenopus
Silura
na
X. gilli (36)
X. new tetraploid (36)X. muelleri (36)X. borealis (36)
S. tropicalis (20)
S. new tetraploid 1 (40)
X. largeni (36)
X. laevis (36)
X. clivii (36)
S. epitropicalis (40)S. new tetraploid 2 (40)
X. cf. boumbaensis (108)
X. wittei (72)
X. ruwenzoriensis (108)
X. longipes (108)
X. amieti (72)
X. andrei (72)
X. vestitus (72)
X. cf. fraseri 2 (36)
X. pygmaeus (36)
X. cf. fraseri 1 (36)
Evans, Fig. 3
X. new octoploid (72)diploid α (18)
diploid β (18)
diploid β (20)
X. boumbaensis (72)
RAG2
X. gilli
X. new tetraploid
X. muelleri **X. borealis
X. largeni
X. laevis
X. clivii
X. cf boumbaensis
X. wittei
X. ruwenzoriensis
X. longipes
X. amieti
X. andrei
X. vestitus
X. cf. fraseri 2 **
X. pygmaeus
X. cf. fraseri 1
X. new octoploid
10 1
2
411
3
7
9
8
5
diploid α
diploid β
Evans, Fig. 4
*
*
*
*
*
*
*
RAG1
6
4x
4x
4x
4x
4x
2x
4x
2x
4x
8x 8x 8x12
4x
4x
4x
4x
4x
8x
4x
4x
8x
8x
8x
12x
12x
12x
Xenopus
X. boumbaensis8x
0
10000
20000
30000
Cou
nt
Number of simulated chimerical heterodimers
0 1 2 3 4 5 6 7 8 9 10 11 *
Evans, Fig. 5
Modelfree
parametersRAG1
likelihood Bayes FactorRAG2
likelihood Bayes Factor
GTR+I+Γ+bf 10 19772.35 6074.23Same GTR, bf, and I for all sites, separate Γ for each codon position 12 19638.90 266.90 6022.41 103.65Same GTR and bf for all sites, separate I+Γ for each codon position 14 19630.65 16.49 6015.37 14.07Same bf and I for all sites, separate GTR+Γ for each codon position 22 19623.83 13.64 6010.25 10.25Same bf for all sites, separate GTR+I+Γ for each codon position 24 19614.16 19.35 6003.93 12.64Same I for all sites, separate GTR+ Γ + bf for each codon position 28 19594.65 39.02 5995.06 17.74
Separate GTR+I+Γ + bf for each codon position 30 19584.55 20.19 5989.77 10.57
Supplementary Table 1. Harmonic mean of post-burnin likelihood under alternative parameterizations of the general time reversible (GTR) model of evolution with base frequencies (bf), a proportion of invariant sites (I) and a gamma distributed rate heterogeneity parameter (Γ) estimated from the data. Free parameters do not include branch-length and topology parameters.
RAG1 RAG2
Supplementary Figure 1: Genetic samples and primers used to amplify RAG1 and RAG2. Genetic Samples: Sequences were obtained from the same individuals as Evans et al. (23) with the following exceptions: (1) for expressed paralogs, at least the first 16 bp were obtained from cDNA from non-vouchered laboratory animals, and (2) in X. wittei and in X. borealis, portions of RAG1 paralogs were sequenced from two different individuals, (3) sequences for RAG1 and RAG2 of Rhinophrynus dorsalis and Hymenochirus sp. were each obtained from different individuals, (4) X. laevis RAG1 paralogs are from different individuals – the β paralog is from Greenhalgh et al. (29), Genbank Accession Number L19324, whereas the α paralog is from the same individual as Evans et al. (23), (5) the outgroup sequence for RAG1 was from Scaphiopus hurteri and for RAG2 was from S. couchi and (6) new sequences from X. new octoploid are included – this is a new species that was recently discovered in the Democratic Republic of the Congo; a species description is in preparation. Primer used to amplify across the RAG1 exon in the 5’ untranslated region: R1.SWEET.M35 (5’ CTGAGCAGTGACAGCYTGACAKCAGG 3’) R1.REV.442 (5’ GATSAGCTCAGGCCAAGAGGTGAC 3’) R1.382.362 (5’ GATAGCTCCGGTTCTGCTGAT 3’) (from Greenhalgh et al. (1993)) Primers used to amplify RAG1 from genomic DNA: R1.begin.for.16 (5’ GAAAATGGAGGTGGCATTGC 3’) R1.for.537 (5’ TGYCAYAATTGCTGGACTATAATGAATC 3’) R1.rev1231 (5’ AGAGCCAGAAGAAAGAGTGTTAAACACACTG 3’) BOR.MUEL.BETA.FOR.1087 (5’ AGGAYCTTAGTTTGGGGTCTGCACCAC 3’) BOUM.BETA1.REV.2772 (5’ TTATTGGGAACACTGATAAC 3’) R1.rev.2583 (5’ CCTTCAAGGATATCTTCTTCCATGTCTTTC 3’) R1.rev.2656 (5’ ACATCTCCCATYCCATCACAGGACTCTTTAACCACCAC 3’) BOUM.FRAS.BOR.MUEL.BETA.REV.2734 (5’ ACGCACAGCTTTCTCTGGAACGGGA 3’) RAG1.BETA.FOR2730 (5’ GCACGGTAGTGGTCCTCCCGTTCCT 3’) RAG1.BETA.FOR2730A (5’ TAGTGGTCCTCCCGTTCCT 3’) RAG1.BETA.FOR2624 (5’ GAAGATATCCTTGAAGGCACAA 3’) R1.FRAS.207765.BETA.REV2716 (5’ ACGCACAGCTTTCTCTGGTTTCTC 3’) R1.ALPHA.FOR1145 (5’ AGGAAARTGAGAAMTCAAATGCMRTTGACAG 3’) R1.ALPHA.REV2734 (5’ ACGTACGGCTTTCTCTGGAACTGCT 3’) R1.REV.2583 (5’ CCTTCAAGGATATCTTCTTCCATGTCTTTC 3’) RAG1.FOR.3175 (5’ GAACYTACAGCGYTATGAGATGTG 3’) RAG1.REV.3952 (5’ GATTGTCTAGATCTACAGTGAACC 3’) From Evans et al. (2005): XENRAG1.FORWARD3 (5’GGATGAGTATCCAGTAGATACAATCTCCAAGAG 3’) XENRAG1.REVERSE2 (5’ TTTCTGGGACATGTGCCAGGGTTTTGTG 3’) RAG1F4 (5’ GCAAGCCTCTCTGNCTGATGC 3’) RAG1R4 (5’ GTTTTTATAGAACTCCCCTAT 3’)
Primers used to amplify RAG2 from genomic DNA: RAG2FOR.1 (5’ ACCTACACAGTTGCTGTGATG 3’) RAG2REV.1423 (5’ TCTTCATCGTCTTCATTGTA 3’). Primers used to amplify some RAG2 paralogs of Silurana species: RAG2.SILALPHA.FOR363 (5’ ACTGGGGTTTTCCTCGTAGATCTT 3’) RAG2.SILALPHA.REV1372 (5’ AGTATCTTCGTCAAAAAGGTTTCCG 3’) Primers unsuccessfully used to try to amplify RAG2 paralog α2 in X. amieti, X. ruwenzoriensis, X. cf. boumbaensis, and X. longipes: RAG2FOR.1 (5’ ACCTACACAGTTGCTGTGATG 3’) RAG2.FOR.109 (5’ AAACGTTCCTGCCCCACTG 3’) RAG2FOR.817 (5’ GATGGACTTTCCTTCCATGTTTC 3’) RAG2.REV.930 (5’ CCCATATCAGCACCAAACC 3’) RAG2REV.1039 (5’ TGGCTATCAGACTCGTATCC 3’) RAG2.REV.1126 (5’ TGTAAACTCTTCAGAGTCTTC 3’)
X. wittei
X. amieti
X. new octoploid
X. longipes
X. cf. boumbaensis
X. cf. fraseri 2
bp0 500 1000 1500 2000 2500 3000
X. muelleri west
X. gilli
X. laevis
X. borealis
X. muelleri
X. cf. fraseri 1
αβX. largeni, X. cliviiαβ
αβ
αβ
αβ
αβ
αβ
αβ
α1
β1α2
β2
X. pygmaeus αβ
X. vestitusα1
β1α2
β2
X. andreiα1
β1α2
β2
α1
β1α2
β2
α1
β1α2
β2
α1
α3α2
β1β2β3
X. ruwenzoriensisα1
α3α2
β1β2β3
α1
α3α2
β1β2β3
αβS. new tetraploid 1
αβS. epitropicalis
S. tropicalis
αβS. new tetraploid 2
core region
Evans, Suppl. Fig. 2
SS
S
S
S
S S
S
S*
S S
SS S S
S
S SS
S
FF*F
F
F
FF
FFF
F
F F F F
FFF FSF F
FF F
F
F
S
S
F
A
bp0 500 1000 1500
B
X. boumbaensisα1α2β1β2
F F
S. hurteriR. dorsalis
0.1
H. sp.
S. tropicalisS. new tetraploid 2 αS. epitropicalis α
S. new tetraploid 1 αS. new tetraploid 2 βS. epitropicalis βS. new tetraploid 1 β
X. cf. boumbaensis α10.05
X. ruwenzoriensis α1
X. amieti α1X. longipes α1
X. cf. fraseri 1 αX. cf. fraseri 2 α
X. cf. boumbaensis α2
X. pygmaeus αX. ruwenzoriensis α3
X. andrei α1X. new octoploid α1X. wittei α1
X.longipes α3
X. largeni α
X. gilli αX. laevis α
X. amiet α2X. ruwenzoriensis α2X. cf. boumbaensis α3
X. longipes α2X. andrei α2X. vestitus α2
X. vestitus α1X. wittei α2
X. new octoploid α2
X. clivii α
X. borealis αX. muelleri α
X. new tetraploid α
X. longipes β1X. amieti β1
X. cf. boumbaensis β1X. ruwenzoriensis β1
X. andrei β1
X. longipes β3
X. new octoploid β1X. wittei β1
X. ruwenzoriensis β3X. pygmaeus β
X. cf. fraseri 2 βX. cf. fraseri 1 β
X. laevis βX. gilli β
X. clivii β
X. borealis βX. muelleri β
X. largeni β
X. new octoploid β2X. wittei β2
X. vestitus β2
X. vestitus β1
X. new tetraploid β
0.05
S.couchiR. dorsalis
H. sp.
S. tropicalisS. new tetraploid 2 αS. epitropicalis αS. new tetraploid 1 αS. new tetraploid 2 βS. epitropicalis βS. new tetraploid 1 β
X. cf. boumbaensis α1
X. ruwenzoriensis α1
X. amieti α1X. longipes α1
X. cf. fraseri 1 αX. cf. fraseri 2 αX. cf. boumbaensis α2
X. pygmaeus αX. ruwenzoriensis α3
X. andrei α1
X. new octoploid α1X. wittei α1
X.longipes α3X. largeni α
X. gilli αX. laevis α
X. amiet α2X. ruwenzoriensis α2X. cf. boumbaensis α3
X. longipes α2X. andrei α2X. vestitus α2
X. vestitus α1X. wittei α2X. new octoploid α2
X. clivii α
X. borealis αX. muelleri α
X. new tetraploid α
X. longipes β1X. amieti β1
X. cf. boumbaensis β1
X. ruwenzoriensis β1
X. andrei β1X. longipes β3
X. new octoploid β1X. wittei β1
X. ruwenzoriensis β3X. pygmaeus β
X. cf. fraseri 2 βX. cf. fraseri 1 β
X. laevis βX. gilli β
X. clivii β
X. borealis βX. muelleri β
X. largeni β
X. new octoploid β2X. wittei β2
X. vestitus β2
X. vestitus β1
X. new tetraploid β
X. andrei β2
X. cf. boumbaensis β2
S. tropicalis
X. ruwenzoriensis 1
X. amieti 1
X. longipes 1
X. cf. boumbaensis 1
X. pygmaeusX. ruwenzoriensis 3
X. cf. boumbaensis 2
X. wittei 1X. new octoploid 1
X. longipes 2
X. amieti 2
X. cf. boumbaensis 3X. ruwenzoriensis 2
X. andrei 2
X. longipes 3
X. new octoploid 2X. wittei 2
X. cf. fraseri 2 X. cf. fraseri 1
X. laevis
X. borealis
X. muelleri
X. largeni
X. vestitus 2
X. vestitus 1
X. new tetraploid
X. andrei 1
X. clivii
X. gilli
*
*
0.02
0.1
X. longipes β1X. amieti β1
X. cf. boumbaensis β1X. ruwenzoriensis β1
X. andrei β1X. longipes β3
X. wittei β1
X. ruwenzoriensis β3X. pygmaeus βX. cf. fraseri 2 β
X. cf. fraseri 1 β
X. laevis βX. gilli β
X. clivii β
X. borealis βX. muelleri β
X. largeni β
X. wittei β2
X. vestitus β2
X. vestitus β1
X. new tetraploid β
X. cf. boumbaensis β2
X. andrei β2
S. couchiR. dorsalis
S. tropicalisS. new tetraploid 2 αS. epitropicalis αS. new tetraploid 1 αS. new tetraploid 2 βS. epitropicalis βS. new tetraploid 1 β
A B
C D
Evans, Suppl. Fig. 3
X. boumbaensis 2
X. boumbaensis 1
**
**
**
**
**
****
**
* **
*
**
****
**
** **
**
**
**
H. sp.
X. new octoploid β2
X. new octoploid β1
X. boumbaensis β1
**
**
**
**
***
**
**
**
**
**
****
**
**
**
***
***
*
0.1X. boumbaensis α1
X. boumbaensis α2
X. boumbaensis β1
*****
****
**
***
**
**
**
**
**
****
****
***
**
*
**
**
**
****
***
*
**
***
**
** ****
**
**
**
**
** *
***
****
***
****
**
0.05
X. boumbaensis α1
X. boumbaensis β1
X. boumbaensis α3
**
**
**
**
**
***
**
*** **
**
**
**
**
**
**
**
****
***
****
****
**
**
*
**
****
********
*
**
**
**
**
**
*
*
**
*
**
**
OA BA
CB
OB
C
OB
CA
A
OA
DB
OB
D
OA
DB
A
Evans, Suppl. Fig. 4