genetics: published articles ahead of print, published on ...90 (wang et al. 2004). expression can...

40
1 Title: Ancestry influences the fate of duplicated genes millions of years after polyploidization of 1 clawed frogs (Xenopus). 2 Author: Ben J. Evans 3 Phone 905.525.9140 Ext. 26973 4 Fax: 905.522.6066 5 Email: [email protected] 6 Author address: 7 Center for Environmental Genomics 8 Department of Biology, McMaster University, 9 Life Sciences Building Room 328 10 1280 Main Street West, Hamilton, ON, Canada L8S 4K1 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 Sequence data from this article have been deposited with the EMBL/GenBank Data Libraries under 30 accession nos. AY874308, AY874311, AY874309, AY874310, AY874324, AY874346, AY874352, 31 EF535883-EF535988. 32 Genetics: Published Articles Ahead of Print, published on April 15, 2007 as 10.1534/genetics.106.069690

Upload: others

Post on 02-Mar-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Genetics: Published Articles Ahead of Print, published on ...90 (WANG et al. 2004). Expression can be influenced by the direction of the hybrid cross (SOLTIS et al. 91 2004), and co-adapted

1

Title: Ancestry influences the fate of duplicated genes millions of years after polyploidization of 1

clawed frogs (Xenopus). 2

Author: Ben J. Evans 3

Phone 905.525.9140 Ext. 26973 4

Fax: 905.522.6066 5

Email: [email protected] 6

Author address: 7

Center for Environmental Genomics 8

Department of Biology, McMaster University, 9

Life Sciences Building Room 328 10

1280 Main Street West, Hamilton, ON, Canada L8S 4K1 11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

Sequence data from this article have been deposited with the EMBL/GenBank Data Libraries under 30

accession nos. AY874308, AY874311, AY874309, AY874310, AY874324, AY874346, AY874352, 31

EF535883-EF535988. 32

Genetics: Published Articles Ahead of Print, published on April 15, 2007 as 10.1534/genetics.106.069690

Page 2: Genetics: Published Articles Ahead of Print, published on ...90 (WANG et al. 2004). Expression can be influenced by the direction of the hybrid cross (SOLTIS et al. 91 2004), and co-adapted

2

33

Running head: 34

Heterodimers in polyploids 35

Key Words: protein-protein interaction, heterosis, pseudogene 36

Corresponding Author: 37

Ben J. Evans 38

Phone 905.525.9140 Ext. 26973 39

Fax: 905.522.6066 40

Email: [email protected] 41

Corresponding Author Address: 42

Center for Environmental Genomics 43

Department of Biology, McMaster University, 44

Life Sciences Building Room 328 45

1280 Main Street West, Hamilton, ON, Canada L8S 4K1 46

47

48

Page 3: Genetics: Published Articles Ahead of Print, published on ...90 (WANG et al. 2004). Expression can be influenced by the direction of the hybrid cross (SOLTIS et al. 91 2004), and co-adapted

3

ABSTRACT 48

49

Allopolyploid species form through the fusion of two differentiated genomes and, in the 50

earliest stages of their evolution, essentially all genes in the nucleus are duplicated. Because unique 51

mutations occur in each ancestor prior to allopolyploidization, duplicate genes in these species 52

potentially are not interchangeable, and this could influence their genetic fates. This study explores 53

evolution and expression of a simple duplicated complex – a heterodimer between RAG1 and RAG2 54

in clawed frogs (Xenopus). Results demonstrate that copies of RAG1 degenerated in different 55

polyploid species in a phylogenetically-biased fashion, predominately in one lineage of closely-related 56

paralogs. Surprisingly, as a result of an early deletion of one RAG2 paralog, it appears that in many 57

species RAG1/RAG2 heterodimers are composed of proteins that were encoded by unlinked paralogs. 58

If the tetraploid ancestor of extant species of Xenopus arose through allopolyploidization and if 59

recombination between paralogs was rare, then the genes that encode functional RAG1 and RAG2 60

proteins in many polyploid species were each ultimately inherited from different diploid progenitors. 61

This observation is consistent with the notion that ancestry can influence the fate of duplicate genes 62

millions of years after duplication, and it uncovers a dimension of natural selection in allopolyploid 63

genomes that is distinct from other genetic phenomena associated with polyploidization or segmental 64

duplication. 65

66

67

Page 4: Genetics: Published Articles Ahead of Print, published on ...90 (WANG et al. 2004). Expression can be influenced by the direction of the hybrid cross (SOLTIS et al. 91 2004), and co-adapted

4

INTRODUCTION 67

68

In allopolyploid species, the complete genome of two species are fused, genes are duplicated, 69

and the level of genetic redundancy that results is determined by divergence of protein coding and 70

regulatory regions of genes in each ancestor, and by epigenetic phenomena after allopolyploidization. 71

In these genomes, natural selection and stochastic processes govern the genetic fate of each paralog – 72

fates that include redundancy, subfunctionalization, neofunctionalization, and gene silencing (FORCE et 73

al. 1999; GU 2003; HUGHES and HUGHES 1993; LYNCH and FORCE 2000; LYNCH et al. 2001; OHNO 74

1970). After allopolyploidization, interactions between the subgenomes derived from each ancestral 75

species include exchange of chromosome segments (MOSCONE et al. 1996; SKALICKÁ et al. 2005), 76

concerted evolution (VOLKOV et al. 1999; WENDEL et al. 1995), recombination (ZWIERZYKOWSKI et 77

al. 1998), and epistasis (JIANG et al. 2000). Restructuring can alter the stoichiometry of protein 78

interactions by modifying regulatory elements or lowering gene copy number. Presumably these 79

events trigger or permit compensatory responses including changed regulation, degeneration, or 80

deletion of superfluous upstream or downstream paralogs. Modulation of the transcriptome can occur 81

on an extraordinarily fine scale: a silenced paralog from one ancestor can be tightly linked to loci in 82

which only the paralog from the other ancestor is silenced, or linked to loci in which paralogs from 83

both ancestors are expressed (LEE and CHEN 2001). Genomic changes such as subfunctionalization, 84

gene deletion, and gene silencing can occur within generations after allopolyploidization, and these 85

changes can be non-stochastic and repeatable, and biased by ancestry (ADAMS et al. 2003; ADAMS et 86

al. 2004; ADAMS and WENDEL 2005; OZKAN et al. 2001; SHAKED et al. 2001; VOLKOV et al. 1999; 87

WANG et al. 2004). Allopolyploidization can also lead to completely novel expression patterns that 88

were not present in either parental species, perhaps as a result of dosage dependent gene regulation 89

(WANG et al. 2004). Expression can be influenced by the direction of the hybrid cross (SOLTIS et al. 90

2004), and co-adapted proteins that were inherited from one ancestor may function more efficiently 91

with one another than with proteins that were derived from a different ancestor (COMAI 2000). 92

Epigenetic phenomena, such as nucleolar dominance, gene silencing, alteration of cytosine 93

methylation patterns, and activation of mobile elements, can also foment genetic change (LIU and 94

WENDEL 2002; LIU and WENDEL 2003). However, variation exists among allopolyploid species in the 95

extent of genomic rearrangement – genomic modulation is less common, for example, in the early 96

stages of evolution in synthetic cotton allopolyploids (LIU et al. 2001) than in other allopolyploids 97

such as wheat and arabidopsis (LEVY and FELDMAN 2004; LUKENS et al. 2004; MADLUNG et al. 2002). 98

Page 5: Genetics: Published Articles Ahead of Print, published on ...90 (WANG et al. 2004). Expression can be influenced by the direction of the hybrid cross (SOLTIS et al. 91 2004), and co-adapted

5

99

Allopolyploid clawed frogs and the RAG1/RAG2 heterodimer. African clawed frogs (genera 100

Xenopus and Silurana) offer a promising model with which to examine the impact of ancestry on gene 101

fate in a allopolyploid genome because multiple independent instances of allopolyploidization 102

occurred (EVANS et al. 2005; EVANS et al. 2004b), and the long term effects of this type of genome 103

duplication can be explored with replication. All extant species of clawed frogs in the genus Xenopus 104

(i.e. those species with multiples of 2x = 18 chromosomes) share a common tetraploid ancestor 105

(EVANS et al. 2005; EVANS et al. 2004b). A definitive test of allopolyploid versus autopolyploid origin 106

of this ancestor is not currently possible because no extant Xenopus diploids (2x = 18) are known. 107

However, this ancestor is suspected to have been an allotetraploid because: (a) other polyploid clawed 108

frogs are definitively allopolyploids (EVANS et al. 2005), (b) Xenopus genomes are diploidized and 109

duplicated pairs of chromosomes have visible differences in secondary constrictions (TYMOWSKA 110

1991), (c) allopolyploid individuals can be created in the laboratory by crossing extant species (KOBEL 111

1996), and (d) multiple unlinked loci indicate that the phylogenetic signal of many paralogs is not 112

blended by recombination (CHAIN and EVANS 2006; CHAIN et al. 2007; EVANS et al. 2005). Silurana 113

and Xenopus diversified from one another roughly 53–64 million years ago and tetraploidization in 114

Xenopus occurred roughly 21–41 million years ago (CHAIN and EVANS 2006; EVANS et al. 2004b). 115

RAG1 and RAG2 proteins form a heterodimer that is crucial for the process of somatic 116

rearrangement of DNA known as V(D)J recombination, making possible the extraordinary molecular 117

variation of B and T-cell antigen receptors that is needed to combat pathogen attack. The core region 118

of RAG1 which, when paired with RAG2 is sufficient to carry out V(D)J recombination, spans human 119

residues 386–1011 out of a total of 1040 amino acids (SADOFSKY et al. 1993). The core region of 120

RAG1 was derived from the Transib transposon superfamily, whereas RAG2 and the N-terminal 121

domain of RAG1 probably appeared through other processes (KAPITONOV and JURKA 2005). These 122

genes are tightly linked in jawed vertebrates; a recent build (version 4.1) of the complete genome 123

sequence of the diploid clawed frog S. tropicalis indicates that there is only one copy of RAG1 and 124

one of RAG2, and that each is convergently transcribed and tightly linked by a ~6.5 kB intergenic 125

region. This is consistent with findings in X. laevis that suggest that this tetraploid was derived from 126

the fusion of two diploid genomes, and that each of these diploid genomes carried only one copy of 127

RAG1 (EVANS et al. 2005; GREENHALGH et al. 1993). Thus, in the absence of paralog deletion and 128

degeneration, a null expectation is that the number of RAG1 and RAG2 paralogs corresponds with the 129

ploidy level of a species – diploids should have one copy of each gene, tetraploids should have two, 130

Page 6: Genetics: Published Articles Ahead of Print, published on ...90 (WANG et al. 2004). Expression can be influenced by the direction of the hybrid cross (SOLTIS et al. 91 2004), and co-adapted

6

octoploids should have four, and dodecaploids should have six (Fig. 1). Each of these paralogs would 131

have two alleles. 132

Contrary to this null expectation, a previous study reported that in different polyploid species 133

of Xenopus, many RAG1 paralogs had become pseudogenes due to stop codons and frameshift 134

insertion/deletions (EVANS et al. 2005). These species of Xenopus were all ultimately derived from a 135

tetraploid ancestor that probably formed through the amalgamation of two diploid genomes by 136

allopolyploidization; this tetraploid ancestor therefore initially had two paralogs of RAG1 and two 137

paralogs of RAG2. If recombination between paralogs was rare or absent after allotetraploidization, 138

then all of the degenerate paralogs detected by Evans et al. (2005) were ultimately derived from only 139

one of these diploid progenitors – diploid “β” – as opposed to being derived from both of the diploid 140

ancestors (“α” and “β”). 141

This pattern of degeneration could be explained by (a) a shared ancestral degeneration of a 142

coding or regulatory region of RAG1 paralog β that was not sequenced by Evans et al. (2005), (b) 143

multiple independent and biased degenerations in different species, or (c) some combination of shared 144

and independent degeneration (EVANS et al. 2005). This study aims to further investigate these 145

possibilities and track the evolutionary history of this heterodimer through multiple episodes of 146

speciation and genome duplication. To this end, (a) new sequences were obtained from most or all 147

upstream coding regions of RAG1 paralogs to test for shared coding-region degeneration, (b) 148

expression analyses were performed on multiple species to test for ancestral silencing of paralogs, and 149

(c) sequences were obtained from paralogs of the linked partner gene RAG2. 150

151

METHODS 152

153

DNA sequencing. Sequence data were obtained from all or almost all RAG1 and RAG2 paralogs 154

from all known species of clawed frog, averaging 2341 base pairs (bp) out of about 3135 bp total per 155

RAG1 paralog and 1123 bp out of 1575 bp total per RAG2 paralog (Suppl. Fig. 1; Genbank Accession 156

Numbers: AY874308, AY874311, AY874309, AY874310, AY874324, AY874346, AY874352, 157

EF535883-EF535988). The sequenced portion of each RAG1 paralog varied but generally covers 158

most or all of the core region of RAG1 (Suppl. Fig. 2). Some paralogs were not detected using a 159

variety of primer combinations, and these may have been deleted (Suppl. Fig. 1). Sequencing of 160

individual paralogs was accomplished through a combination of TA cloning (Invitrogen) and targeted 161

amplification using paralog-specific primers (Suppl. Fig. 1). Allelic clones with the least number of 162

Page 7: Genetics: Published Articles Ahead of Print, published on ...90 (WANG et al. 2004). Expression can be influenced by the direction of the hybrid cross (SOLTIS et al. 91 2004), and co-adapted

7

autapomorphic mutations were selected for analysis. When both alleles of a paralog were directly 163

sequenced with paralog-specific primers, polymorphic positions were analyzed using IUPAC 164

degenerate nucleotide symbols. Some sequence data from expressed paralogs were obtained from 165

non-vouchered individuals; these data were conatenated with sequences from the corresponding 166

paralog from other individuals that usually were vouchered, as detailed in (EVANS et al. 2005; EVANS 167

et al. 2004b). Following (EVANS et al. 2005), paralogs of RAG1 and RAG2 that are closely related to 168

the linked X. laevis paralogs identified by Greenhalgh et al (1993) are referred to as β paralogs and the 169

others are referred to as α paralogs. In Silurana, tetraploid paralogs closely related to the diploid S. 170

tropicalis are designated α paralogs, and the others are β paralogs. 171

Attempts to amplify and clone the α paralogs of RAG2 failed in all species of Xenopus using a 172

variety of primer combinations (Suppl. Info. 1), even though both paralogs were detected in the 173

Silurana tetraploids. To rigorously test whether RAG2 α paralogs were deleted in clawed frogs as 174

opposed to just not being amplified, a systematic effort was made to co-amplify both paralogs using 175

seven pairwise combinations of three forward and three reverse primers (Suppl. Fig. 1). S. tropicalis 176

was used as a positive control to demonstrate that in addition to Xenopus RAG2 paralog β, these 177

primers can successfully amplify RAG2 in a more distantly related lineage than Xenopus RAG2 178

paralog α. 179

180

Amplification of expressed RAG1 paralogs. To determine which RAG1 paralogs are expressed in 181

various species, cDNA was amplified across an intron in the 5’ untranslated region of the RAG1 182

transcript (GREENHALGH et al. 1993). Negative controls with no DNA and with genomic DNA were 183

performed to insure that only expressed and spliced paralogs were amplified; previously published and 184

new primers were used (Suppl. Fig. 1; GREENHALGH et al. 1993). RNA was extracted using the 185

RNeasy mini kit (Qiagen) and converted to cDNA using the Omniscript RT kit (Qiagen). Individual 186

expressed paralogs were amplified from cDNA generated from different tissues (brain, liver, spleen, 187

testis, and/or bone marrow) from a variety of species (X. laevis, X. gilli, X. borealis, X. muelleri, X. 188

amieti, X. andrei, X. new octoploid, and X. boumbaensis), and then cloned and sequenced. 189

Additionally, cDNA from some tissues was directly sequenced and chromatograms were inspected for 190

paralog-specific single nucleotide polymorphisms. 191

192

Phylogenetic Analyses. Phylogenetic analysis was performed on coding portions of RAG1 and 193

RAG2 in three types of data configurations: (1) each locus was analyzed independently, (2) putatively 194

Page 8: Genetics: Published Articles Ahead of Print, published on ...90 (WANG et al. 2004). Expression can be influenced by the direction of the hybrid cross (SOLTIS et al. 91 2004), and co-adapted

8

linked paralogs were combined into single taxonomic units, and (3) for Xenopus only, a “synthetic” 195

dataset was constructed in which α and β paralogs of RAG1 and RAG2 that were derived from the 196

same most recent tetraploid ancestor were combined into a single taxonomic unit. This third 197

configuration exploits the redundant phylogenetic information available in co-inherited paralogs. For 198

example, in the third data configuration, tetraploid α and β paralogs were combined, octoploid α1 and 199

β1 paralogs were combined, but octoploid α1 and β2 paralogs were not combined. For the third 200

analysis, S. tropicalis was used as an outgroup to RAG1 α and RAG2 paralogs and no outgroup 201

sequence was used for the portion of the sequences that was comprised of RAG1 β paralogs. 202

MrBayes version 3.1.2 was used for Bayesian phylogenetic analysis, and Bayes factors were 203

used to select a model of evolution as described by Nylander et al. (2004). Seven partitioned models 204

were explored (Suppl. Table 1). These models were compared based on the harmonic mean of the 205

posterior probability of trees sampled after a conservative burnin of one million generations from two 206

independent MCMC runs, each of 2 million generations. A highly parameterized model of evolution 207

was favored for phylogenetic analyses of RAG1 and RAG2 (Suppl. Table 1), and this model was also 208

used for the combined analyses, which used for 5 million generations and the same burnin. Branch 209

support was also evaluated with 2,000 nonparametric bootstrapping replicates, each with a single 210

replication of random taxon addition, a limit of ten million rearrangements per replicate, and the 211

maximum parsimony criterion using PAUP version 4.0b10 (SWOFFORD 2002). Almost all of the well-212

supported relationships from the Bayesian analyses also have nonparametric bootstrap values of over 213

80% (Suppl. Fig. 3) 214

To test hypotheses of fewer gene silencing events in RAG1 than was suggested by 215

phylogenetic analysis, parametric bootstrap tests (GOLDMAN et al. 2000; HUELSENBECK et al. 1996) 216

were performed as in (EVANS et al. 2005). This procedure tests the fit of the data to alternative 217

phylogenetic hypotheses that are different from the consensus tree that was obtained from the Bayesian 218

analysis, and that would be consistent with fewer instances of independent gene degeneration (Suppl. 219

Fig. 4). To maximize the phylogenetic signal of these tests, simulations were performed according to 220

the “synthetic” data configuration using Seq-Gen version 1.3.2 (RAMBAUT and GRASSLY 1997). Perl 221

scripts were written to modify the patchiness the simulations to match the observed data in terms of the 222

quantity and positions of missing data for each taxon. 223

A caveat to the interpretation that autapomorphic degenerations occurred independently is that 224

recurrent substitutions could have erased an ancestral stop codon in one or both descendant paralogs. 225

To explore this possibility, marginal ancestral reconstruction of ancestral character states was 226

Page 9: Genetics: Published Articles Ahead of Print, published on ...90 (WANG et al. 2004). Expression can be influenced by the direction of the hybrid cross (SOLTIS et al. 91 2004), and co-adapted

9

performed with a general time reversible nucleotide model and a gamma distributed rate heterogeneity 227

parameter using the baseml program of PAML version 3.14 (YANG 1997). Reconstructed sequences 228

were then translated into protein using MacClade version 4.08 (MADDISON and MADDISON 2000) and 229

inspected for stop codons. 230

231

Testing for phylogenetic bias in RAG1 degeneration. To explore whether there is significant bias in 232

gene degeneration of RAG1 with respect to ancestry of each paralog, two approaches were taken. The 233

first approach used a maximum likelihood framework to compare rates of degeneration (δ), using 234

Discrete version 4.0 (PAGEL 1994). This framework tested whether rates of degeneration in RAG1 and 235

RAG2 were significantly different, and also whether rates of degeneration in the RAG1 α and β 236

lineages were significantly different. The rate of resuscitation of degenerate paralogs was set to a 237

negligible value, missing paralogs were coded as degenerate, and the most recent ancestor of expressed 238

paralogs with autapomorphic degenerations was set as nondegenerate. Likelihood ratio tests were used 239

to compare models with two degeneration parameters to models with only one, and these tests were 240

performed with and without a gamma-distributed approximation for rate heterogeneity (γ). To be 241

conservative, modified topologies were used in which the independent degenerations (numbered 1-12 242

in Fig. 2) were each forced to each be a clade. Branch lengths were estimated under the GTR+I+Γ 243

model and imposing a molecular clock using PAUP* (SWOFFORD 2002). 244

A second approach used simulations to estimate the probability that the observed number of 245

“chimerical” heterodimers – those composed of RAG1 and RAG2 proteins from different paralogous 246

lineages – occurred by chance if there were no bias to RAG1 degeneration. It was assumed that the 247

number of observed degenerate paralogs in each species follows a binomial distribution with a species-248

specific probability density for the mean of this distribution that was calculated from the observed 249

data. It was also assumed that at least one RAG1 paralog must remain functional in each species, that 250

no simulated degeneration could occur in species with no observed degeneration, that each non-251

degenerate paralog is expressed at the same intensity, time, and place, and that RAG1/RAG2 252

heterodimers form from paralogs from the same ancestral lineage (α or β) whenever possible. For 253

eight extant or ancestral species with observed independent RAG1 degeneration (see Results), one 254

hundred thousand simulations drew k degeneration events from n-1 paralogs, where n is the total 255

number of non-degenerate paralogs inferred to be present when that species originated. Degeneration 256

was modeled as a delayed transformation phenomenon wherein paralogs degenerated after 257

allopolyploidization and they were also performed in a phylogenetically-independent manner such that 258

Page 10: Genetics: Published Articles Ahead of Print, published on ...90 (WANG et al. 2004). Expression can be influenced by the direction of the hybrid cross (SOLTIS et al. 91 2004), and co-adapted

10

ancestral degenerations were inherited and not “re-simulated” in descendant species. Chimerical 259

heterodimers were quantified for each simulation based on observed and suspected degenerations and 260

deletions of RAG2 (i.e. degenerations 10, 11, and 12 in Fig. 2). 261

262

Recombination between alleles of different paralogs. Recombination between alleles of different 263

paralogs could blend their phylogenetic signal, and recombination between the intergenic regions of 264

paralogous chromosomes could alter the paralogs synteny (GREENHALGH et al. 1993). To test for 265

evidence of recombination in Xenopus and Silurana sequences, multiple tests were used because their 266

performance varies with the level of divergence, the extent of recombination, and among site rate 267

heterogeneity (POSADA 2002; POSADA and CRANDALL 2001). These tests included the recombination 268

detection program, geneconv, chimera, bootscan, and siscan, as implemented by the Recombination 269

Detection Program (MARTIN and RYBICKI 2000). Details of these methods can be found elsewhere 270

(GIBBS et al. 2000; MARTIN and RYBICKI 2000; MAYNARD SMITH 1992; PADIDAM et al. 1999; POSADA 271

and CRANDALL 2001; SALMINEN et al. 1995). A variety of parameter settings were explored for each 272

method as in (EVANS et al. 2005). 273

274

RESULTS 275

276

New data lead to a re-assessment of the evolutionary history of X. boumbaensis. Evolutionary 277

relationships inferred from RAG1 (this study; EVANS et al. 2005), mitochondrial DNA (EVANS et al. 278

2004b), and RAG2 (Fig. 2, Suppl. Fig. 3) can be synthesized into a reticulate phylogeny (Fig. 3). New 279

data identified an exception to expected relationships among paralogs in an individual herein referred 280

to as X. cf. boumbaensis. Additional sequencing from X. cf. boumbaensis identified a third RAG1 α 281

paralog and two RAG2 paralogs whose relationships are suggestive of dodecaploidy (Fig 1). This 282

suggests that this individual from Younde, Cameroon, was previously incorrectly classified as X. 283

boumbaensis (EVANS et al. 2004a; EVANS et al. 2005), which is an octoploid species (LOUMONT 1983; 284

TYMOWSKA 1991). It appears that X. cf. boumbaensis is a dodecaploid derived from 285

allopolyploidization between an octoploid ancestor of X. boumbaensis and a tetraploid ancestor of X. 286

cf. fraseri 2 (Fig. 3). Data from an X. boumbaensis individual from the type locality of Moloundou, 287

Cameroon, that were not included in EVANS et al. (2005), are consistent with octoploidy (Fig. 2, 3) and 288

lead to a re-evaluation of the evolutionary history of this species. Moreover it appears that X. 289

boumbaensis, X. ameiti, and X. andrei share a common octoploid ancestor, as opposed to each 290

Page 11: Genetics: Published Articles Ahead of Print, published on ...90 (WANG et al. 2004). Expression can be influenced by the direction of the hybrid cross (SOLTIS et al. 91 2004), and co-adapted

11

originating independently as was previously proposed (EVANS et al. 2005). Thus these genealogies 291

support three rather than five independent allopolyploid origins of most extant octoploids: (1) X. 292

vestitus, (2) X. wittei + X. new octoploid, and (3) X. amieti + X. andrei + X. boumbaensis, but three 293

rather than two independent origins of dodecaploids: (1) X. ruwenzoriensis, (2) X. longipes, and (3) X. 294

cf. boumbaensis (Fig. 3). This new information also changes the number of ancestral species that are 295

predicted but for whom an extant descendant with the same ploidy level is unknown from three 296

diploids, three tetraploids, and one octoploid (EVANS et al. 2005) to three diploids and three tetraploids 297

(Fig. 3). Mitochondrial DNA sequences from X. cf. boumbaensis are almost identical to a X. 298

boumbaensis sample from the type locality (EVANS et al. 2004b), suggesting recent dodecaploidization 299

of this individual. 300

301

Deletion of RAG2 and RAG1 paralogs. Both paralogs of RAG2 were amplified, cloned, and 302

sequenced from genomic DNA in Silurana tetraploids but only one divergent lineage of RAG2 was 303

amplified in Xenopus. Systematic attempts to amplify additional RAG2 paralogs in Xenopus with 304

seven combinations of other primers (Suppl. Fig. 1) were unsuccessful, even though these primers 305

successfully amplified a more divergent RAG2 ortholog in the diploid S. tropicalis. Southern blotting 306

of X. laevis genomic DNA also detected only one RAG2 paralog but two RAG1 paralogs 307

(GREENHALGH et al. 1993). This suggests that one RAG2 paralog was deleted in an early tetraploid 308

ancestor (deletion 10 in Fig. 2), as opposed to the alternatives that gene conversion occurred or that 309

another paralog is present but undetected. All Xenopus species are therefore suspected to have half of 310

the number of RAG2 paralogs than RAG1 paralogs. 311

Another suspected deletion of RAG2 occurred in an ancestral paralog from which paralog β2 312

of X. amieti, X. ruwenzoriensis, X. boumbaensis, and X. longipes, and RAG2 paralog β3 of X. cf. 313

boumbaensis would have descended (deletion 12 in Fig. 2). These predicted paralogs were not 314

detected in amplifications from genomic DNA, even after multiple attempts with different primer 315

combinations (Suppl. Fig. 1). Linked RAG1 paralogs from these species also were not detected 316

(deletion 12 in Fig. 2), suggesting that the deleted region spans both of them. Apart from these putative 317

deletions, the only other observed gene degeneration in RAG2 was in X. andrei RAG2 paralog β2, 318

which experienced a frameshift deletion. 319

320

Independent degeneration of RAG1 paralogs. To test whether the unique degenerations in the 3’ 321

region of RAG1 (EVANS et al. 2005) could have occurred after a shared ancestral degeneration in the 322

Page 12: Genetics: Published Articles Ahead of Print, published on ...90 (WANG et al. 2004). Expression can be influenced by the direction of the hybrid cross (SOLTIS et al. 91 2004), and co-adapted

12

5’ coding region, RAG1 sequences were obtained from most of the coding region of most or all RAG1 323

paralogs of all known species of clawed frog (Suppl. Fig. 2). Including previously identified 324

degenerate paralogs (EVANS et al. 2005), a total of seventeen degenerate RAG1 β paralogs and two 325

degenerate RAG1 α paralogs were detected in various Xenopus species (Fig. 2, Suppl. Fig. 2). The 326

only shared stop codons or frameshift mutations that were identified were: (1) one frameshift and one 327

stop codon shared by X. pygmaeus paralog β and X. ruwenzoriensis paralog β3, (2) a stop codon 328

shared by X. vestitus paralog β1 and X. longipes paralog β3, and (3) a stop codon shared by the X. 329

ruwenzoriensis paralog β1 and X. vestitus paralog β2. Maximum likelihood reconstructions indicate 330

that the first example is due to shared ancestry, but that the other two evolved independently. 331

Expression of at least one RAG1 paralog was confirmed in heart, brain, liver, testes, spleen, 332

and bone marrow (Table 1). Xenopus laevis and X. gilli each express both of their RAG1 paralogs and 333

neither is degenerate. Likewise, no degeneration was observed at the DNA level in either RAG1 334

paralog of some other tetraploids including X. clivii, X. largeni, and X. muelleri west (Suppl. Fig. 2). 335

Degeneration of many paralogs appears to have occurred independently because (a) the stop 336

codons and frameshift mutations are not shared with other paralogs and because (b) these degenerate 337

paralogs are either still expressed or closely related to other expressed paralogs. X. andrei RAG1 338

paralog β1, X. boumbaensis RAG1 paralog β1, and X. amieti RAG1 paralog α2, for example, are 339

expressed even though each one is degenerate (degenerations 4, 7, and 9, respectively, in Figs. 2 and 340

4). Overall, under the assumptions of no resuscitation of silenced genes, phylogenetic analyses 341

support a minimum of seven independent episodes of degeneration of the Xenopus RAG1 β paralogs 342

and two independent episodes in the Xenopus RAG1 α paralogs (Figs. 2 and 4). 343

The minimum seven instances of gene silencing RAG1 β paralogs, listed by numbers 344

corresponding with Figs. 2 and 4, include: (1) a heterozygous stop codon in X. borealis RAG1 paralog 345

β – expression of this paralog also was not detected in multiple tissues (Table 1), (2) RAG1 paralog β 346

of a tetraploid ancestor of both of the tetraploid ancestors of X. vestitus, (3) RAG1 paralog β3 of X. 347

longipes or its tetraploid ancestor, (4) X. andrei RAG1 paralog β1, (5) X. wittei RAG1 paralog β1, (6) 348

X. amieti RAG1 paralog β1, and (7) X. boumbaensis RAG1 paralog β1. The two examples of 349

degeneration in the RAG1 α paralogs are (8) X. ruwenzoriensis paralog α2 and (9) X. amieti paralog 350

α2 (Fig. 2). Both instances of degeneration of RAG1 paralog α occurred very recently, and potentially 351

after protein-protein interactions between RAG1 paralog α and RAG2 paralog β were already 352

established by pseudogenization of other RAG1 paralogs. 353

Page 13: Genetics: Published Articles Ahead of Print, published on ...90 (WANG et al. 2004). Expression can be influenced by the direction of the hybrid cross (SOLTIS et al. 91 2004), and co-adapted

13

Nine additional instances of degeneration of RAG1 β paralogs are possible but their 354

independence depends on whether ancestral gene silencing preceded autapomorphic degeneration of 355

the coding regions of various paralogs (Fig. 2). Expression of X. muelleri RAG1 paralog β, for 356

example, was not detected in multiple tissues (Table 1) and expression of this paralog may have been 357

silenced ancestrally prior to the speciation of X. muelleri and X. borealis and coding region 358

degeneration in X. borealis (degeneration #1 in Fig. 2 and 4). Likewise, a lack of detected expression 359

of X. new octoploid RAG1 paralog β2 could be explained by silenced expression of paralog β in one of 360

the tetraploid ancestors of X. vestitus, X. wittei, and X. new octoploid. If this were the case, then 361

degeneration of RAG1 paralog β1 and β2 in X. vestitus, paralog β2 in X. wittei, and paralog β2 in X. 362

new octoploid should be considered a single event (degeneration 2 in Figs. 2 and 4), even though no 363

shared degeneration of the coding region of these paralogs was observed (Suppl. Info. 2). 364

Parametric bootstrap tests strongly reject hypotheses of fewer independent degenerations in the 365

β lineage of RAG1 (P < 0.001). Of note is that the full sequence of X. boumbaensis RAG1 paralog β1 366

was not obtained (Suppl. Fig. 2) so the possibility of shared degeneration with another closely related 367

paralog cannot be completely dismissed. Also of interest is the observation that in X. new octoploid, 368

the intensity of paralog specific polymorphisms on sequence chromatograms suggests that expression 369

of RAG1 paralog α1 was higher than paralog α2 in the brain, whereas the opposite was observed in 370

amplifications of RAG1 from heart cDNA from this species (Table 1), a pattern of expression that is 371

suggestive of subfunctionalization (FORCE et al. 1999). 372

373

Significant bias to RAG1 degeneration. Comparison of alternative parameterizations of a model of 374

stochastic degeneration indicates that the rate of degeneration of the RAG1 β lineage is significantly 375

higher than the RAG1 α lineage with a gamma rate heterogeneity model (P = 0.0274, δRAG1α= 6.5519, 376

δRAG1β= 50.0343, γ RAG1α = 0.0082, γ RAG1β = 0.0042, df = 2) or without one (P = 0.0054, δRAG1α= 4.8927, 377

δRAG1β= 34.4187, df = 1). Additionally, the overall rate of degeneration of RAG1 is significantly 378

higher than the rate of degeneration of RAG2 when modeled with a gamma rate heterogeneity model 379

(P = 0.0431, δRAG1 = 31.2407, δRAG2 = 3.08511, γ RAG1 = 0.0082, γ RAG2 = 0.0096, df = 2) and without one 380

(P = 0.0323, δRAG1 = 7.8978, δRAG2 = 1.8141, df = 1). 381

Because some species may have inherited RAG1 β paralogs that were already degenerate, even 382

if subsequent degeneration of RAG1 were unbiased, chimerical heterodimers would still be expected 383

to occur. However, under the assumptions discussed above and given the observed pattern of deletion 384

Page 14: Genetics: Published Articles Ahead of Print, published on ...90 (WANG et al. 2004). Expression can be influenced by the direction of the hybrid cross (SOLTIS et al. 91 2004), and co-adapted

14

in RAG2, simulations indicate that the observed number of chimerical heterodimers that resulted from 385

RAG1 degeneration is significantly in excess of expectations if RAG1 degeneration were unbiased (P 386

< 0.0001, Fig. 5). 387

388

Recombination between RAG1 paralogs and between RAG2 paralogs is rare. Using 389

congruence of multiple tests as a criterion for credibility (POSADA 2002; POSADA and CRANDALL 390

2001), there is not convincing evidence of recombination between paralogs of RAG1, including when 391

new data are included, or between paralogs of RAG2 (this study; EVANS et al. 2005). No two tests 392

recovered significant evidence of recombination for the same region of any paralog of RAG1 or RAG2 393

and most tests did not recover any signal of recombination at all. A few putative recombination events 394

that had a parent sequence within the same species were individually identified by various tests but 395

these were deemed not credible based on a visual inspection of the data and a lack of corroboration by 396

other tests (POSADA 2002; POSADA and CRANDALL 2001). 397

Linkage of RAG1 paralog β and RAG2 paralog β has been demonstrated (GREENHALGH et al. 398

1993) and a similar linkage structure is suggested by a putative deletion that spanned RAG1 and 399

RAG2 paralogs of an ancestor of X. amieti, X. ruwenzoriensis, X. boumbaensis, X. cf. boumbaensis, 400

and X. longipes (deletion 12 in Fig. 2). The possibility that recombination occurred between the 401

intergenic region that separates linked alleles of RAG1 and RAG2 is difficult to conclusively rule out 402

because a diploid Xenopus (2x=18), which could confirm the ancestral synteny of RAG1 and RAG2 403

paralogs, is not available. However, phylogenetic congruence among RAG1, RAG2, and mtDNA, and 404

the internal consistency between the α and β genealogies of RAG1 support the contention that 405

recombination between these paralogs is rare (Fig. 2; EVANS et al. 2005; EVANS et al. 2004b). 406

Recombination among paralogous alleles is expected to be rare if diploidization of these allopolyploid 407

genomes occurred soon after their formation, which is typical of allopolyploid cotton, for example 408

(CRONN et al. 1999). 409

410

411

DISCUSSION 412

413

Evolution of a heterodimer in allopolyploid clawed frogs. The crucial process of V(D)J 414

recombination requires the action of a heterodimer that is made up of the RAG1 and RAG2 proteins, 415

which are encoded by tightly linked genes. The genes whose protein products form this heterodimer 416

Page 15: Genetics: Published Articles Ahead of Print, published on ...90 (WANG et al. 2004). Expression can be influenced by the direction of the hybrid cross (SOLTIS et al. 91 2004), and co-adapted

15

were duplicated, probably by allo- rather than auto-polyploidization, to generate a tetraploid ancestor 417

of all extant species in the genus Xenopus. Because allotetraploids inherit a complete genome from 418

two different diploid species, this tetraploid ancestor initially had two linked sets of RAG1 and RAG2 419

paralogs. Immediately after allopolyploidization, the RAG1/RAG2 heterodimers probably had 420

essentially identical functions but their dosage could have varied as a result of unique mutations that 421

occurred in each of the diploid ancestors. At this point the paralogous expression domains were also 422

probably similar, making possible protein-protein interactions between paralogs inherited from 423

different diploid ancestors. These linked sets of interacting paralogs were then duplicated again when 424

multiple independent episodes of allopolyploidization generated new octoploid and dodecaploid 425

species (Fig. 3; EVANS et al. 2005). 426

Before the tetraploid Xenopus ancestor diversified to give rise to the extant species, it appears 427

that one RAG2 paralog – paralog α – was deleted, leaving only the other RAG2 paralog – paralog β, 428

but still two functional paralogs of RAG1, α and β (Figs. 2, 4). Later, after speciation of this 429

tetraploid ancestor without change in genome size and also after further episodes of 430

allopolyploidization, paralogs of RAG1 in different species independently degenerated, making their 431

copy number similar or equal to RAG2. Surprisingly, until very recently degeneration of RAG1 432

paralogs occurred exclusively in closely related members of paralog lineage β. As a result, many 433

Xenopus species have functional paralogs of RAG1 that are exclusively α paralogs, and their protein 434

products must heterodimerize with proteins encoded by RAG2 β paralogs (Figs. 2, 4). Moreover, in 435

many species it appears that the functional copy of RAG1 is linked to a region where a paralog of 436

RAG2 was deleted and the functional copy of RAG2 is linked a paralog of RAG1 that either 437

degenerated or that was deleted. While the tetraploid species X. laevis and X. gilli still express 438

functional linked β paralogs of RAG1 and RAG2 and also an α paralog of RAG1, in most other 439

Xenopus species the only functional paralogs of RAG1 and RAG2 are unlinked, suggesting that each 440

one was derived from a different diploid ancestor. In contrast, allotetraploid clawed frogs that evolved 441

independently in the genus Silurana (4x = 40) retain both paralogs of RAG1 and of RAG2, all appear 442

functional at the DNA level in the portions of these genes that were sequenced, and both RAG1 443

paralogs are expressed in at least one Silurana tetraploid (S. new tetraploid 1). 444

The rate of degeneration is significantly higher in RAG1 β paralogs than in RAG1 α paralogs 445

and significantly higher in RAG1 than in RAG2. Simulations also indicate that, given the observed 446

pattern of degeneration of RAG2, the probability of unbiased degeneration of RAG1 producing by 447

chance such a high number of chimerical heterodimers derived from unlinked paralogs of RAG1 and 448

Page 16: Genetics: Published Articles Ahead of Print, published on ...90 (WANG et al. 2004). Expression can be influenced by the direction of the hybrid cross (SOLTIS et al. 91 2004), and co-adapted

16

RAG2 is very low. It is surprising that degeneration of RAG1 paralogs occurred in this manner 449

because an allopolyploid origin of the tetraploid ancestor of Xenopus would suggest that unlinked 450

paralogs share a shorter co-evolutionary history than do the linked ones. 451

Unique characteristics of each subgenome could account for the biased degeneration of RAG1 452

paralog β. Epigenetic phenomena after allopolyploidization could contribute to this bias, for example, 453

if paralog expression were differently affected by asymmetric mobility of transposable elements in 454

each subgenome. Alternatively, genetic explanations for nonrandom degeneration of RAG1 paralogs 455

include (1) “intra-paralog” phenomena, such as differences in dosage of each RAG1 paralog, or (2) 456

“inter-molecular” phenomena involving selection on interactions between the protein products of 457

specific RAG1 paralogs and other molecules. This second genetic explanation includes scenarios 458

involving a disadvantage (negative selection) or an advantage (positive selection) to interactions 459

between specific paralogs of RAG1 and other molecules, which may or may not include proteins 460

encoded by specific paralogs of RAG2. 461

462

Dosage as an explanation for non-random gene silencing of RAG1. One explanation for biased 463

degeneration of RAG1 is that, after allotetraploidization in Xenopus, expression of the RAG1 paralog 464

that was derived from diploid ancestor α was greater than the one that was derived from diploid 465

ancestor β. If this difference in dosage meant that the RAG1 paralog α could operate alone in a 466

polyploid genome whereas RAG1 paralog β could not, then the former would be both necessary and 467

sufficient. Under this scenario, the function of each RAG1 paralog would be identical and 468

interchangeable and the difference driving degeneration of RAG1 lineage β would be one of quantity, 469

not quality, of proteins encoded by RAG1 paralogs. However, because copy number of RAG2 was 470

reduced by deletion in an ancestor before any RAG1 copies degenerated, dosage constraints on RAG1 471

after allopolyploidization would have to have been imposed by other co-factors of RAG1. 472

Haploinsufficient phenotypes result from reduced expression or activity of a heterozygous 473

locus, and these phenotypes could stem from multiple mechanisms including altered enzymatic 474

stoichiometry (VEITIA 2002). The importance of stoichiometric requirements is suggested in yeast in 475

that proteins that form complexes are rarely members of large gene families and underexpression or 476

overexpression of these genes can be deleterious (PAPP et al. 2003). In the same way, ancestral 477

differences in gene dosage could influence allopolyploid phenotypes and ultimately affect the genetic 478

fate of paralogs in an allopolyploid genome. For example, laboratory generated allopolyploids with 479

two Z chromosomes from X. gilli and one W chromosome from X. laevis were mostly male whereas 480

Page 17: Genetics: Published Articles Ahead of Print, published on ...90 (WANG et al. 2004). Expression can be influenced by the direction of the hybrid cross (SOLTIS et al. 91 2004), and co-adapted

17

individuals with the same W chromosome but one Z chromosome from X. gilli and one Z chromosome 481

from X. muelleri (or X. laevis) were only female (KOBEL 1996). Dosage could also be an important 482

factor in the evolution of dominance. That wildtype alleles are generally dominant over new mutations 483

could be a byproduct of selection for surplus capability that is needed to operate under heterogeneous 484

conditions (CHARLESWORTH 1979; FORSDYKE 1994; KACSER and BURNS 1981; ORR 1991; WRIGHT 485

1934). 486

487

Selection on interactions between specific paralogs of RAG1 and other molecules. If the 488

tetraploid ancestor of Xenopus evolved through allopolyploidization, negative selection on molecular 489

interactions could arise from Dobzhansky-Muller incompatibilities (DOBZHANSKY 1937; MULLER 490

1942). However, Dobzhansky-Muller incompatibilities between paralogs of RAG1 and RAG2 could 491

explain the degeneration of specific RAG1 paralogs only if the paralogs that are currently functional 492

were derived from the same diploid ancestor. But this does not appear to be the case because, based on 493

comparison to linked paralogs in X. laevis (GREENHALGH et al. 1993), the only functional RAG1 and 494

RAG2 paralogs in most species are not in synteny. Thus, similar to the dosage hypothesis, if 495

Dobzhansky-Muller incompatibilities drove degeneration of RAG1 paralog β, they would probably 496

involve an interaction other than that with the protein product of the RAG2 gene. This could include 497

interactions with other protein co-factors of V(D)J recombination or with the recombination signal 498

sequences (RSSs) or the spacer regions between RSSs that flank variable, diversity, and joining 499

segments (RAMSDEN et al. 1994; SAKANO et al. 1979). 500

If the tetraploid ancestor of Xenopus was allopolyploid, positive selection could account for 501

this pattern of gene silencing in at least two ways. First, co-adaptation of paralogs of RAG1 and 502

RAG2 could have occurred in one of the diploid ancestors and then these paralogs could have been 503

unlinked by recombination after allopolyploidization. However, the co-adaptation hypothesis is 504

disfavored for the same reason that Dobzhansky-Muller incompatibilities between RAG1 and RAG2 505

are an unlikely explanation – recombination between alleles of different paralogs appears rare and 506

functional RAG1 and RAG2 paralogs in many species therefore are probably derived from different 507

diploid ancestors. A second possibility is that there is a performance advantage to protein-protein 508

interactions between RAG1 and RAG2 paralogs that are derived from different diploid ancestors. This 509

scenario posits an advantage, or heterosis, to a combination of two evolutionarily naïve proteins over a 510

heterodimer whose constituents have a longer period of coevolutionary history. Heterosis is also 511

Page 18: Genetics: Published Articles Ahead of Print, published on ...90 (WANG et al. 2004). Expression can be influenced by the direction of the hybrid cross (SOLTIS et al. 91 2004), and co-adapted

18

suggested, for example, in Xenopus polyploids that are resistant to parasites that infect both of their 512

parental taxa (JACKSON and TINSLEY 2003). 513

Explanations for heterosis include overdominance, dominance, or epistasis. The 514

overdominance hypothesis posits that heterozygous loci are more fit than homozygous loci (whether 515

dominant or recessive) (HULL 1945; SHULL 1908), and this could apply to allopolyploids that co-516

express alleles from each parental species. In fact, allopolyploidization provides a way to maintain 517

heterosis associated with heterozygosity because alleles from each parental species are forced to co-518

segregate in disomic allopolyploids. In the current case, however, the overdominance explanation for 519

heterosis does not apply because, while many allopolyploids have a RAG1/RAG2 heterodimer 520

comprised of proteins derived from genes with different ancestry, each paralog that encodes the 521

proteins in these heterodimers is homozygous with respect to their diploid ancestry. The dominance 522

explanation for heterosis posits that if dominant alleles are more fit, hybrids (or allopolyploids) would 523

benefit from the dominant alleles from each parent (DAVENPORT 1908). Dominance can result from 524

differences in dosage, which was discussed earlier, and could also be a consequence of epistasis, so 525

these explanations for heterosis are not mutually exclusive. Epistasis could lead to heterosis in an 526

allopolyploid if there were a performance advantage to interactions between proteins derived from 527

different ancestors. 528

Taken together, these observations suggest that allopolyploid transcriptomes are sculpted by 529

natural selection on each subgenome. Mutational or regulatory differences that accumulated in each 530

ancestor may be advantageous or deleterious, and their paralogs can be preserved or discarded after 531

genome fusion based on their performance and their interactions with molecules from the other 532

subgenome. In some cases, pseudogenization is strongly biased with respect to ancestry over millions 533

of years, and chimerical interactions between proteins from different ancestors may be favored over 534

interactions between proteins that share a longer period of co-evolution. 535

536

ACKNOWLEDGMENTS 537

538

I am grateful to Brian Golding for helpful discussion about this project and for access to computational 539

resources in his laboratory and to two anonymous reviewers for providing comments on this 540

manuscript. This research was supported by the Canadian Foundation for Innovation, the National 541

Science and Engineering Research Council, the Ontario Research and Development Challenge Fund, 542

and McMaster University. 543

Page 19: Genetics: Published Articles Ahead of Print, published on ...90 (WANG et al. 2004). Expression can be influenced by the direction of the hybrid cross (SOLTIS et al. 91 2004), and co-adapted

19

544

LITERATURE CITED 545

546

ADAMS, K. L., R. CRONN, R. PERCIFELD and J. F. WENDEL, 2003 Genes duplicated by polyploidy 547

show unequal contributions to the transcriptome and organ-specific reciprocal silencing. 548

Proceedings of the National Academy of Sciences 100: 4649. 549

ADAMS, K. L., R. PERCIFELD and J. F. WENDEL, 2004 Organ-specific silencing of duplicated genes in a 550

newly synthesized cotton allotetraploid. Genetics 168: 2217-2226. 551

ADAMS, K. L., and J. F. WENDEL, 2005 Polyploidy and genome evolution in plants. Current Opinion in 552

Plant Biology 8: 135-141. 553

CHAIN, F. J. J., and B. J. EVANS, 2006 Molecular evolution of duplicate genes in Xenopus laevis is 554

consistent with multiple mechanisms for their retained expression. PLoS Genetics 2: e56. 555

CHAIN, F. J. J., D. ILIEVA and B. J. EVANS, 2007 Immediate purifying selection, dosage requirements, 556

and expression divergence following gene duplication by allopolyploidization. Submitted. 557

CHARLESWORTH, B., 1979 Evidence against Fisher's theory of dominance. Nature 278: 848-849. 558

COMAI, L., 2000 Genetic and epigenetic interactions in allopolyploid plants. Plant Molecular Biology 559

43: 387-399. 560

CRONN, R. C., R. L. SMALL and J. F. WENDEL, 1999 Duplicated genes evolve independently after 561

polyploid formation in cotton. Proceedings of the National Academy of Sciences 96: 14406-562

14411. 563

DAVENPORT, C. B., 1908 Degeneration, albinism and inbreeding. Science 28: 454-455. 564

DOBZHANSKY, T., 1937 Genetics and the origin of species. Columbia University Press, New York. 565

EVANS, B. J., D. C. CANNATELLA and D. J. MELNICK, 2004a Understanding the origins of areas of 566

endemism in phylogeographic analyses: a reply to Bridle et al. Evolution 58: 1397-1400. 567

Page 20: Genetics: Published Articles Ahead of Print, published on ...90 (WANG et al. 2004). Expression can be influenced by the direction of the hybrid cross (SOLTIS et al. 91 2004), and co-adapted

20

EVANS, B. J., D. B. KELLEY, D. J. MELNICK and D. C. CANNATELLA, 2005 Evolution of RAG-1 in 568

polyploid clawed frogs. Molecular Biology and Evolution 22: 1193-1207. 569

EVANS, B. J., D. B. KELLEY, R. C. TINSLEY, D. J. MELNICK and D. C. CANNATELLA, 2004b A 570

mitochondrial DNA phylogeny of clawed frogs: phylogeography on sub-Saharan Africa and 571

implications for polyploid evolution. Molecular Phylogenetics and Evolution 33: 197-213. 572

FORCE, A., M. LYNCH, B. PICKETT, A. AMORES, Y. L. YAN et al., 1999 Preservation of duplicate genes 573

by complementary, degenerative mutations. Genetics 151: 1531-1545. 574

FORSDYKE, D. R., 1994 The heat-shock response and the molecular basis of genetic dominance. 575

Journal of Theoretical Biology 167: 1-5. 576

GIBBS, M. J., J. S. ARMSTRONG and A. J. GIBBS, 2000 Sister-Scanning: a Monte Carlo procedure for 577

assessing signals in recombinant sequences. Bioinformatics 16: 573-582. 578

GOLDMAN, N., J. P. ANDERSON and A. G. RODRIGO, 2000 Likelihood-based tests of topologies in 579

phylogenetics. Systematic Biology 49: 652-670. 580

GREENHALGH, P., C. E. M. OLESEN and L. A. STEINER, 1993 Characterization and expression of 581

Recombination Activating Genes (RAG-1 and RAG-2) in Xenopus laevis. Journal of 582

Immunology 151: 3100-3110. 583

GU, X., 2003 Evolution of duplicate genes versus genetic robustness against null mutations. Trends in 584

Genetics 19: 354-356. 585

HUELSENBECK, J. P., D. M. HILLIS and R. JONES, 1996 Parametric bootstrapping in molecular 586

phylogenetics: applications and performance., pp. 19-45 in Molecular Zoology: Advances, 587

Strategies, and Protocols., edited by J. D. FERRARIS and S. R. PALUMBI. Wiley-Liss, New 588

York. 589

HUGHES, M. K., and A. L. HUGHES, 1993 Evolution of duplicate genes in a tetraploid animal, Xenopus 590

laevis. Molecular Biology and Evolution 10: 1360-1369. 591

Page 21: Genetics: Published Articles Ahead of Print, published on ...90 (WANG et al. 2004). Expression can be influenced by the direction of the hybrid cross (SOLTIS et al. 91 2004), and co-adapted

21

HULL, F. G., 1945 Recurrent selection and specific combining ability in corn. Journal of the American 592

Society of Agronomy 37: 134-145. 593

JACKSON, J. A., and R. C. TINSLEY, 2003 Parasite infectivity to hybridising host species: a link 594

between hybrid resistance and allopolyploid speciation? International Journal for Parasitology 595

33: 137-144. 596

JIANG, C.-X., P. W. CHEE, X. DRAYE, P. L. MORRELL, C. W. SMITH et al., 2000 Multilocus 597

interactions restrict gene introgression in interspecific populations of polyploid Gossypium 598

(cotton). Evolution 54: 798-814. 599

KACSER, H., and J. A. BURNS, 1981 The molecular basis of dominance. Genetics 97: 639-666. 600

KAPITONOV, V., and J. JURKA, 2005 RAG1 core and V(D)J recombination signal sequences were 601

derived from Transib transposons. PLoS Biology 3: e181. 602

KOBEL, H. R., 1996 Allopolyploid speciation, pp. 391-401 in The Biology of Xenopus, edited by R. C. 603

TINSLEY and H. R. KOBEL. Clarendon Press, Oxford. 604

LEE, H.-S., and Z. J. CHEN, 2001 Protein-coding genes are epigenetically regulated in Arabidopsis 605

polyploids. Proceedings of the National Academy of Sciences 98: 6753-6758. 606

LEVY, A. A., and M. FELDMAN, 2004 Genetic and epigenetic reprogramming of the wheat genome 607

upon allopoolyploidization. Biological Journal of the Linnean Society 82: 607-613. 608

LIU, B., C. L. BRUBAKER, G. MERGEAI, R. C. CRONN and J. F. WENDEL, 2001 Polyploid formation in 609

cotton is not accompanied by rapid genomic changes. Genome 44: 321-330. 610

LIU, B., and J. F. WENDEL, 2002 Non-mendelian phenomena in allopolyploid genome evolution. 611

Current Genomics 3: 489-505. 612

LIU, B., and J. F. WENDEL, 2003 Epigenetic phenomena and the evolution of plant allopolyploids. 613

Molecular Phylogenetics and Evolution 29: 365-379. 614

Page 22: Genetics: Published Articles Ahead of Print, published on ...90 (WANG et al. 2004). Expression can be influenced by the direction of the hybrid cross (SOLTIS et al. 91 2004), and co-adapted

22

LOUMONT, C., 1983 Deux especes nouvelles de Xenopus du Cameroun (Amphibia, Pipidae). Rev. 615

Suisse Zool. 90: 169-177. 616

LUKENS, L. N., P. A. QUIJADA, J. UDALL, J. C. PIRES, M. E. SCHRANZ et al., 2004 Genome 617

redundancy and plasticity within ancient and recent Brassica crop species. Biological Journal 618

of the Linnean Society 82: 665-674. 619

LYNCH, M., and A. FORCE, 2000 The probability of duplicate gene preservation by 620

subfunctionalization. Genetics 154: 459-473. 621

LYNCH, M., M. O'HELY, B. WALSH and A. FORCE, 2001 The probability of preservation of a newly 622

arisen gene duplicate. Genetics 159: 1789-1804. 623

MADDISON, D. R., and W. P. MADDISON, 2000 MacClade. Sinauer Associates, Sunderland. 624

MADLUNG, A., R. MASUELLI, B. WATSON, S. H. REYNOLDS, J. DAVISON et al., 2002 Remodeling of 625

DNA methylation and phenotypic and transcriptional changes in synthetic Arabidopsis 626

allotetraploids. Plant Physiology 129: 733-746. 627

MARTIN, D., and E. RYBICKI, 2000 RDP: detection of recombination amongst aligned sequences. 628

Bioinformatics 16: 562-563. 629

MAYNARD SMITH, J., 1992 Analyzing the mosaic structure of genes. Journal of Molecular Evolution 630

34: 126-129. 631

MOSCONE, E. A., M. A. MATZKE and A. J. M. MATZKE, 1996 The use of combined FISH/GISH in 632

conjunction with DAPI counterstaining to identify chromosomes containing transgene inserts 633

in amphidiploid tobacco. Chromosoma 105: 231-236. 634

MULLER, H. J., 1942 Isolating mechanisms, evolution and temperature. Biol. Symp. 6: 71-125. 635

NYLANDER, J. A. A., F. RONQUIST, J. P. HUELSENBECK and J. L. NIEVES-ALDREY, 2004 Bayesian 636

phylogenetic analysis of combined data. Systematic Biology 53: 47-67. 637

OHNO, S., 1970 Evolution by gene duplication. Springer-Verlag, Berlin. 638

Page 23: Genetics: Published Articles Ahead of Print, published on ...90 (WANG et al. 2004). Expression can be influenced by the direction of the hybrid cross (SOLTIS et al. 91 2004), and co-adapted

23

ORR, H. A., 1991 A test of Fisher's theory of dominance. Proceedings of the National Academy of 639

Sciences 88: 11413-11415. 640

OZKAN, H., A. A. LEVY and M. FELDMAN, 2001 Allopolyploidy-induced rapid genomic evolution of 641

the wheat (Aegilops-Triticum) group. Plant Cell 13: 1735-1747. 642

PADIDAM, M., S. SAWYER and C. M. FAUQUET, 1999 Possible emergence of new geminiviruses by 643

frequent recombination. Virology 265: 218-225. 644

PAGEL, M., 1994 Detecting correlated evolution on phylogenies: a general method for the comparative 645

analysis of discrete characters. Proceedings of the Royal Society of London (B) 255: 37-45. 646

PAPP, B., C. PÁL and L. D. HURST, 2003 Dosage sensitivity and the evolution of gene families in yeast. 647

Nature 424: 194-197. 648

POSADA, D., 2002 Evaluation of methods for detecting recombination from DNA sequences: empirical 649

data. Molecular Biology and Evolution 19: 708-717. 650

POSADA, D., and K. A. CRANDALL, 2001 Evaluation of methods for detecting recombination from 651

DNA sequences: computer simulations. Proceedings of the National Academy of Sciences 98: 652

13757-13762. 653

RAMBAUT, A., and N. C. GRASSLY, 1997 Seq-Gen: an application for the Monte Carlo simulation of 654

DNA sequence evolution along phylogenetic trees. Computer Applications in the Biosciences 655

13: 235-238. 656

RAMSDEN, D. A., K. BAETZ and G. E. WU, 1994 Conservation of sequence in recombination signal 657

sequence spacers. Nucleic Acids Research 22: 1785-1796. 658

SADOFSKY, M. J., J. E. HESSE, J. F. MCBLANE and M. GELLERT, 1993 Expression and V(D)J 659

recombination activity of mutated RAG-1 proteins. Nucleic Acids Res. 21: 5644-5650. 660

SAKANO, H., K. HUPPI, G. HEINRICH and S. TONEGAWA, 1979 Sequences at the somatic recombination 661

sites of immunoglobulin light-chain genes. Nature 280: 288-294. 662

Page 24: Genetics: Published Articles Ahead of Print, published on ...90 (WANG et al. 2004). Expression can be influenced by the direction of the hybrid cross (SOLTIS et al. 91 2004), and co-adapted

24

SALMINEN, M. O., J. K. CARR, D. S. BURKE and F. E. MCCUTCHAN, 1995 Identification of breakpoints 663

in intergenotypic recombinants of HIV-1 by bootscanning. AIDS Res. Hum. Retroviruses 11: 664

1423-1425. 665

SHAKED, H., K. KASHKUSH, H. OZKAN, M. FELDMAN and A. A. LEVY, 2001 Sequence elimination and 666

cytosime methylation are rapid and reproducible responses of the genome to wide hybridization 667

and allopolyploidy in wheat. The Plant Cell 13: 1749-1759. 668

SHULL, G. H., 1908 The composition of a field of maize. Report of the American Breeding Association 669

4: 296-301. 670

SKALICKÁ, K., K. Y. LIM, R. MATYASEK, M. MATZKE, A. R. LEITCH et al., 2005 Preferential 671

elimination of repeated DNA sequences from the paternal, Nicotiana tomentosiformis genome 672

donor of a synthetic, allotetraploid tobacco. New Phytologist 166: 291-303. 673

SOLTIS, D. E., P. S. SOLTIS, J. C. PRIRES, A. KOVARIK, J. A. TATE et al., 2004 Recent and recurrent 674

polyploidy in Tragopogon (Asteraceae): cytogenetic, genomic and genetic comparisons. 675

Biological Journal of the Linnean Society 82: 485-501. 676

SWOFFORD, D. L., 2002 Phylogenetic analysis using parsimony (* and other methods). Version 4. 677

Sinauer Associates, Sunderland. 678

TYMOWSKA, J., 1991 Polyploidy and cytogenetic variation in frogs of the genus Xenopus., pp. 259-297 679

in Amphibian cytogenetics and evolution, edited by D. S. GREEN and S. K. SESSIONS. 680

Academic Press., San Diego. 681

VEITIA, R. A., 2002 Exploring the etiology of haploinsufficiency. Bioessays 24: 175-184. 682

VOLKOV, R. A., N. V. BORISJUK, I. I. PANCHUK, D. SCHWEIZER and V. HEMLEBEN, 1999 Elimination 683

and rearrangement of parental rDNA in the allotetraploid Nicotiana tabacum. Molecular 684

Biology and Evolution 16: 311-320. 685

Page 25: Genetics: Published Articles Ahead of Print, published on ...90 (WANG et al. 2004). Expression can be influenced by the direction of the hybrid cross (SOLTIS et al. 91 2004), and co-adapted

25

WANG, J., L. TIAN, A. MADLUNG, H.-S. LEE, M. CHEN et al., 2004 Stochastic and epigenetic changes 686

of gne expression in Arabidopsis polyploids. Genetics 168: 2217-2226. 687

WENDEL, J. F., A. SCHNABEL and T. SEELANAN, 1995 Bidirectional interlocus concerted evolution 688

following allopolyploid speciation in cotton (Gossypium). Proceedings of the National 689

Academy of Sciences 92: 280-284. 690

WRIGHT, S., 1934 Molecular and evolutionary theories of dominance. American Naturalist 63: 24-53. 691

YANG, Z., 1997 PAML: a program package for phylogenetic analysis by maximum likelihood. 692

CABIOS 13: 555-556. 693

ZWIERZYKOWSKI, Z., R. TAYYAR, M. BRUNELL and A. J. LUKASZEWSKI, 1998 Genome recombination 694

in intergeneric hybrids between tetraploid Festuca pratensis and Lolium multiflorum. Journal of 695

Heredity 89: 324-328. 696

697

698

Figure Legends 699

700

Fig. 1. Diploidization of allopolyploid genomes means that recombination between alleles of different 701

paralogs of the same gene is rare. The evolutionary history of linked paralogs in diploidized 702

allopolyploid genomes, therefore, can be inferred by combining information on synteny with 703

information on the evolutionary relationships among paralogs of each gene. The RAG1 and 704

RAG2 genes, for example, are tightly linked. This diagram depicts the evolution of the 705

expected number of paralogs of each of these genes in species with different ploidy levels 706

(tetraploid, octoploid, and dodecaploid) if no deletion or degeneration occurred. An allo-707

tetraploid species inherits a linked set of RAG1 and RAG2 α paralogs from diploid ancestor 1 708

and a linked set of RAG1 and RAG2 β paralogs from diploid ancestor 2. An allo-octoploid 709

species inherits two sets of linked RAG1 and RAG2 paralogs from two different allo-tetraploid 710

ancestors. In these species the α1 and α2 paralogs of RAG1 and RAG2 are derived from 711

diploid ancestor 1 and the β1 and β2 paralogs of RAG1 and RAG2 are derived from diploid 712

Page 26: Genetics: Published Articles Ahead of Print, published on ...90 (WANG et al. 2004). Expression can be influenced by the direction of the hybrid cross (SOLTIS et al. 91 2004), and co-adapted

26

ancestor 2. An allo-dodecaploid inherits an allo-octoploid and an allo-tetraploid genome and 713

has three α paralogs (α1, α2, α3) derived from diploid ancestor 1 and three β paralogs (β1, β2, 714

β3) derived from diploid ancestor 2. Note that the observed number of functional RAG1 and 715

RAG2 paralogs is less than this expectation in most (RAG1) or all (RAG2) species of Xenopus 716

due to degeneration or deletion. 717

Fig. 2. Combined phylogenetic analysis of RAG1 and RAG2 paralogs. Nodes with less than 95% 718

posterior probability or that were inconsistent between the α and β lineages are collapsed, 719

asterisks indicate RAG1 paralogs for which expression was confirmed, and linked X. laevis 720

RAG1 and RAG2 paralogs (Greenhalgh et al. 1993) are connected by a line. Degenerate 721

paralogs are in red and missing paralogs are in gray. Numbers 1-9 indicate the minimum 722

independent degenerations of RAG1 mentioned in the text, and 10, 11, and 12 refer to 723

degeneration of RAG2 lineage α, degeneration X. andrei RAG2 paralog β2, and a suspected 724

ancestral deletion spanning paralogs of RAG1 and RAG2, respectively. 725

726

Fig. 3. Evolutionary relationships merge when a species evolves by allopolyploidization, and a 727

reticulate phylogeny can be constructed from well-supported nodes in the genealogies of RAG1 728

and RAG2 in Fig. 1. The ploidy level follows each species name including three diploids 729

indicated with daggers whose existence is predicted in the past but for whom an extant 730

descendant with the same ploidy level is not known. Additionally, three tetraploid ancestors 731

(each with 4x=36), also indicated by daggers, are predicted but not known by an extant 732

representative. 733

734

Fig. 4. Evolution of RAG1 and RAG2 in Xenopus. The RAG1 and RAG2 heterodimer is depicted as a 735

scissor and the evolutionary trajectory of this heterodimer is followed through the reticulate 736

phylogeny that is depicted in Fig. 3, including degeneration and deletions of paralogs encoding 737

each protein that are depicted in Fig. 2. Proteins encoded by paralogs inherited from diploid 738

ancestor α are yellow and those encoded by paralogs inherited from diploid ancestor β are blue. 739

Numbered deletion and degeneration events from Fig. 2 are labeled and the ploidy of each 740

ancestor or extant species (2x, 4x, 8x, or 12x) is indicated inside each box. Species and 741

ancestors for which simulations were performed are indicated by a red box. As a result of 742

biased degeneration of RAG1, heterodimers of many extant species must form between 743

proteins that were ultimately inherited from different diploid species (i.e. the scissor are 744

Page 27: Genetics: Published Articles Ahead of Print, published on ...90 (WANG et al. 2004). Expression can be influenced by the direction of the hybrid cross (SOLTIS et al. 91 2004), and co-adapted

27

composed of a yellow RAG1 protein and a blue RAG2 protein). Single asterisks indicate a 745

minimum of seven independently evolved chimerical heterodimers. Double asterisks indicate 746

caveats that X. cf. fraseri had no observed degeneration of RAG1 paralog β, but that only a 747

small fragment of this paralog was sequenced (Suppl. Fig. 2), and that expression of paralog β 748

was not detected in X. muelleri but no degeneration was observed in the coding region. 749

750

Fig. 5. Simulations of the expected distribution of chimerical heterodimers (those formed between 751

proteins encoded by from paralogs derived from different diploid ancestors) given the observed 752

degeneration and deletion of RAG2 and phylogenetically independent and unbiased deletion of 753

RAG1. Simulations were performed on the eight species or ancestors as indicated in Fig. 4. 754

An asterisk indicates the ten observed chimerical heterodimers in the species for which the 755

simulations were performed, including seven that evolved independently, but not counting the 756

one of the chimerical heterdimers in X. wittei that may have been inherited from one of the 757

tetraploid ancestors that was simulated. The observed number of chimerical heterodimers 758

depart significantly from expectations under no bias (P < 0.0001). 759

760

Suppl. Fig. 1. Genetic samples and primers used in this study. 761

762

Suppl. Fig. 2. (A) Sequenced regions of RAG1. The core region of RAG1 that is required for V(D)J 763

recombination is indicated by a black bar on top. The white bar inside this black bar indicates 764

the region sequenced by Evans et al. 2005. Arrows indicate paralogs whose expression is 765

confirmed or, in the diploid S. tropicalis, assumed. Frameshift mutations and stop codons are 766

indicated by F and S, respectively. Heterozygous degenerations in X. borealis and X. 767

ruwenzoriensis are indicated by an asterisk and shared degenerations are underlined. (B) The 768

sequenced region of RAG2 is indicated by a thick line; essentially the same region was 769

sequences for all RAG2 paralogs in this study. 770

771

Suppl. Fig. 3. Consensus topology from Bayesian analysis of (A) RAG1, (B) RAG2, (C) RAG1 and 772

RAG2 combined, and (D) a synthetic data configuration (see text). A single asterisk indicates 773

branches with at least 95% posterior probability and double asterisks indicate branches that 774

also have bootstrap values of at least 80%. In (B) a bar indicates a putative deletion of RAG2 775

paralog α. Sequences are labeled as in Fig. 1 except in (D) where for tetraploids α and β are 776

Page 28: Genetics: Published Articles Ahead of Print, published on ...90 (WANG et al. 2004). Expression can be influenced by the direction of the hybrid cross (SOLTIS et al. 91 2004), and co-adapted

28

collapsed into a single taxon, and for octoploids and dodecaploids α1 and β1 are a single taxon 777

and α2 and β2 are a single taxon, and for dodecaploids α3 and β3 are a single taxon. 778

779

Suppl. Fig. 4. Parametric bootstrap tests were performed to evaluate hypotheses of fewer independent 780

degenerations of RAG1. Because X. boumbaensis paralog β1 and X. andrei paralog β1 are 781

both expressed and have only autapomorphic degenerations, it is assumed that they 782

degenerated independently from other paralogs, but the total number of additional independent 783

degenerations depends on their evolutionary relationships with respect to other paralogs 784

because ancestral gene silencing could have preceded the accumulation of autapomorphic 785

degenerations in each paralog. To be conservative, paralog β1 of X. boumbaensis and paralog 786

β1 of X. cf. boumbaensis are assumed to have been silenced in a recent common ancestor, even 787

though insufficient data was collected from these paralogs to test for shared degeneration in the 788

coding region. Parametric bootstrap tests were performed using the “synthetic” configuration of 789

data, so each taxonomic unit includes concatenated data from an α and a β paralog of RAG1 790

and the corresponding β paralog of RAG2. Specifically, lineage A is a single taxonomic unit 791

that includes concatenated X. andrei RAG1 paralog α1 and β1 and RAG2 paralog β1. Clade B 792

includes two taxonomic units; one is made up of X. boumbaensis sequences from RAG1 793

paralog α1 and β1 and RAG2 paralog β1 and the other is concatenated sequences from the 794

same paralogs from X. cf. boumbaensis. Clade C includes ten concatenations: concatenated 795

paralog 1 from X. wittei, X. new octoploid, X. ameiti, X. pygmaeus, X. cf. fraseri 1, X. cf. 796

fraseri 2, X. longipes, and X. ruwenzoriensis, and concatenated paralog 3 from X. longipes and 797

X. ruwenzoriensis. Clade D is Clade C minus the X. longipes concatenated paralog 3. Other 798

taxa are indicated by “O”. All of these hypotheses were strongly rejected (P < 0.0001). 799

800

801

Page 29: Genetics: Published Articles Ahead of Print, published on ...90 (WANG et al. 2004). Expression can be influenced by the direction of the hybrid cross (SOLTIS et al. 91 2004), and co-adapted

Species (ploidy) Tissue

Expected

paralogs

Observed

expressed

paralogs

Cloned (C) or

directly sequenced

(DS)

S. new tetraploid 1 (4x=40) brain !, " !, " C

testes !, " !, " DS

liver !, " !, " DS

X. laevis (4x=36) bone marrow !, " !, " C, DS

heart !, " ! DS

liver !, " !, " DS

spleen !, " !, " DS

liver !, " !, " DS

X. gilli (4x=36) bone marrow !, " !, " C

X. borealis (4x=36) brain !, " ! DS

liver !, " ! DS

testes !, " ! DS

X. muelleri (4x=36) brain !, " ! DS

testes !, " ! DS

bone marrow !, " ! DS

X. amieti (8x=72) liver !1,!2, "1,"2 !1,!2# C

brain !1,!2, "1,"2 !1 DS

X. andrei (8x=72) bone marrow !1,!2, "1,"2 !1,!2,"1# C, DS

spleen !1,!2, "1,"2 !1 DS

brain !1,!2, "1,"2 !1 DS

X. new octoploid (8x=72) bone marrow !1,!2, "1,"2 !1,!2 C

brain !1,!2, "1,"2 !1,!2(low) DS

heart !1,!2, "1,"2 !1(low),!2 DS

testes !1,!2, "1,"2 !1,!2 C

X. boumbaensis (8x=72) bone marrow !1,!2, "1,"2 !1,!2, "1# C

brain !1,!2, "1,"2 !1,"1# DS

heart !1,!2, "1,"2 "1# DS

Table 1. Gene silencing of RAG1 paralogs in different Xenopus species. Expression was confirmed by

amplifying cDNA across an intron and either cloning (C) or directly sequncing (DS) the PCR product.

Expressed paralogs with an asterisk are degenerate.

Page 30: Genetics: Published Articles Ahead of Print, published on ...90 (WANG et al. 2004). Expression can be influenced by the direction of the hybrid cross (SOLTIS et al. 91 2004), and co-adapted

RA

G1

RA

G2

RA

G1α

RA

G2α

RA

G1β

RA

G2β

regular speciation and divergence

allopolyploidization

RA

G1α

RA

G2α

RA

G1β

RA

G2β

Dip

loid

an

ce

stor 1

Dip

loid

an

ce

stor 2

Dip

loid

an

ce

stor

Allo

-tetra

plo

id a

nc

esto

r

regular speciation and divergence

allopolyploidization

RA

G1α

2R

AG

2α2

RA

G1β2

RA

G2β2

Allo

-tetra

plo

id a

nc

esto

r 2

RA

G1α

1R

AG

2α1

RA

G1β1

RA

G2β1

Allo

-tetra

plo

id a

nc

esto

r 1

RA

G1α

2R

AG

2α2

RA

G1β2

RA

G2β2

Allo

-oc

top

loid

an

ce

stor

RA

G1α

1R

AG

2α1

RA

G1β1

RA

G2β1

Evans, F

ig. 1

allopolyploidization

RA

G1α

1R

AG

2α1

RA

G1β1

RA

G2β1

RA

G1α

3R

AG

2α3

RA

G1β3

RA

G2β3

Allo

-do

de

ca

plo

id

RA

G1α

2R

AG

2α2

RA

G1β2

RA

G2β2

RA

G1α

RA

G2α

RA

G1β

RA

G2β

Allo

-tetra

plo

id

RA

G1α

2R

AG

2α2

RA

G1β2

RA

G2β2

Allo

-oc

top

loid

RA

G1α

1R

AG

2α1

RA

G1β1

RA

G2β1

Page 31: Genetics: Published Articles Ahead of Print, published on ...90 (WANG et al. 2004). Expression can be influenced by the direction of the hybrid cross (SOLTIS et al. 91 2004), and co-adapted

ruwenzoriensis β3

gilli α

gilli β

S. tropicalisnew tetraploid 2 αS. epitropicalis αnew tetraploid 1 αS. epitropicalis β

new tetraploid 1β new tetraploid 2 β

largeni α

amieti α1longipes α1

ruwenzoriensis α1cf. boumbaensis α1

pygmaeus αruwenzoriensis α3

cf. fraseri 1 αcf. fraseri 2 α

cf. boumbaensis α2new octoploid α1

wittei α1andrei α1

longipes α3

vestitus α2

amieti α2ruwenzoriensis α2cf. boumbaensis α3

andrei α2longipes α2

new octoploid α2wittei α2

vestitus α1

laevis α

clivii αnew tetraploid 1α

borealis αmuelleri αamieti β1

ruwenzoriensis β1longipes β1

cf. boumbaensis β1cf. fraseri 1 β

cf. boumbaensis β2cf. fraseri 2 βpygmaeus β

new octoploid β1wittei β1andrei β1

longipes β3laevis β

new octoploid β2wittei β2

vestitus β1

vestitus β2andrei β2

largeni βclivii β

new tetraploid 1 βborealis βmuelleri β

amieti β2ruwenzoriensis β2

cf. boumbaensis β3longipes β2

34

2

1

6

98

11

*

*

**

**

*

**

**

**

*

*

*

**

RAG1 RAG2

α

β

α

β

β β

α α

10

Evans, Fig. 2

5

12 12

boumbaensis α2

boumbaensis α1

boumbaensis β1

boumbaensis β2

7

Page 32: Genetics: Published Articles Ahead of Print, published on ...90 (WANG et al. 2004). Expression can be influenced by the direction of the hybrid cross (SOLTIS et al. 91 2004), and co-adapted

Xenopus

Silura

na

X. gilli (36)

X. new tetraploid (36)X. muelleri (36)X. borealis (36)

S. tropicalis (20)

S. new tetraploid 1 (40)

X. largeni (36)

X. laevis (36)

X. clivii (36)

S. epitropicalis (40)S. new tetraploid 2 (40)

X. cf. boumbaensis (108)

X. wittei (72)

X. ruwenzoriensis (108)

X. longipes (108)

X. amieti (72)

X. andrei (72)

X. vestitus (72)

X. cf. fraseri 2 (36)

X. pygmaeus (36)

X. cf. fraseri 1 (36)

Evans, Fig. 3

X. new octoploid (72)diploid α (18)

diploid β (18)

diploid β (20)

X. boumbaensis (72)

Page 33: Genetics: Published Articles Ahead of Print, published on ...90 (WANG et al. 2004). Expression can be influenced by the direction of the hybrid cross (SOLTIS et al. 91 2004), and co-adapted

RAG2

X. gilli

X. new tetraploid

X. muelleri **X. borealis

X. largeni

X. laevis

X. clivii

X. cf boumbaensis

X. wittei

X. ruwenzoriensis

X. longipes

X. amieti

X. andrei

X. vestitus

X. cf. fraseri 2 **

X. pygmaeus

X. cf. fraseri 1

X. new octoploid

10 1

2

411

3

7

9

8

5

diploid α

diploid β

Evans, Fig. 4

*

*

*

*

*

*

*

RAG1

6

4x

4x

4x

4x

4x

2x

4x

2x

4x

8x 8x 8x12

4x

4x

4x

4x

4x

8x

4x

4x

8x

8x

8x

12x

12x

12x

Xenopus

X. boumbaensis8x

Page 34: Genetics: Published Articles Ahead of Print, published on ...90 (WANG et al. 2004). Expression can be influenced by the direction of the hybrid cross (SOLTIS et al. 91 2004), and co-adapted

0

10000

20000

30000

Cou

nt

Number of simulated chimerical heterodimers

0 1 2 3 4 5 6 7 8 9 10 11 *

Evans, Fig. 5

Page 35: Genetics: Published Articles Ahead of Print, published on ...90 (WANG et al. 2004). Expression can be influenced by the direction of the hybrid cross (SOLTIS et al. 91 2004), and co-adapted

Modelfree

parametersRAG1

likelihood Bayes FactorRAG2

likelihood Bayes Factor

GTR+I+Γ+bf 10 19772.35 6074.23Same GTR, bf, and I for all sites, separate Γ for each codon position 12 19638.90 266.90 6022.41 103.65Same GTR and bf for all sites, separate I+Γ for each codon position 14 19630.65 16.49 6015.37 14.07Same bf and I for all sites, separate GTR+Γ for each codon position 22 19623.83 13.64 6010.25 10.25Same bf for all sites, separate GTR+I+Γ for each codon position 24 19614.16 19.35 6003.93 12.64Same I for all sites, separate GTR+ Γ + bf for each codon position 28 19594.65 39.02 5995.06 17.74

Separate GTR+I+Γ + bf for each codon position 30 19584.55 20.19 5989.77 10.57

Supplementary Table 1. Harmonic mean of post-burnin likelihood under alternative parameterizations of the general time reversible (GTR) model of evolution with base frequencies (bf), a proportion of invariant sites (I) and a gamma distributed rate heterogeneity parameter (Γ) estimated from the data. Free parameters do not include branch-length and topology parameters.

RAG1 RAG2

Page 36: Genetics: Published Articles Ahead of Print, published on ...90 (WANG et al. 2004). Expression can be influenced by the direction of the hybrid cross (SOLTIS et al. 91 2004), and co-adapted

Supplementary Figure 1: Genetic samples and primers used to amplify RAG1 and RAG2. Genetic Samples: Sequences were obtained from the same individuals as Evans et al. (23) with the following exceptions: (1) for expressed paralogs, at least the first 16 bp were obtained from cDNA from non-vouchered laboratory animals, and (2) in X. wittei and in X. borealis, portions of RAG1 paralogs were sequenced from two different individuals, (3) sequences for RAG1 and RAG2 of Rhinophrynus dorsalis and Hymenochirus sp. were each obtained from different individuals, (4) X. laevis RAG1 paralogs are from different individuals – the β paralog is from Greenhalgh et al. (29), Genbank Accession Number L19324, whereas the α paralog is from the same individual as Evans et al. (23), (5) the outgroup sequence for RAG1 was from Scaphiopus hurteri and for RAG2 was from S. couchi and (6) new sequences from X. new octoploid are included – this is a new species that was recently discovered in the Democratic Republic of the Congo; a species description is in preparation. Primer used to amplify across the RAG1 exon in the 5’ untranslated region: R1.SWEET.M35 (5’ CTGAGCAGTGACAGCYTGACAKCAGG 3’) R1.REV.442 (5’ GATSAGCTCAGGCCAAGAGGTGAC 3’) R1.382.362 (5’ GATAGCTCCGGTTCTGCTGAT 3’) (from Greenhalgh et al. (1993)) Primers used to amplify RAG1 from genomic DNA: R1.begin.for.16 (5’ GAAAATGGAGGTGGCATTGC 3’) R1.for.537 (5’ TGYCAYAATTGCTGGACTATAATGAATC 3’) R1.rev1231 (5’ AGAGCCAGAAGAAAGAGTGTTAAACACACTG 3’) BOR.MUEL.BETA.FOR.1087 (5’ AGGAYCTTAGTTTGGGGTCTGCACCAC 3’) BOUM.BETA1.REV.2772 (5’ TTATTGGGAACACTGATAAC 3’) R1.rev.2583 (5’ CCTTCAAGGATATCTTCTTCCATGTCTTTC 3’) R1.rev.2656 (5’ ACATCTCCCATYCCATCACAGGACTCTTTAACCACCAC 3’) BOUM.FRAS.BOR.MUEL.BETA.REV.2734 (5’ ACGCACAGCTTTCTCTGGAACGGGA 3’) RAG1.BETA.FOR2730 (5’ GCACGGTAGTGGTCCTCCCGTTCCT 3’) RAG1.BETA.FOR2730A (5’ TAGTGGTCCTCCCGTTCCT 3’) RAG1.BETA.FOR2624 (5’ GAAGATATCCTTGAAGGCACAA 3’) R1.FRAS.207765.BETA.REV2716 (5’ ACGCACAGCTTTCTCTGGTTTCTC 3’) R1.ALPHA.FOR1145 (5’ AGGAAARTGAGAAMTCAAATGCMRTTGACAG 3’) R1.ALPHA.REV2734 (5’ ACGTACGGCTTTCTCTGGAACTGCT 3’) R1.REV.2583 (5’ CCTTCAAGGATATCTTCTTCCATGTCTTTC 3’) RAG1.FOR.3175 (5’ GAACYTACAGCGYTATGAGATGTG 3’) RAG1.REV.3952 (5’ GATTGTCTAGATCTACAGTGAACC 3’) From Evans et al. (2005): XENRAG1.FORWARD3 (5’GGATGAGTATCCAGTAGATACAATCTCCAAGAG 3’) XENRAG1.REVERSE2 (5’ TTTCTGGGACATGTGCCAGGGTTTTGTG 3’) RAG1F4 (5’ GCAAGCCTCTCTGNCTGATGC 3’) RAG1R4 (5’ GTTTTTATAGAACTCCCCTAT 3’)

Page 37: Genetics: Published Articles Ahead of Print, published on ...90 (WANG et al. 2004). Expression can be influenced by the direction of the hybrid cross (SOLTIS et al. 91 2004), and co-adapted

Primers used to amplify RAG2 from genomic DNA: RAG2FOR.1 (5’ ACCTACACAGTTGCTGTGATG 3’) RAG2REV.1423 (5’ TCTTCATCGTCTTCATTGTA 3’). Primers used to amplify some RAG2 paralogs of Silurana species: RAG2.SILALPHA.FOR363 (5’ ACTGGGGTTTTCCTCGTAGATCTT 3’) RAG2.SILALPHA.REV1372 (5’ AGTATCTTCGTCAAAAAGGTTTCCG 3’) Primers unsuccessfully used to try to amplify RAG2 paralog α2 in X. amieti, X. ruwenzoriensis, X. cf. boumbaensis, and X. longipes: RAG2FOR.1 (5’ ACCTACACAGTTGCTGTGATG 3’) RAG2.FOR.109 (5’ AAACGTTCCTGCCCCACTG 3’) RAG2FOR.817 (5’ GATGGACTTTCCTTCCATGTTTC 3’) RAG2.REV.930 (5’ CCCATATCAGCACCAAACC 3’) RAG2REV.1039 (5’ TGGCTATCAGACTCGTATCC 3’) RAG2.REV.1126 (5’ TGTAAACTCTTCAGAGTCTTC 3’)

Page 38: Genetics: Published Articles Ahead of Print, published on ...90 (WANG et al. 2004). Expression can be influenced by the direction of the hybrid cross (SOLTIS et al. 91 2004), and co-adapted

X. wittei

X. amieti

X. new octoploid

X. longipes

X. cf. boumbaensis

X. cf. fraseri 2

bp0 500 1000 1500 2000 2500 3000

X. muelleri west

X. gilli

X. laevis

X. borealis

X. muelleri

X. cf. fraseri 1

αβX. largeni, X. cliviiαβ

αβ

αβ

αβ

αβ

αβ

αβ

α1

β1α2

β2

X. pygmaeus αβ

X. vestitusα1

β1α2

β2

X. andreiα1

β1α2

β2

α1

β1α2

β2

α1

β1α2

β2

α1

α3α2

β1β2β3

X. ruwenzoriensisα1

α3α2

β1β2β3

α1

α3α2

β1β2β3

αβS. new tetraploid 1

αβS. epitropicalis

S. tropicalis

αβS. new tetraploid 2

core region

Evans, Suppl. Fig. 2

SS

S

S

S

S S

S

S*

S S

SS S S

S

S SS

S

FF*F

F

F

FF

FFF

F

F F F F

FFF FSF F

FF F

F

F

S

S

F

A

bp0 500 1000 1500

B

X. boumbaensisα1α2β1β2

F F

Page 39: Genetics: Published Articles Ahead of Print, published on ...90 (WANG et al. 2004). Expression can be influenced by the direction of the hybrid cross (SOLTIS et al. 91 2004), and co-adapted

S. hurteriR. dorsalis

0.1

H. sp.

S. tropicalisS. new tetraploid 2 αS. epitropicalis α

S. new tetraploid 1 αS. new tetraploid 2 βS. epitropicalis βS. new tetraploid 1 β

X. cf. boumbaensis α10.05

X. ruwenzoriensis α1

X. amieti α1X. longipes α1

X. cf. fraseri 1 αX. cf. fraseri 2 α

X. cf. boumbaensis α2

X. pygmaeus αX. ruwenzoriensis α3

X. andrei α1X. new octoploid α1X. wittei α1

X.longipes α3

X. largeni α

X. gilli αX. laevis α

X. amiet α2X. ruwenzoriensis α2X. cf. boumbaensis α3

X. longipes α2X. andrei α2X. vestitus α2

X. vestitus α1X. wittei α2

X. new octoploid α2

X. clivii α

X. borealis αX. muelleri α

X. new tetraploid α

X. longipes β1X. amieti β1

X. cf. boumbaensis β1X. ruwenzoriensis β1

X. andrei β1

X. longipes β3

X. new octoploid β1X. wittei β1

X. ruwenzoriensis β3X. pygmaeus β

X. cf. fraseri 2 βX. cf. fraseri 1 β

X. laevis βX. gilli β

X. clivii β

X. borealis βX. muelleri β

X. largeni β

X. new octoploid β2X. wittei β2

X. vestitus β2

X. vestitus β1

X. new tetraploid β

0.05

S.couchiR. dorsalis

H. sp.

S. tropicalisS. new tetraploid 2 αS. epitropicalis αS. new tetraploid 1 αS. new tetraploid 2 βS. epitropicalis βS. new tetraploid 1 β

X. cf. boumbaensis α1

X. ruwenzoriensis α1

X. amieti α1X. longipes α1

X. cf. fraseri 1 αX. cf. fraseri 2 αX. cf. boumbaensis α2

X. pygmaeus αX. ruwenzoriensis α3

X. andrei α1

X. new octoploid α1X. wittei α1

X.longipes α3X. largeni α

X. gilli αX. laevis α

X. amiet α2X. ruwenzoriensis α2X. cf. boumbaensis α3

X. longipes α2X. andrei α2X. vestitus α2

X. vestitus α1X. wittei α2X. new octoploid α2

X. clivii α

X. borealis αX. muelleri α

X. new tetraploid α

X. longipes β1X. amieti β1

X. cf. boumbaensis β1

X. ruwenzoriensis β1

X. andrei β1X. longipes β3

X. new octoploid β1X. wittei β1

X. ruwenzoriensis β3X. pygmaeus β

X. cf. fraseri 2 βX. cf. fraseri 1 β

X. laevis βX. gilli β

X. clivii β

X. borealis βX. muelleri β

X. largeni β

X. new octoploid β2X. wittei β2

X. vestitus β2

X. vestitus β1

X. new tetraploid β

X. andrei β2

X. cf. boumbaensis β2

S. tropicalis

X. ruwenzoriensis 1

X. amieti 1

X. longipes 1

X. cf. boumbaensis 1

X. pygmaeusX. ruwenzoriensis 3

X. cf. boumbaensis 2

X. wittei 1X. new octoploid 1

X. longipes 2

X. amieti 2

X. cf. boumbaensis 3X. ruwenzoriensis 2

X. andrei 2

X. longipes 3

X. new octoploid 2X. wittei 2

X. cf. fraseri 2 X. cf. fraseri 1

X. laevis

X. borealis

X. muelleri

X. largeni

X. vestitus 2

X. vestitus 1

X. new tetraploid

X. andrei 1

X. clivii

X. gilli

*

*

0.02

0.1

X. longipes β1X. amieti β1

X. cf. boumbaensis β1X. ruwenzoriensis β1

X. andrei β1X. longipes β3

X. wittei β1

X. ruwenzoriensis β3X. pygmaeus βX. cf. fraseri 2 β

X. cf. fraseri 1 β

X. laevis βX. gilli β

X. clivii β

X. borealis βX. muelleri β

X. largeni β

X. wittei β2

X. vestitus β2

X. vestitus β1

X. new tetraploid β

X. cf. boumbaensis β2

X. andrei β2

S. couchiR. dorsalis

S. tropicalisS. new tetraploid 2 αS. epitropicalis αS. new tetraploid 1 αS. new tetraploid 2 βS. epitropicalis βS. new tetraploid 1 β

A B

C D

Evans, Suppl. Fig. 3

X. boumbaensis 2

X. boumbaensis 1

**

**

**

**

**

****

**

* **

*

**

****

**

** **

**

**

**

H. sp.

X. new octoploid β2

X. new octoploid β1

X. boumbaensis β1

**

**

**

**

***

**

**

**

**

**

****

**

**

**

***

***

*

0.1X. boumbaensis α1

X. boumbaensis α2

X. boumbaensis β1

*****

****

**

***

**

**

**

**

**

****

****

***

**

*

**

**

**

****

***

*

**

***

**

** ****

**

**

**

**

** *

***

****

***

****

**

0.05

X. boumbaensis α1

X. boumbaensis β1

X. boumbaensis α3

**

**

**

**

**

***

**

*** **

**

**

**

**

**

**

**

****

***

****

****

**

**

*

**

****

********

*

**

**

**

**

**

*

*

**

*

**

**

Page 40: Genetics: Published Articles Ahead of Print, published on ...90 (WANG et al. 2004). Expression can be influenced by the direction of the hybrid cross (SOLTIS et al. 91 2004), and co-adapted

OA BA

CB

OB

C

OB

CA

A

OA

DB

OB

D

OA

DB

A

Evans, Suppl. Fig. 4