molecular data from orthonectid worms show they are highly ... · 1 2 title: molecular data from...

24
1 Title: Molecular data from Orthonectid worms show they 2 are highly degenerate members of phylum Annelida not 3 phylum Mesozoa. 4 Authors: Philipp H. Schiffer 1 , Helen E. Robertson 1 , Maximilian J. Telford 1 * 5 6 Affiliations: 1 Centre for Life’s Origins and Evolution, Department of Genetics, Evolution 7 and Environment, University College London, Gower Street, London, WC1E 6BT, UK 8 *Correspondence to: [email protected] 9 10 11 Summary: 12 The Mesozoa are a group of tiny, extremely simple, vermiform endoparasites of various 13 marine animals (Fig. 1). There are two recognised groups within the Mesozoa: the 14 Orthonectida (Fig. 1a,b; with a few hundred cells including a nervous system made up of just 15 10 cells [1]) and the Dicyemids (Fig. 1c; with at most 42 cells [2]). They are 16 classic ’Problematica’ [3] - the name Mesozoa suggests an evolutionary position 17 intermediate between Protozoa and Metazoa (animals) [4] and implies their simplicity is a 18 primitive state, but molecular data have shown they are members of Lophotrochozoa within 19 Bilateria [5-8] which would mean they derive from a more complex ancestor. Their precise 20 phylogenetic affinities remain uncertain, however, and ascertaining this is complicated by the 21 very fast evolution observed in genes from both groups, leading to the common systematic 22 error of Long Branch Attraction (LBA) [9]. Here we use mitochondrial and nuclear gene 23 author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was not peer-reviewed) is the . https://doi.org/10.1101/235549 doi: bioRxiv preprint

Upload: others

Post on 12-Mar-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Molecular data from Orthonectid worms show they are highly ... · 1 2 Title: Molecular data from Orthonectid worms show they 3 are highly degenerate members of phylum Annelida not

1 Title: Molecular data from Orthonectid worms show they 2

are highly degenerate members of phylum Annelida not 3

phylum Mesozoa. 4

Authors: Philipp H. Schiffer1, Helen E. Robertson1, Maximilian J. Telford1* 5

6

Affiliations: 1 Centre for Life’s Origins and Evolution, Department of Genetics, Evolution 7

and Environment, University College London, Gower Street, London, WC1E 6BT, UK 8

*Correspondence to: [email protected] 9

10

11

Summary: 12

The Mesozoa are a group of tiny, extremely simple, vermiform endoparasites of various 13

marine animals (Fig. 1). There are two recognised groups within the Mesozoa: the 14

Orthonectida (Fig. 1a,b; with a few hundred cells including a nervous system made up of just 15

10 cells [1]) and the Dicyemids (Fig. 1c; with at most 42 cells [2]). They are 16

classic ’Problematica’ [3] - the name Mesozoa suggests an evolutionary position 17

intermediate between Protozoa and Metazoa (animals) [4] and implies their simplicity is a 18

primitive state, but molecular data have shown they are members of Lophotrochozoa within 19

Bilateria [5-8] which would mean they derive from a more complex ancestor. Their precise 20

phylogenetic affinities remain uncertain, however, and ascertaining this is complicated by the 21

very fast evolution observed in genes from both groups, leading to the common systematic 22

error of Long Branch Attraction (LBA) [9]. Here we use mitochondrial and nuclear gene 23

author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/235549doi: bioRxiv preprint

Page 2: Molecular data from Orthonectid worms show they are highly ... · 1 2 Title: Molecular data from Orthonectid worms show they 3 are highly degenerate members of phylum Annelida not

sequence data, and show beyond doubt that both dicyemids and orthonectids are members of 24

the Lophotrochozoa. Carefully addressing the effects of systematic errors due to unequal 25

rates of evolution, we show that the phylum Mesozoa is polyphyletic. While the precise 26

position of dicyemids remains unresolved within Lophotrochozoa, we unequivocally identify 27

orthonectids as members of the phylum Annelida. This result reveals one of the most extreme 28

cases of body plan simplification in the animal kingdom; our finding makes sense of an 29

annelid-like cuticle in orthonectids [1] and suggests the circular muscle cells repeated along 30

their body [10] may be segmental in origin. 31

32

Results 33

Using a new assembly of available genomic and transcriptomic sequence data we identified an 34

almost complete mitochondrial genome from Intoshia linei (2 ribosomal RNAs, 20 transfer 35

RNAs and all protein coding genes apart from atp8) and recovered 9 individual mitochondrial 36

gene containing contigs from Dicyema japonicum and from a second unidentified species 37

(Dicyema sp.; cox1, 2, 3; cob; and nad1, 2, 3, 4, 5). Cob, nad3, nad4, and nad5 had not 38

previously been identified in any Dicyma species. All protostomes studied possess a unique, 39

derived combination of amino acid signatures and conserved deletions in their mitochondrial 40

NAD5 genes. Comparing the NAD5 protein coding regions of Intoshia and Dicyema to those 41

of other Metazoa shows that both share almost all of the conserved protostome signatures [11] 42

(Fig. 2a). This signature is significantly more complex than the two amino acids of the 43

Lox5/DoxC signature from Dicyema previously published [11-13] and shows beyond doubt 44

that both groups are protostomes. 45

It has been suggested that mesozoans are derived from the parasitic neodermatan flatworms. If 46

this were correct mesozoans would be expected to share two changes in mitochondrial genetic 47

author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/235549doi: bioRxiv preprint

Page 3: Molecular data from Orthonectid worms show they are highly ... · 1 2 Title: Molecular data from Orthonectid worms show they 3 are highly degenerate members of phylum Annelida not

code that unite all rhabditophoran flatworms, where the triplet AAA codes for Asparagine (N) 48

rather than the normal Lysine (K) and ATA codes for Isoleucine (I) rather than the usual 49

Methionine (M) [14]. We inferred the mitochondrial genetic codes for Dicyema and Intoshia. 50

Both groups have the standard invertebrate mitochondrial code arguing against a relationship 51

with the parasitic rhabditophoran platyhelminths (table S1). 52

We next aligned the mitochondrial genes of Intoshia and three species of Dicyema with 53

orthologs from a diversity of other Metazoans and concatenated these to produce a matrix of 54

2,969 reliably aligned amino acids from 69 species. Phylogenetic analyses of this 55

comparatively small data set is not expected to be as reliable as a much larger set of nuclear 56

genes and aspects of the topology and observed branch lengths suggest it was affected by LBA 57

(Fig. 2b). To reduce the effects of LBA on the inference of the affinities of the mesozoans we 58

removed the taxa with the longest branches and considered the position of the dicyemids and 59

orthonectid separately (as both are very long branched). We were unable to resolve the position 60

of the dicyemids (although they are clearly lophotrochozoans), but found some support for 61

placing the orthonectid Intoshia linei with the annelids (Fig. 2c and figures S1, 2). Intoshia 62

linei has a unique mitochondrial gene order although the order of the genes nad1, nad6, and 63

cob match that seen in the Lophotrochozoan ground plan and the early branching annelid 64

Owenia (Fig. 2d). 65

We next assembled a data set of 469 orthologous genes, 227,187 reliably aligned amino acids, 66

from 45 species of animals including Intoshia linei and two species of Dicyema. After 67

removing positions in the concatenated alignment with less than 50% occupancy we had an 68

alignment length of 190,027 amino acids and average completeness of ~68%. Intoshia linei 69

was 65% complete, while Dicyema japonicum and Dicyema sp. were 77% and 43% complete 70

respectively (table S2). We conducted a bayesian phylogenetic analyses of these data with the 71

site heterogeneous CAT+G4 model in Phylobayes [15]. To provide an additional, conservative 72

author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/235549doi: bioRxiv preprint

Page 4: Molecular data from Orthonectid worms show they are highly ... · 1 2 Title: Molecular data from Orthonectid worms show they 3 are highly degenerate members of phylum Annelida not

estimate of clade support and to enable further analyses in a practical time frame, we also used 73

jackknife subsampling. For each jackknife analysis we took 50 random subsamples of 30,000 74

amino acids each and ran 2,000 cycles (phylobayes CAT+G4) per sample. All 50 subsamples 75

were summarised into a single tree with the first 1800 trees from each excluded as ‘burnin’ 76

[16]. 77

We observed strong support for a clade of Lophotrochozoa (excluding Rotifers) including both 78

dicyemids and the orthonectid (Bayesian Posterior Probability (PP) = 1.0; Jackknife Proportion 79

(JP) = 0.97) (Fig. 3a). The dicyemids and orthonectids were not each other’s closest relatives; 80

the position of the dicyemids within the Lophotrochozoa was not resolved; they were not the 81

sister group of the platyhelminths nor of the gastrotrichs in our analysis. The position of the 82

orthonectid Intoshia, in contrast, was resolved as being within the clade of annelids (Fig. 2a PP 83

= 0.97; JP = 0.74). 84

We next asked whether there was any effect from long branched dicyemids on the strength of 85

support for inclusion of Intoshia within the Annelida - Intoshia also being a long-branched 86

taxon. Repeating our jackknife analyses with dicyemids excluded increased the support for 87

Intoshia as an annelid from JP = 0.74 to JP = 0.86 (Fig. 3b) showing that when the expected 88

LBA between Dicyema spp and Intoshia is prevented, there is stronger support for including 89

the orthonectid in Annelida. An equivalent analysis omitting Intoshia did not help to resolve 90

the position of dicyemids (figure S3). 91

To test further the support for Intoshia being a member of Annelida, we reasoned that an 92

analysis restricted to genes showing the strongest signal supporting monophyletic Annelida 93

should give stronger support to Intoshia within Annelida but only if it is indeed a member of 94

the clade; if not, support should decrease when using this subset of genes. We first removed all 95

mesozoan sequences from each individual gene alignment and reconstructed a tree for each 96

gene. We ranked these gene trees according to the proportion of all annelids present in a given 97

author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/235549doi: bioRxiv preprint

Page 5: Molecular data from Orthonectid worms show they are highly ... · 1 2 Title: Molecular data from Orthonectid worms show they 3 are highly degenerate members of phylum Annelida not

gene data set that were observed united in a clade. We concatenated the genes (now including 98

mesozoans) from strongest supporters of monophyletic Annelida to weakest. We repeated our 99

jackknife analyses using the best quarter of genes. An analysis of the genes that most strongly 100

support monophyletic Annelida results in an increase support for inclusion of Intoshia within 101

Annelida from JP = 0.74 to JP = 0.94 (Fig. 3c). 102

Our results suggest that recent findings of a close relationship between Intoshia and Dicyema 103

and the linking of both these taxa to rapidly evolving gastrotrichs and platyhelminths [7,8] is 104

due to long branch attraction. To test this prediction we exaggerated the expected effects of 105

LBA on our own data set by using less well fitting models. We first conducted cross validation 106

comparing the site heterogeneous CAT+G4 model we have used to the site homogenous 107

LG+G4 and show that LG+G4 is a significantly less good fit to our data (CAT+G4 is better 108

than LG+G4: ΔlnL = 9787 +/- 249.265). We used the less well fitting LG+G4 model to 109

reanalyse the jackknife replicates of a data set including our four most complete annelids. We 110

observed a topology clearly influenced by LBA in which long branched taxa including 111

flatworms, annelids, rotifers and nematodes were grouped. We also observed within this ‘LBA 112

assemblage’ the two longest branched clades, dicyemids and the orthonectid as each other’s 113

closest relatives. As a further test we reanalysed the published data set [8] which had linked 114

orthonectid and dicyemid with platyhelminths and gastrotrichs. When we removed the most 115

obvious source of LBA - the long branched dicyemid - we found that the orthonectid Intoshia 116

was, as expected, found not with platyhelminths or gastrotrichs but with the two annelids 117

present in this data set, again providing evidence of the effects of long branch attraction (Fig 118

4). 119

Discussion 120

We have analysed the first, almost complete mitochondrial genome sequence of an orthonectid 121

mesozoan and added to the known mitochondrial genes of Dicyemida to provide two powerful 122

author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/235549doi: bioRxiv preprint

Page 6: Molecular data from Orthonectid worms show they are highly ... · 1 2 Title: Molecular data from Orthonectid worms show they 3 are highly degenerate members of phylum Annelida not

rare genomic changes. Our analyses of mitochondrial NAD5 gene sequences show 123

unequivocally that both Dicyemida and Orthonectida are members of the protostomes and the 124

absence of rhabditophoran flatworm mitochondrial genetic code changes rejects existing ideas 125

that either group might be derived from parasitic flatworms. Both groups show unusually high 126

rates of evolution and this required steps to test for and avoid the possible effects of long branch 127

attraction, not least between the orthonectids and dicyemids. 128

Our mitochondrial data set and our large, taxonomically broad set of nuclear genes with a low 129

percentage of missing data, analysed with well fitting, site heterogeneous models of sequence 130

evolution, do not support the close relationship between orthonectids and dicyemids. 131

Orthonectids are annelids and not members of the Mesozoa and the phylum Mesozoa sensu 132

lato is an unnatural polyphyletic assemblage. We were unable to place the dicyemids more 133

precisely and they may be considered a phylum in their own right. Experiments manipulating 134

the expected effects of LBA strongly suggest previous phylogenies were affected by this 135

important source of systematic error. Finding the orthonectids and dicyemids not closely 136

associated demonstrates a remarkable instance of convergent evolution in two unrelated, 137

miniaturised parasites. 138

The finding that the orthonectid Intoshia is a member of the Annelida shows that it has evolved 139

its extraordinary simplicity by drastic simplification from a much more complex annelid 140

common ancestor. Our phylogenetic analyses could not more precisely place Intoshia within 141

the annelids, however, a short stretch of mitochondrial genes (nad1, nad6, cob) that are found 142

in the same order as in the lophotrochozoan ancestor and in the early branching annelid Owenia 143

fusiformis but not in the pleistoannelid ground plan argues for a position outside of the 144

Pleistoannelida [17] (Fig 2d). Possible evidence of an ancestral segmented body plan is still 145

apparent in the series of circular muscles regularly spaced along the antero-posterior axis of 146

Intoshia (Fig 1b), along with similarly repeated bands of cilia (Fig 1 and ref [18]). Further 147

author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/235549doi: bioRxiv preprint

Page 7: Molecular data from Orthonectid worms show they are highly ... · 1 2 Title: Molecular data from Orthonectid worms show they 3 are highly degenerate members of phylum Annelida not

analysis of the genome, embryology and morphology of Intoshia or other orthonectids are 148

predicted to show additional clues as to their cryptic annelidan ancestry. 149

Methods 150 151

Genome and transcriptome assemblies. 152

We downloaded genomic (Intoshia linei: SRR4418796, SRR4418797) and transcriptomic 153

(Dicyema sp.: SRR827581; Dicyema japonicum: DRR057371) data from the NCBI Short 154

Read Archives and DDBJ, and used Trimmomatic [19] to clean residual adapter sequences 155

from the sequencing reads and to remove low quality bases. We used the clc assembly cell 156

(clcBIO/Qiagen; v.5.0) to re-assemble the I. linei genome and the Trinity pipeline [20] 157

(v.2.3.2) to assemble the Dicyema. sp. and D. japonicum transcriptomes using default 158

settings. We additionally assembled transcriptomes for Phascolopsis gouldii, 159

Spiochaetopterus sp., Arenicola marina, Sabella pavonina, Magelona pitelkai, 160

Pharyngocirrus tridentiger and Bonellia viridis from SRA datasets (SRR1654498, 161

SRR1224605, SRR2005653, SRR2005708, SRR2015609, SRR2016714, SRR2017645) 162

using the same approach. 163

164

Identifying mitochondrial genome fragments. 165

Using mitochondrial gene protein coding sequences from flatworms as queries [21] we used 166

tblastn [22] and blastp to search for Dicyema sp. and D. japonicum mitochondrial fragments 167

in the Trinity RNA-Seq assemblies, and screened the I. linei genome re-assembly in a similar 168

way. Positively identified ORFs were then blasted against NCBI nr to detect possible 169

contamination from host species in the RNA-Seq data. For each Dicyema sp. gene-bearing 170

contig, we also found additional contigs which had strongly matching blast hits to Octopus or 171

author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/235549doi: bioRxiv preprint

Page 8: Molecular data from Orthonectid worms show they are highly ... · 1 2 Title: Molecular data from Orthonectid worms show they 3 are highly degenerate members of phylum Annelida not

other cephalopods (or in some cases to the gastropod mollusc Aplysia) and we discarded 172

these as likely contaminations. 173

174

Annotating mitochondrial genomes. 175

Using blast we identified a 14.2kb mitochondrial contig in the assembled I. linei genome, 176

which we annotated using MITOS [23]. The location of protein-coding genes were manually 177

verified from MITOS prediction, and inferred to start from the first in-frame start codon 178

(ATN, GTG, TTG, or GTT). The C-terminal of the protein-coding genes was inferred to be 179

the first in-frame stop codon (TAA, TAG or TGA). We aligned the Intoshia and Dicyema 180

NAD5 genes with those from 5 protostomes, 4 deuterostomes, and 2 non-bilaterian species in 181

the Geneious software to visualise Protostome specific signatures in the sequence. 182

183

Mitochondrial Phylogenetics 184

We grouped the mesozoan mitochondrial protein coding genes with their orthologs from 65 185

other species selected to cover the diversity of the Metazoa including diploblasts, 186

deuterostomes and ecdysozoans but with an emphasis on the diversity of Lophotrochozoa. 187

We aligned each set of orthologs using Muscle [24] v3.8.31 using default parameters and 188

trimmed these alignments to exclude unreliably aligned positions using TrimAl [25] (version 189

1.2 rev 59 using default settings). Finally, we concatenated the trimmed alignments of all 190

genes into a supermatrix of 2969 positions. We inferred a phylogeny with phylobayes (4.1b) 191

under the CAT+G4 model. We ran 10 independent chains for 10,000 cycles each. We 192

summarised all ten chains (bpcomp) discarding the first 8,000 trees from each as burnin. We 193

reconstructed additional mitochondrial phylogenies omitting (i) the long branching flatworm 194

species, (ii) all long branch taxa and also Intoshia, and (iii) long branch taxa and the Dicyema 195

author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/235549doi: bioRxiv preprint

Page 9: Molecular data from Orthonectid worms show they are highly ... · 1 2 Title: Molecular data from Orthonectid worms show they 3 are highly degenerate members of phylum Annelida not

species. Here and elsewhere we visualised and edited phylogenetic trees with FigTree 196

(v1.4.3; http://tree.bio.ed.ac.uk/software/figtree/). 197

198

Nuclear gene orthology determination 199

We chose to add the mesozoan data to sets of orthologous genes that were previously 200

successfully used to infer lophotrochozoan phylogeny [26,27]. We first used Orthofinder [28] 201

(v.1.0.8) to calculate orthologous relationships between the genes predicted for I. linei in the 202

recent genome paper [8] and our Dicyema sp. gene predictions. To ensure robustness of the 203

analysis we included several outgroup species (Supplementary table 2) In particular, as we 204

were concerned about potential contamination by the hosts of the parasitic Dicyema we 205

included the Octopus bimaculoides proteome. Since the published phylogenomic studies 206

included few annelid species we added our own Trinity assemblies of several additional 207

species (see above). We then extracted all orthologous groups containing the Octopus and the 208

two mesozoan taxa from the Orthofinder output and inserted these sequences into the original 209

alignments. This resulted in 590 orthologous groups. With the aid of OMA [29] and custom 210

Perl scripts we filtered these groups to contain single copy orthologs of all species. We re-211

aligned each set of orthologs using clustal-omega [30]; we removed unreliably aligned 212

positions from each alignment using TrimAl; finally we constructed individual gene trees 213

from these trimmed alignments using phyml [31] (v20160207). Using Python code and the 214

ETE3 toolkit we checked each tree for instances where sequences from Octopus and Dicyema 215

sp. were each other’s closest relatives (suggesting the sequence is an Octopus contaminant) 216

and removed the 5 alignments where the trees had this topology from our set. We 217

concatenated all single trimmed alignments of 45 taxa into a supermatrix of 227,646 218

author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/235549doi: bioRxiv preprint

Page 10: Molecular data from Orthonectid worms show they are highly ... · 1 2 Title: Molecular data from Orthonectid worms show they 3 are highly degenerate members of phylum Annelida not

positions. We used a custom script to eliminate all positions in the alignment with less than 219

50% occupancy. 220

221

Nuclear Gene Phylogenomic analyses 222

Using the mpi version of phylobayes (in v.1.7) run over four independent chains for 5000 223

cycles and discarding the first 4500 trees as burnin we reconstructed a phylogeny using this 224

alignment under both the CAT+G4 model of molecular evolution. To provide a conservative 225

measure of clade support and to test different data samples in a reasonable time we also 226

reconstructed trees using 50 jackknife sub-samples of 30,000 positions each from the 227

supermatrix. We used phylobayes 4.1c with the aid of the gnu-parallel command line tool 228

[32] and the UCL HPC cluster. We used the CAT+G4 model, and also compared results 229

from LG+G4. We ran phylobayes for 2000 cycles per jackknife sample which consistently 230

resulted in a plateauing of the likelihood score. We summarised all 50 of these phylobayes 231

analyses per model (using bpcomp) discarding the first 1800 sampled trees per jackknife as 232

burnin. We also tested the effect of different species compositions in our dataset by 233

performing phylobayes jackknife sampling with different subsets of taxa. 234

235

Cross validation 236

We compared the fit of CAT+G4 and LG+G4 models to our data using cross validation as 237

described in the phylobayes user manual. We ran 10 replicates and for each replicate we used 238

a randomly selected 30,000 positions of the data as a training set and 10,000 randomly 239

author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/235549doi: bioRxiv preprint

Page 11: Molecular data from Orthonectid worms show they are highly ... · 1 2 Title: Molecular data from Orthonectid worms show they 3 are highly degenerate members of phylum Annelida not

selected positions as the test set. Log likelihood scores were averaged over the ten replicates 240

using the sumcv command. 241

242

Ranking genes according to support for monophyletic Annelida. 243

We first removed all Intoshia and Dicyema sequences from each individual gene alignment. 244

For each individual gene, we reconstructed a tree from the aligned protein coding sequences 245

using Ninja [33]. Each tree was parsed using a custom script to find the proportion of 246

annelids in the data set present in the largest clade of annelids found. The tree was given a 247

score which was calculated as the number of annelids in the largest clade/total number of 248

annelids on the tree. Trees with larger monophyletic annelid clades scored highest. The 249

genes were then concatenated in order of their score. We took the first 25% of positions from 250

this concatenation (those genes with the strongest signal supporting monophyletic annelids) 251

and analysed jackknife replicates as before. 252

253 Acknowledgements: 254

255 The authors are grateful to Prof George Slyusarev (Saint Petersburg State University, Russia) 256

for providing images of Intoshia linei and for sharing an unpublished book chapter and to Dr 257

Tsai-Ming Lu and colleagues (Okinawa Institute of Science and Technology, Japan) for 258

sharing data. The authors also thank Fraser Simpson (UCL) for help in editing Figure 1b. The 259

research was funded by ERC grant (ERC-2012-AdG 322790) to MJT. Alignments have been 260

deposited with Zenodo (XXX) and phylogenetic trees are available through treebase.org 261

(XXX). 262

263

264

author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/235549doi: bioRxiv preprint

Page 12: Molecular data from Orthonectid worms show they are highly ... · 1 2 Title: Molecular data from Orthonectid worms show they 3 are highly degenerate members of phylum Annelida not

Author Contributions 265

Conceived the study: MJT. Planned the study: MJT and PHS. Assembled the data sets: PHS. 266

Analysed the data: PHS and MJT. Drafted the manuscript: MJT and PHS. Analysed 267

mitochondrial data: HER. 268

269

Declaration of Interests: 270

The authors declare no competing financial interests. 271 272 273 Figure Legends 274 275 Fig. 1: The mesozoans Intoshia variabili and Dicyema typus 276 277 A. Differential Interference contrast micrograph of an Intoshia variabili female showing 278 repeated bands of ciliated cells. Picture G. Slyusarev (St Petersburg State University, 279 Russia). 280 B. Confocal image of a phalloidin stained female specimen of Intoshia linei reveals repeated 281 set of circular muscles. Picture G. Slyusarev (St Petersburg State Univ.). 282 C. Rhombogen stage of a dicyemid (Dicyema typus from the Octopus) adapted from Hyman 283 L.H. The Invertebrates: Protozoa through Ctenophora McGraw-Hill, New York 1940(19). 284 Anterior to right in all images. 285 286 Fig. 2: Analyses of the phylogenetic positions of Dicyema and Intoshia based on 287 mitochondrial gene sequences. 288 289 A. Alignment of the mitochondrial NAD5 gene from selected protostomes, deuterostomes, 290 and outgroups, highlighting derived substitutions and amino acid deletions shared by the 291 orthonectids, dicyemids, and other protostomes. 292 B. A mitochondrial bayesian phylogeny based on 2969 positions places orthonectids and 293 dicyemids inside Lophotrochozoa, but the unlikely assemblage of Intoshia linei and 294 flatworms with annelids suggest this is affected by systematic error. 295 C. Mitochondrial bayesian phylogeny omitting the long branching taxa including Dicyema 296 gives some support for a position of Intoshia within Annelida.D. Order of the Intoshia nad1, 297 nad6, and cob mitochondrial genes in comparison to the early branching annelid Owenia 298 fusiformis, the pleistoannelid ground plan and the lophotrochozoan ground plan (see ref [17]). 299 300

author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/235549doi: bioRxiv preprint

Page 13: Molecular data from Orthonectid worms show they are highly ... · 1 2 Title: Molecular data from Orthonectid worms show they 3 are highly degenerate members of phylum Annelida not

Fig. 3: Analyses of the phylogenetic positions of Dicyema and Intoshia based on nuclear 301 gene sequences. 302

A. A bayesian phylogeny reconstructed from 190,027 aligned amino acid positions analysed 303 under the CAT+G4 model. Support values are from bayesian posterior probabilities (PP) and 304 from 50 jackknifed sub-samples of 30,000 residues (JP support values in brackets). Both 305 analyses reveal Mesozoa to be polyphyletic and place Intoshia linei in Annelida (see Supp 306 Fig 4a for support values). 307

B. A repeat of the jackknife analysis omitting the long-branching Dicyema species eliminates 308 the potential for LBA between Intoshia and Dicyema. This leads to an increase in the support 309 for a position of Intoshia within Annelida from JP 0.74 to JP 0.86 JP. (Only lophotrochozoan 310 part of the tree shown, see Supplementary Fig 4c for full tree). 311

C. Bayesian jackknife using CAT+G4 model using the best quarter of genes supporting 312 monophyletic annelids leads to increased support for Intoshia within Annelida to JP 0.94 313 even with the inclusion of the Dicyema species. (Only lophotrochozoan part of the tree 314 shown, see Supplementary Fig 4d for full tree). 315

316

Fig. 4: Reanalysis of a published data set addressing potential LBA between mesozoans 317 supports annelid affinity for Intoshia. 318

Repeating the analyses on a previously published data set [8] excluding the long branching 319 Dicyema leads to Intoshia being placed with the annelids, showing the likely effect of LBA 320 on the original analysis. Support values are bayesian posterior probabilities (PP). 321

author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/235549doi: bioRxiv preprint

Page 14: Molecular data from Orthonectid worms show they are highly ... · 1 2 Title: Molecular data from Orthonectid worms show they 3 are highly degenerate members of phylum Annelida not

A B

C

Fig. 1: The mesozoans Intoshia variabili and Dicyema typus A. Differential Interference contrast micrograph of an Intoshia variabili female showing repeated bands of ciliated cells. Picture G. Slyusarev (St Petersburg State University, Russia). B. Confocal image of a phalloidin stained female specimen of Intoshia linei reveals repeated set of circular muscles. Picture G. Slyusarev (St Petersburg State Univ.). C. Rhombogen stage of a dicyemid (Dicyema typus from the Octopus) adapted from Hyman L.H. The Invertebrates: Protozoa through Ctenophora McGraw-Hill, New York 1940(19). Anterior to right in all images.

author/funder. All rights reserved. N

o reuse allowed w

ithout permission.

The copyright holder for this preprint (w

hich was not peer-review

ed) is the.

https://doi.org/10.1101/235549doi:

bioRxiv preprint

Page 15: Molecular data from Orthonectid worms show they are highly ... · 1 2 Title: Molecular data from Orthonectid worms show they 3 are highly degenerate members of phylum Annelida not

A

B C

0.1

Nematostella_sp

Aplysia_californica

Doliolum_nationalisBranchiostoma_floridae

Acropora_tenuis

Siphonodentalium_lobatum

Lampetra_fluviatilis

Asymmetron_inferum

Loligo_bleekeri

Fasciola_hepaticaTaenia_crassiceps

Sepioteuthis_lessoniana

Tribolium_castaneum

Biomphalaria_tenagophila

Sepia_officinalis

Locusta_migratoria

Decipisagitta_decipiens

Limulus_polyphemus

Schistosoma_mansoni

Xenoturbella_bocki

Sepia_esculenta

Bugula_neritina

Terebratulina_retusa

Pupa_strigosa

Microstomum_lineare

Echinococcus_multilocularis

Marphysa_sanguinea

Orbinia_latreillii

Sagitta_inflata

Dicyema_japonicum

Saccoglossus_kowalevskii

Strongyloides_stercoralis

Escarpia_spicata

Urechis_caupo

Trichinella_spiralis

Lineus_alborostratusNautilus_macromphalus

Steinernema_carpocapsae

Mytilus_trossulus

Spadella_cephaloptera

Thais_clavigera

Sipunculus_nudus

Echinococcus_shiquicus

Intoshia_linei

Ornithorhynchus_anatinus

Priapulus_caudatus

Perionyx_excavatus

Strongylocentrotus_pallidus

Myxine_glutinosaSalmo_salar

Amynthas_jiriensis

Homo_sapiens

Balanoglossus_carnosus

Aurelia_aurita

Octopus_vulgaris

Trichoplax_adhaerens

Octopus_ocellatus

Laqueus_rubellus

Ciona_intestinalis

Dosidicus_gigas

Myzostoma_seymourcollegiorum

Clymenella_torquata

Flustrellidra_hispida

Dicyema_spDicyema_misakiense

Schmidtea_mediterranea

Lumbricus_terrestris

Daphnia_pulex

Platynereis_dumerilii

0.331

0.57

0.41

0.3

0.68

0.87

1

1

0.980.96

0.62

0.1

Aurelia_auritaStrongylocentrotus_pallidus

Dosidicus_gigas

Flustrellidra_hispidaBugula_neritina

Limulus_polyphemus

Biomphalaria_tenagophila

Asymmetron_inferumTribolium_castaneum

Decipisagitta_decipiens

Octopus_vulgaris

Branchiostoma_floridae

Xenoturbella_bocki

Priapulus_caudatus

Thais_clavigera

Amynthas_jiriensis

Ornithorhynchus_anatinus

Orbinia_latreillii

Daphnia_pulexLocusta_migratoria

Doliolum_nationalis

Octopus_ocellatus

Salmo_salar

Sagitta_inflata

Saccoglossus_kowalevskii

Mytilus_trossulus

Myxine_glutinosa

Homo_sapiens

Myzostoma_seymourcollegiorum

Ciona_intestinalis

Intoshia_linei

Trichoplax_adhaerens

Loligo_bleekeri

Terebratulina_retusa

Spadella_cephaloptera

Urechis_caupo

Lampetra_fluviatilis

Aplysia_californica

Siphonodentalium_lobatum

Sepia_esculenta

Laqueus_rubellus

Platynereis_dumerilii

Sepioteuthis_lessoniana

Nautilus_macromphalus

Nematostella_sp

Escarpia_spicataPerionyx_excavatus

Marphysa_sanguinea

Balanoglossus_carnosus

Acropora_tenuis

Clymenella_torquata

Sipunculus_nudus

Lineus_alborostratus

Pupa_strigosa

Sepia_officinalis

Lumbricus_terrestris

0.80.69

0.39

0.94

0.92

1

0.83

1

0.741

Annelida

DicyemaIntoshiaOctopusPhonorisTaeniaBrugiaDrosophilaXenopusPongoXenoturbellaHydraParamecium

180 190 260 270 280 290

Fig. 2: Analyses of the phylogenetic positions of Dicyema and Intoshia based on mitochondrial gene sequences. A. Alignment of the mitochondrial NAD5 gene from selected protostomes, deuterostomes, and outgroups, highlighting derived substitutions and amino

acid deletions shared by the orthonectids, dicyemids, and other protostomes. B. A mitochondrial bayesian phylogeny based on 2969 positions places orthonectids and dicyemids inside Lophotrochozoa, but the unlikely assemblage

of Intoshia linei and flatworms with annelids suggest this is affected by systematic error. C. Mitochondrial bayesian phylogeny omitting the long branching taxa including Dicyema gives some support for a position of Intoshia within Annelida. D. Order of the Intoshia nad1, nad6, and cob mitochondrial genes in comparison to the early branching annelid Owenia fusiformis, the pleistoannelid ground plan and the lophotrochozoan ground plan (see ref [17]).

-nad4 -cox1 nad1 nad6 cob nad3 rrnL

nad5 nad2 nad1 nad6 cob nad4L nad4

cox2 atp8 cox3 nad6 cob atp6 nad5

rrnS rrnL nad1 nad6 cob nad4L nad4D

Pleistoannelida

Intoshia linei

Owenia fusiformis

Lophotrochozoa

author/funder. All rights reserved. N

o reuse allowed w

ithout permission.

The copyright holder for this preprint (w

hich was not peer-review

ed) is the.

https://doi.org/10.1101/235549doi:

bioRxiv preprint

Page 16: Molecular data from Orthonectid worms show they are highly ... · 1 2 Title: Molecular data from Orthonectid worms show they 3 are highly degenerate members of phylum Annelida not

A

AnnelidaB

0.5

Maritigrella_crozieri

Macrodasys_sp

Cerebratulus_sp

Macrostomum_lignano

Tubulanus_polymorphus

Sabella_pavonina

Helobdella_robustaIntoshia_linei

Cephalothrix_linearis

Pharyngocirrus_tridentiger

Lymnea_stagnalis

Lepidodermella_squamata

Octopus_bimaculoides

Golfingia_vulgaris

Adineta_ricciaeMesodasys_laticaudatus

Neomenia_megatrapezata

Schistosoma_mansoni

Bonellia_viridisAlvinella_pompejana

Macracanthorhynchus_hirudinaceus

Dicyema_japonicum

Lottia_gigantea

Magelona_pitelkai

Monocelis_sp

Stenostomum_sp

Capitella_tellata

Dicyema_sp

Dendrocoelum_lacteum

C

0.5

Lepidodermella_squamata

Helobdella_robusta

Magelona_pitelkai

Intoshia_linei

Tubulanus_polymorphus

Stenostomum_sp

Macracanthorhynchus_hirudinaceus

Alvinella_pompejana

Macrodasys_sp

Monocelis_sp

Sabella_pavonina

Adineta_ricciae

Bonellia_viridis

Macrostomum_lignano

Lottia_gigantea

Octopus_bimaculoides

Golfingia_vulgaris

Lymnea_stagnalis

Dendrocoelum_lacteum

Neomenia_megatrapezata

Cephalothrix_linearis

Capitella_tellata

Schistosoma_mansoni

Pharyngocirrus_tridentiger

Maritigrella_crozieri

Cerebratulus_sp

Mesodasys_laticaudatus

0.86 0.94

Annelida0.5 0.5

0.1

Pediculus_humanus

Lymnea_stagnalis

Macrostomum_lignano

Monocelis_sp

Alvinella_pompejana

Tribolium_castaneum

Patiria_miniata

Macracanthorhynchus_hirudinaceus

Tubulanus_polymorphus

Schistosoma_mansoni

Helobdella_robusta

Adineta_ricciae

Strongylocentrotus_purpuratus

Dendrocoelum_lacteum

Homo_sapiens

Hydra_vulgaris

Trichinella_spiralis

Rhabdopleura_sp

Nematostella_vectensis

Capitella_tellata

Priapulus_caudatus

Neomenia_megatrapezata

Maritigrella_crozieri

Magelona_pitelkai

Cephalothrix_linearis

Sycon_coactum

Lepidodermella_squamata

Trichoplax_adhaerens

Dicyema_sp

Amphimedon_queenslandica

Stenostomum_sp

Bonellia_viridis

Macrodasys_sp

Lottia_gigantea

Intoshia_linei

Saccoglossus_kowalevskii

Branchiostoma_floridae

Cerebratulus_sp

Pharyngocirrus_tridentiger

Daphnia_pulex

Mesodasys_laticaudatus

Golfingia_vulgaris

Octopus_bimaculoides

Sabella_pavonina

Dicyema_japonicum

0.1

0.97 (0.74)

1 (0.97)

Fig. 3: Analyses of the phylogenetic positions of Dicyema and Intoshia based on nuclear gene sequences. A. A bayesian phylogeny reconstructed from 190,027 aligned amino acid positions analysed under the CAT+G4 model. Support values are from bayesian posterior probabilities (PP) and from 50 jackknifed sub-samples of 30,000 residues (JP support values in brackets). Both analyses reveal Mesozoa to be polyphyletic and place Intoshia linei in Annelida (see Supp Fig 4a for support values). B. A repeat of the jackknife analysis omitting the long-branching Dicyema species eliminates the potential for LBA between Intoshia and Dicyema. This leads to an increase in the support for a position of Intoshia within Annelida from JP 0.74 to JP 0.86 JP. (Only lophotrochozoan part of the tree shown, see Supplementary Fig 4c for full tree). C. Bayesian jackknife using CAT+G4 model using the best quarter of genes supporting monophyletic annelids leads to increased support for Intoshia within Annelida to JP 0.94 even with the inclusion of the Dicyema species. (Only lophotrochozoan part of the tree shown, see Supplementary Fig 4d for full tree).

author/funder. All rights reserved. N

o reuse allowed w

ithout permission.

The copyright holder for this preprint (w

hich was not peer-review

ed) is the.

https://doi.org/10.1101/235549doi:

bioRxiv preprint

Page 17: Molecular data from Orthonectid worms show they are highly ... · 1 2 Title: Molecular data from Orthonectid worms show they 3 are highly degenerate members of phylum Annelida not

Fig. 4: Reanalysis of a published data set addressing potential LBA between mesozoans supports annelid affinity for Intoshia.

Repeating the analyses on a previously published data set [7] excluding the long branching Dicyema leads to Intoshia being placed with the annelids, showing the likely effect of LBA on the original analysis. Support values are bayesian posterior probabilities (PP).

Annelida

0.1

Daphnia_pulex

Bothrioplana_semperi

Lepidodermella_squamata

Helobdella_robusta

Homo_sapiens

Stenostomum_leucops

Schistosoma_mansoni

Mesodasys_laticaudatus

Geocentrophora_applanata

Macrodasys_sp

Stenostomum_sthenum

Octopus_bimaculatus

Prostheceraeus_vittatus

Branchiostoma_floridae

Monocelis_fusca

Tribolium_castaneum

Austrognathia_sp

Ciona_intestinalis

Drosophila_melanogaster

Brachionus_koreanusMacracanthorhynchus_hirudinaceus

Gnathostomula_paradoxa

Lottia_gigantea

Microstomum_lineare

Diuronotus_aspetos

Intoshia_linei

Limnognathia_maerski

Capitella_teleta0.980.93

1

1

1

1

1

1

1

11

1

1

0.98

1

1

1

1

1

1

0.99

0.94

0.911

0.96

1

author/funder. All rights reserved. N

o reuse allowed w

ithout permission.

The copyright holder for this preprint (w

hich was not peer-review

ed) is the.

https://doi.org/10.1101/235549doi:

bioRxiv preprint

Page 18: Molecular data from Orthonectid worms show they are highly ... · 1 2 Title: Molecular data from Orthonectid worms show they 3 are highly degenerate members of phylum Annelida not

Supplementary Tables 322 323

Supplementary Table 1 324 Predicted correspondence of nucleotide triplets to amino acids in Intoshia and three Dicyema 325 species. For each triplet, the amino acid corresponding to the triplet in the standard 326 invertebrate mitochondrial code is shown, the number of observations of the triplet to 327 prediction is based on, the predicted amino acid and its score and finally the second highest 328 scoring amino acid prediction. The triplets AAA and ATA are highlighted in green and likely 329 errors highlighted in blue. Likely errors are mostly associated with very low numbers of 330 observed GC rich triplets in these very AT rich mitochondrial genomes. 331

332

Supplementary Table 2 333 List of species used in the final phylogenetic analysis, data sources, and representation in the 334 final alignment. 335

336 337

Supplementary Figures: 338 339

Figure S1. Related to Figure 2. 340 Phylogram and corresponding cladogram of a Bayesian analysis of our mitochondrial data set 341 omitting the long-branching flatworm species. Phylobayes CAT+G4 model was run in 10 342 independent runs for 10,000 cycles each on an alignment with 2969 positions and 8000 trees 343 were discarded as burnin. 344

345

Figure S2. Related to Figure 2. 346

Phylogram and corresponding cladogram of a Bayesian analysis of our mitochondrial data set 347 omitting Intoshia linei. Phylobayes CAT+G4 model was run in 10 independent runs for 348 10,000 cycles each on an alignment with 2969 positions and 8000 trees were discarded as 349 burnin. 350

351

Figure S3. Related to Figure 3. 352

A phylogram based on our analysis of the jackknifed dataset omitting Intoshia linei. Contrary 353 to the improvement in placing I. linei observed when excluding the Dicyema species, the 354 exclusion of I. linei does not lead to a better resolution of the Dicyema species’ position. This 355 can be seen as further evidence for the non-affiliation of orthonectids and dicyemids and the 356 correct inference that orthonectids are part of Annelida. 357

author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/235549doi: bioRxiv preprint

Page 19: Molecular data from Orthonectid worms show they are highly ... · 1 2 Title: Molecular data from Orthonectid worms show they 3 are highly degenerate members of phylum Annelida not

Figure S4. Related to Figure 3. 358 A. Cladogram corresponding to Fig 3a showing all PP support values for the CAT+G4 359 phylogeny based on the full alignment of 190,027 amino acid positions. 360 361 B. A cladogram including JP support values based on 50 jackknife subsamples of 30,000 362 amino acid positions each independently analysed for 2000 cycles under the CAT+G4 model 363 in phylobayes and summarised with the bpcomp command setting 1800 as burnin. As in the 364 analysis of the full dataset I. linei is found within the annelids and phylum Mesozoa is found 365 as an unnatural assemblage. 366 367 C. Cladogram corresponding to Fig 3b showing all support values. 368 369 D. Cladogram corresponding to Fig 3c showing all support values. 370

371 References: 372

373 [1] Slyusarev GS, Starunov VV. The structure of the muscular and nervous systems of 374

the female Intoshia linei (Orthonectida). Org Divers Evol 2015;16:65–71. 375 [2] Furuya H, Hochberg FG, Tsuneki K. Cell number and cellular composition in 376

infusoriform larvae of dicyemid mesozoans (Phylum Dicyemida). Zool Sci 377 2004;21:877–89. 378

[3] Nielsen C. Animal Evolution. Oxford University Press; 2011. 379 [4] Dodson EO. A note on the systematic position of the Mesozoa. Syst Zoo 1956;5:37. 380 [5] Suzuki TG, Ogino K, Tsuneki K, Furuya H. Phylogenetic analysis of dicyemid 381

mesozoans (phylum Dicyemida) from innexin amino acid sequences: Dicyemids are 382 not related to Platyhelminthes. J Parasitol 2010;96:614–25. 383

[6] Hanelt B, Van Schyndel D, Adema CM, Lewis LA, Loker ES. The phylogenetic 384 position of Rhopalura ophiocomae (Orthonectida) based on 18S ribosomal DNA 385 sequence analysis. Mol Biol Evol 1996;13:1187–91. 386

[7] Lu T-M, Kanda M, Satoh N, Furuya H. The phylogenetic position of dicyemid 387 mesozoans offers insights into spiralian evolution. Zool Letts 2017;3:419. 388

[8] Mikhailov KV, Slyusarev GS, Nikitin MA, Logacheva MD, Penin AA, Aleoshin VV, et 389 al. The genome of Intoshia linei affirms orthonectids as highly simplified spiralians. 390 Curr Biol 2016;26:1768–74. 391

[9] Philippe H, Brinkmann H, Copley RR, Moroz LL, Nakano H, Poustka AJ, et al. 392 Acoelomorph flatworms are deuterostomes related to Xenoturbella. Nature 393 2011;470:255–8. 394

[10] Slyusarev GS. Fine structure and development of the cuticle of Intoshia variabili 395 (Orthonectida). Acta Zool 2000;81:1–8. 396

[11] Telford MJ, Copley RR. Improving animal phylogenies with genomic data. Trends 397 Genet 2011;27:186–95. 398

[12] Telford MJ. Turning Hox “signatures” into synapomorphies. Evol Dev 2000;2:360–4. 399 [13] Kobayashi M, Furuya H, Holland PW. Dicyemids are higher animals. Nature 400

1999;401:762–2. 401 [14] Telford MJ, Herniou EA, Russell RB, Littlewood DT. Changes in mitochondrial 402

genetic codes as phylogenetic characters: two examples from the flatworms. P Natl 403 Acad Sci Usa 2000;97:11359–64. 404

[15] Lartillot N, Blanquart S, Lepage T. PhyloBayes 3.3 a Bayesian software for 405 phylogenetic reconstruction and molecular dating using mixture models. 2012. 406

author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/235549doi: bioRxiv preprint

Page 20: Molecular data from Orthonectid worms show they are highly ... · 1 2 Title: Molecular data from Orthonectid worms show they 3 are highly degenerate members of phylum Annelida not

[16] Simion P, Philippe H, Baurain D, Jager M, Richter DJ, Di Franco A, et al. A large 407 and consistent phylogenomic dataset supports sponges as the sister group to all 408 other animals. Curr Biol 2017;27:958–67. 409

[17] Weigert A, Golombek A, Gerth M, Schwarz F, Struck TH, Bleidorn C. Evolution of 410 mitochondrial gene order in Annelida. Mol Phyl Evol 2016;94:196–206. 411

[18] Slyusarev GS, Kristensen RM. Fine structure of the ciliated cells and ciliary rootlets 412 of Intoshia variabili (Orthonectida). Zoomorphology 2002;122:33–9. 413

[19] Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina 414 sequence data. Bioinformatics 2014;30:2114–20. 415

[20] Haas BJ, Papanicolaou A, Yassour M, Grabherr M, Blood PD, Bowden J, et al. De 416 novo transcript sequence reconstruction from RNA-seq using the Trinity platform for 417 reference generation and analysis. Nat Protoc 2013;8:1494–512. 418

[21] Robertson HE, Lapraz F, Egger B, Telford MJ, Schiffer PH. The mitochondrial 419 genomes of the acoelomorph worms Paratomella rubra, Isodiametra pulchra and 420 Archaphanostoma ylvae. Sci Rep 2017;7:1847. 421

[22] Altschul SF, Gish W, Miller W, Myers EW. Basic local alignment search tool. Journal 422 of Mol Biol 1990;215:403–10. 423

[23] Bernt M, Donath A, Jühling F, Externbrink F, Florentz C, Fritzsch G, et al. MITOS: 424 improved de novo metazoan mitochondrial genome annotation. Mol Phyl Evol 425 2013;69:313–9. 426

[24] Edgar RC. MUSCLE: a multiple sequence alignment method with reduced time and 427 space complexity. BMC Bioinformatics 2004;5:113. 428

[25] Capella-Gutiérrez S, Silla-Martínez JM, Gabaldón T. TrimAl: a tool for automated 429 alignment trimming in large-scale phylogenetic analyses. Bioinformatics 430 2009;25:1972–3. 431

[26] Egger B, Lapraz F, Tomiczek B, Müller S, Dessimoz C, Girstmair J, et al. A 432 transcriptomic-phylogenomic analysis of the evolutionary relationships of flatworms. 433 Curr Biol 2015;25:1347–53. 434

[27] Struck TH, Wey-Fabrizius AR, Golombek A, Hering L, Weigert A, Bleidorn C, et al. 435 Platyzoan paraphyly based on phylogenomic data supports a noncoelomate 436 ancestry of spiralia. Mol Biol Evol 2014;31:1833–49. 437

[28] Emms DM, Kelly S. OrthoFinder: solving fundamental biases in whole genome 438 comparisons dramatically improves orthogroup inference accuracy. Genome Biol 439 2015;16:E9–13. 440

[29] Altenhoff AM, Škunca N, Glover N, Train C-M, Sueki A, Piližota I, et al. The OMA 441 orthology database in 2015: function predictions, better plant support, synteny view 442 and other improvements. Nucleic Acids Res 2015;43:D240–9. 443

[30] Sievers F, Wilm A, Dineen D, Gibson TJ, Karplus K, Li W, et al. Fast, scalable 444 generation of high-quality protein multiple sequence alignments using Clustal 445 Omega. Mol Syst Biol 2011;7:1–6. 446

[31] Morrison DA. Increasing the efficiency of searches for the maximum likelihood tree 447 in a phylogenetic analysis of up to 150 nucleotide sequences. Systematic Biol 448 2007;56:988–1010. 449

[32] Tange O. Gnu parallel-the command-line power tool. The USENIX Magazine; 2011. 450 [33] Wheeler TJ. Large-Scale Neighbor-Joining with NINJA. Algorithms in Bioinformatics, 451

vol. 5724, Berlin, Heidelberg: Springer Berlin Heidelberg; 2009, pp. 375–89. 452

~50µm

author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/235549doi: bioRxiv preprint

Page 21: Molecular data from Orthonectid worms show they are highly ... · 1 2 Title: Molecular data from Orthonectid worms show they 3 are highly degenerate members of phylum Annelida not

0.1

Saccoglossus_kowalevskii

Lumbricus_terrestrisPerionyx_excavatusAmynthas_jiriensis

Strongylocentrotus_pallidus

Thais_clavigera

Marphysa_sanguinea

Lineus_alborostratus

Ciona_intestinalis

Siphonodentalium_lobatum

Xenoturbella_bocki

Spadella_cephaloptera

Homo_sapiens

Octopus_vulgaris

Daphnia_pulex

Myzostoma_seymourcollegiorum

Lampetra_fluviatilis

Biomphalaria_tenagophilaPupa_strigosa

Terebratulina_retusa

Loligo_bleekeri

Flustrellidra_hispida

Escarpia_spicata

Tribolium_castaneum

Aplysia_californica

Platynereis_dumerilii

Sagitta_inflata

Myxine_glutinosa

Asymmetron_inferum

Locusta_migratoria

Dicyema_japonicum

Ornithorhynchus_anatinus

Mytilus_trossulus

Octopus_ocellatus

Dosidicus_gigas

Nautilus_macromphalus

Branchiostoma_floridae

Intoshia_linei

Priapulus_caudatus

Laqueus_rubellus

Clymenella_torquata

Balanoglossus_carnosus

Limulus_polyphemus

Urechis_caupo

Dicyema_sp

Sepia_esculentaSepia_officinalis

Doliolum_nationalis

Decipisagitta_decipiens

Orbinia_latreillii

Trichoplax_adhaerens

Dicyema_misakiense

Acropora_tenuis

Salmo_salar

Bugula_neritina

Nematostella_sp

Sepioteuthis_lessoniana

Aurelia_aurita

Sipunculus_nudus

0.74

1

0.69

1

0.99

1

0.63

0.64

0.92

1

1

0.87

1

0.87

10.94

0.32

0.47

0.99

0.97

0.94

0.96

0.57

1

0.38

0.96

0.74

1

0.42

0.39

0.431

0.89

0.99

0.94

0.39

0.41

0.44

0.97

0.99

1

0.94

0.55

1

0.63

0.83

0.32

0.990.97

0.58

0.85

0.78

1

0.73

0.98

0.1

Saccoglossus_kowalevskii

Lumbricus_terrestrisPerionyx_excavatusAmynthas_jiriensis

Strongylocentrotus_pallidus

Thais_clavigera

Marphysa_sanguinea

Lineus_alborostratus

Ciona_intestinalis

Siphonodentalium_lobatum

Xenoturbella_bocki

Spadella_cephaloptera

Homo_sapiens

Octopus_vulgaris

Daphnia_pulex

Myzostoma_seymourcollegiorum

Lampetra_fluviatilis

Biomphalaria_tenagophilaPupa_strigosa

Terebratulina_retusa

Loligo_bleekeri

Flustrellidra_hispida

Escarpia_spicata

Tribolium_castaneum

Aplysia_californica

Platynereis_dumerilii

Sagitta_inflata

Myxine_glutinosa

Asymmetron_inferum

Locusta_migratoria

Dicyema_japonicum

Ornithorhynchus_anatinus

Mytilus_trossulus

Octopus_ocellatus

Dosidicus_gigas

Nautilus_macromphalus

Branchiostoma_floridae

Intoshia_linei

Priapulus_caudatus

Laqueus_rubellus

Clymenella_torquata

Balanoglossus_carnosus

Limulus_polyphemus

Urechis_caupo

Dicyema_sp

Sepia_esculentaSepia_officinalis

Doliolum_nationalis

Decipisagitta_decipiens

Orbinia_latreillii

Trichoplax_adhaerens

Dicyema_misakiense

Acropora_tenuis

Salmo_salar

Bugula_neritina

Nematostella_sp

Sepioteuthis_lessoniana

Aurelia_aurita

Sipunculus_nudus

Figure S1:Phylogram and corresponding cladogram of a Bayesian analysis of our mitochondrial data set omitting the long-branching flatworm species. Phylobayes CAT+G4 model was run in 10 independent runs for 10,000 cycles each on an alignment with 2969 positions and 8000 trees were discarded as burnin.

Figure S1. Related to Figure 2.

author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/235549doi: bioRxiv preprint

Page 22: Molecular data from Orthonectid worms show they are highly ... · 1 2 Title: Molecular data from Orthonectid worms show they 3 are highly degenerate members of phylum Annelida not

0.1

Salmo_salar

Dosidicus_gigas

Perionyx_excavatus

Doliolum_nationalis

Sagitta_inflata

Sepia_officinalis

Nematostella_sp

Sepioteuthis_lessoniana

Decipisagitta_decipiens

Amynthas_jiriensis

Acropora_tenuis

Homo_sapiensAsymmetron_inferum

Loligo_bleekeri

Aurelia_aurita

Priapulus_caudatus

Pupa_strigosa

Trichoplax_adhaerens

Clymenella_torquata

Ornithorhynchus_anatinus

Thais_clavigera

Ciona_intestinalis

Xenoturbella_bocki

Sepia_esculenta

Dicyema_misakiense

Sipunculus_nudus

Urechis_caupo

Octopus_vulgaris

Strongylocentrotus_pallidus

Myxine_glutinosa

Branchiostoma_floridae

Biomphalaria_tenagophila

Saccoglossus_kowalevskii

Lampetra_fluviatilis

Dicyema_japonicum

Siphonodentalium_lobatum

Platynereis_dumeriliiMarphysa_sanguinea

Spadella_cephaloptera

Bugula_neritina

Nautilus_macromphalus

Dicyema_sp

Mytilus_trossulus

Myzostoma_seymourcollegiorum

Terebratulina_retusaAplysia_californica

Tribolium_castaneum

Flustrellidra_hispida

Limulus_polyphemus

Octopus_ocellatus

Laqueus_rubellus

Balanoglossus_carnosus

Locusta_migratoriaDaphnia_pulex

Orbinia_latreillii

Lumbricus_terrestris

Lineus_alborostratus

Escarpia_spicata

1

0.990.66

0.970.92

0.81

0.761

0.97

1

0.99

0.57

0.91

0.98

0.71

0.56

0.93

1

0.89 0.99

0.72

0.96

1

0.56

0.59

1

1

0.98

0.95

0.88

0.97

1

0.85

0.8

0.64

0.96

0.91

0.99

1

0.97

0.68

1

1

0.761

1

0.74

0.98

0.1

Salmo_salar

Dosidicus_gigas

Perionyx_excavatus

Doliolum_nationalis

Sagitta_inflata

Sepia_officinalis

Nematostella_sp

Sepioteuthis_lessoniana

Decipisagitta_decipiens

Amynthas_jiriensis

Acropora_tenuis

Homo_sapiensAsymmetron_inferum

Loligo_bleekeri

Aurelia_aurita

Priapulus_caudatus

Pupa_strigosa

Trichoplax_adhaerens

Clymenella_torquata

Ornithorhynchus_anatinus

Thais_clavigera

Ciona_intestinalis

Xenoturbella_bocki

Sepia_esculenta

Dicyema_misakiense

Sipunculus_nudus

Urechis_caupo

Octopus_vulgaris

Strongylocentrotus_pallidus

Myxine_glutinosa

Branchiostoma_floridae

Biomphalaria_tenagophila

Saccoglossus_kowalevskii

Lampetra_fluviatilis

Dicyema_japonicum

Siphonodentalium_lobatum

Platynereis_dumeriliiMarphysa_sanguinea

Spadella_cephaloptera

Bugula_neritina

Nautilus_macromphalus

Dicyema_sp

Mytilus_trossulus

Myzostoma_seymourcollegiorum

Terebratulina_retusaAplysia_californica

Tribolium_castaneum

Flustrellidra_hispida

Limulus_polyphemus

Octopus_ocellatus

Laqueus_rubellus

Balanoglossus_carnosus

Locusta_migratoriaDaphnia_pulex

Orbinia_latreillii

Lumbricus_terrestris

Lineus_alborostratus

Escarpia_spicata

Figure S2:

Phylogram and corresponding cladogram of a Bayesian analysis of our mitochondrial data set omitting Intoshia linei. Phylobayes CAT+G4 model was run in 10 independent runs for 10,000 cycles each on an alignment with 2969 positions and 8000 trees were discarded as burnin.

Figure S2. Related to Figure 2.

author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/235549doi: bioRxiv preprint

Page 23: Molecular data from Orthonectid worms show they are highly ... · 1 2 Title: Molecular data from Orthonectid worms show they 3 are highly degenerate members of phylum Annelida not

0.1

Octopus_bimaculoides

Nematostella_vectensis

Maritigrella_crozieri

Pharyngocirrus_tridentiger

Priapulus_caudatusTrichinella_spiralis

Dicyema_japonicum

Bonellia_viridis

Branchiostoma_floridae

Tubulanus_polymorphusMacrostomum_lignano

Schistosoma_mansoni

Magelona_pitelkai

Adineta_ricciae

Homo_sapiens

Amphimedon_queenslandica

Saccoglossus_kowalevskii

Cephalothrix_linearis

Stenostomum_sp

Cerebratulus_sp

Dendrocoelum_lacteum

Helobdella_robusta

Sycon_coactum

Hydra_vulgaris

Mesodasys_laticaudatus

Dicyema_spLottia_gigantea

Trichoplax_adhaerens

Strongylocentrotus_purpuratus

Monocelis_sp

Sabella_pavonina

Alvinella_pompejana

Macrodasys_sp

Lymnea_stagnalis

Rhabdopleura_spPatiria_miniata

Capitella_tellata

Golfingia_vulgaris

Neomenia_megatrapezata

Tribolium_castaneum

Macracanthorhynchus_hirudinaceusPediculus_humanusDaphnia_pulex

0.93

0.94

1

0.9

0.96

0.980.98

1

0.97

1

1

0.79

0.87

1

1

0.98

1

1

0.98

1

1

1

1

0.53

0.98

10.99

0.84

0.82

1

1

1

1

0.51

1

0.1

Octopus_bimaculoides

Nematostella_vectensis

Maritigrella_crozieri

Pharyngocirrus_tridentiger

Priapulus_caudatusTrichinella_spiralis

Dicyema_japonicum

Bonellia_viridis

Branchiostoma_floridae

Tubulanus_polymorphusMacrostomum_lignano

Schistosoma_mansoni

Magelona_pitelkai

Adineta_ricciae

Homo_sapiens

Amphimedon_queenslandica

Saccoglossus_kowalevskii

Cephalothrix_linearis

Stenostomum_sp

Cerebratulus_sp

Dendrocoelum_lacteum

Helobdella_robusta

Sycon_coactum

Hydra_vulgaris

Mesodasys_laticaudatus

Dicyema_spLottia_gigantea

Trichoplax_adhaerens

Strongylocentrotus_purpuratus

Monocelis_sp

Sabella_pavonina

Alvinella_pompejana

Macrodasys_sp

Lymnea_stagnalis

Rhabdopleura_spPatiria_miniata

Capitella_tellata

Golfingia_vulgaris

Neomenia_megatrapezata

Tribolium_castaneum

Macracanthorhynchus_hirudinaceusPediculus_humanusDaphnia_pulex

Figure S3:

A phylogram based on our analysis of the jackknifed dataset omitting Intoshia linei. Contrary to the improvement in placing I. linei observed when excluding the Dicyema species, the exclusion of I. linei does not lead to a better resolution of the Dicyema species’ position. This can be seen as further evidence for the non-affiliation of orthonectids and dicyemids and the correct inference that orthonectids are part of Annelida.

Figure S3. Related to Figure 3.

author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/235549doi: bioRxiv preprint

Page 24: Molecular data from Orthonectid worms show they are highly ... · 1 2 Title: Molecular data from Orthonectid worms show they 3 are highly degenerate members of phylum Annelida not

0.1

Strongylocentrotus_purpuratusPatiria_miniata

Adineta_ricciae

Macrodasys_sp

Macrostomum_lignano

Pharyngocirrus_tridentiger

Daphnia_pulex

Branchiostoma_floridae

Golfingia_vulgaris

Lottia_gigantea

Hydra_vulgaris

Lymnea_stagnalis

Neomenia_megatrapezata

Macracanthorhynchus_hirudinaceus

Bonellia_viridis

Stenostomum_sp

Schistosoma_mansoni

Homo_sapiens

Sycon_coactum

Magelona_pitelkai

Cerebratulus_sp

Dicyema_sp

Capitella_tellata

Alvinella_pompejana

Nematostella_vectensis

Tribolium_castaneum

Mesodasys_laticaudatus

Lepidodermella_squamata

Sabella_pavonina

Rhabdopleura_sp

Dicyema_japonicum

Trichinella_spiralis

Intoshia_linei

Dendrocoelum_lacteum

Trichoplax_adhaerens

Helobdella_robusta

Amphimedon_queenslandica

Saccoglossus_kowalevskii

Tubulanus_polymorphusCephalothrix_linearis

Octopus_bimaculoides

Pediculus_humanus

Maritigrella_crozieriMonocelis_sp

Priapulus_caudatus

0.82

1

1

1

0.93

0.86

0.96

0.99

0.5

1

0.98

1

0.97

1

1

1

1

1

0.92

1

0.98

0.75

0.88

0.96

1

1

0.74

0.6

1

1

0.94

0.990.77

0.98

BA

0.1

Pediculus_humanus

Lymnea_stagnalis

Macrostomum_lignano

Monocelis_sp

Alvinella_pompejana

Tribolium_castaneum

Patiria_miniata

Macracanthorhynchus_hirudinaceus

Tubulanus_polymorphus

Schistosoma_mansoni

Helobdella_robusta

Adineta_ricciae

Strongylocentrotus_purpuratus

Dendrocoelum_lacteum

Homo_sapiens

Hydra_vulgaris

Trichinella_spiralis

Rhabdopleura_sp

Nematostella_vectensis

Capitella_tellata

Priapulus_caudatus

Neomenia_megatrapezata

Maritigrella_crozieri

Magelona_pitelkai

Cephalothrix_linearis

Sycon_coactum

Lepidodermella_squamata

Trichoplax_adhaerens

Dicyema_sp

Amphimedon_queenslandica

Stenostomum_sp

Bonellia_viridis

Macrodasys_sp

Lottia_gigantea

Intoshia_linei

Saccoglossus_kowalevskii

Branchiostoma_floridae

Cerebratulus_sp

Pharyngocirrus_tridentiger

Daphnia_pulex

Mesodasys_laticaudatus

Golfingia_vulgaris

Octopus_bimaculoides

Sabella_pavonina

Dicyema_japonicum

1

1

1

1

1

0.75

1

0.75

1

1

0.75

1

1

1

1

1

1

1

1

1

1

0.58

0.75

1

1

1

0.75

1

1

1

1

0.75

1

1

1

1

1

0.97

1

1

Figure S4: A. Cladogram corresponding to Fig 3a showing all PP support values for the CAT+G4 phylogeny based on the full alignment of 190,027 amino acid positions. B. A cladogram including JP support values based on 50 jackknife subsamples of 30,000 amino acid positions each independently analysed for 2000 cycles under

the CAT+G4 model in phylobayes and summarised with the bpcomp command setting 1800 as burnin. As in the analysis of the full dataset I. linei is found within the annelids and phylum Mesozoa is found as an unnatural assemblage.

C. Cladogram corresponding to Fig 3b showing all support values. D. Cladogram corresponding to Fig 3c showing all support values.

0.1

Sycon_coactum

Tribolium_castaneum

Macracanthorhynchus_hirudinaceus

Macrostomum_lignano

Hydra_vulgaris

Amphimedon_queenslandica

Helobdella_robusta

Pediculus_humanus

Cerebratulus_sp

Lymnea_stagnalis

Nematostella_vectensis

Macrodasys_sp

Pharyngocirrus_tridentiger

Dendrocoelum_lacteum

Priapulus_caudatus

Daphnia_pulex

Monocelis_spSchistosoma_mansoni

Lepidodermella_squamata

Saccoglossus_kowalevskii

Trichinella_spiralis

Stenostomum_sp

Golfingia_vulgaris

Neomenia_megatrapezata

Maritigrella_crozieri

Sabella_pavonina

Magelona_pitelkai

Strongylocentrotus_purpuratus

Cephalothrix_linearis

Octopus_bimaculoides

Alvinella_pompejana

Patiria_miniata

Adineta_ricciae

Trichoplax_adhaerens

Rhabdopleura_sp

Tubulanus_polymorphus

Branchiostoma_floridae

Lottia_gigantea

Capitella_tellata

Mesodasys_laticaudatus

Intoshia_linei

Homo_sapiens

Bonellia_viridis

0.9

0.99

0.81

0.97

0.86

1

0.95

1

1

0.91

1

0.511

1

0.97

0.55

0.72

1

0.65

0.8

1

0.86

0.99

0.921

1

0.99

0.97

1

0.99

1

1

1

1

C

0.1

Dicyema_sp

Alvinella_pompejana

Dicyema_japonicum

Amphimedon_queenslandica

Tubulanus_polymorphus

Tribolium_castaneum

Mesodasys_laticaudatus

Strongylocentrotus_purpuratus

Cerebratulus_sp

Cephalothrix_linearis

Capitella_tellata

Hydra_vulgaris

Rhabdopleura_sp

Patiria_miniata

Macracanthorhynchus_hirudinaceus

Priapulus_caudatus

Pharyngocirrus_tridentiger

Daphnia_pulex

Homo_sapiens

Adineta_ricciae

Monocelis_sp

Helobdella_robusta

Sabella_pavonina

Saccoglossus_kowalevskii

Stenostomum_sp

Sycon_coactum

Branchiostoma_floridae

Golfingia_vulgaris

Neomenia_megatrapezata

Macrostomum_lignano

Schistosoma_mansoni

Intoshia_linei

Trichinella_spiralis

Macrodasys_sp

Octopus_bimaculoidesLymnea_stagnalis

Dendrocoelum_lacteum

Lottia_gigantea

Pediculus_humanus

Bonellia_viridis

Trichoplax_adhaerens

Lepidodermella_squamata

Nematostella_vectensis

Maritigrella_crozieri

Magelona_pitelkai

1

1

0.97

1

1

0.59

0.54

0.68

0.54

1

1

0.96

0.66

0.99

0.67

0.99

1

0.99

1

0.88

1

1

1

0.52

0.97

0.94

0.8

0.99

1

1

0.94

10.82

1

1

0.99

1

1

0.99

1

0.70.43

D

Figure S4. Related to Figure 3.

author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprint (which was not peer-reviewed) is the. https://doi.org/10.1101/235549doi: bioRxiv preprint