phylogenetic analyses with systematic taxon sampling show ...10.1038... · 20 and compositional...
TRANSCRIPT
Articleshttps://doi.org/10.1038/s41559-020-1239-x
Phylogenetic analyses with systematic taxon sampling show that mitochondria branch within AlphaproteobacteriaLu Fan 1,2,3,7 ✉, Dingfeng Wu4,7, Vadim Goremykin5,7, Jing Xiao4, Yanbing Xu4, Sriram Garg 6, Chuanlun Zhang1,3, William F. Martin 6 ✉ and Ruixin Zhu 4 ✉
1Shenzhen Key Laboratory of Marine Archaea Geo-Omics, Department of Ocean Science and Engineering, Southern University of Science and Technology (SUSTech), Shenzhen, China. 2Academy for Advanced Interdisciplinary Studies, Southern University of Science and Technology (SUSTech), Shenzhen, China. 3Southern Marine Science and Engineering Guangdong Laboratory, Guangzhou, China. 4Department of Bioinformatics, Putuo People’s Hospital, Tongji University, Shanghai, China. 5Research and Innovation Centre, Fondazione Edmund Mach, San Michele all’Adige, Italy. 6Institute of Molecular Evolution, Heinrich Heine University, Düsseldorf, Germany. 7These authors contributed equally: Lu Fan, Dingfeng Wu, Vadim Goremykin. ✉e-mail: [email protected]; [email protected]; [email protected]
SUPPLEMENTARY INFORMATION
In the format provided by the authors and unedited.
NAtuRe eCoLoGY & eVoLutioN | www.nature.com/natecolevol
1
Supplementary Information 1
This file contains Supplementary Notes 1–9, Supplementary Methods, Supplementary Figures 1–60, and 2 Supplementary References. 3
4
Table of Contents 5
Supplementary Notes………02 6
Supplementary Methods…...07 7
Supplementary Figures…….09 8
Supplementary References...69 9
10
2
Supplementary Notes 11
12
Supplementary Note 1 13
Difficulties in resolving the phylogenetic relationship with extant alphaproteobacterial lineages (see a 14 detailed review in reference 5). (1) Considerable phylogenetic divergence and metabolic variety within 15
Alphaproteobacteria1; (2) faint historical signals left behind the very ancient event of mitochondria origin2; 16 (3) limited number of marker genes shared between mitochondria and Alphaproteobacteria due to extensive 17
gene loss in the prior3; (4) taxonomic bias in datasets towards clinically or agriculturally important 18 alphaproteobacterial members4; and (5) strong phylogenetic artefacts such as long-branch attraction (LBA) 19
and compositional heterogeneity associating mitochondria with fast-evolving alphaproteobacterial lineages. 20
21
Supplementary Note 2 22
Heterogeneity mitigation methods. As datasets in studies on phylogeny between mitochondria and 23
Alphaproteobacteria heavily suffer from compositional heterogeneity and LBA, various approaches to 24 mitigate non-historical signals have been adopted but the drawbacks of these methods are rarely examined. 25 Among them, protein recoding cause signal loss and artificial mutation saturation6. Nucleus-encoded 26
mitochondrial genes have to be adapted to new rules of expression and regulation in the nucleus system and 27 therefore may actually have undergone intensive site substation compared to mitochondrion-encoded genes. 28
Thus, the reliability of using nucleus-encoded mitochondrial genes in phylogenetic analysis of mitochondria 29 need further justification7,8. The idea of excluding potentially model-violating sites to improve phylogenetic 30
prediction was introduced over two decades ago9,10 but has been opposed by some researchers (see review 31 by Shepherd and Klaere11). The concern is that in spite of non-historical signals, these sites may contain 32
useful information. Nonetheless, various versions of site exclusion have been applied in phylogenetic studies 33 of mitochondria and Alphaproteobacteria either based on evolving rate12,13 or amino acid composition14,15,16. 34
However, conflicting results were reported by using different site-exclusion metrics17. 35
36
Supplementary Note 3 37
Explanations and verifications to the topological shift of trees observed in Martijn et al. (2018). In the 38 study of Martijn et al.16, it was observed that when certain amount of sites in the alignment matrix were 39
excluded by using either a Stuart-score based stationary trimming method or a !2-score based method, the 40
tree topology shifted from supporting Rickettsiales-sister to Alphaproteobacteria-sister for mitochondria. At 41 least two mechanisms may explain this observation: (1) Mitochondria originated independently from extant 42
Alphaproteobacteria. The affinity between mitochondria and Rickettsiales in phylogenetic trees is the result 43 of convergent evolution in amino acid composition instead of common evolutionary history. Site-exclusion 44
approaches filter out this non-historical signal by removing the most heterogenous sites in datasets and 45 restore the true phylogenetic position of mitochondria; and (2) Historical signals between Mitochondria and 46
some alphaproteobacterial groups exist but are weak as the result of divergent evolution for billions of years. 47
3
Site-exclusion approaches employed by Martijn et al. removed most of the sites containing these signals and 48 broke the true evolutionary connection between mitochondria and Alphaproteobacteria. The long branch of 49 mitochondria in the phylogenetic tree is then attracted by the long branch leading towards the outgroup taxa 50
containing other proteobacteria resulting an Alphaproteobacteria-sister topology. However, since there is a 51 lack of known close relatives to the entire Alphaproteobacteria clade, it is impossible to validate the LBA 52
effect in this tree by replacing the distant outgroup with a much closer alternative (but see the placement of 53 mitochondria in the Alpha IIb with Alpha IIa as a short-branched outgroup in this study). 54
Martijn et al. testified the potent LBA effect of outgroup taxa by conducting three analyses16. As admitted 55 by them, the outgroup removal analysis did not proof or disproof the hypothesis of LBA by the outgroup. 56
Secondly, the parametric simulation approach they conducted generated ten trees in which eight recovered 57 a Rickettsiales-sister topology with weak node supports (below 80%) (Supplementary Figure 20 in their 58
publication). As suggested by the IQ-TREE manuscript17, Ultra-Fast bootstrapping values below 95% are 59 unreliable. Therefore, the result they obtained may actually suggest that a trustable Rickettsiales-sister 60
topology was unable to be recovered and LBA effect of the outgroup might have a function on it. Lastly, the 61 authors declared that random sequence replacements showed that the long branch of outgroup did not attract 62 any of the random sequences. However, this is completely untrue! In the ten trees they provided as 63
Supplementary Data, there are 8, 0, 3, 8, 5, 0, 0, 6, 0, and 3 of the 10 random sequences attracted by the 64 outgroup, respectively. Therefore, the authors failed to exclude the possible artificial attraction of outgroup 65
to mitochondria after site exclusion. 66
67
Supplementary Note 4 68
Notable inconsistence in data reproducibility in the study of Martijn et al.16. In addition to the 69
misinterpretation of results by Martijn et al. we mentioned in Supplementary Note 3, we here report two 70 cases of data irreproducibility in Martijn et al. (2018). First, the Bayesian consensus tree we obtained of the 71
untreated ‘24-alphamitoCOGs’ dataset (Supplementary Figure 1) is fundamentally different from the ones 72 show in Supplementary Figure 9 in Martijn et al. (2018), albeit we used the same dataset and tree 73
construction parameters as they did. Specifically, in our tree, fast-evolving taxonomic groups do not form 74 monophyletic groups. Instead, some taxa such as Pelagibacter were placed with slow-evolving 75 alphaproteobacteria. Mitochondria and Rickettsiales formed a monophyletic clade, which is in adjacent by 76
other alphaproteobacterial groups. Thus, our result shows notable capacity of the CAT model in dealing with 77 phylogenetic artefacts. 78
Second, the results of posterior predictive tests to the Bayesian inference of the site-excluded dataset based 79 on Stuart’s test (this tree is the cornerstone to support the Alphaproteobacteria-sister conclusion of 80
mitochondria by Martijn et al., Supplementary Table 2) are not comparable to the ones reported by Martijn 81 et al. (Supplementary Figure 11 and Supplementary Table 3 in reference 16). Specifically, they reported a 82
much better model fit according to the maximum squared heterogeneity (Z-score 0.5 – 0.6 in their study, but 83 around 3.09 in this study; P-value 0.25 – 0.27 in their study, but around 0.01 in this study) and mean squared 84
heterogeneity (Z-score -2.5 – -2.3 in their study, but around 2.67 in this study; P-value 0.98 – 0.99 in their 85
4
study, but around 0.004 in this study). We are not sure why such inconsistent results were obtained in these 86 two studies since (1) we used the same dataset (the dataset was directly provided by Martijn) and tree 87 reconstruction protocol; and (2) we both obtain the almost same consensus tree topology (Supplementary 88
Figure 2 in this study and Supplementary Figure 11 in their study). The only difference that may explain the 89 above two cases of inconsistency may be that different versions of PhyloBayes MPI were used in the two 90
studies (i.e. version 1.8 here and 1.7a in their study). 91
92
Supplementary Note 5 93
Sequence heterogeneity and taxa selection of the ‘Modified18’ dataset. In the ‘Modified18’ dataset, GC-94
rich mitochondrial sequences were selected to remarkably reduce the heterogeneity in FYMINK/GARP ratio 95 between mitochondria and slowly evolving alphaproteobacteria (Supplementary Figure 24). High GC taxa 96
of Rickettsiales taxa were also selected with higher genome G&C content in comparison to those in the ‘24-97 alphamitoCOGs’ dataset. The rationale behind this selection is that the phylogenetic position of Rickettsiales 98
to the backbone taxa of alphaproteobacteria is difficulty to resolve (see Supplementary Note 7). 99 Rickettsiales with low G&C content may artificially attracted by other fast-evolving species in the tree 100 leading to the loss of weak phylogenetic connection to the backbone taxa. 101
It is necessary to notice that the introduced GC-rich mitochondria were all from higher plants. While this 102 may compromise the representation of data, it has been noticed as early as in 1980s by Carl Woese et al. that 103
the mitochondrial sequences of higher plants may have diverged from bacterial sequences to low extent18-20, 104 possibly as a result of low mutation rate in genes maintained by DNA repair mechanisms21. With this in 105
concern, plant mitochondria are as suitable as those of jakobids for phylogenetic analysis with bacterial 106 sequences in term of branch length. Indeed, branch lengths of plant mitochondria in trees based on the 107
‘Modified18’ dataset are comparable to those of single cell eukaryotes based on the ‘Modified24’ dataset as 108 shown in this study (Supplementary Figures 31–36). 109
110
Supplementary Note 6 111
The four backbone clades. Group GT (Supplementary Table 3) is equivalent to Geminicoccaceae in 112 Muñoz-Gómez et al. (2019)15. Alpha I comprises core Alphaproteobacterial orders including 113 Kordiimonadales, Sphingomonadales, Rhizobiales, Caulobacterales, Parvularculales and Rhodobacterales. 114
Grouping of these lineages is in consistence with the findings by Muñoz-Gómez et al. and others1,15,22. Alpha 115 II comprises three isolates belonging to Rhodospirillaceae and several marine alphaproteobacterial 116
metagenome-assembled genomes (MAGs). Grouping of these lineages was observed in by Williams et al. 117 and others1,15,22. Alpha III comprises Kiloniellaceae, SAR116, Acetobacteraceae, Azospirillaceae, and some 118
taxa classified to the polyphyletic Rhodospirillaceae. This result is similar to the finding by Muñoz-Gómez 119 et al.15. 120
Notably, separation of these four groups were exactly recovered by Martijn et al. in their untreated ‘24-121 alphamitoCOGs’ dataset (Supplementary Figure 9, 10 in Martijn et al. (2018)), but not in their stationary-122
5
trimmed dataset (Fig. 4a and Supplementary Figure 11, 12 in Martijn et al. (2018)), which they claimed to 123 support their ‘mito-out’ result. This again suggests site exclusion may result in abnormal tree topology for 124 even slow-evolving species. 125
126
Supplementary Note 7 127
Phylogenetic positions of fast-evolving alphaproteobacterial lineages. Holosporales was previously 128 considered as a subclade of Rickettsiales based on phylogenetic analysis and the factor that members of both 129
groups are obligate endosymbionts23-25. However, other studies suggested the phylogenetic affinity between 130 Holosporales and other Rickettsiales families is the result of artifact16,23,26-28. In the very recent study by 131
Muñoz-Gómez et al. where amino acid bias was corrected by site exclusion, Holosporales was suggested to 132 have a derived position within the Rhodospirillales and possibly close to Azospirllaceae15. Here, 133
Holosporales are in sister-relationship with the entire Alpha III (Fig. 2cd). Our results support that 134 Holosporales are independent of Rickettsiales but may be close to taxa in Alpha III. Recent studies suggested 135
the grouping of Pelagibacterales, alpha proteobacterium HIMB59 and Rickettsiales, as reported by many 136 earlier studies is the result of a compositional bias artefact15,16. Using site-excluded datasets, it was suggested 137 that Pelagibacterales should be placed after the common ancestor of Sphingomonadales (belonging to Alpha 138
Ia here) but before the divergence of Rhodobacterales, Caulobacterales and Rhizobiales (belonging to Alpha 139 Ib here)15. Our result is consistent with this (Fig. 2ef). Moreover, the exact phylogenetic position of alpha 140
proteobacterium HIMB59 could not be resolved by itself (Fig. 2gh). When mitochondria were present, it is 141 placed in the Alpha IIb clade based on the ‘Modified18’ dataset (Fig. 3f). Rickettsiales appearing as sister to 142
all other alphaproteobacteria has been reported in some artefact-attenuated studies15 while conflicting results 143 were recovered in others16 suggesting the current difficulty in resolving its relationship with slow-evolving 144
alphaproteobacteria. We found that Rickettsiales were placed within Alpha II, as the sister of MarineAlpha9 145 Bin5 based on the ‘Modified18’ dataset (Fig. 2i). 146
147
Supplementary Note 8 148
Evidences that the tree topology in Fig. 4 is not the results of artefact. First, taxon-exclusion analyses 149 clearly demonstrate the phylogenetic connections of fast-evolving alphaproteobacterial lineages including 150 Rickettsiales and fast-evolving MAGs (FEMAGs) to slow-evolving taxa MarineAlpha9 Bin5 and 151
MarineAlpha11 in the absence of possible influence from non-historical signals (Fig. 2ij). Secondly, 152 mitochondria and these fast-evolving taxa do not form a singlet clade falling apart from backbone clades as 153
a result of LBA – something shown in Supplementary Figure 9, 10 in Martijn et al. (2018). Instead, they 154 were placed with slow-evolving taxa within Alpha IIb (Fig. 3kl and 4). Lastly, taxon-reduction and site-155
exclusion examinations on the datasets ‘Modified24-AlphaII’, ‘Modified24-AlphaII-MoreTaxa’ and 156 ‘Modified18-AlphaII’ as shown in Fig. 1d suggest their tree topology is robust. 157
158
Supplementary Note 9 159
6
Metabolic analysis. Metabolism reconstruction of the bacterial ancestor of mitochondria is essential to 160 syntrophic-based models of eukaryogenesis29-31. Proteome study of alphaproteobacterial sister of 161 mitochondria might provide hints to the metabolic nature of the latest common ancestor of mitochondria and 162
Alphaproteobacteria particularly for those proteins ultimately having been lost in extant mitochondria or 163 nuclei31,32. However, as evidenced by our previous studies33,34, extensive horizontal gene transfer events 164
have been taken place in the 1.8 billion years’ evolutionary history of extant alphaproteobacteria since 165 eukaryogenesis. Consequently, few proteins in extant alphaproteobacteria other than those used for 166
phylogeny here (the 24 marker proteins including riboproteins) were acquired through vertical transfer from 167 the latest common ancestor of mitochondria and Alphaproteobacteria. We therefore think approaches 168
inferring the very ancient event of eukaryogenesis based on proteomes of extant Alphaproteobacteria without 169 robust justification of each protein’s evolutionary pathway is prone to systematic errors. 170
Nevertheless, we here test two hallmarks of metabolic characters in the bacterial ancestor of mitochondria 171 predicted by the ‘hydrogen hypothesis’30 – aerobic and anaerobic oxidation of pyruvate and hydrogen 172
production – by annotating proteomes of taxa in the ‘24-alphamitoCOGs’ dataset. Pyruvate dehydrogenase 173 E1 component alpha subunit (PdhA) involving in aerobic pyruvate oxidation is found in almost all the 174 alphaproteobacteria but only one species in the outgroup (Supplementary Table 7). In contrast, one of the 175
pyruvate ferredoxin oxidoreductases PorA is only detected in Magnetospirillum magneticum, while the other 176 two, OorA and IorA, are found generally in the Alpha II subclade including FEMAG I, some Alpha I and 177
Alpha II taxa and two outgroup taxa. The result suggests ubiquitous aerobic pyruvate oxidation but 178 sporadically distributed anaerobic pyruvate oxidation in extant alphaproteobacteria. NiFe hydrogenase HoxF 179
is only encoded by M. magneticum and two outgroup taxa, probably because most taxa studied here are 180 better adapted to aerobic environments. 181
To investigate the possible evolutionary pathways of these enzymes, phylogenetic trees were reconstructed 182 for PdhA, OorA and IorA (Supplementary Figures 59 and 60). In general, all the trees are in low resolution 183
as shown by nearly half of the nodes with unstable supports (ultra-fast bootstrapping values < 95%) likely 184 caused by short sequence length of each protein. In addition, possible systematic errors such as 185
compositional heterogeneity and LBA may bias the results. Despite, in branches with stable supports, we 186 observe both evidences of local vertical transfer as shown by small monophyletic subclades (e.g. PdhA in 187 Rickettsiales, FEMAG II and Alpha Ia, respectively, Supplementary Figure 59) and lateral transfer as 188
demonstrated by polyphyletic subclades. Overall, it is likely that the three enzymes representing 189 aerobic/anaerobic pyruvate oxidation were present in the common ancestors of all extant alphaproteobacteria 190
and experienced differential loss, duplication and sporadically lateral exchanges with other bacteria. 191 However, since the low resolution of the trees, it is impossible to trace the basal nodes of Alpha II taxa. No 192
evidence of pyruvate oxidation or hydrogen production for the common ancestor of mitochondria and Alpha 193 II taxa can be obtained based on this very limited data. 194
195 196
7
Supplementary Methods 197
198
The site-exclusion method based on the score of Bowker’s test. Compared to Stuart’s test, Bowker’s test 199
of symmetry was reported to more comprehensive and sufficient to assess the compliance of symmetry, 200 reversibility and homogeneity in time-reversible model assumptions35,36. We used Bowker’s test of 201
symmetry to produce subsets of the ‘alphamitoCOGs-24’ dataset by meeting increasingly stringent p-value-202 based thresholds (>0.01, >0.1, >0.3, >0.4 and >0.5, respectively). The Bowker’s test has long been used as 203
an overall test for symmetry36. The test assesses symmetry in an r × r contingency table with the ij-th cell 204 containing the observed frequency nij. The null hypothesis for symmetry is H0 = nij = nji, i ≠ j, i,j = 1,…,r, 205
and the test value is computed by using equation (1). 206
(1) 207
The test statistics follows !2 distribution with the number of degrees of freedom equal to the number of 208
comparisons (nij vs nji) made. 209
The scoring function (SF) utilized for symmetry-based alignment trimming employed here is a sum of 210
absolute values of natural logarithms of Bowker's test's p-values, each raised to a certain power (15 as the 211 default value). SF can be computed as a mean over the values in an upper or lower triangular part of a square 212
matrix which rows and columns represent taxa, populated with |ln p|x values for Bowker’s tests among these 213 taxa, as shown in equation (2). 214
(2) 215
wherein h is the number of taxa in the msa, and pab is a p value for the sequences a and b. 216
The script which performs symmetry-based trimming (available as Supplementary Software) deletes a site 217 in an alignment, computes a SF value and restores the original alignment. The operation is performed for 218
every alignment site. Then, the site which removal results in lowest SF value is deleted irreversibly. The 219 procedure is repeated for each shortened alignment subset until the lowest p-value for a pair-wise Bowker’s 220
test in the trimmed dataset exceeds certain p-value-based threshold(s). 221
Exponentiation in formula 2 leads to a sooner recovery of trimmed subsets. The exponentiation 222 disproportionally increases the addend values in formula 2 (|ln pab|x) for smaller p values. For instance, the 223
default addend in the formula 2 for p-value 0.5 is 0.004 and the addend for p-value 0.005 is 72789633288. 224 Thus, when there is a disparity in individual p-values in the data, which is the case when the method is 225
needed, the exponentiation increases the relative contribution of the lowest p-values onto the SF value size. 226 At each trimming step the heuristic algorithm identifies a site which removal is likely to improve the worst 227
8
(lowest) p-values. The script outputs a trimmed subset when the lowest p-value exceeds the threshold value. 228 The suggested exponentiation, causing preferential improvement of the worst p-values at each site stripping 229 step, is able to deliver a result when less positions are removed. The default exponent value (x = 15) has 230
been determined experimentally. 231
232
Functional annotation and phylogenetic reconstruction of the pyruvate oxidation and hydrogen 233 generation enzymes. A subset of taxa in the ‘24-alphamitoCOGs’ dataset with high genome quality were 234
selected based on validation by using CheckM (v1.0.11)37 followed by two filtering criteria: (1) for each 235 composite bin, the original bin with the highest genome completeness were selected; and (2) only 236
bins/genomes with completeness > 80% were kept. Proteomes of the selected 65 genomes/bins were 237 searched against to the KEGG database by using KofamKOALA ‘-E 0.01’ (v1.2.0)38. Hits to the entries 238
K00161, K00169, K00174, K00179 and K18005 were calculated for each taxon (Supplementary Table 7). 239 Proteins annotated as K00161, K00174 and K00179 were aligned by using MUSCLE (v3.8)39 and then 240
trimmed by using trimAl ‘-gappyout’ (v1.4)40, respectively. Maximum-likelihood trees for the trimmed 241 alignments were reconstructed by using IQ-TREE (v1.6.12)17 with the ‘LG+C60+F’ model set. 242
243
244
9
Supplementary Figures 245
246
247
Supplementary Figure 1 | Bayesian phylogenetic tree of the untreated ‘24-alphamitoCOGs’ dataset. 248 The tree is rooted to the outgroup. Posterior probability support values at nodes are shown. mito, 249 mitochondria. 250 251
10
252
Supplementary Figure 2 | Bayesian phylogenetic tree of the ‘24-alphamitoCOGs’ dataset treated by 253 site exclusion based on the Stuart’s test score. The tree is rooted to the outgroup. Posterior probability 254 support values at nodes are shown. mito, mitochondria. 255 256
11
257
Supplementary Figure 3 | Bayesian phylogenetic tree of the ‘24-alphamitoCOGs’ dataset with 5% sites 258 excluded based on the !2-score method. The tree is rooted to the outgroup. Posterior probability support 259 values at nodes are shown. mito, mitochondria. 260
12
261
Supplementary Figure 4 | Bayesian phylogenetic tree of the ‘24-alphamitoCOGs’ dataset with 10% 262 sites excluded based on the !2-score method. The tree is rooted to the outgroup. Posterior probability 263 support values at nodes are shown. mito, mitochondria. 264 265
13
266
Supplementary Figure 5 | Bayesian phylogenetic tree of the ‘24-alphamitoCOGs’ dataset with 20% 267 sites excluded based on the !2-score method. The tree is rooted to the outgroup. Posterior probability 268 support values at nodes are shown. mito, mitochondria. 269
14
270
Supplementary Figure 6 | Bayesian phylogenetic tree of the ‘24-alphamitoCOGs’ dataset with 40% 271 sites excluded based on the !2-score method. The tree is rooted to the outgroup. Posterior probability 272 support values at nodes are shown. mito, mitochondria. 273 274
15
275 Supplementary Figure 7 | Bayesian phylogenetic tree of the ‘24-alphamitoCOGs’ dataset with 60% 276 sites excluded based on the !2-score method. The tree is rooted to the outgroup. Posterior probability 277 support values at nodes are shown. mito, mitochondria. 278 279
16
280
Supplementary Figure 8 | Bayesian phylogenetic tree of the ‘24-alphamitoCOGs’ dataset with 5% sites 281 excluded based on the evolving-rate score method. The tree is rooted to the outgroup. Posterior probability 282 support values at nodes are shown. mito, mitochondria. 283 284
17
285
Supplementary Figure 9 | Bayesian phylogenetic tree of the ‘24-alphamitoCOGs’ dataset with 10% 286 sites excluded based on the evolving-rate score method. The tree is rooted to the outgroup. Posterior 287 probability support values at nodes are shown. mito, mitochondria. 288
18
289 Supplementary Figure 10 | Bayesian phylogenetic tree of the ‘24-alphamitoCOGs’ dataset with 20% 290 sites excluded based on the evolving-rate score method. The tree is rooted to the outgroup. Posterior 291 probability support values at nodes are shown. mito, mitochondria. 292 293
19
294 Supplementary Figure 11 | Bayesian phylogenetic tree of the ‘24-alphamitoCOGs’ dataset with 40% 295 sites excluded based on the evolving-rate score method. The tree is rooted to the outgroup. Posterior 296 probability support values at nodes are shown. mito, mitochondria. 297 298
20
299
Supplementary Figure 12 | Bayesian phylogenetic tree of the ‘24-alphamitoCOGs’ dataset with 60% 300 sites excluded based on the evolving-rate score method. The tree is rooted to the outgroup. Posterior 301 probability support values at nodes are shown. mito, mitochondria. 302 303
21
304
Supplementary Figure 13 | Bayesian phylogenetic tree of the ‘24-alphamitoCOGs’ dataset with 5% 305 sites excluded based on the ɀ-score method. The tree is rooted to the outgroup. Posterior probability 306 support values at nodes are shown. mito, mitochondria. 307 308
22
309
Supplementary Figure 14 | Bayesian phylogenetic tree of the ‘24-alphamitoCOGs’ dataset with 10% 310 sites excluded based on the ɀ-score method. The tree is rooted to the outgroup. Posterior probability 311 support values at nodes are shown. mito, mitochondria. 312 313
23
314
Supplementary Figure 15 | Bayesian phylogenetic tree of the ‘24-alphamitoCOGs’ dataset with 20% 315 sites excluded based on the ɀ-score method. The tree is rooted to the outgroup. Posterior probability 316 support values at nodes are shown. mito, mitochondria. 317 318
24
319
Supplementary Figure 16 | Bayesian phylogenetic tree of the ‘24-alphamitoCOGs’ dataset with 40% 320 sites excluded based on the ɀ-score method. The tree is rooted to the outgroup. Posterior probability 321 support values at nodes are shown. mito, mitochondria. 322 323
25
324 Supplementary Figure 17 | Bayesian phylogenetic tree of the ‘24-alphamitoCOGs’ dataset with 60% 325 sites excluded based on the ɀ-score method. The tree is rooted to the outgroup. Posterior probability 326 support values at nodes are shown. mito, mitochondria. 327 328
26
329
Supplementary Figure 18 | Bayesian phylogenetic tree of the ‘24-alphamitoCOGs’ dataset treated by 330 site exclusion based on the Bowker’s test score method with P-value > 0.01. The tree is rooted to the 331 outgroup. Posterior probability support values at nodes are shown. mito, mitochondria. 332 333
27
334 Supplementary Figure 19 | Bayesian phylogenetic tree of the ‘24-alphamitoCOGs’ dataset treated by 335 site exclusion based on the Bowker’s test score method with P-value > 0.1. The tree is rooted to the 336 outgroup. Posterior probability support values at nodes are shown. mito, mitochondria. 337 338
28
339 Supplementary Figure 20 | Bayesian phylogenetic tree of the ‘24-alphamitoCOGs’ dataset treated by 340 site exclusion based on the Bowker’s test score method P-value > 0.3. The tree is rooted to the outgroup. 341 Posterior probability support values at nodes are shown. mito, mitochondria. 342 343
29
344 Supplementary Figure 21 | Bayesian phylogenetic tree of the ‘24-alphamitoCOGs’ dataset treated by 345 site exclusion based on the Bowker’s test score method P-value > 0.4. The tree is rooted to the outgroup. 346 Posterior probability support values at nodes are shown. mito, mitochondria. 347 348
30
349 Supplementary Figure 22 | Bayesian phylogenetic tree of the ‘24-alphamitoCOGs’ dataset treated by 350 site exclusion based on the Bowker’s test score method P-value > 0.5. The tree is rooted to the outgroup. 351 Posterior probability support values at nodes are shown. mito, mitochondria. 352 353
31
354
355
Supplementary Figure 23 | Relationships between alignment sites, the phylogenetic position of 356 mitochondria and model fit (max heterogeneity across taxa test) based on different datasets, site-357 exclusion and taxon-selection approaches. Bayesian inference with model CAT+GTR was conducted. X 358 axel shows the number of sites in each dataset for phylogenetic inference. Y axle shows the Z-scores of the 359 ‘max heterogeneity across taxa’ posterior predictive test. Numbers aside markers show node support values 360 (posterior probability support values) of the consensus trees. Values ⩾95 are in bold. Strings in parentheses 361 show the closest relatives of mitochondria in the tree. R, Rickettsiales. T, Tistrella mobilis. F2, FEMAG II. 362 AII, Alpha II. AIII, Alpha III. GT, Geminicoccus roseus and T. mobilis. MA9, MarineAlpha9. mito-in, 363 mitochondria branch within Alphaproteobacteria. mito-out, mitochondria branch outside 364 Alphaproteobacteria. mito-in AlphaIIb, mitochondria brance within the AlphaIIb clade of 365 Alphaproteobacteria. a, site-exclusion methods applied on the ‘24-alphamitoCOGs’ dataset in Martijn et al. 366 (2018). Trees are shown in Supplementary Figure 1–22. b, !2-score based site-exclusion and taxon-367 reduction methods applied on the subsets of the ‘Modified24’ and the ‘Modified18’ datasets, respectively, 368 containing only the backbone, Rickettsiales and mitochondrial sequences. Trees are shown in 369 Supplementary Figure 35, 37–42. c, !2-score based site-exclusion and taxon-reduction methods applied on 370 the subsets of the ‘Modified24’ and the ‘Modified18’ datasets, respectively, containing only the backbone, 371 FEMAGs and mitochondrial sequences. Trees are shown in Supplementary Figure 36, 42–48. d, !2-score 372 based site exclusion applied on the subsets of the ‘Modified24-AlphaII’, the ‘Modified24-AlphaII-MoreTaxa’ 373 and the ‘Modified18-AlphaII’ datasets, respectively. Trees are shown in Fig. 4 and Supplementary Figure 374 50–58. 375 376
32
377
378
379
Supplementary Figure 24 | GC content and amino acid compositional heterogeneity among 380 alphaproteobacterial lineages and mitochondria. Dots represent taxa. Lineages are colored according to 381 Fig. 3 except empty dots represent Beta-, Gammaproteobacteria and Magnetococcales. a, taxa in the 382 ‘Modified24’ dataset. b, taxa in the ‘Modified18’ dataset. 383 384
33
385
386
Supplementary Figure 25 | Bayesian phylogenetic tree of the slowly-evolving backbone taxa of Alphaproteobacteria. The tree is rooted to the outgroup. Posterior 387 probability support values at nodes are shown. a, based on the ‘Modified24’ dataset. b, based on the ‘Modified18’ dataset. 388 389
34
390
Supplementary Figure 26 | Bayesian phylogenetic tree of the backbone taxa of Alphaproteobacteria and Holosporales. The tree is rooted to the outgroup. Posterior 391 probability support values at nodes are shown. a, based on the ‘Modified24’ dataset. b, based on the ‘Modified18’ dataset. 392 393
35
394
Supplementary Figure 27| Bayesian phylogenetic tree of the backbone taxa of Alphaproteobacteria and Pelagibacterales. The tree is rooted to the outgroup. Posterior 395 probability support values at nodes are shown. a, based on the ‘Modified24’ dataset. b, based on the ‘Modified18’ dataset. 396 397
36
398
Supplementary Figure 28 | Bayesian phylogenetic tree of the backbone taxa of Alphaproteobacteria and alpha proteobacterium HIMB59. The tree is rooted to the 399 outgroup. Posterior probability support values at nodes are shown. a, based on the ‘Modified24’ dataset. b, based on the ‘Modified18’ dataset. 400 401
37
402
403
Supplementary Figure 29 | Bayesian phylogenetic tree of the backbone taxa of Alphaproteobacteria and Rickettsiales. The tree is rooted to the outgroup. Posterior 404 probability support values at nodes are shown. a, based on the ‘Modified24’ dataset. b, based on the ‘Modified18’ dataset. 405 406
38
407
Supplementary Figure 30 | Bayesian phylogenetic tree of the backbone taxa of Alphaproteobacteria and FEMAGs. The tree is rooted to the outgroup. Posterior 408 probability support values at nodes are shown. a, based on the ‘Modified24’ dataset. b, based on the ‘Modified18’ dataset. 409 410
39
411
Supplementary Figure 31 | Bayesian phylogenetic tree of the slowly-evolving backbone taxa of Alphaproteobacteria and mitochondria. The tree is rooted to the 412 outgroup. Posterior probability support values at nodes are shown. mito, mitochondria. a, based on the ‘Modified24’ dataset. b, based on the ‘Modified18’ dataset. 413
40
414
Supplementary Figure 32 | Bayesian phylogenetic tree of the backbone taxa of Alphaproteobacteria, Holosporales and mitochondria. The tree is rooted to the 415 outgroup. Posterior probability support values at nodes are shown. mito, mitochondria. a, based on the ‘Modified24’ dataset. b, based on the ‘Modified18’ dataset. 416
41
417
Supplementary Figure 33 | Bayesian phylogenetic tree of the backbone taxa of Alphaproteobacteria, Pelagibacterales and mitochondria. The tree is rooted to the 418 outgroup. Posterior probability support values at nodes are shown. mito, mitochondria. a, based on the ‘Modified24’ dataset. b, based on the ‘Modified18’ dataset. 419
42
420
Supplementary Figure 34 | Bayesian phylogenetic tree of the backbone taxa of Alphaproteobacteria, alpha proteobacterium HIMB59 and mitochondria. The tree 421 is rooted to the outgroup. Posterior probability support values at nodes are shown. mito, mitochondria. a, based on the ‘Modified24’ dataset. b, based on the ‘Modified18’ 422 dataset. 423
43
424
Supplementary Figure 35 | Bayesian phylogenetic tree of the backbone taxa of Alphaproteobacteria, Rickettsiales and mitochondria. The tree is rooted to the 425 outgroup. Posterior probability support values at nodes are shown. mito, mitochondria. a, based on the ‘Modified24’ dataset. b, based on the ‘Modified18’ dataset. 426
44
427
Supplementary Figure 36 | Bayesian phylogenetic tree of the backbone taxa of Alphaproteobacteria, FEMAGs and mitochondria. The tree is rooted to the outgroup. 428 Posterior probability support values at nodes are shown. mito, mitochondria. a, based on the ‘Modified24’ dataset. b, based on the ‘Modified18’ dataset. 429
45
430
Supplementary Figure 37 | Bayesian phylogenetic tree of the backbone taxa of Alphaproteobacteria, Rickettsiales and mitochondria after 5% sites excluded. The 431 !2-score method for site exclusion was applied. The tree is rooted to the outgroup. Posterior probability support values at nodes are shown. mito, mitochondria. a, based on 432 the ‘Modified24’ dataset. b, based on the ‘Modified18’ dataset. 433
46
434
Supplementary Figure 38 | Bayesian phylogenetic tree of the backbone taxa of Alphaproteobacteria, Rickettsiales and mitochondria after 10% sites excluded. The 435 !2-score method for site exclusion was applied. The tree is rooted to the outgroup. Posterior probability support values at nodes are shown. mito, mitochondria. a, based on 436 the ‘Modified24’ dataset. b, based on the ‘Modified18’ dataset. 437
47
438
Supplementary Figure 39 | Bayesian phylogenetic tree of the backbone taxa of Alphaproteobacteria, Rickettsiales and mitochondria after 20% sites excluded. The 439 !2-score method for site exclusion was applied. The tree is rooted to the outgroup. Posterior probability support values at nodes are shown. mito, mitochondria. a, based on 440 the ‘Modified24’ dataset. b, based on the ‘Modified18’ dataset. 441
48
442
Supplementary Figure 40 | Bayesian phylogenetic tree of the backbone taxa of Alphaproteobacteria, Rickettsiales and mitochondria after 40% sites excluded. The 443 !2-score method for site exclusion was applied. The tree is rooted to the outgroup. Posterior probability support values at nodes are shown. mito, mitochondria. a, based on 444 the ‘Modified24’ dataset. b, based on the ‘Modified18’ dataset. 445
49
446
Supplementary Figure 41 | Bayesian phylogenetic tree of the backbone taxa of Alphaproteobacteria, Rickettsiales and mitochondria after 60% sites excluded. The 447 !2-score method for site exclusion was applied. The tree is rooted to the outgroup. Posterior probability support values at nodes are shown. mito, mitochondria. a, based on 448 the ‘Modified24’ dataset. b, based on the ‘Modified18’ dataset. 449
50
450
Supplementary Figure 42 | Bayesian phylogenetic tree of the backbone taxa of Alphaproteobacteria, Rickettsiales and mitochondria after taxon reduction. The !2-451 score method for site exclusion was applied. The tree is rooted to the outgroup. Posterior probability support values at nodes are shown. mito, mitochondria. a, based on the 452 ‘Modified24’ dataset. b, based on the ‘Modified18’ dataset. 453
51
454
Supplementary Figure 43 | Bayesian phylogenetic tree of the backbone taxa of Alphaproteobacteria, FEMAGs and mitochondria after 5% sites excluded. The !2-455 score method for site exclusion was applied. The tree is rooted to the outgroup. Posterior probability support values at nodes are shown. mito, mitochondria. a, based on the 456 ‘Modified24’ dataset. b, based on the ‘Modified18’ dataset. 457 458
52
459
Supplementary Figure 44 | Bayesian phylogenetic tree of the backbone taxa of Alphaproteobacteria, FEMAGs and mitochondria after 10% sites excluded. The !2-460 score method for site exclusion was applied. The tree is rooted to the outgroup. Posterior probability support values at nodes are shown. mito, mitochondria. a, based on the 461 ‘Modified24’ dataset. b, based on the ‘Modified18’ dataset. 462
53
463
Supplementary Figure 45 | Bayesian phylogenetic tree of the backbone taxa of Alphaproteobacteria, FEMAGs and mitochondria after 20% sites excluded. The !2-464 score method for site exclusion was applied. The tree is rooted to the outgroup. Posterior probability support values at nodes are shown. mito, mitochondria. a, based on the 465 ‘Modified24’ dataset. b, based on the ‘Modified18’ dataset. 466
54
467
Supplementary Figure 46 | Bayesian phylogenetic tree of the backbone taxa of Alphaproteobacteria, FEMAGs and mitochondria after 40% sites excluded. The !2-468 score method for site exclusion was applied. The tree is rooted to the outgroup. Posterior probability support values at nodes are shown. mito, mitochondria. a, based on the 469 ‘Modified24’ dataset. b, based on the ‘Modified18’ dataset. 470
55
471
Supplementary Figure 47 | Bayesian phylogenetic tree of the backbone taxa of Alphaproteobacteria, FEMAGs and mitochondria after 60% sites excluded. The !2-472 score method for site exclusion was applied. The tree is rooted to the outgroup. Posterior probability support values at nodes are shown. mito, mitochondria. a, based on the 473 ‘Modified24’ dataset. b, based on the ‘Modified18’ dataset. 474
56
475
Supplementary Figure 48 | Bayesian phylogenetic tree of the backbone taxa of Alphaproteobacteria, FEMAGs and mitochondria after taxon reduction. The !2-476 score method for site exclusion was applied. The tree is rooted to the outgroup. Posterior probability support values at nodes are shown. mito, mitochondria. a, based on the 477 ‘Modified24’ dataset. b, based on the ‘Modified18’ dataset. 478 479
57
480
Supplementary Figure 49 | Bayesian phylogenetic tree of the Alpha II subclade of Alphaproteobacteria. The tree is rooted to the outgroup (Alpha IIa). Posterior 481 probability support values at nodes are shown. a, based on the ‘Modified24-AlphaII’ dataset. b, based on the ‘Modified18-AlphaII’ dataset. 482 483
58
484
485
Supplementary Figure 50 | Bayesian phylogenetic tree of the Alpha II subclade of Alphaproteobacteria and mitochondria. The tree is rooted to the outgroup (Alpha 486 IIa). Posterior probability support values at nodes are shown. mito, mitochondria. a, based on the ‘Modified24-AlphaII’ dataset. b, based on the ‘Modified18-AlphaII’ dataset. 487 488
59
489
490
Supplementary Figure 51 | Bayesian phylogenetic tree of the Alpha II subclade of Alphaproteobacteria and mitochondria after 5% sites excluded. The !2-score 491 method for site exclusion was applied. The tree is rooted to the outgroup (Alpha IIa). Posterior probability support values at nodes are shown. mito, mitochondria. a, based 492 on the ‘Modified24-AlphaII’ dataset. b, based on the ‘Modified18-AlphaII’ dataset. 493 494
60
495
496
Supplementary Figure 52 | Bayesian phylogenetic tree of the Alpha II subclade of Alphaproteobacteria and mitochondria after 10% sites excluded. The !2-score 497 method for site exclusion was applied. The tree is rooted to the outgroup (Alpha IIa). Posterior probability support values at nodes are shown. mito, mitochondria. a, based 498 on the ‘Modified24-AlphaII’ dataset. b, based on the ‘Modified18-AlphaII’ dataset. 499 500
61
501
502
Supplementary Figure 53 | Bayesian phylogenetic tree of the Alpha II subclade of Alphaproteobacteria and mitochondria after 20% sites excluded. The !2-score 503 method for site exclusion was applied. The tree is rooted to the outgroup (Alpha IIa). Posterior probability support values at nodes are shown. mito, mitochondria. a, based 504 on the ‘Modified24-AlphaII’ dataset. b, based on the ‘Modified18-AlphaII’ dataset. 505 506
62
507
508
Supplementary Figure 54 | Bayesian phylogenetic tree of the Alpha II subclade of Alphaproteobacteria and mitochondria after 40% sites excluded. The !2-score 509 method for site exclusion was applied. The tree is rooted to the outgroup (Alpha IIa). Posterior probability support values at nodes are shown. mito, mitochondria. a, based 510 on the ‘Modified24-AlphaII’ dataset. b, based on the ‘Modified18-AlphaII’ dataset. 511 512
63
513
514
Supplementary Figure 55 | Bayesian phylogenetic tree of the Alpha II subclade of Alphaproteobacteria and mitochondria after 60% sites excluded. The !2-score 515 method for site exclusion was applied. The tree is rooted to the outgroup (Alpha IIa). Posterior probability support values at nodes are shown. mito, mitochondria. a, based 516 on the ‘Modified24-AlphaII’ dataset. b, based on the ‘Modified18-AlphaII’ dataset. 517 518
64
519
520
Supplementary Figure 56 | Bayesian phylogenetic tree of the Alpha II subclade of Alphaproteobacteria and mitochondria after 5-10% sites excluded based on the 521 ‘Modified24-AlphaII-MoreTaxa’ dataset. The !2-score method for site exclusion was applied. The tree is rooted to the outgroup (Alpha IIa). Posterior probability support 522 values at nodes are shown. mito, mitochondria. a, 5% sites excluded. b, 10% sites excluded. 523 524
65
525
526
Supplementary Figure 57 | Bayesian phylogenetic tree of the Alpha II subclade of Alphaproteobacteria and mitochondria after 20-40% sites excluded based on 527 the ‘Modified24-AlphaII-MoreTaxa’ dataset. The !2-score method for site exclusion was applied. The tree is rooted to the outgroup (Alpha IIa). Posterior probability 528 support values at nodes are shown. mito, mitochondria. a, 20% sites excluded. b, 40% sites excluded. 529 530
66
531
532
Supplementary Figure 58 | Bayesian phylogenetic tree of the Alpha II subclade of Alphaproteobacteria and mitochondria after 60% sites excluded based on the 533 ‘Modified24-AlphaII-MoreTaxa’ dataset. The !2-score method for site exclusion was applied. The tree is rooted to the outgroup (Alpha IIa). Posterior probability support 534 values at nodes are shown. mito, mitochondria. 535
536
67
537
538 Supplementary Figure 59 | Maximum-likelihood phylogenetic tree of pyruvate dehydrogenase E1 539 component alpha subunit (PdhA) found in the taxa of the ‘24-alphamitoCOGs’ dataset. The tree is 540 rooted to the midpoint. Node values show the ultra-fast bootstraps based on 1,000 iterations. Taxa are541 assignedtosubcladesnamedaccordingtoSupplementary Table 3. 542
543
68
544
545
Supplementary Figure 60 | Maximum-likelihood phylogenetic tree of pyruvate ferredoxin oxidoreductase found in the taxa of the ‘24-alphamitoCOGs’ dataset. 546 The trees are rooted to the midpoints. Node values show the ultra-fast bootstraps based on 1,000 iterations. Taxa are assigned to subclades named according to547 Supplementary Table 3. a, 2-oxoglutarate/2-oxoacid ferredoxin oxidoreductase subunit alpha (OorA). b, indolepyruvate ferredoxin oxidoreductase subunit alpha (IorA). 548
549
69
Supplementary References 550
551
1. Ettema, T. J. & Andersson, S. G. The alpha-proteobacteria: the Darwin finches of the bacterial world. 552 Biol Lett 5, 429-432 (2009). 553
2. Betts, H. C., Puttick, M. N., Clark, J. W., Williams, T. A., et al. Integrated genomic and fossil evidence 554 illuminates life's early evolution and eukaryote origin. Nat Ecol Evol (2018). 555
3. Karnkowska, A., Vacek, V., Zubáčová, Z., Treitli, S. C., et al. A eukaryote without a mitochondrial 556 organelle. Curr Biol 26, 1274-1284 (2016). 557
4. Brindefalk, B., Ettema, T. J., Viklund, J., Thollesson, M. & Andersson, S. G. A phylometagenomic 558 exploration of oceanic alphaproteobacteria reveals mitochondrial relatives unrelated to the SAR11 559 clade. PLoS One 6, e24457 (2011). 560
5. Roger, A. J., Muñoz-Gómez, S. A. & Kamikawa, R. The origin and diversification of mitochondria. Curr 561 Biol 27, R1177-R1192 (2017). 562
6. Philippe, H., Brinkmann, H., Lavrov, D. V., Littlewood, D. T., et al. Resolving difficult phylogenetic 563 questions: why more sequences are not enough. PLoS Biol 9, e1000602 (2011). 564
7. Derelle, R. & Lang, B. F. Rooting the eukaryotic tree with mitochondrial and bacterial proteins. Mol Biol 565 Evol 29, 1277-1289 (2012). 566
8. Adams, K. L., Song, K., Roessler, P. G., Nugent, J. M., et al. Intracellular gene transfer in action: dual 567 transcription and multiple silencings of nuclear and mitochondrial cox2 genes in legumes. Proc Natl 568 Acad Sci U S A 96, 13863-13868 (1999). 569
9. Hansmann, S. & Martin, W. Phylogeny of 33 ribosomal and six other proteins encoded in an ancient 570 gene cluster that is conserved across prokaryotic genomes: influence of excluding poorly alignable sites 571 from analysis. Int J Syst Evol Microbiol 50 Pt 4, 1655-1663 (2000). 572
10. Goremykin, V. V., Hansmann, S. & Martin, W. F. Evolutionary analysis of 58 proteins encoded in six 573 completely sequenced chloroplast genomes: revised molecular estimates of two seed plant divergence 574 times. Plant Systematics and Evolution 206, 337-351 (1997). 575
11. A Shepherd, D. & Klaere, S. How well does your phylogenetic model fit your data? Syst Biol 68, 157-576 167 (2019). 577
12. Esser, C., Ahmadinejad, N., Wiegand, C., Rotte, C., et al. A genome phylogeny for mitochondria among 578 alpha-proteobacteria and a predominantly eubacterial ancestry of yeast nuclear genes. Mol Biol Evol 579 21, 1643-1660 (2004). 580
13. Fitzpatrick, D. A., Creevey, C. J. & McInerney, J. O. Genome phylogenies indicate a meaningful alpha-581 proteobacterial phylogeny and support a grouping of the mitochondria with the Rickettsiales. Mol Biol 582 Evol 23, 74-85 (2006). 583
14. Viklund, J., Ettema, T. J. & Andersson, S. G. Independent genome reduction and phylogenetic 584 reclassification of the oceanic SAR11 clade. Mol Biol Evol 29, 599-615 (2012). 585
15. Muñoz-Gómez, S. A., Hess, S., Burger, G., Lang, B. F., et al. An updated phylogeny of the 586 Alphaproteobacteria reveals that the parasitic Rickettsiales and Holosporales have independent origins. 587 Elife 8, (2019). 588
16. Martijn, J., Vosseberg, J., Guy, L., Offre, P. & Ettema, T. J. G. Deep mitochondrial origin outside the 589 sampled alphaproteobacteria. Nature 557, 101-105 (2018). 590
70
17. Nguyen, L. T., Schmidt, H. A., von Haeseler, A. & Minh, B. Q. IQ-TREE: a fast and effective stochastic 591 algorithm for estimating maximum-likelihood phylogenies. Mol Biol Evol 32, 268-274 (2015). 592
18. Woese, C. R. in The Evolution of Prokaryotes, eds. Schleifer, K.-H & Stackebrandt, E. (Academic, 593 London). (1985). 594
19. Yang, D., Oyaizu, Y., Oyaizu, H., Olsen, G. J. & Woese, C. R. Mitochondrial origins. Proc Natl Acad 595 Sci U S A 82, 4443-4447 (1985). 596
20. Palmer, J. D. & Herbon, L. A. Plant mitochondrial DNA evolves rapidly in structure, but slowly in 597 sequence. J Mol Evol 28, 87-97 (1988). 598
21. Christensen, A. C. Plant mitochondrial genome evolution can be explained by DNA repair mechanisms. 599 Genome Biol Evol 5, 1079-1086 (2013). 600
22. Williams, K. P., Sobral, B. W. & Dickerman, A. W. A robust species tree for the alphaproteobacteria. J 601 Bacteriol 189, 4578-4586 (2007). 602
23. Szokoli, F., Castelli, M., Sabaneyeva, E., Schrallhammer, M., et al. Disentangling the taxonomy of 603 Rickettsiales and description of two novel symbionts ("Candidatus Bealeia paramacronuclearis" and 604 "Candidatus Fokinia cryptica") sharing the cytoplasm of the ciliate protist Paramecium biaurelia. Appl 605 Environ Microbiol 82, 7236-7247 (2016). 606
24. Vannini, C., Ferrantini, F., Schleifer, K. H., Ludwig, W., et al. "Candidatus anadelfobacter veles" and 607 "Candidatus cyrtobacter comes," two new rickettsiales species hosted by the protist ciliate Euplotes 608 harpa (Ciliophora, Spirotrichea). Appl Environ Microbiol 76, 4047-4054 (2010). 609
25. Martijn, J., Schulz, F., Zaremba-Niedzwiedzka, K., Viklund, J., et al. Single-cell genomics of a rare 610 environmental alphaproteobacterium provides unique insights into Rickettsiaceae evolution. ISME J 9, 611 2373-2385 (2015). 612
26. Wang, Z. & Wu, M. An integrated phylogenomic approach toward pinpointing the origin of 613 mitochondria. Sci Rep 5, 7949 (2015). 614
27. Georgiades, K., Madoui, M. A., Le, P., Robert, C. & Raoult, D. Phylogenomic analysis of Odyssella 615 thessalonicensis fortifies the common origin of Rickettsiales, Pelagibacter ubique and Reclimonas 616 americana mitochondrion. PLoS One 6, e24857 (2011). 617
28. Ferla, M. P., Thrash, J. C., Giovannoni, S. J. & Patrick, W. M. New rRNA gene-based phylogenies of 618 the Alphaproteobacteria provide perspective on major groups, mitochondrial ancestry and phylogenetic 619 instability. PLoS One 8, e83383 (2013). 620
29. Imachi, H., Nobu, M. K., Nakahara, N., Morono, Y., et al. Isolation of an archaeon at the prokaryote-621 eukaryote interface. Nature 577, 519-525 (2020). 622
30. Martin, W. & Müller, M. The hydrogen hypothesis for the first eukaryote. Nature 392, 37-41 (1998). 623
31. Gabaldón, T. & Huynen, M. A. Reconstruction of the proto-mitochondrial metabolism. Science 301, 624 609 (2003). 625
32. Wang, Z. & Wu, M. Phylogenomic reconstruction indicates mitochondrial ancestor was an energy 626 parasite. PLoS One 9, e110685 (2014). 627
33. Martin, W. Mosaic bacterial chromosomes: a challenge en route to a tree of genomes. Bioessays 21, 99-628 104 (1999). 629
34. Ku, C., Nelson-Sathi, S., Roettger, M., Sousa, F. L., et al. Endosymbiotic origin and differential loss of 630 eukaryotic genes. Nature 524, 427-432 (2015). 631
71
35. Jermiin, L. S., Jayaswal, V., Ababneh, F. M. & Robinson, J. Identifying optimal models of evolution. 632 Methods Mol Biol 1525, 379-420 (2017). 633
36. Bowker, A. H. A test for symmetry in contingency tables. J Am Stat Assoc 43, 572-574 (1948). 634
37. Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P. & Tyson, G. W. CheckM: assessing the 635 quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res 25, 636 1043-1055 (2015). 637
38. Aramaki, T., Blanc-Mathieu, R., Endo, H., Ohkubo, K., et al. KofamKOALA: KEGG ortholog 638 assignment based on profile HMM and adaptive score threshold. Bioinformatics (2019). DOI: 639 10.1093/bioinformatics/btz859 640
39. Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic 641 Acids Res 32, 1792-1797 (2004). 642
40. Capella-Gutiérrez, S., Silla-Martínez, J. M. & Gabaldón, T. trimAl: a tool for automated alignment 643 trimming in large-scale phylogenetic analyses. Bioinformatics 25, 1972-1973 (2009). 644
645
646