evolutionary history of the hymenoptera · report evolutionary history of the hymenoptera...
TRANSCRIPT
Report
Evolutionary History of the
HymenopteraHighlights
d The most comprehensive dataset ever compiled for inferring
the phylogeny of Hymenoptera
d A major radiation of primarily ectophytic sawflies
(Eusymphyta) is hypothesized
d A major radiation of parasitoid wasps (Parasitoida) is
identified
d The phylogenetic origins of wasp-waisted wasps, stinging
wasps, and bees are resolved
Peters et al., 2017, Current Biology 27, 1013–1018April 3, 2017 ª 2017 Elsevier Ltd.http://dx.doi.org/10.1016/j.cub.2017.01.027
Authors
Ralph S. Peters, Lars Krogmann,
Christoph Mayer, ..., Jes Rust,
Bernhard Misof, Oliver Niehuis
[email protected] (R.S.P.),[email protected](O.N.)
In Brief
Peters et al. infer a time-calibrated and
statistically solid phylogenetic tree of the
mega-diverse insect order Hymenoptera
(sawflies, wasps, ants, and bees) from the
analysis of phylogenomic data. This
sheds new light on the early history of this
intriguing group, as well as on the origins
and radiation of parasitoids, stinging
wasps, and bees.
Current Biology
Report
Evolutionary History of the HymenopteraRalph S. Peters,1,24,25,* Lars Krogmann,2 Christoph Mayer,3 Alexander Donath,3 Simon Gunkel,4 Karen Meusemann,3,5,6
Alexey Kozlov,7 Lars Podsiadlowski,8 Malte Petersen,3 Robert Lanfear,9,10 Patricia A. Diez,11 John Heraty,12
Karl M. Kjer,13 Seraina Klopfstein,14 Rudolf Meier,15 Carlo Polidori,16 Thomas Schmitt,17 Shanlin Liu,18,19,20 Xin Zhou,21,22
Torsten Wappler,4 Jes Rust,4 Bernhard Misof,3 and Oliver Niehuis3,5,23,24,*1Center of Taxonomy and Evolutionary Research, Arthropoda Department, Zoologisches Forschungsmuseum Alexander Koenig,
53113 Bonn, Germany2Entomologie, Staatliches Museum fur Naturkunde Stuttgart, 70191 Stuttgart, Germany3Center for Molecular Biodiversity Research, Zoologisches Forschungsmuseum Alexander Koenig, 53113 Bonn, Germany4Steinmann Institut fur Geologie, Mineralogie und Pal€aontologie, 53115 Bonn, Germany5Department of Evolutionary Biology and Ecology, Institute for Biology I (Zoology), University of Freiburg, 79104 Freiburg (Brsg.), Germany6Australian National Insect Collection, CSIRO National Research Collections Australia (NRCA), Acton, ACT 2601, Australia7Scientific Computing Group, Heidelberg Institute for Theoretical Studies, 69118 Heidelberg, Germany8Institute of Evolutionary Biology and Ecology, University of Bonn, 53121 Bonn, Germany9Ecology, Evolution and Genetics, Research School of Biology, Australian National University, Canberra, ACT 2601, Australia10School of Biological Sciences, Macquarie University, Sydney, NSW 2109, Australia11Centro de Investigaciones y Transferencia de Catamarca, CITCA-CONICET/UNCA, 4700 Catamarca, Argentina12Department of Entomology, University of California, Riverside, Riverside, CA 92521, USA13Department of Biological Sciences, Rutgers University, Newark, NJ 07102, USA14Naturhistorisches Museum der Burgergemeinde Bern, 3005 Bern, Switzerland15Department of Biological Sciences and Lee Kong Chian Natural History Museum, National University of Singapore, Singapore 117543,
Singapore16Instituto de Ciencias Ambientales (ICAM), Universidad de Castilla-La Mancha, 45071 Toledo, Spain17Department of Animal Ecology and Tropical Biology, University of Wurzburg, 97074 Wurzburg, Germany18China National GeneBank-Shenzhen, BGI-Shenzhen, Shenzhen, Guangdong Province, 518083, People’s Republic of China19BGI-Shenzhen, Shenzhen, Guangdong Province, 518083, People’s Republic of China20Centre for GeoGenetics, Natural HistoryMuseumof Denmark, University of Copenhagen, Øster Voldgade 5–7, 1350Copenhagen, Denmark21Beijing Advanced Innovation Center for Food Nutrition and Human Health, China Agricultural University, Beijing 100193, People’s Republic
of China22Department of Entomology, China Agricultural University, Beijing 100193, People’s Republic of China23School of Life Sciences, Arizona State University, Tempe, AZ 85287, USA24These authors contributed equally25Lead Contact
*Correspondence: [email protected] (R.S.P.), [email protected] (O.N.)
http://dx.doi.org/10.1016/j.cub.2017.01.027
SUMMARY
Hymenoptera (sawflies, wasps, ants, and bees) areone of four mega-diverse insect orders, comprisingmore than 153,000 described and possibly up toone million undescribed extant species [1, 2]. Asparasitoids, predators, and pollinators, Hymeno-ptera play a fundamental role in virtually all terrestrialecosystems and are of substantial economic impor-tance [1, 3]. To understand the diversification andkey evolutionary transitions of Hymenoptera, mostnotably from phytophagy to parasitoidism and pre-dation (and vice versa) and from solitary to eusociallife, we inferred the phylogeny and divergence timesof all major lineages of Hymenoptera by analyzing3,256 protein-coding genes in 173 insect species.Our analyses suggest that extant Hymenopterastarted to diversify around 281 million years ago(mya). The primarily ectophytophagous sawflies arefound to be monophyletic. The species-rich lineagesof parasitoid wasps constitute a monophyletic group
Curre
as well. The little-known, species-poor Trigonaloideaare identified as the sister group of the stingingwasps (Aculeata). Finally, we located the evolu-tionary root of bees within the apoid wasp family‘‘Crabronidae.’’ Our results reveal that the extantsawfly diversity is largely the result of a previouslyunrecognized major radiation of phytophagous Hy-menoptera that did not lead to wood-dwelling andparasitoidism. They also confirm that all primarilyparasitoid wasps are descendants of a single endo-phytic parasitoid ancestor that lived around 247mya. Our findings provide the basis for a naturalclassification of Hymenoptera and allow for futurecomparative analyses of Hymenoptera, includingtheir genomes, morphology, venoms, and parasitoidand eusocial life styles.
RESULTS AND DISCUSSION
We sequenced whole-body transcriptomes of 167 species of
Hymenoptera and selected outgroups and supplemented our
nt Biology 27, 1013–1018, April 3, 2017 ª 2017 Elsevier Ltd. 1013
dataset with sequenced and annotated genomes of five hyme-
nopterans and a beetle (for details, see Supplemental Experi-
mental Procedures and Data S1A–S1D). Our study includes 54
families of Hymenoptera, representing all major superfamilies.
The phylogenetic inferences are based on the analysis of 1.5
million amino acid and 3.0 million nucleotide positions, respec-
tively, derived from 3,256 single-copy protein-coding genes
(Data S1E) and inferred by using a combination of domain-,
gene-, and codon position-based data partition schemes to
improve the fitting of the applied substitution models. Consid-
ering the taxonomic and molecular sampling, this is the most
comprehensive dataset ever generated for investigating phylo-
genetic relationships within Hymenoptera or any other insect
group. The dataset was furthermore used to estimate divergence
times with an independent-rates as well as with a correlated-
rates molecular clock approach (Data S1H) and a validated
set of 14 fossils (Data S1F).
The inferred phylogenetic relationships and divergence time
estimates were used to assess where in the phylogeny of Hyme-
noptera, when in their geological history, and how often major
evolutionary transitions took place. Specifically, we studied the
switch from feeding on plants to feeding on an insect host (para-
sitoidism), the formation of a wasp waist, the evolution of a
venomous stinger to subduemobile hosts, the evolution of euso-
ciality, and the switch from hunting prey to collecting pollen.
These evolutionary transitions are partially reflected by the his-
toric classification of Hymenoptera: sawflies (‘‘Symphyta’’) are
those Hymenoptera that lack the wasp waist that characterizes
all remaining Hymenoptera (Apocrita), ‘‘Parasitica’’ encom-
passes the primarily parasitoid Apocrita that lack a stinger, and
Aculeata comprises the stinging wasps, ants, and bees (Antho-
phila) [1]. Yet, how many major lineages each of these groups
encompasses has been controversial for decades [4–11].
The results of our phylogenomic study received strong sup-
port in all analyses, unless stated otherwise, and alter previous
ideas regarding the evolutionary history of Hymenoptera (Fig-
ure 1B; for full results and detailed experimental procedures,
see Figure S1, Supplemental Experimental Procedures, and
additional figures deposited at Mendeley Data, http://dx.doi.
org/10.17632/s5j2f62z3d.2). According to our analyses, extant
Hymenoptera started to diversify between the Carboniferous
and the Triassic (95% confidence interval [CI]: 329–239 million
years ago [mya]; mean: 281 mya; node 1 [n.1] in Figure 1B),
with the oldest currently known Hymenoptera fossils being
from the Triassic, �224 million years old [8]. Previous studies
suggested this divergence to have occurred between the sawfly
lineage Xyeloidea and the remaining Hymenoptera [5, 7–11],
whereas our analysis identified a much more inclusive clade of
sawflies (Eusymphyta; n.2) that also contains Pamphilioidea
and Tenthredinoidea as closest relatives of all remaining Hyme-
noptera (Unicalcarida). These superfamilies had been thought to
form a paraphyletic grade [5, 7, 9, 11]. Instead, they represent an
unexpected and previously unrecognized major radiation of pri-
marily ectophytophagous insects that comprises more than
7,000 described species [1]. We estimate the first diversification
of the extant eusymphytan lineages to have occurred 276–157
mya (mean 212 mya). Note that Eusymphyta were corroborated
as the sister group of all remaining Hymenoptera when addition-
ally scrutinizing the analyzed molecular data for conflicting
1014 Current Biology 27, 1013–1018, April 3, 2017
phylogenetic signal (Supplemental Experimental Procedures).
Given the novelty and importance of our finding, we anticipate
that it will significantly influence future research on Hymenoptera
relationships, and we encourage researchers to further assess
this particular phylogenetic hypothesis in future studies, for
example by extending the taxon sampling within Eusymphyta
and the outgroup.
A clade Eusymphyta representing the extant sister lineage of
all remaining Hymenoptera (Unicalcarida) has profound conse-
quences for inferring ground-plan characters of Hymenoptera.
For example, Hymenoptera were previously thought to have
been ancestrally ectophytophagous, based on the assumption
that eusymphytans form a paraphyletic assemblage. Consid-
ering that the sister group of Hymenoptera (Aparaglossata)
was ancestrally likely predacious [12], the inferred relationship
between Eusymphyta and Unicalcarida implies that the most
recent common ancestor of Hymenoptera could have been
ecto- or endophytophagous. A sister group relationship between
Eusymphyta and Unicalcarida furthermore implies that the
remarkable ability of male Hymenoptera to restore diploidy in
their muscle cells was already present in the last common
ancestor of all Hymenoptera (with a secondary loss in Xyelidae),
or that this feature evolved at least twice (in Unicalcarida and
Tenthredinoidea) [13]. Finally, the unexpected finding that the
turnip sawfly, Athalia rosae (Tenthredinoidea), whose genome
has recently been sequenced by the i5K initiative [14], is a repre-
sentative of the sister lineage of all remaining Hymenoptera will
improve our understanding of the genetic composition of the
most recent common ancestor of Hymenoptera: genomic fea-
tures shared between the turnip sawfly and species of Unicalcar-
ida with sequenced genomes (e.g., Nasonia parasitoid wasps,
ants, bees) were likely inherited from their common ancestor.
In agreement with earlier studies [9, 10], we found a single
origin of the endophytic sawfly lineages (i.e., Cephoidea, Orus-
soidea, Siricoidea, and Xiphydrioidea; n.3), which form a para-
phyletic grade, in which Orussoidea (parasitoid woodwasps)
represent the closest relatives of Apocrita (n.4). Morphological
data have suggested a sister group relationship of Orussoidea
and Apocrita (Vespina) [6, 15], but results from analyzing molec-
ular data have been inconsistent [7, 9]. Our analyses provide
strong support for the monophyly of Vespina and of Apocrita
(n.5) and imply that the bulk of primarily parasitoid wasps are de-
scendants of a single endophytic parasitoid ancestor that lived in
the Permian or in the Triassic (CI: 289–211mya; mean: 247mya).
Contrary to earlier hypotheses of sawfly relationships (see [10]),
we identified Cephoidea, and not Siricoidea and/or Xiphydrioi-
dea, as the closest extant relatives of Vespina (n.6), a result
only recently suggested [7].
The evolution of the wasp waist, a constriction between the
first and the second abdominal segment greatly improving
the maneuverability of the abdomen’s rear section, including
the ovipositor, was a major innovation in the evolution of Hyme-
noptera that undoubtedly contributed to the rapid diversification
of Apocrita (n.5) [6]. Our analysis is the first to persuasively
demonstrate that the most diverse parasitoid wasp lineages
(i.e., Ceraphronoidea, Ichneumonoidea, and Proctotrupomor-
pha) constitute a natural group (Parasitoida; n.7) whose aston-
ishing radiation was likely triggered by further optimization of
the parasitoid lifestyle and related traits (e.g., endoparasitoidism,
PamphilioideaXyeloidea
Tenthredinoidea
XiphydrioideaSiricoideaCephoideaOrussoideaIchneumonoideaCeraphronoidea
Cynipoidea
Platygastroidea
ProctotrupoideaDiaprioidea
EvanioideaStephanoideaTrigonaloidea
Chrysidoidea
Tiphioidea
Scolioidea
Ampulicidae
"Crabronidae"
Sphecidae
Anthophilabees
digger wasps
Formicoideaants
Pompiloideavelvet ants,spider wasps,and relatives
Chalcidoideajewel wasps
Vespoideapotter, honey andsocial wasps
Apoidea
Bootstrap support100%
51–75%
91–99%76–90%
0–50%
Tiphia
Cephalonomia
Bembix
Monosapyga
Harpegnathos
Tetragonula
Dryudella
Sapgya
Dasymutilla
Gorytes
Diodontus
Philanthus
Sirex
outgroup (5)
Stephanus
Xyela
Pergagrapta
Harpactus
Vespa
Sphecius
Dolichurus
Urocerus
Auplopus
Acromyrmex
Smicromyrme
Tremex
Stizus
Ampulex c.
Melittidae (3)
Mellinus
Masaris
CrabroOxybelus
Alysson
various speciesDinetus
Bombus
Chrysis
Apidae (partim, 5)
PseudogonalosPlumarius
Acantholyda
Orussus a.
Astata
Chyphotes
AprocerosArge
Aulacus
Ceraphron
Stizoides
Lestica
Camponotus
Vespula
Dendrocerus
NitelaPalarus
Tenthredo
Gasteruption
Heterodontonyx
Ampulex f.
Orussus u.
Meria
Apis
Cephus
Xiphydria
Polistes
Methocha
Blasticotoma
Andrenidae (3)
Nysson
Nematus
Sapygina
PemphredonPsenulus
Euglossa
Parnopes
Brachygaster
Trypoxylon
Crossocerus
Cimbex
Pseudoscolia
Tachysphex
Megalodontes
Cleptes
Diprion
Megachilidae (9)
Pompilus
Scolia
Colletidae (2)
Colpa
Passaloecus
Eumenes
Parischnogaster
Spilomena
Elampus
Liris
Episyron
Cerceris
Pison
Halictidae
Leptopilina
Synergus
Ibalia
Pediaspis
Andricus
TelenomusInostemma
Trissolcus
PelecinusTrichopria
Brachymeria
Cosmocomoidea
Torymus
Orasema
Brachyserphus
Nasonia
Diglyphus
Braconidae (5)Ichneumonidae (8)
Apidae (partim, 8)
(8)
(7)
Eusymphyta
Parasitoida
Permian Triassic Jurassic Cretaceus Paleogene Neo. Q.
50 0100150200250300 mya
50 0100150 Mya
Hymenoptera
AculeataApocrita
Apidae
(partim)
Tenthredo livida L.(Tenthredinoidea)
Camponotus maculatus (Fabr.)(Formicoidea)
A
B
Ampulex compressa (Fabr.)(Apoidea)
Vespula germanica (Fabr.)(Vespoidea)
Torymus bedeguaris (L.)(Chalcidoidea)
Stephanus serrator (Fabr.)(Stephanoidea)
Orussus abietinus (Scop.)(Orussoidea)
Apis mellifera L. (Apoidea)
Proctotrupomorpha
Unicalcarida
1
2
4
5
7
8
9
16
3
6
1011
12
1314
15
eusociality
eusociality
pollencollecting
Vespinaparasitoidism
wasp-waist
stinger
eusociality
eusociality
pollen collecting
Thynnoidea
Figure 1. Evolutionary History of the Hymenoptera
(A) Representatives of sawflies, wasps, ants, and bees. Scale bars represent 5 mm.
(B) Phylogenetic relationships and divergence time estimates of Hymenoptera. Key evolutionary events are indicated at the respective clades (note that only the
major eusocial lineages are considered). The tree was inferred under the maximum-likelihood optimality criterion, analyzing 1,505,514 amino acid sites and
applying a combination of protein domain- and gene-specific substitution models. Divergence times were estimated with an independent-rates molecular clock
approach and considering 14 validated fossils. Triangular branches cover multiple species (number of species in parentheses) whose relationships are shown in
detail in Figure S1. Nodes with circled numbers are referred to in the main text.
Current Biology 27, 1013–1018, April 3, 2017 1015
miniaturization), which allowed for successfully attacking a
variety of new hosts. We estimate the beginning of the group’s
radiation at 266–195 mya (mean: 228 mya), only a few million
years after Parasitoida separated from the remaining Apocrita
(CI: 276–203 mya; mean: 236 mya). The early radiation of Para-
sitoida thus falls within a time period when the parasitoids’ major
host lineages (e.g., Hemiptera, Holometabola) also started to
diversify [16].
We identified the enigmatic Trigonaloidea as the closest
extant relatives of Aculeata with strong node support (n.8), a hy-
pothesis only recently put forth [7, 9]. Evanioidea, which had also
been discussed as a possible sister group of Aculeata [5, 10, 17,
18], cluster with Stephanoidea (n.9). Node support for this rela-
tionship is low, however, and it needs to be investigated further
in future studies that include additional types of characters and
samples of Megalyroidea, a lineage that we were unable to
sequence. Note that in contrast to Aculeata, the Evanioidea, Ste-
phanoidea, and Trigonaloidea have all remained species-poor.
The identification of the closest relatives of Aculeata will be
important for better understanding which traits (e.g., venoms)
fostered the diversification of the stinging wasps.
Our analysis sheds new light on the phylogeny of Aculeata
(n.10), whose early diversification occurred 224–160 mya
(mean: 190 mya). Chrysidoids are confirmed as the sister group
of all remaining Aculeata [19]. We corroborate the artificial nature
of the former superfamily ‘‘Vespoidea’’ (i.e., all Aculeata except
Apoidea and Chrysidoidea) [5], which comprises four major
lineages that are paraphyletic with respect to Apoidea [20]. The
potter, honey, and social wasps (Vespoidea sensu Pilgrim et al.
[20]: Vespidae; n.11) were identified as the sister lineage of all
remaining non-chrysidoid Aculeata. However, the phylogenetic
position of the species-poor Rhopalosomatidae (Vespoidea
sensu Pilgrim et al. [20]), an aculeate wasp family that we were
unable to sequence and possible sister lineage of Vespidae, re-
mains controversial [9, 10, 20]. The inferred phylogenetic rela-
tionships within Vespidae suggest two independent origins of
eusociality, a previously fiercely contested hypothesis [21, 22].
In agreement with an earlier phylogenomic study [23], we in-
ferred ants (Formicoidea) as being the closest extant relatives
of Apoidea (n.12) in all of our analyses, except when applying a
Bayesian approach, which suggested ants plus scoliid wasps
(Scolioidea, possibly including also the family Bradynobaenidae
[20], which we were unable to sequence) as being sister to Apo-
idea (figure deposited at Mendeley Data, http://dx.doi.org/10.
17632/s5j2f62z3d.2). We estimate the last common ancestor
of ants and Apoidea to have lived in the Jurassic or the Creta-
ceous (CI: 192–136 mya; mean: 162 mya).
We located the phylogenetic origin of bees (Anthophila) within
the apoid wasp family ‘‘Crabronidae’’ (n.13), which our study
shows to be an artificial construct comprising five major line-
ages. The crabronid wasp lineage in our study most closely
related to bees is the species-poor tribe Psenini. This result sub-
stantiates the idea that the switch from a predatory to a herbi-
vorous lifestyle was a key to the tremendous diversification of
bees [24]. We estimate the origin of bees to have been in the
Cretaceous (CIs: 147–93mya; means: 124 and 111mya), a result
that is consistent with a close temporal link between the diversi-
fications of bees and angiosperms [24]. Melittid bees were iden-
tified as the sister lineage of all remaining Anthophila (n.14),
1016 Current Biology 27, 1013–1018, April 3, 2017
which implies that short-tongued bees do not represent a natural
group. In contrast, we confirmed long-tongued bees (i.e., Apidae
and Megachilidae) to constitute a natural entity (n.15) [24]. We
also found the eusocial apid bee lineages to be monophyletic,
corroborating the hypothesis that eusociality has evolved
once, not twice, in corbiculate (pollen basket) bees (n.16) [25].
Our study confirms the power of phylogenomic approaches
for deciphering difficult-to-resolve arthropod phylogenetic rela-
tionships [12, 16, 26, 27] by yielding well-supported answers to
some of the most pressing questions regarding the evolutionary
history of the sawflies, wasps, ants, and bees.We provide strong
evidence for understanding the phylogenetic relationships
among all major lineages of Hymenoptera, and we were able
to date the individual divergence events, both paramount for de-
ciphering the tempo and mode of diversification of ecologically,
economically, sociobiologically, and/or pharmaceutically rele-
vant traits of interest (e.g., gene repertoires, haplodiploidy and
sex determination, eusociality, chemosensation, and venoms).
Finally, our study offers the basis for establishing a natural clas-
sification of the insect order Hymenoptera.
EXPERIMENTAL PROCEDURES
We sequenced the transcriptomes of 134 species of Hymenoptera using Illu-
mina HiSeq 2000 sequencing technology (Data S1A–S1C). We complemented
our dataset by including previously published transcriptomes of 29 Hymeno-
ptera and four Neuropteroidea [16, 28]. Finally, we considered the official
gene sets of five Hymenoptera and the flour beetle Tribolium castaneum
(Data S1D). All paired-end reads were assembled with SOAPdenovo-Trans-
31kmer (version 1.01) [29], the assembled transcripts were filtered for possible
contaminants, and the raw reads and filtered assemblies were submitted to
the NCBI SRA and TSA archives. We searched the assemblies with the soft-
ware Orthograph (version beta4) [28] for transcripts of 3,260 protein-coding
genes that the OrthoDB v7 database [30] suggested to be single-copy in
Hymenoptera and Neuropteroidea (outgroup) by applying the best reciprocal
hit criterion. Orthologous transcripts were aligned with MAFFT (version
7.017) [31] at the translational (amino acid) level. All multiple sequence align-
ments (MSAs) were quality assessed and, if necessary, improved and masked
using the procedure outlined by Misof et al. [16]. The resulting MSAs were
concatenated to a supermatrix that we simultaneously partitioned based
on a combination of Pfam protein domains and genes [16]. The phylogenetic
information content of each partition was assessed with MARE (version
0.1.2-rc) [32], and all uninformative partitions were removed. We subsequently
used PartitionFinder (developer versions 2.0.0-pre2, 2.0.0-pre9, and 2.0.0-
pre10) [33] to simultaneously infer a partition scheme and proper amino acid
substitution models for analyzing each partition with the rcluster algorithm.
We applied the same partition scheme when analyzing the corresponding
supermatrix at the transcriptional (nucleotide) level, except that we modeled
the first and second codon position of each partition separately (note that
we excluded the hypervariable third codon position from our analyses). Phylo-
genetic trees were reconstructed with ExaML (versions 3.0.15 and 3.0.17) [34],
conducting 50 independent tree searches per supermatrix. Node support was
inferred with the bootstrap method [35]. Decisive datasets were used for
testing the possible impact of missing data at the partition level on the inferred
phylogenetic tree [36], and four-cluster likelihood mapping was used for
assessing the phylogenetic signal for alternative phylogenetic relationships
[37]. Permutation tests allowed assessing the impact of heterogeneous amino
acid sequence composition, non-stationarity of substitution processes, and
non-random distribution of missing data on the inferred phylogenetic tree
[16]. We additionally conducted phylogenetic inferences in a Bayesian frame-
work, using ExaBayes [38] with its default settings, enabling automatic substi-
tution model detection and applying the same data partitioning scheme that
we used in analyses under the maximum-likelihood optimality criterion. We
analyzed three independent runs with four coupled Markov chain Monte Carlo
chains and 200,000 generations each. The consense tool (part of the
ExaBayes software package) was used to obtain a consensus tree based on
the extended majority rule method (MRE), discarding the first 25% of the
sampled topologies as burn-in. Divergence timeswere calibrated using 14 fos-
sils (Data S1F), selected following best-practice recommendations [39] and
representing extant lineages distributed across the entire Hymenoptera Tree
of Life. Divergence times were estimated with mcmctree in conjunction with
codeml (both part of the PAML software package, version 4.9) [40]. We
analyzed a subset of the amino acid and of the nucleotide supermatrix, both
comprising only sites that had amino acids or nucleotides present in at least
95% of the species, both with an independent-rates model and with a corre-
lated-rates model (Figure 1B; Data S1H) and sampling parameters previously
assessed for convergence of results.
Data Resources
Data reported in this paper have been published in Mendeley Data and
are available at http://dx.doi.org/10.17632/trbj94zm2n.2 (inferred matrices
and statistics) and http://dx.doi.org/10.17632/s5j2f62z3d.2 (figures). All
sequencing data are available at NCBI via the Umbrella BioProject accession
number NCBI: PRJNA183205 (‘‘The 1KITE project: evolution of insects’’).
SUPPLEMENTAL INFORMATION
Supplemental Information includes one figure, Supplemental Experimental
Procedures, and one dataset and can be found with this article online at
http://dx.doi.org/10.1016/j.cub.2017.01.027.
AUTHOR CONTRIBUTIONS
B.M., L.K., O.N., and R.S.P. conceived the study. C.P., J.H., K.M., K.M.K.,
L.K., O.N., P.D., R.M., R.S.P., S.K., and T.S. collected or provided samples.
A.D., K.M., L.P., O.N., R.S.P., S.L., and X.Z. sequenced, assembled, and pro-
cessed the transcriptomes. A.K., C.M., K.M., M.P., O.N., R.L., and R.S.P.
phylogenetically analyzed the transcriptomes. J.R., L.K., O.N., R.S.P., S.G.,
and T.W. are responsible for the dating of the inferred phylogeny. All authors
contributed to the writing of the manuscript, with L.K., O.N., and R.S.P. taking
the lead.
ACKNOWLEDGMENTS
The presented data are the result of the collaborative efforts of the 1KITE con-
sortium. The sequencing and assembly of the 1KITE transcriptomes were
funded by BGI through support to the China National GeneBank. We thank
S. Blank, A. Dorchin, J. Gusenleitner, V. Mauss, C. Schmid-Egger, and M.
Schwarz for help with identification of samples and R. Allemand, E. Altenhofer,
E. van den Berghe, A. Blanke, J. de Boer, J. Chille, A. Dorchin, T. Eltz, M.
Fierke, R. Glatz, K. Kantner, M. Kivan, K. Kraaijeveld, S. Leonhardt, M. Neu-
mann, M. Niehuis, G. Reder, K. Riede, M. Shaw, N. Schiff, K.-H. Schmalz, J.
Schmidt, P. Schule, K. Schutte, J. Steidle, N. Szucsich, D. Tagu, and D. Yeates
for providing valuable samples. We are grateful to S. Brown, D. Gilbert, J. Lie-
big, and R. Waterhouse for providing information required for the transcript
orthology prediction. We thank V. Achter, S. Bank, D. Bartel, A. Bohm,
H. Escalona, O. Hlinka, T. Pauli, S. Simon, A. Stamatakis, V. Winkelmann,
the Cologne High Efficient Operating Platform for Science (CHEOPS) at the
Regionales Rechenzentrum Koln (RRZ), and the CSIRO HPC Dell PowerEdge
M620 Linux Cluster Systems for computing time and/or bioinformatic support.
We furthermore acknowledge the Gauss Centre for Supercomputing e. V. for
funding computing time on the GCS Supercomputer SuperMUC at the Leibniz
Supercomputing Centre (LRZ). We thank A. Stamatakis for implementing the
four-cluster likelihood quartet mapping feature in ExaML. We acknowledge
the Amt fur Umwelt, Verbraucherschutz und Lokale Agenda of Bonn, Hessen
Forst, the Israeli Nature and National Parks Protection Authority, the Mercan-
tour National Park Service, and the Struktur- und Genehmigungsbehorde
Sud and the Struktur- und Genehmigungsdirektion Nord (both Rhineland
Palatinate) for granting permission to collect samples. K.M. thanks O. Hlinka
(IM&T), D. Yeates (CSIRO), and the Schlinger Endowment to the CSIRO
National Research Collections Australia for support. H. Goulet and the Natural
History Museum (London) kindly granted permission to use published draw-
ings as blueprints for illustrating our main figure. U. Vaartjes kindly helped
with illustrating the main figure. A.D., B.M., C.M., J.R., L.P., M.P., O.N.,
R.S.P., and S.G. were supported by the Leibniz Graduate School for Genomic
Biodiversity Research. C.P. was supported by the Universidad de Castilla-La
Mancha and the European Social Fund (ESF). O.N. and T.S. acknowledge
the German Research Foundation (DFG) for supporting parts of this study
(NI 1387/1-1; SCHM 2645/2-1).
Received: September 24, 2016
Revised: December 13, 2016
Accepted: January 16, 2017
Published: March 23, 2017
REFERENCES
1. Grimaldi, D.A., and Engel, M.S. (2005). Evolution of the Insects (Cambridge
University Press).
2. Aguiar, A.P., Deans, A.R., Engel, M.S., Forshage, M., Huber, J.T.,
Jennings, J.T., Johnson, N.F., Lelej, A.S., Longino, J.T., Lohrmann, V.,
et al. (2013). Order Hymenoptera. Zootaxa 3703, 51–62.
3. Quicke, D.L.J. (1997). Parasitic Wasps (Chapman & Hall).
4. Dowton, M., and Austin, A.D. (1994). Molecular phylogeny of the insect or-
der Hymenoptera: apocritan relationships. Proc. Natl. Acad. Sci. USA 91,
9911–9915.
5. Sharkey, M.J. (2007). Phylogeny and classification of Hymenoptera.
Zootaxa 1668, 521–548.
6. Vilhelmsen, L., Miko, I., and Krogmann, L. (2010). Beyond the wasp-waist:
structural diversity and phylogenetic significance of the mesosoma in
apocritan wasps (Insecta: Hymenoptera). Zool. J. Linn. Soc. 159, 22–194.
7. Heraty, J., Ronquist, F., Carpenter, J.M., Hawks, D., Schulmeister, S.,
Dowling, A.P., Murray, D., Munro, J., Wheeler, W.C., Schiff, N., and
Sharkey, M. (2011). Evolution of the hymenopteran megaradiation. Mol.
Phylogenet. Evol. 60, 73–88.
8. Ronquist, F., Klopfstein, S., Vilhelmsen, L., Schulmeister, S., Murray, D.L.,
and Rasnitsyn, A.P. (2012). A total-evidence approach to dating with fos-
sils, applied to the early radiation of the Hymenoptera. Syst. Biol. 61,
973–999.
9. Sharkey, M.J., Carpenter, J.M., Vilhelmsen, L., Heraty, J., Liljeblad, J.,
Dowling, A.P.G., Schulmeister, S., Murray, D., Deans, A.R., Ronquist, F.,
et al. (2012). Phylogenetic relationships among superfamilies of
Hymenoptera. Cladistics 28, 80–112.
10. Klopfstein, S., Vilhelmsen, L., Heraty, J.M., Sharkey, M., and Ronquist, F.
(2013). The hymenopteran tree of life: evidence from protein-coding genes
and objectively aligned ribosomal data. PLoS ONE 8, e69344.
11. Malm, T., and Nyman, T. (2015). Phylogeny of the symphytan grade of
Hymenoptera: new pieces into the old jigsaw (fly) puzzle. Cladistics 31,
1–17.
12. Peters, R.S., Meusemann, K., Petersen, M., Mayer, C., Wilbrandt, J.,
Ziesmann, T., Donath, A., Kjer, K.M., Aspock, U., Aspock, H., et al.
(2014). The evolutionary history of holometabolous insects inferred from
transcriptome-based phylogeny and comprehensive morphological
data. BMC Evol. Biol. 14, 52.
13. Aron, S., de Menten, L., Van Bockstaele, D.R., Blank, S.M., and Roisin, Y.
(2005). When hymenopteran males reinvented diploidy. Curr. Biol. 15,
824–827.
14. i5K Consortium (2013). The i5K Initiative: advancing arthropod genomics
for knowledge, human health, agriculture, and the environment.
J. Hered. 104, 595–600.
15. Rasnitsyn, A.P., and Quicke, D.L.J. (2002). History of Insects (Kluwer
Academic Publishers).
16. Misof, B., Liu, S., Meusemann, K., Peters, R.S., Donath, A., Mayer, C.,
Frandsen, P.B., Ware, J., Flouri, T., Beutel, R.G., et al. (2014).
Phylogenomics resolves the timing and pattern of insect evolution.
Science 346, 763–767.
Current Biology 27, 1013–1018, April 3, 2017 1017
17. Peters, R.S., Meyer, B., Krogmann, L., Borner, J., Meusemann, K.,
Schutte, K., Niehuis, O., and Misof, B. (2011). The taming of an impossible
child: a standardized all-in approach to the phylogeny of Hymenoptera us-
ing public database sequences. BMC Biol. 9, 55.
18. Zimmermann, D., and Vilhelmsen, L. (2016). The sister group of Aculeata
(Hymenoptera) – evidence from internal head anatomy, with emphasis
on the tentorium. Arthropod Syst. Phylogeny 74, 195–218.
19. Brothers, D.J. (1999). Phylogeny and evolution of wasps, ants and bees
(Hymenoptera, Chrysidoidea, Vespoidea and Apoidea). Zool. Scr. 28,
233–249.
20. Pilgrim, E.F., Von Dohlen, C.D., and Pitts, J.P. (2008). Molecular phyloge-
netics of Vespoidea indicate paraphyly of the superfamily and novel
relationships of its component families and subfamilies. Zool. Scr. 37,
539–560.
21. Hines, H.M., Hunt, J.H., O’Connor, T.K., Gillespie, J.J., and Cameron, S.A.
(2007). Multigene phylogeny reveals eusociality evolved twice in vespid
wasps. Proc. Natl. Acad. Sci. USA 104, 3295–3299.
22. Pickett, K.M., and Carpenter, J.M. (2010). Simultaneous analysis and the
origin of eusociality in the Vespidae (Insecta: Hymenoptera). Arthropod
Syst. Phylogeny 68, 3–33.
23. Johnson, B.R., Borowiec,M.L., Chiu, J.C., Lee, E.K., Atallah, J., andWard,
P.S. (2013). Phylogenomics resolves evolutionary relationships among
ants, bees, and wasps. Curr. Biol. 23, 2058–2062.
24. Cardinal, S., and Danforth, B.N. (2013). Bees diversified in the age of eu-
dicots. Proc. Biol. Sci. 280, 20122686.
25. Romiguier, J., Cameron, S.A., Woodard, S.H., Fischman, B.J., Keller, L.,
and Praz, C.J. (2016). Phylogenomics controlling for base compositional
bias reveals a single origin of eusociality in corbiculate bees. Mol. Biol.
Evol. 33, 670–678.
26. Garrison, N.L., Rodriguez, J., Agnarsson, I., Coddington, J.A., Griswold,
C.E., Hamilton, C.A., Hedin, M., Kocot, K.M., Ledford, J.M., and Bond,
J.E. (2016). Spider phylogenomics: untangling the spider tree of life.
PeerJ 4, e1719.
27. Fernandez, R., Edgecombe, G.D., and Giribet, G. (2016). Exploring phylo-
genetic relationships within Myriapoda and the effects of matrix composi-
tion and occupancy on phylogenomic reconstruction. Syst. Biol. 65,
871–889.
28. Petersen, M., Meusemann, K., Donath, A., Dowling, D., Liu, S., Peters,
R.S., Podsiadlowski, L., Vasilikopoulos, A., Zhou, X., Misof, B., and
Niehuis, O. (2017). Orthograph: a versatile tool for mapping coding
nucleotide sequences to clusters of orthologous genes. BMC
Bioinformatics 18, 111.
1018 Current Biology 27, 1013–1018, April 3, 2017
29. Xie, Y., Wu, G., Tang, J., Luo, R., Patterson, J., Liu, S., Huang, W., He, G.,
Gu, S., Li, S., et al. (2014). SOAPdenovo-Trans: de novo transcriptome as-
sembly with short RNA-Seq reads. Bioinformatics 30, 1660–1666.
30. Waterhouse, R.M., Tegenfeldt, F., Li, J., Zdobnov, E.M., and Kriventseva,
E.V. (2013). OrthoDB: a hierarchical catalog of animal, fungal and bacterial
orthologs. Nucleic Acids Res. 41, D358–D365.
31. Katoh, K., and Standley, D.M. (2013). MAFFTmultiple sequence alignment
software version 7: improvements in performance and usability. Mol. Biol.
Evol. 30, 772–780.
32. Misof, B., Meyer, B., von Reumont, B.M., Kuck, P., Misof, K., and
Meusemann, K. (2013). Selecting informative subsets of sparse
supermatrices increases the chance to find correct trees. BMC
Bioinformatics 14, 348.
33. Lanfear, R., Calcott, B., Kainer, D., Mayer, C., and Stamatakis, A. (2014).
Selecting optimal partitioning schemes for phylogenomic datasets. BMC
Evol. Biol. 14, 82.
34. Kozlov, A.M., Aberer, A.J., and Stamatakis, A. (2015). ExaML version 3: a
tool for phylogenomic analyses on supercomputers. Bioinformatics 31,
2577–2579.
35. Pattengale, N.D., Alipour, M., Bininda-Emonds, O.R., Moret, B.M., and
Stamatakis, A. (2010). How many bootstrap replicates are necessary?
J. Comput. Biol. 17, 337–354.
36. Dell’Ampio, E., Meusemann, K., Szucsich, N.U., Peters, R.S., Meyer, B.,
Borner, J., Petersen, M., Aberer, A.J., Stamatakis, A., Walzl, M.G., et al.
(2014). Decisive data sets in phylogenomics: lessons from studies on
the phylogenetic relationships of primarily wingless insects. Mol. Biol.
Evol. 31, 239–249.
37. Strimmer, K., and von Haeseler, A. (1997). Likelihood-mapping: a simple
method to visualize phylogenetic content of a sequence alignment.
Proc. Natl. Acad. Sci. USA 94, 6815–6819.
38. Aberer, A.J., Kobert, K., and Stamatakis, A. (2014). ExaBayes: massively
parallel bayesian tree inference for the whole-genome era. Mol. Biol.
Evol. 31, 2553–2556.
39. Parham, J.F., Donoghue, P.C.J., Bell, C.J., Calway, T.D., Head, J.J.,
Holroyd, P.A., Inoue, J.G., Irmis, R.B., Joyce, W.G., Ksepka, D.T., et al.
(2012). Best practices for justifying fossil calibrations. Syst. Biol. 61,
346–359.
40. Yang, Z. (2007). PAML 4: phylogenetic analysis by maximum likelihood.
Mol. Biol. Evol. 24, 1586–1591.
Current Biology, Volume 27
Supplemental Information
Evolutionary History of the Hymenoptera
Ralph S. Peters, Lars Krogmann, Christoph Mayer, Alexander Donath, SimonGunkel, Karen Meusemann, Alexey Kozlov, Lars Podsiadlowski, Malte Petersen, RobertLanfear, Patricia A. Diez, John Heraty, Karl M. Kjer, Seraina Klopfstein, RudolfMeier, Carlo Polidori, Thomas Schmitt, Shanlin Liu, Xin Zhou, Torsten Wappler, JesRust, Bernhard Misof, and Oliver Niehuis
Supplemental Figure
Figure S1. Inferred Ultrametric and Time-Calibrated Tree with Node Numbers, Related to Figure 1 The phylogenetic tree is identical to the one shown in Figure 1, but additionally shows the estimated divergence times in million years ago (mya; black numbers behind nodes) and 95% confidence intervals (green bars). The consecutively numbered nodes (blue numbers before nodes) refer to Data S1H, in which all estimated divergence times and 95% confidence intervals are listed.
Aproceros leucopoda
Nomia diversipes
Panurgus dentipes
Ceraphron sp.
Acromyrmex echinatior
Dryudella pinguis
Corydalus cornutus
Euglossa dilemma
Hylaeus variegatus
Trichopria cf. drosophilae
Crabro peltarius
Parischnogaster nigricans
Monosapyga clavicornis
Heteropelma cf. amictum
Pediaspis aceris
Cephalonomia tarsalis
Orussus unicolor
Auplopus ablifrons
Palarus histrio
Blasticotoma filiceti
Thyreus orbatusAnthophora plumipes
Nysson niger
Cosmocomoidea morrilli
Acantholyda hieroglyphica
Brachyserphus parvulus
Parnopes grandior
Chlorion hirtum
Orasema simulatrix
Urocerus augur
Tetragonula carbonaria
Tetraloniella sp.
Diglyphus isaea
Liris sp.
Lestica clypeata
Tetralonia malvae
Pompilus cinereus
Syrphophilus tricinctorius
Dioxys cincta
Passaloecus eremita
Dendrocerus carpenteri
Prionyx kirbii
Ampulex compressa
Isodontia mexicana
Pelecinus polyturator
Eucera syriaca
Eumenes papillarius
Melitta haemorrhoidalis
Xanthostigma xanthostigma
Sapgya quinquepunctata
Cerceris arenaria
Cleptes nitidulus
Lithurgus chrysurus
Plumarius gradifrons
Camptopoeum sacrum
Ampulex fasciata
Harpactus elegans
Ibalia leucospoides
Tenthredo koehleri
Spilomena beata
Harpegnathos saltator
Alysson spinosus
Trissolcus semistriatus
Arge berberidis
Methocha articulata
Orussus abietinus
Bombus rupestris
Camponotus floridanus
Synergus umbraculus
Colletes cunicularius
Gyrinus marinus
Macropis fulvipes
Nasonia vitripennis
Chrysis viridula
Aleiodes testaceus
Halictus quadricinctus
Nomioides sp.
Dufourea dentiventris
Smicromyrme rufipes
Macrocentrus marginator
Trypoxylon figulus
Diaeretus essigellae
Systropha curvicornis
Chyphotes sp.
Pergagrapta polita
Andricus quercuscalicis
Tribolium castaneum
Elampus spina
Astata minorMellinus arvensis
Nomada lathburiana
Chelostoma florisomne
Eucera plumigera
Dolichurus corniculus
Leptopilina clavipes
Diprion pini
Pemphredon lugens
Gorytes laticinctus
Tremex magus
Sphecodes albilabris
Pseudomallada prasinus
Megalodontes cephalotes
Telenomus sp.
Masaris aegyptiacus
Netelia testacea
Podalonia hirsuta
Dasymutilla gloriosa
Apis mellifera
Heterodontonyx sp.
Andrena vaga
Tiphia femorata
Stephanus serratorGasteruption tournieri
Pison atrum
Heriades truncorum
Chalybion californicum
Aphidius colemani
Philanthus triangulum
Ceratina chalybea
Stizoides tridentatus
Pimpla flavicoxis
Xyela alpigena
Nematus ribesii
Colpa sexmaculata
Sceliphron curvatum
Episyron rufipes
Nitela sp.
Polistes dominula
Vespula germanica
Sphecius convallis
Tachysphex fulvitarsis
Stizus continuus
Ichneumon cf. albiger
Osmia cornuta
Pseudogonalos hahnii
Inostemma sp.
Ammophila sabulosa
Epeolus variegatus
Psenulus fuscipennis
Sapygina decemguttata
Crossocerus quadrimaculatus
Torymus bedeguaris
Pseudoscolia sinaitica
Meria tripunctata
Cotesia vestalis
Tetraloniella nigriceps
Buathra laborator
Sirex noctilio
Aulacus burquei
Cephus spinipes
Megachile willughbiella
Xylocopa violacea
Vespa crabro
Xiphydria camelus
Lasioglossum xanthopus
Stelis punctulatissima
Hyposoter didymator
Bembix rostrata
Ammobates syriacus
Oxybelus bipunctatus
Dendrocerus carpenteri
Diodontus minutus
Coelioxys conoidea
Brachygaster minuta
Cimbex rubida
Eucera nigrescens
Dinetus pictus
Brachymeria minuta
Dacnusa sibirica
Anthidium manicatum
Scolia hirta
Sphex funerarius
Netelia sp.
Dasypoda hirtipes
176
159
68
109
4857
77
51
40
83
64
35
63
62
57
85
25
28
23
60
52
137
35
92
48
69
40
94
77
163
124
214
106
62
145
57
30
12
43
63
155
129
144
63
4470
15
72
64
210
55
72
236
56
119
77
143
107
120
163
187
143
90
19
121
48
50
179
20
140
72
247
82
67
43
153
133
162
24
130
121
55
78
130
190
48
53
80
91
67
37
47
128
174
118
228
138
93
94
54
100
76
115
43
80
200
24
225
64
92
62
66
76
38
83
13
65
217
74
94
120
7538
76
106
189
9161
158
150
10
257
137
386
247
218
281
55
144
304
88
182
29
62
212
77
90
99
78
6
126
51
144
133
94
81
142
84
59
44
43
95
211
78
178
50
38
mya
Hym
en
op
tera
Acu
leata
Ap
ocrita
PamphilioideaXyeloidea
Tenthredinoidea
Xiphydrioidea
Siricoidea
CephoideaOrussoidea
Ichneumonoidea
Ceraphronoidea
Cynipoidea
Platygastroidea
ProctotrupoideaDiaprioidea
EvanioideaStephanoideaTrigonaloidea
Chrysidoidea
Tiphioidea
Scolioidea
Ampulicidae
Sphecidae
Anthophilabees
"Crabronidae"digger wasps
Formicoideaants
Pompiloideavelvet ants,spider wasps,and relatives
Chalcidoideajewel wasps
Vespoideapotter, honey andsocial wasps
Ap
oid
ea
168
14
15
129
111
264
outgroup
7
31
100
126127
13
15
9
12
1819
14
8
2
4
20
1110
17
3
16
5
1
6
63
62
47
73
60
78
80
65
53
59
76
43
42
66
79
77
74
67
56
52
44
70
55
69
41
45
71
46
61
49
57
75
72
68
64
54
5150
48
58
40
35
3837
32
36
34
25
2324
21
27
22
28
30
26
29
39
33
88
9998
9495
96
87
8283
97
93
81
86
89
92
85
90
84
91
101
130
136
153152
168
157
162
155
171
159160
165
158
151
172
170
164
166
169
163
167
129
131
124
142
139
115
141
120
135
148
123
146
116
145
119
134
132
147
117
128
137
133
125
144
149
150
138
143
121
110109
111
102
113
105
114
103106
108
104
112
107
118
122
140
173
161
156
154
Thynnoidea
Supplemental Experimental Procedures 1. Sampling of transcriptomes We sampled RNA of 134 species of Hymenoptera for transcriptome sequencing (Data S1A). Whenever possible, we deposited voucher specimens (paragenophore or syngenophore) [S1] at the Zoological Research Museum Alexander Koenig (Bonn, Germany). Collected samples were ground and preserved in RNAlater (Qiagen, Hilden, Germany) and stored at +4 °C or -80 °C until further processing. One sample (Ceraphron sp.) was first stored in 99% ethanol for < 10 min. The sample was subsequently air-dried and ground in RNAlater. In all species, we sampled the RNA from the entire body of adult specimens. Detailed information (e. g., number of specimens and their sex) are summarized in Data S1A. Of one species (Dendrocerus carpenteri), we sequenced two different samples to ensure good data representation for this important taxon. Please note that we included in our study the raw reads of 33 already sequenced whole body transcriptomes (29 Hymenoptera and four outgroup species; marked with asterisks in Data S1A) published by us in two preceding investigations [S2, S3]. The raw reads of the nine extra samples were processed exactly as the other samples. Our sampling of transcriptomes thus comprised 164 samples of Hymenoptera, referring to 163 different species, and four outgroup taxa. 2. Transcriptome sequencing RNA extraction, next generation sequencing (NGS) library preparation and sequencing of the prepared libraries on Illumina HiSeq 2000 sequencers (Illumina, San Diego, CA, USA) followed the protocol given by Misof et al. [S2]. However, in five species (i. e., Aleiodes testaceus, Ceraphron sp., Cosmocomoidea morrilli, Dolichurus corniculus, Inostemma sp.), from which a total RNA yield < 3 µg was obtained, we constructed the NGS library using the TruSeq mRNA Library Prep Kit (Illumina). We sheared the purified mRNA into fragments of 160–170 bp in length using divalent cations at 98 °C. Fragment sizes and concentrations were determined with the aid of an Agilent Bioanalyzer 2100 (Agilent Technologies, Santa Clara, CA, USA) and a StepOnePlus Real-Time PCR thermocycler (Applied Biosystems, Waltham, MA, USA). All NGS libraries were paired-end sequenced on a HiSeq 2000 (Illumina) with 150 bp (all libraries except those prepared using the TruSeq kit) and 90 bp (libraries prepared with the TruSeq kit) read length. Per library, we collected approximately 2.5 Gb of raw data. 3. De novo assembly of transcriptomes Transcript raw reads were assembled using the assembler SOAPdenovo-Trans-31kmer (version 1.01) [S4]. All raw reads were quality checked and trimmed. Specifically, we discarded (i) reads with adapter contamination (minimum length of the alignment: 15 bp; at most 3 mismatches), (ii) reads that included > 10 Ns, and (iii) reads that included > 50 base pairs of low quality (i. e., Phred quality score: 2, ASCII 66 "B", Illumina 1.5+ Phred+64). All remaining reads were used for de novo assembly. For details on the assembly process, see Xie et al. [S4] and Misof et al. [S2]. In one step of the assembly, our settings differed from those of Misof et al. [S2]. In the contig forming step, linear k-mers (i. e., k-mers with a single out-degree) were merged to form edges and different edges were linked by arcs. Arcs with an abundance < 5% of the total out-degrees or < 2% of the total in-degrees were excluded. Subsequently, edges with an average abundance >= 3 were reported as contigs. 4. Identification and removal of contaminating sequences All transcriptome assemblies were first searched for vector and linker/adapter contamination using a local installation of VecScreen (http://www.ncbi.nlm.nih.gov/tools/vecscreen/) and the UniVec database build 7.1 (http://www.ncbi.nlm.nih.gov/tools/vecscreen/univec/). We removed both terminal and internal contaminations. The removal of internal contaminations resulted in a split of contigs/scaffolds. We next searched the assembled transcriptomes for cross-library contamination, as it can occur when pooling single index-tagged NGS libraries on the same Illumina NGS sequencer lane, using the search strategy outlined by Mayer et al. [S5]. Specifically, we compared each transcriptome assembly with all other assemblies sequenced in context of the 1KITE project using BLASTN of the BLAST+ (version 2.2.29) program suite [S6]. We explicitly refrained from restricting the contamination search to only those transcriptomes, which were sequenced on the same lane, in order to be able to detect also contamination that may have occurred in pre-sequencing steps. In those instances, in which BLASTN identified transcripts that shared over a length of at least 180 bp a nucleotide sequence identity of at least 98 %, we used the coverage depth to
determine, which transcript is likely the original sequence (by being more abundant) and which one likely represents the contamination (by being less abundant). We used as coverage depth of a given transcript the average k-mer coverage statistic provided by the assembly software SOAPdenovo-Trans-31kmer [S4]. Having identified transcripts sharing a high sequence identity, we applied the following procedure: (i) If two transcripts differed more than 2-fold in their relative coverage, we removed the transcript with the lower relative coverage from the corresponding assembly. (ii) If the coverage of the two transcripts in question differed 2-fold or less, we conservatively removed both of them from the two corresponding assemblies. This procedure helped with removing putative foreign contaminations (e. g., from third party libraries sequenced on the same lane, but not present in our analyses). It also meant that in case of multiple highly similar sequences, we retained only the single transcript with the highest relative coverage, given that its coverage was more than 2-fold higher than the coverage of the second-best matching transcript. We detected in samples sequenced on two specific Illumina lanes contaminations originating from the moth Plutella xylostella (Insecta: Lepidoptera), a species that was not included in our dataset, and thus we decided to specifically search the 1KITE transcript assemblies generated from sequencing these two lanes for sequences of this moth. We retrieved all protein-coding sequences of the P. xylostella official gene set (version 1.0) from the diamondback moth genome project database (http://iae.fafu.edu.cn/DBM/) and searched the coding sequences with BLASTN against the respective 1KITE transcriptome assemblies. Hits that exhibited 98% sequence identity over a length of at least 180 bp were removed. We additionally removed transcripts from the assembled transcriptomes that NCBI identified as possibly foreign contaminations when submitting the assemblies to the NCBI Transcriptome Shotgun Assembly (TSA) database (see below). Information on how many transcripts were removed from each transcript library is summarized in Data S1B.
We removed per assembled transcript library between 14,258 (0.05% of the unfiltered assembled transcriptome) and 9,925,188 (26.99% of the unfiltered assembled transcriptome) nucleotides, as they likely or possibly represented contamination (Data S1B). However, the comparatively high number of removed nucleotides in some samples (i. e., congeneric species in the genera Eucera and Tetraloniella) is more likely explained by the chosen conservative approach for identifying possible contaminants than by actual excessive instances of contamination. Please note that we found a relatively small number of single-copy (target) genes in the species of these two genera. Since the two genera are deeply nested within Anthophila (bees), the small number of identified genes (Data S1E) is, however, insignificant for inferring the phylogenetic relationships of the major lineages of Hymenoptera. 5. Identification of orthologous transcripts of single-copy protein-coding genes We used the program Orthograph (version beta4) [S3] to search the transcript libraries for transcripts that are orthologous to a set of target genes. Specifically, we selected target genes that the OrthoDB v7 database [S7] suggested to occur in single-copy across Hymenoptera and Coleoptera (beetles). Note that we included the latter to enable outgroup comparisons (i. e., rooting of the inferred tree). Specifically, target genes were selected by searching the OrthoDB v7 database [S7] for genes that are single copy (copy number = 1) in the sequenced and annotated genome of each of the following six reference taxa: Acromyrmex echinatior (official gene set, OGS, 3.8) [S8], Apis mellifera (OGS 3.2) [S9, S10], Camponotus floridanus (OGS 3.3) [S11] Harpegnathos saltator (OGS 3.3) [S11], Nasonia vitripennis (OGS 2) [S12], and Tribolium castaneum (OGS 3.0) [S13]. The hierarchical level for clustering orthologous genes in the OrthoDB query was set to the node “Holometabola” (= Endopterygota). The six reference taxa were selected, because (i) their genomes are well sequenced and annotated, (ii) their official genes set were publicly released and freely available (in respect of the Ft. Lauderdale agreement) when we started our analyses, and (iii) they covered all those major target lineages from which genomes of species had been sequenced. We additionally demanded that the target genes have not been recorded as being duplicated (but they can be missing) in other species of Hymenoptera. In addition to the six reference species, we exploited the publicly available gene orthology information stored in OrthoDB v7 [S7] to other species when inferring a set of target genes (ortholog set). Our custom query (kindly conducted and provided by Robert Waterhouse on August 11, 2014) explicitly demanded that single-copy genes of the above six reference taxa must not occur in multi-copy (i. e., their copy number must be <= 1) in any of the following additional species: Atta cephalotes (OGS 1.2), Apis florea (OGS 1.1), Bombus impatiens (OGS 1.2), Bombus terrestris (OGS 1.3), Cephus cinctus (OGS from April 2013), Dufourea novaeangliae (OGS 1.1), Eufriesea mexicana (OGS 1.1), Habropoda laboriosa (OGS 1.2), Lasioglossum albipes (OGS 5.4), Linepithema humile (OGS 1.2), Melipona quadrifasciata (OGS 1.1), Megachile rotundata (OGS 1.1), Pogonomyrmex barbatus (OGS 1.2),
and Solenopsis invicta (OGS 2.2.3). Note that we intentionally did not restrict the copy number to 1 in order to cope with the problem that official gene sets of poorly sequenced and annotated genomes typically comprise a relatively low number of genes. We acknowledge that assembly and annotation artefacts can also result in artificially inflated copy numbers. We think, however, that our query represents a reasonable compromise between being able to search for as many promising target genes as possible in the 1KITE transcriptomes (even if these genes actually got secondarily lost in some lineages) and making sure that genes that indeed occur in multiple copies are omitted.
Orthograph requires the user to provide a tabulator-delimited file containing the information which genes in the reference species are orthologous. We obtained this information from Robert Waterhouse, who extracted it from OrthoDB v7 [S7]. Orthograph also requires the official gene sets (i. e., the predicted coding and the predicted amino acid sequences) to the sequenced genomes of these reference species if one intends to conduct analyses also on the nucleotide level. These were obtained from the sources specified in Data S1D.
Since the OrthoDB v7 file that stores information about which genes are orthologous in the six reference taxa also included information about all other taxa that were part of the OrthoDB query, we first removed with a custom Perl script from the tabulator-delimited OrthoDB v7 output file all entries that did not refer to the above six reference species. After this filtering, the file contained 3,275 entries per species, thus 19,650 entries in total. Given that the entries in the gene identifier column of this file need to perfectly correspond with the header information of the corresponding genes in the official gene sets, we re-formatted the headers in the OGS files with custom Perl scripts. In this context, we discovered that the OGS 2.0 of N. vitripennis is partially corrupt. Specifically, we found multiple entries for the following genes and proteins (the number of copies is given in parenthesis): Nasvi2EG037307t1 (2), Nasvi2EG037281t1 (4), Nasvi2EG037267t1 (3), Nasvi2EG037334t1 (2), Nasvi2EG037012t1 (2), Nasvi2EG037232t1 (2), Nasvi2EG037311t1 (3), Nasvi2EG037344t1 (2), Nasvi2EG037308t1 (3) Nasvi2EG037337t1 (4), Nasvi2EG037324t1 (2), Nasvi2EG037295t1 (2), Nasvi2EG037340t1 (2), Nasvi2EG037310t1 (3), Nasvi2EG037309t1 (2), Nasvi2EG037306t1 (3), Nasvi2EG037268t1 (2), Nasvi2EG037352t1 (3), Nasvi2EG037335t1 (3), Nasvi2EG037349t1 (2), Nasvi2EG037338t1 (2), Nasvi2EG037315t1 (3), Nasvi2EG037181t1 (2), Nasvi2EG037181t2 (2), Nasvi2EG037320t1 (5), Nasvi2EG037351t1 (2), Nasvi2EG037317t1 (2), Nasvi2EG037342t1 (3), Nasvi2EG037322t1 (4), Nasvi2EG037294t1 (3), Nasvi2EG037343t1 (5), Nasvi2EG037339t1 (2), Nasvi2EG037231t1 (3), Nasvi2EG037313t1 (2), Nasvi2EG026831t1 (2), Nasvi2EG037316t1 (4), Nasvi2EG037325t1 (2), Nasvi2EG037230t1 (3), Nasvi2EG037326t1 (2).
Since it remained unclear why different genes in the Nasonia OGS 2.0 had the same identifier (Don Gilbert, personal communication; August 11, 2014), we removed these genes from the official gene set. A total of 15 of them (EOG7JTJMP, EOG783ZJ4, EOG7W4C2X, EOG78Q60C, EOG74RCGQ, EOG7B3CCR, EOG7W1GSS, EOG7B0H4C, EOG700M13, EOG7K459C, EOG77MMBJ, EOG72GBX3, EOG7DG821, EOG78Q602, EOG764JRC) referred to genes in our OrthoDB v7 output files. We conservatively removed the corresponding entries to all six reference taxa from the OrthoDB v7 file. The edited OrthoDB v7 output file subsequently contained 3,260 entries per species, 19,560 entries in total.
The occurrence of isoforms in a dataset can compromise the search for orthologous transcripts, since the Orthograph search (see below) of a potential target transcript (at the transcriptional level) against all proteins in an official gene set can result in finding an isoform of a protein different from the isoform (of the same protein) that was initially used to search the transcript library (at the transcriptional level). The best reciprocal hit criterion, on which Orthograph relies on (see below), would then not be fulfilled. We therefore decided to screen the official gene sets for the occurrence of isoforms (and alternative transcripts on the coding sequence level) in the 3,260 target genes and retained only the isoform (and alternative transcript on the coding sequence level) that OrthoDB v7 determined to be orthologous. Finally, we removed terminal stop symbols (*) in the protein files and applied the script ‘make-ogs-corresponding.pl’ (part of the Orthograph software package; [S3]) to validate (and if necessary enforce) correspondence of the amino acid sequences of proteins and their coding sequence at the nucleotide level in the OGS files of a given species. The resulting ortholog set comprised 3,260 protein-coding genes.
These 3,260 single-copy protein-coding target genes were searched in the transcript libraries with Orthograph, which rests on a graph-based approach and uses the best-reciprocal hit (BRH) criterion to extend clusters of known orthologs with transcript sequences [S3]. Its algorithm maps transcript sequences
to the globally best matching ortholog group (OG). For this purpose, it creates a profile hidden Markov model (pHMM) from the amino acid sequences of each target gene (i. e., ortholog group (OG), containing the orthologous amino acid sequences of the six reference species). This pHMM is used to search (at the translational level) for ortholog candidates in transcript libraries. Each candidate transcript sequence is extracted and used as query in a protein BLAST search against a database of all amino acid sequences from all the reference OGS. The results from both the pHMM search and the protein BLAST search are stored in a database for evaluation and orthology delineation. The orthology delineation procedure pulls all pHMM search results from the database and sorts them by descending bit score. For each pHMM hit transcript, the corresponding BLAST result is checked whether the best hit sequence belongs to the OG that the pHMM is based on. If this is the case, the BRH criterion is fulfilled and the OG is extended with the candidate transcript. Subsequently, Exonerate [S14] is employed to infer the open reading frame (ORF) in the transcript sequence by calculating a pairwise alignment with a reference amino acid sequence (i. e., the amino acid sequence of the reference species that was retrieved in the reciprocal search as best hit; see above). This step also produces corresponding sequences on the transcriptional level and corrects frame shift errors. If possible, the ORF is extended beyond the pHMM alignment region, retaining the orthologous region.
We used Orthograph (version beta4) [S3] to search for the 3,260 target genes. Orthograph depends on the following bioinformatics software packages, of which we used the hereafter specified version numbers: BLAST+ (version 2.2.26) [S6], Exonerate (version 2.2) [S14], and MAFFT (version 7.123) [S15]. All potential orthologous transcripts found by a given pHMM were sorted by their degree of similarity to the pHMM and then reciprocally searched against the proteins of the official gene sets (from six different species; Data S1D) combined. We applied a non-strict reciprocal search. Thus, the BRH criterion was fulfilled if the first reciprocal hit was a protein that was also part of the pHMM, irrespective of the species to which the protein belonged to. If the BRH criterion could not be established for a given transcript, all transcripts with a smaller degree of similarity to a given pHMM were discarded (soft-threshold = 0). The minimum length of transcripts (on the translational level) was set to 30 amino acids. We allowed frame shift-corrected transcriptional extension of each transcript beyond the region of the transcript for which the BRH criterion was established as long as this region was not longer than the region, for which the BRH was established (see above). All other Orthograph parameters were left at the default values. When summarizing the results, we removed terminal and masked internal stop codons with X and NNN, respectively, in all amino acid and corresponding nucleotide sequences.
Search for transcripts orthologous to the 3,260 target genes in all 168 analyzed transcript libraries revealed on average transcripts to 2,403 target genes (median: 2,450; minimum: 1,456; maximum 2,760; Data S1E). Species with particularly few target gene hits were: Eucera plumigera (1,456), Ampulex compressa (1,540), Ampulex fasciata (1,542), Eucera syriaca (1,560), Tetraloniella sp. (1,662), and Acantholyda hieroglyphica (1,668). Overall, 3,256 of the 3,260 target genes were represented in the 168 transcriptomes. 6. Inference of multiple sequence alignments We constructed multiple sequence alignments (MSA) with MAFFT (version 7.017) [S15], using the L-INS-i algorithm. We subsequently checked the resulting MSAs on the translational level and, if necessary, refined the MSAs as outlined by Misof et al. [S2]. We inferred the MSA of each OG also on the transcriptional (nucleotide) level using a modified version of the software Pal2Nal [S16] (version 14; see Misof et al. [S2] for details on the software modification). All MSAs have been deposited in and are available from Mendeley Data (doi: 10.17632/trbj94zm2n.2).
Assessing the quality of the inferred Multiple Sequence Alignments (MSAs) revealed putatively misaligned amino acid transcripts (hereafter referred to as outliers) in 256 of the 3,256 investigated ortholog groups (OGs). Attempts to refine the alignment of the 619 outlier sequences succeeded in 101 instances (i. e., sequences). The remaining 518 sequences, referring to a total of 230 OGs, remained outliers and were consequently removed from the MSAs. We also removed the corresponding nucleotide sequences from the respective nucleotide MSAs.
7. Protein domain identification To facilitate a protein domain-based sequence data partitioning scheme, we applied the procedure described by Misof et al. [S2]. Protein domains and protein domain clans (i. e., evolutionary related protein domains) [S17] were identified (on the amino acid level) in each MSA by using pHMMs from the Pfam database (release 27.0) [S18] and the software pfam_scan.pl (version 1.5, release date: October 15, 2013) [S18] in conjunction with HMMER (version 3.1b2) [S19]. For additional details, see Misof et al. [S2].
Search for protein domains in the refined amino acid sequence alignments of the 3,256 single-copy genes assigned 32.0% of the alignment sites to domains of Pfam-A and 6.0% of the alignment sites to domains of Pfam-B. A total of 62.0% of the alignment sites remained unannotated. Based on the domain identification results, we split the 3,256 multiple sequence alignments and rearranged their sites into 5,922 different data blocks. Of these, 312 belong to Pfam-A domains and were pooled according to their clan annotation [S17]. After pooling domains without clan annotation but having the same name, we obtained 1,243 data blocks belonging to different Pfam-A domains and 1,111 belonging to different Pfam-B domains. Finally, we pooled unannotated regions (voids) according to their gene origin. This resulted in 3,256 data blocks corresponding to the void regions of the 3,256 analyzed genes. 8. Multiple sequence alignment masking We used a modified version of Aliscore (version 1.2) [S20–S22] to identify blocks of putative alignment ambiguities or randomized MSA sections in the MSA of each OG. Aliscore was executed with its default sliding window size, with requesting the maximum number of pairwise sequence comparisons, and with the option –e (i. e., indicating gappy amino acid sequence data). Blocks of putative alignment ambiguities or randomized MSA sections were subsequently removed from each domain data block at the amino acid and corresponding nucleotide level using custom scripts and concatenated to supermatrices containing protein domain- and gene-based data blocks (the latter comprising regions not annotated as protein domain). Gap symbols (-) at the beginning and at the end of the resulting MSAs were replaced with 'X' (translational level) and 'N' (transcriptional level), respectively. 9. Removal of phylogenetically non-informative data blocks We assessed the phylogenetic signal of each data block at the translational level with the software MARE (version 0.1.2-rc) [S23]. Data blocks whose phylogenetic information was zero were removed from the supermatrix on the translational and on the transcriptional level using custom Perl scripts. Note that the phylogenetic information content of all retained data blocks also differed from zero when being assessed on the transcriptional level.
After (i) removing ambiguous alignment sites in each data block resulting from steps outlined above, (ii) deleting data blocks that contained no phylogenetic information (792 in total), and (iii) concatenating all remaining data blocks, the resulting supermatrices consisted of 1,505,722 amino acid and 3,011,444 nucleotide sites (1st and 2nd codon position only), respectively. Each of the two supermatrices contained information from 3,256 different single-copy protein-coding genes and comprised 4,818 data blocks (301 clan data blocks from pooled Pfam-A protein domains with clan association, 1,103 data blocks from pooled Pfam-B protein domains without clan association, 778 data blocks from Pfam-B protein domains, 2,636 data blocks from void regions).
The amino acid supermatrix was further processed for testing specific phylogenetic hypotheses via analysis of decisive datasets (i. e., when demanding the presence of at least one species from a given taxonomic group in each data block of the supermatrix). The statistics of the three inferred datasets were as follows: ‘backbone dataset’ (800,393 amino acid sites, comprising 2,142 data blocks), ‘Aculeata dataset’ (709,055 amino acid sites, comprising 1,712 data blocks), ‘Apoidea dataset’ (527,548 amino acid sites, comprising 841 data blocks).
All above specified supermatrices as well as the data block information files have been deposited at Mendeley Data (doi: 10.17632/trbj94zm2n.2). 10. Partitioning and substitution model selection We used PartitionFinder (developer versions 2.0.0-pre2, 2.0.0-pre9, and 2.0.0-pre10) [S24] to identify combinations of data blocks that can be modeled with the same substitution model and which therefore might better be combined to partitions. The corrected Akaike information criterion (AICc) [S25] was used to assess whether or not to combine data blocks to partitions. PartitionFinder was run with the following
settings: branchlengths [linked], models [LG+G, LG+G+F, WAG+G, WAG+G+F, BLOSUM62+G, BLOSUM62+G+F, DCMUT+G, DCMUT+G+F, JTT+G, JTT+G+F, LG4X], model_selection [aicc], search [rcluster], weights [rate = 1.0, base = 1.0, model = 0.0, alpha = 1.0], rcluster-percent [100.0], rcluster-max [10,000]. PartitionFinder was started with the following additional command line arguments: '--raxml --all-states --min-subset-size 50'. We applied the same partition scheme that we inferred with PartitionFinder from analyzing protein domain-based data blocks at the translational (amino acid) level when analyzing the data at the transcriptional (nucleotide) level. However, we discarded the 3rd codon position, as it was saturated and typically exhibits lineage specific compositional biases [S26, S27]. We furthermore modeled the nucleotide substitutions of each codon position within a given partition separately. The applied nucleotide substitution model was GTR+G.
PartitionFinder suggested merging data blocks in each of the supermatrices outlined above to partitions. Specifically, PartitionFinder suggested analyzing the primary supermatrix (i. e., the non-decisive dataset) at the amino acid level by considering 2,058 partitions. We applied the same partition scheme when analyzing the corresponding supermatrix containing nucleotide sequences, except that we modeled the 1st and 2nd
codon position of each partition separately (the 3rd codon position was not considered in our analyses). The resulting primary supermatrix comprised accordingly on the nucleotide level 4,116 partitions. PartitionFinder suggested considering the following number of partitions for testing specific phylogenetic hypotheses via analyses of decisive datasets: 1,025 (‘backbone dataset’), 853 (‘Aculeata dataset’), and 461 (‘Apoidea dataset’). The final data sets and files containing the inferred partition schemes and model selection results are deposited at Mendeley Data (doi: 10.17632/trbj94zm2n.2). 11. Phylogenetic tree inference and bootstrap analysis We applied the maximum likelihood optimality criterion as implemented in ExaML (versions 3.0.15 and 3.0.17) [S28] for the phylogenetic inferences. We selected the tree with the best log-likelihood score found in 50 independent tree searches (25 randomized stepwise addition parsimony starting trees, 25 completely random starting trees) per dataset. All starting trees were inferred with RAxML (version 8.0.26) [S29] We applied the GTR substitution model when analyzing the nucleotide sequence data and applied the partition-specific substitution models suggested by PartitionFinder when analyzing the amino acid sequences in ExaML. Rate heterogeneity was approximated with a gamma distribution, using the median and four discrete rate categories. Node support was estimated via non-parametric bootstrapping [S30]. Bootstrap replicate alignments and random start trees were generated with RAxML and a custom shell script. Subsequently, ExaML was used to infer one ML tree per bootstrap replicate, applying the original partitioning scheme suggested by PartitionFinder for the respective supermatrix. In order to determine the minimum number of replicates needed for reliable estimation of bootstrap support values, we used the ”autoMRE” bootstrap convergence criterion [S31], as implemented in RAxML, with the default threshold setting of 0.03. In brief, this method works as follows: first, all replicate trees are randomly divided into two subsets of equal size; then, a majority rule extended (MRE) consensus tree for each subset is built; finally, the relative weighted Robinson-Foulds distance (WRF) between the consensus trees is computed. Convergence is reached if WRF is below the threshold in at least 99% of 1,000 tree set permutations. Convergence was evaluated after analyzing every batch of 50 bootstrap replicates. All inferred trees were rooted at the branch that connects Hymenoptera with the outgroups.
The majority rule extended (MRE) bootstrap convergence criterion [S31] indicated that 50 bootstrap replicates are sufficient to accurately assess node support irrespective of whether we analyzed the primary supermatrix containing amino acid sequences or the primary supermatrix containing the corresponding nucleotide sequences. All node support values provided in the present investigation are consequently based on 50 non-parametric bootstrap replicates each.
Please note that RAxML and ExaML remove completely undetermined sites (i. e., sites containing only “-” and/or “X” and/or “N”) prior to the phylogenetic inference. The total size of the supermatrices in each of the phylogenetic inferences was accordingly 1,505,514 sites (primary supermatrix, amino acids), 3,011,099 sites (primary supermatrix, nucleotides), 800,252 sites (“backbone dataset”, amino acids), 708,924 sites (“Aculeata dataset”, amino acids), and 527,471 sites (“Apoidea dataset”, amino acids).
We used ExaBayes [S32] to conduct phylogenetic inferences also in Bayesian framework and analyzed again the primary amino acids supermatrix (see above). We completed three independent ExaBayes runs with four coupled MCMC chains each. After 200,000 generations, we assessed convergence of the results by computing the average deviation of split frequencies (ASDSF criterion). We obtained an ASDSF value of 2.51%, which is generally considered an acceptable convergence according to the ExaBayes manual
[http://sco.h-its.org/exelixis/web/software/exabayes/manual/index.html#sec-7-3]. Finally, we used the consense tool (part of the ExaBayes software package) to obtain a MRE consensus tree, discarding the first 25% of the sampled topologies. Please note that ExaBayes does not support the LG4X amino acid substitution model, which PartitionFinder suggested to apply on several of the inferred data partitions. We therefore decided to use the automatic model selection strategy implemented in ExaBayes (default setting) on the PartitionFinder-inferred partitions.
The topology inferred with ExaBayes proved to be identical to the one inferred from analyzing the primary amino acid supermatrix under the maximum likelihood optimality criterion (Figure 1b), except for the phylogenetic position of Stephanoidea and Scoliidae. Stephanoidea were found with low statistical support as sister group of Evanioidea+(Trigonaloidea+Aculeata), while the maximum likelihood analysis inferred (also with low node support) Stephanoidea as sister group of Evanioidea. Note that we found noteworthy phylogenetic conflict when analyzing the dataset using Four-cluster Likelihood Quartet Mapping analyses in respect of the phylogenetic position of Stephanoidea (see also discussion in section 14). Finally, while the maximum likelihood analyses suggested ants as the sister group of Apoidea, the Bayesian analyses inferred a lineage comprising both ants and Scoliidae as the sister group of Apoidea.
The results from the phylogenetic tree inference and bootstrap analysis have been deposited at Mendeley Data (doi: 10.17632/trbj94zm2n.2). 12. Screen for rogue taxa We tested for the presence of rogue taxa with RogueNaRok (version 1.0) [S33] when analyzing the amino acid and the nucleotide supermatrix using five distinct settings: (i) bipartition support drawn onto the best ML tree (referred to as best), (ii) majority rule consensus (50% threshold, referred to as mr), (iii) greedily refined extended majority rule consensus (referred to as mre), (iv) strict consensus (100% threshold, referred to as strict), (v) the 75% threshold consensus – the criterion for pruning rogue taxa is to improve the number of edges that have at least 75% bootstrap support (referred to as t75). We used a dropset size of 1 when using the settings best and mre, a dropset size of 1+2 when using the setting mr and a dropset size of 1+4 when using the settings strict and t75.
We identified a single species that exhibited rogue taxon behavior: Nitela sp., a representative of the polyphyletic apoid wasp family “Crabronidae”. However, Nitela sp. was identified as a rogue taxon only when analyzing the amino acid primary supermatrix and only under one specific rogue taxon identification setting: 75% threshold consensus with dropset sizes 1+4. Given the fact that Nitela is deeply nested in a subordinated monophyletic lineage of apoid wasps and that its position within this lineage was not of primary interest in context of the present investigation, we refrained from excluding Nitela from our phylogenetic analyses. The results from the rogue taxon have been deposited at Mendeley Data (doi: 10.17632/trbj94zm2n.2). 13. Testing specific phylogenetic hypotheses via analysis of decisive datasets Following Dell’Ampio et al. [S34] and Misof et al. [S2], we generated decisive datasets, which contained at least one species from a given taxonomic group in each partition of the analyzed supermatrices, to assess whether a bias due to a possible lack of taxonomic sequence representation at the partition level has an impact on the inferred tree topology. We generated three different decisive datasets: (i) one for addressing the major lineages along the Hymenoptera tree (‘backbone dataset’), (ii) one for addressing the phylogenetic relationships of the major lineages of Vespoidea and Apoidea (‘Aculeata dataset’), and (iii) one for addressing the phylogenetic relationships within Apoidea specifically (‘Apoidea dataset’). Specific information on which groups we distinguished and which species are part of a given group has been deposited at Mendeley Data (doi: 10.17632/s5j2f62z3d.2). Given that the generation of decisive datasets resulted in some partitions being excluded from subsequent analyses, we inferred new partition schemes and corresponding partition-specific substitution models with PartitionFinder for each decisive dataset. Please note that in the phylogenetic analyses of the decisive datasets we considered only amino acid sequences. We conducted ten tree searches with ExaML (versions 3.0.15 and 3.0.17) with random starting trees per decisive dataset.
Phylogenetic analysis of all three decisive datasets (i. e., ‘backbone dataset’, ‘Aculeata dataset’, ‘Apoidea dataset’) resulted in topologies identical to the one inferred from analyzing the primary supermatrix (i. e., the non-decisive dataset) at the amino acid level when considering only those major lineages whose phylogenetic relationships to each other were meant to be assessed by the respective
decisive dataset. Since the decisive data sets are restricted to partitions with a required taxon coverage, the results indicate that the distribution of missing data did not produce a significant phylogenetic signal, rendering the probability of a missing data bias in our phylogenetic relationships unlikely. The results from analyzing the decisive datasets have been deposited at Mendeley Data (doi: 10.17632/s5j2f62z3d.2). 14. Four-cluster Likelihood Quartet Mapping (FcLM) We applied FcLM to assess the phylogenetic support for conflicting hypotheses [S2, S34, S35, S36], focusing on five major phylogenetic hypotheses (see below). We used the PTHREADS implementation of RAxML (versions 8.2.6 and 8.2.8) [S29] to infer the likelihood scores of quartets, using the partition scheme and substitution models inferred before when analyzing the complete supermatrix at the translational (amino acid) level. We assessed the following phylogenetic hypotheses: (i) Eusymphyta represent the sister group of Unicalcarida. This hypothesis was assessed by testing the phylogenetic position of Xyeloidea (iA) relative to the outgroup, to Pamphilioidea+Tenthredinoidea and to Unicalcarida, (iB) relative to the outgroup, to Tenthredinoidea and to Pamphilioidea+Unicalcarida, and (iC) relative to the outgroup, to Pamphilioidea and to Tenthredinoidea+Unicalcarida. Please note that we did not consider representatives of Apocrita in the three tests. (ii) Cephoidea represent the sister group of Vespina; (iii) Trigonaloidea represent the sister group of Aculeata; (iv) Stephanoidea represent the sister group of Evanioidea; (v) Formicidae represent the sister group of Apoidea. Specific information about which species were part of a specific group is given in Data S1G.
The results from testing major phylogenetic hypotheses (i. e., Eusymphyta as the sister group of Unicalcarida, Cephoidea are the sister group of Vespina, Trigonaloidea are the sister group of Aculeata, Stephanoidea are the sister group of Evanioidea, Formicidae are the sister group of Apoidea) via Four-cluster Likelihood Mapping (FcLM) as well as all analyzed amino acid data matrices have been deposited along with partition information files at Mendeley Data (doi: 10.17632/s5j2f62z3d.2 and 10.17632/trbj94zm2n.2).
Given that we found in all tree inferences strong support for monophyletic Eusymphyta, a result incompatible with the classical hypothesis of Xyeloidea being the sister group of all remaining Hymenoptera (supported by morphological data and data on muscle cell ploidy levels) [S37, S38], we additionally assessed the support for our and for alternative phylogenetic hypotheses regarding the phylogenetic position of Xyeloidea via FcLM. FcLM of the original amino acid sequence data consistently revealed the lowest phylogenetic signal (0–3%) for a sister group relationship of Xyeloidea and the remaining Hymenoptera, irrespective of the phylogenetic position of Pamphilioidea (please note that FcLM does not assume any specific phylogenetic relationships among the species of a given group, e. g., the outgroup). While it cannot be excluded that consideration of additional outgroup lineages (e. g., non-holometabolous insects) could result in a change of the topology, doing so would require design of a very different ortholog set than the one used in the present investigation, comprising significantly fewer genes and consequently likely containing substantially less phylogenetic signal.
The results from FcLM of the original amino acid sequence data revealed no phylogenetic signal for topologies that are incompatible with the hypothesis of Cephoidea being the sister group of Vespina and with the hypothesis of Trigonaloidea being the sister group of Aculeata. However, FcLM revealed phylogenetic signal in the dataset that is in conflict with the other two hypotheses. Specifically, FcLM indicated besides strong support for a sister group relationship of Formicidae and Apoidea (66%) also noteworthy support a sister group relationship of Scoliidae and Apoidea (32%). This signal has the potential of misleading phylogenetic analyses, in particular when considering only a small taxonomic sampling. Authors should keep this in mind when inferring a sister group relationship of scoliid and apoid wasps in future analyses. A sister group relationship of Scoliidae and Formicidae, as it was inferred in the Bayesian analyses, was supported only by 2% of the quartets. Considering all evidence, a sister group relationship of Formicidae and Apoidea appears currently the most likely phylogenetic scenario. FcLM also indicated strong support for a sister group relationship of Stephanoidea and Trigonaloidea+Aculeata (88%) and for a sister group relationship of Stephanoidea and Parasitoida (11%). In fact, the phylogenetic hypothesis suggested by the simultaneous phylogenetic analysis of all taxa (Figure 1; i. e., a sister group relationship of Stephanoidea and Evanioidea) is only supported by 1% of the evaluated quartets. This sister group relationship also received only a comparatively low bootstrap support (i. e., 90%), considering the size of the analyzed data matrix (Figure 1) and it was not found in the Bayesian analyses either (Fig. S4). We consequently consider the question of the phylogenetic position of Stephanoidea relative to Parasitoida, Evanioidea, and Trigonaloidea+Aculeata as still open.
15. Testing for non-phylogenetic signal biases via permutation tests To assess the possible impact of a heterogeneous composition of amino acid sequences, of a non-stationarity of substitution processes and of a non-random distribution of missing data [S39, S40] on our phylogenetic inferences, we adopted the permutation test strategy suggested by Misof et al. [S2]. For more information on the applied permutation schemes and their explanatory power, see Misof et al. [S2]. The phylogenetic signal for the conflicting hypotheses in the permuted supermatrices was assessed using FcLM (see above). FcLM was conducted on the translational level and using the same partition scheme as before, but using the WAG substitution model across all partitions. Permutation tests were done with RAxML (versions 8.2.6 and 8.2.8) [S29] and ExaML (version 3.0.17) [S28].
The results from testing major phylogenetic hypotheses (i. e., Eusymphyta are the sister group of Unicalcarida, Cephoidea are the sister group of Vespina, Trigonaloidea are the sister group of Aculeata, Stephanoidea are the sister group of Evanioidea, Formicidae are the sister group of Apoidea) via permutation tests in conjunction with FcLM have been deposited at Mendeley Data (doi: 10.17632/s5j2f62z3d.2). In none of the permutation tests did any of the three possible topologies receive more than 18% support when assessing the phylogenetic support for different topology by the permuted dataset via FcLM. The only exception were the results from testing the position of Xyeloidea and in which some topologies received up to 41% support. However, artificial phylogenetic signal supported almost exclusively topologies incompatible with monophyletic Eusymphyta, whereas artificial support for Eusymphyta remained low. We accordingly interpret the impact of artificial phylogenetic signal on the inferred phylogeny shown in Figure 1 as insignificant.
The results from the various permutation tests have been deposited at Mendeley Data (doi: 10.17632/s5j2f62z3d.2). 16. Divergence time estimation
We time-calibrated the inferred phylogenetic tree using information on 14 carefully selected fossils. We followed the best-practice recommendations put forth by Parham et al. [S41] for justifying fossil
calibrations. We considered the following fossils to calibrate the inferred phylogeny, with the lineage each fossil was assigned to given in parentheses: Triassoxyela foveolata (Hymenoptera) [S42] Libanophron astarte (Ceraphronoidea) [S43], Rhetinorhyssalus morticinus (Braconidae) [S44], Protimaspis costalis (Cynipoidea) [S45], Archaeopelecinus jinzhouensis (Pelecinidae) [S46], an undescribed species (Chalcidoidea) deposited in the Natural History Museum of the Lebanese University (Faculty of Sciences II, Fanar, Lebanon, female, Coll. No. 874A, specimen represents an undescribed family, that can be placed in Chalcidoidea based on unambiguous synapomorphic character, i. e., presence of multiporous plate sensillae on antennae), Protoparevania lourothi (Evaniidae) [S47], Lancepyris opertus (Chrysidoidea) [S48], Architiphia rasnitsyni (Tiphiidae) [S49], Paleogenia wahisi (Pepsinae) [S50], Apodolichurus sphaerocephalus (Ampulicidae) [S51], Palaeomacropis eocenicus (Macropidinae) [S52], Paleohabropoda oudardi (Anthophorini) [S53], Bombus (Bombus) randeckensis (Bombini) [S54]. Detailed information about the fossils (e. g., characters considered as autapomorphies of specific lineages, their estimate minimum geological age) is given in Data S1F. If multiple fossils of a given group were available for calibration, we selected the geologically oldest one. Please note that we considered Paleogenia wahisi (Pepsinae), deposited in Baltic Amber, despite the well-known problem of accurately estimating the geological age of Baltic Amber. However, the currently widely accepted conservative minimum age of Baltic Amber (i. e., 34 mya) [S55] proved to be informative for improving calibration of the phylogenetic tree. We used mcmctree, in conjunction with codeml (both part of the PAML software package, version 4.9) [S56] to estimate node ages. We applied the approximate likelihood method [S57] by generating a single Hessian matrix with codeml using standard parameters and the JTT model from a condensed version of our supermatrix of amino acids. Specifically, the supermatrix contained only sites at which at least 95% of the samples had a non-ambiguous amino acid symbol. The condensation of the supermatrix became necessary to overcome computational limitations when estimating node ages resulting from the large size of the dataset. Please note that we tested whether or not our data allowed for the generation and subsequent concatenation of partition-specific Hessian matrices. This approach would in principle allow different models to be applied to different partitions. However, we found that most partitions contained at least one taxon with no amino acid sequence information, rendering this approach unfeasible. Prior to the node age analyses with fossil calibration data, we started various test runs, using an arbitrary calibration and the Hessian matrix, to assess whether our sampling parameters would likely lead to convergence. These tests
revealed that a burn-in value of 40,000 and a number of samples of 400,000 typically led to convergence. We introduced the fossil calibrations as soft minima using default settings (i. e., as truncated Cauchy distributions with an offset of 0.1, a scale parameter of 1 and a left tail probability of 0.025) [S58]. The root age was specified at 412 mya in the control file, providing a very conservative estimate [S2]. We ran four dating analyses, using independent-rates clock and standard parameters, apart from the burn-in and sample size values, which we set to the values found in the above test runs and using the Hessian matrix generated from the non-partitioned dataset. All four runs provided very similar node age estimates, with two of the runs also agreeing in the boundaries of the 95% confidence intervals. The other two runs suggested slightly larger confidence limits at a small number of nodes, mostly close to the root of the tree. It is likely that in these two runs the MCMC chain took longer to converge to the final node ages and that thus a larger number of outliers was sampled. Since two of the runs agreed in their node age estimates and errors, we consider the results from these two runs to very likely represent the more realistic node ages and confidence intervals (shown in Figures 1 and S1 and specified in Data S1H). In addition, we ran the same dataset with the same settings again twice, this time using the correlated-rates clock. Finally, we used a condensed version of our supermatrix of nucleotides, analogous to the amino acid set including only sites at which at least 95% of the samples had a non-ambiguous base symbol, and ran this, with the same settings as outlined above for the amino acid set, twice with the independent and the correlated-rates clock, respectively. As model we specified HKY85. The node ages and confidence intervals are shown in Data S1H. The two replicate runs for these additional analyses converged (Figure deposited at Mendeley Data, doi: 10.17632/s5j2f62z3d.2). In general, the correlated rates runs and the runs using the nucleotide supermatrix delivered results that show older ages for the included nodes, with the effect of using correlated-rates slightly lower. However, we deem the correlated-rates clock as less suitable for the data and taxon sampling of our study, following [S59]. While correlated-rates models have been advocated for in past studies (e. g., [S60]), the basic autocorrelation method implemented in MCMCtree can result in unrealistically high or low rates, particularly if the number of taxa included in the study is high and if the root age is old (also see [S61, S62]). Supplemental References S1. Pleijel, F., Jondelius, U., Norlinder, E., Nygren, A., Oxelman, B., Schander, C., Sundberg, P., and
Thollesson, M. (2008). Phylogenies without roots? A plea for the use of vouchers in molecular phylogenetic studies. Mol. Phylogenet. Evol. 48, 369–371.
S2. Misof, B., Liu, S., Meusemann, K., Peters, R.S., Donath, A., Mayer, C., Frandsen, P.B., Ware, J., Flouri, T., Beutel, R.G., et al. (2014). Phylogenomics resolves the timing and pattern of insect evolution. Science 346, 763–767.
S3. Petersen, M., Meusemann, K., Donath, A., Dowling, D., Shanlin, L., Peters, R.S., Podsiadlowski, L., Vasilikopoulos, A., Zhou, X., Misof, B., et al. (2017). Orthograph: a versatile tool for mapping coding nucleotide sequences to clusters of orthologous genes. BMC Bioinformatics 18, 111.
S4. Xie, Y., Wu, G., Tang, J., Luo, R., Patterson, J., Liu, S., Huang, W., He, G., Gu, S., Li, S., et al. (2014). SOAPdenovo-Trans: de novo transcriptome assembly with short RNA-Seq reads. Bioinformatics 30, 1660–1666.
S5. Mayer, C., Sann, M., Donath, A., Meixner, M., Podsiadlowski, L., Peters, R.S., Petersen, M., Meusemann, K., Liere, K., Wägele, J.-W., et al. (2016). BaitFisher: a software package for multispecies target DNA enrichment probe design. Mol. Bio. Evol. 33, 1875–1886.
S6. Camacho, C., Coulouris, G., Avagyan, V., Ma, N., Papadopoulos, J., Bealer, K., and Madden, T.L. (2009). BLAST+: architecture and applications. BMC Bioinformatics 10, 421.
S7. Waterhouse, R.M., Tegenfeldt, F., Li, J., Zdobnov, E.M., and Kriventseva, E.V. (2013). OrthoDB: a hierarchical catalog of animal, fungal and bacterial orthologs. Nucleic Acids Res. 41, D358–D365.
S8. Nygaard, S., Zhang, G., Schiøtt, M., Li, C., Wurm, Y., Hu, H., Zhou, J., Ji, L., Qiu, F., Rasmussen, M., et al. (2011). The genome of the leaf-cutting ant Acromyrmex echinatior suggests key adaptations to advanced social life and fungus farming. Genome Res. 21, 1339–1348.
S9. The Honeybee Genome Sequencing Consortium (2006). Insights into social insects from the genome of the honeybee Apis mellifera. Nature 443, 931–949.
S10. Elsik, C.G., Worley, K.C., Bennett, A.K., Beye, M., Camara, F., Childers, C.P., de Graaf, D.C., Debyser, G., Deng, J., Devreese, et al. (2014). Finding the missing honey bee genes: lessons learned from a genome upgrade. BMC Genomics 15, 86.
S11. Bonasio, R., Zhang, G., Ye, C., Mutti N.S., Fang, X., Qin, N., Donahue, G., Yang, P., Li, Q., Li, C., et al. (2010). Genomic comparison of the ants Camponotus floridanus and Harpegnathos saltator. Science 329, 1068–1071.
S12. Werren, J.H., Richards, S., Desjardins, C.A., Niehuis, O., Gadau, J., and Colbourne, J.K. (2010). The Nasonia Genome Working Group, Functional and evolutionary insights from the genomes of three parasitoid Nasonia species. Science 327, 343–348.
S13. Tribolium Genome Sequencing Consortium (2008). The genome of the model beetle and pest Tribolium castaneum. Nature 452, 949–955.
S14. Slater, G.S., and Birney, E. (2005). Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics 6, 31.
S15. Katoh, K., and Standley, D.M. (2013). MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30, 772–780.
S16. Suyama, M., Torrents, D., and Bork, P. (2006). PAL2NAL: Robust conversion of protein sequence alignments into the corresponding codon alignments. Nucleic Acids Res. 34, W609–W612.
S17. Finn, R.D., Mistry, J., Schuster-Böckler, B., Griffiths-Jones, S., Hollich, V., Lassmann, T., Moxon, S., Marshall, M., Khanna, A., Durbin, R., et al. (2006). Pfam: clans, web tools and services. Nucleic Acids Res. 34, D247–D251.
S18. Finn, R.D., Bateman, A., Clements, J., Coggill, P., Eberhardt, R.Y., Eddy, S.R., Heger, A., Hetherington, K., Holm, L., Mistry, J., et al. (2014). Pfam: the protein families database. Nucleic Acids Res. 42, D222–D230.
S19. Eddy S.R. (2011). Accelerated profile HMM searches. PLOS Comput. Biol. 7, e1002195. S20. Meusemann, K., von Reumont, B.M., Simon, S., Roeding, F., Strauss, S., Kück, P., Ebersberger, I.,
Walzl, M., Pass, G., Breuers, S., et al. (2010). A phylogenomic approach to resolve the arthropod tree of life. Mol. Biol. Evol. 27, 2451–2464.
S21. Li, B., Lopes, J.S., Foster, P.G., Embley, T.M., and Cox, C.J. (2014). Compositional biases among synonymous substitutions cause conflict between gene and protein trees for plastid origins. Mol. Biol. Evol. 31, 1697–1709.
S22. Kück, P., Meusemann, K., Dambach, J., Thormann, B., von Reumont, B.M., Wägele, J.W., and Misof, B. (2010). Parametric and non-parametric masking of randomness in sequence alignments can be improved and leads to better resolved trees. Front. Zool. 7, 10.
S23. Misof, B., Meyer, B., von Reumont, B.M., Kück, P., Misof, K., and Meusemann, K. (2013). Selecting informative subsets of sparse supermatrices increases the chance to find correct trees. BMC Bioinformatics 14, 348.
S24. Lanfear, R., Calcott, B., Kainer, D., Mayer, C., and Stamatakis, A. (2014). Selecting optimal partitioning schemes for phylogenomic datasets. BMC Evol. Biol. 14, 82.
S25. Hurvich, C.M., and Tsai, C.-L. (1989). Regression and time series model selection in small samples. Biometrika 76, 297–307.
S26. Akashi, H., and Kliman, R.M. (1998). Mutation pressure, natural selection, and the evolution of base composition in Drosophila. In Mutation and Evolution, R. Woodruff, J. Thompson, A. Eyre-Walker, eds. (New York: Springer), pp. 49–60.
S27. Li, B., Lopes, J.S., Foster, P.G., Embley, T.M., and Cox, C.J. (2014). Compositional biases among synonymous substitutions cause conflict between gene and protein trees for plastid origins. Mol. Biol. Evol. 31, 1697–1709.
S28. Kozlov, A.M., Aberer, A.J., and Stamatakis, A. (2015). ExaML version 3: a tool for phylogenomic analyses on supercomputers. Bioinformatics 31, 2577–2579.
S29. Stamatakis, A. (2006). RAxML-VI-HPC: Maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22, 2688–2690.
S30. Felsenstein, J. (1985). Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39, 783–791
S31. Pattengale, N.D., Alipour, M., Bininda-Emonds, O.R., Moret, B.M., and Stamatakis, A. (2010). How many bootstrap replicates are necessary? J. Comput. Biol. 17, 337–354.
S32. Aberer, A.J., Kobert, K., and Stamatakis, A. (2014). ExaBayes: massively parallel Bayesian tree inference for the whole-genome era. Mol. Biol. Evol. 31, 2553–2556.
S33. Aberer, A.J., Krompass, D., and Stamatakis, A. (2013). Pruning rogue taxa improves phylogenetic accuracy: an efficient algorithm and webservice. Syst. Biol. 62, 162–166.
S34. Dell’Ampio, E., Meusemann, K., Szucsich, N.U., Peters, R.S., Meyer, B., Borner, J., Petersen, M., Aberer, A.J., Stamatakis, A., Walzl, M.G., et al. (2014). Decisive data sets in phylogenomics: Lessons from studies on the phylogenetic relationships of primarily wingless insects. Mol. Biol. Evol. 31, 239–249.
S35. Peters, R.S., Meusemann, K., Petersen, M., Mayer, C., Wilbrandt, J., Ziesmann, T., Donath, A., Kjer, K.M., Aspöck, U., Aspöck, H., et al. (2014). The evolutionary history of holometabolous insects inferred from transcriptome-based phylogeny and comprehensive morphological data. BMC Evol. Biol. 14, 52.
S36. Strimmer, K., Haeseler, A. von (1997). Likelihood-mapping: a simple method to visualize phylogenetic content of a sequence alignment. Proc. Natl. Acad. Sci. U.S.A. 94, 6815–6819.
S37. Aron, S., de Menten, L., Van Bockstaele, D.R., Blank, S.M., and Roisin, Y. (2005). When hymenopteran males reinvented diploidy. Curr. Biol. 15, 824–827.
S38. Schulmeister, S. (2002). Simultaneous analysis of basal Hymenoptera (Insecta): introducing robust-choice sensitivity analysis. Biol. J. Linnean Soc. 79, 245–275.
S39. Heath, T.A., Zwickl, D.J., Kim, J., and Hillis, D.M. (2008). Taxon sampling affects inferences of macroevolutionary processes from phylogenetic trees. Syst. Biol. 57, 160–166.
S40. Betancur-R R., Li, C., Munroe, T.A., Ballesteros, J.A., and Ortí, G. (2013). Addressing gene tree discordance and non-stationarity to resolve a multi-locus phylogeny of the flatfishes (Teleostei: Pleuronectiformes). Syst. Biol. 62, 763–785.
S41. Parham, J.F., Donoghue, P.C.J., Bell, C.J., Calway, T.D., Head, J.J., Holroyd, P.A., Inoue, J.G., Irmis, R.B., Joyce, W.G., Ksepka, D.T., et al. (2012). Best practices for justifying fossil calibrations. Syst. Biol. 61, 346–359.
S42. Rasnitsyn A.P. (1964). New Triassic Hymenoptera of the Middle Asia. Paleontol. J. 1, 88–89. S43. Engel, M.S., and Grimaldi, D.A. (2009). Diversity and phylogeny of the Mesozoic wasp family
Stigmaphronidae (Hymenoptera: Ceraphronoidea). Denisia 26, 53–68. S44. Engel, M.S. (2016). Notes on Cretaceous amber Braconidae (Hymenoptera), with description of two
new genera. Novit. Paleoentomol. 15, 1–7. S45. Liu, Z., Engel, M.S., and Grimaldi, D. (2007). Phylogeny and geological history of the cynipoid
wasps (Hymenoptera: Cynipoidea). Am. Mus. Novit. 3583, 1–48. S46. Shih C.K., Liu, C.X., and Ren, D. (2009). The earliest fossil record of pelecinid wasps (Insecta:
Hymenoptera: Proctotrupoidea: Pelecinidae) from Inner Mongolia, China. Ann. Entomol. Soc. Am. 102, 20–38.
S47. Deans, A.R., Basibuyuk, H.H., Azar, D., and Nel, A. (2004). Descriptions of two new Early Cretaceous (Hauterivian) ensign wasp genera (Hymenoptera: Evaniidae) from Lebanese amber. Cretaceous Res. 25, 509–516.
S48. Azevedo, C.O., and Azar, D. (2012). A new fossil subfamily of Bethylidae (Hymenoptera) from the Early Cretaceous Lebanese amber and its phylogenetic position. Zoologia 29, 210–218.
S49. Darling, D.C., and Sharkey, M.J. (1990). Order Hymenoptera. In Insects from the Santana Formation, Lower Cretaceous, of Brazil, D. Grimaldi, ed. (New York: Bull. Am. Mus. Nat. Hist. 195), pp. 123–153.
S50. Rodriguez, J., Waichert, C., von Dohlen, C., Poinar Jr, G., and Pitts, J. (2016). Eocene and not Cretaceous origin of spider wasps (Pompilidae): fossil evidence from amber. Acta Palaeontol. Pol. 61, 89–96.
S51. Antropov, A. (2000). Digger wasps (Hymenoptera, Sphecidae) in Burmese amber. Bull. Nat. Hist. Mus., Ser. Geol. 56, 59–78.
S52. Michez, D., Nel, A., Menier, J.-J., and Rasmont, P. (2007). The oldest fossil of a melittid bee (Hymenoptera: Apiformes) from the early Eocene of Oise (France). Zool. J. Linn. Soc. 250, 701–709.
S53. Michez, D., De Meulemeester, T., Rasmont, P., Nel, A., and Patiny, S. (2009). New fossil evidence of the early diversification of bees: Paleohabropoda oudardi from the French Paleocene (Hymenoptera, Apidae, Anthophorini). Zool. Scripta 38, 171–181.
S54. Wappler, T., De Meulemeester, T., Aytekin, A.M., Michez, D., and Engel, M.S. (2012). Geometric morphometric analysis of a new Miocene bumble bee from the Randeck Maar of southwestern Germany (Hymenoptera: Apidae). Syst. Entomol. 37, 784–792.
S55. Dlussky, G.M., and Rasnitsyn, A.P. (2009). Ants (Insecta: Vespida: Formicidae) in the Upper Eocene Amber of Central and Eastern Europe. Paleontol. J. 43, 1024–1042.
S56. Yang, Z. (2007). PAML 4: a program package for phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24, 1586–1591.
S57. Dos Reis, M., and Yang Z. (2011). Approximate likelihood calculation on a phylogeny for Bayesian estimation of divergence times. Mol. Biol. Evol. 28, 2161–2172.
S58. Inoue, J.P.C., Donoghue, H., and Yang, Z. (2010). The impact of the representation of fossil calibrations on Bayesian estimation of species divergence times. Syst. Biol. 59, 74–89.
S59. Linder, M., Britton, T., and Sennblad, B. (2011). Evaluation of Bayesian models of substitution rate evolution – parental guidance versus mutual independence. Syst. Biol. 60, 329-342.
S60. Lepage, T., Bryant, D., Phillipe, H., and Lartillot, N. (2007). A general comparison of relaxed molecular clock models. Mol. Biol. Evol. 24, 2669-2680.
S61. Ho, S.Y.W., and Duchene, S. (2014). Molecular-clock models for estimating evolutionary rates and timescales. Mol. Ecol. 23, 5947-5965.
S62. dos Reis, M., Donoghue, P.C.J., and Yang, Z. (2016). Bayesian molecular clock dating of species divergences in the genomics era. Nature. Rev. Genet. 17, 71–80.