draft - university of toronto t-space · draft 6 level and were left as bacidina arnoldiana...
TRANSCRIPT
Draft
Barcoding lichen-forming fungi using 454 pyrosequencing is
challenged by artifactual and biological sequence variation
Journal: Genome
Manuscript ID gen-2015-0189.R2
Manuscript Type: Article
Date Submitted by the Author: 29-Apr-2016
Complete List of Authors: Mark, Kristiina; Swiss Federal Research Institute WSL; University of Tartu, Institute of Ecology and Earth Sciences Cornejo, Carolina; Swiss Federal Research Institute WSL Keller, Christine; Swiss Federal Research Institute WSL Flück, Daniela; Swiss Federal Research Institute WSL Scheidegger, Christoph; Swiss Federal Research Institute WSL
Keyword: 454 pyrosequencing, Internal Transcribed Spacer, Lichenized fungi, Intragenomic variation, DNA barcoding
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
1
Barcoding lichen-forming fungi using 454 pyrosequencing is challenged by
artifactual and biological sequence variation
Kristiina Mark1, 2*, Carolina Cornejo1, Christine Keller1, Daniela Flück1, Christoph
Scheidegger1
1Biodiversity and Conservation Biology, Swiss Federal Research Institute WSL,
Switzerland
2Institute of Botany and Ecology, University of Tartu, Estonia
* Corresponding author: Kristiina Mark; e-mail: [email protected]
Page 1 of 70
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
2
Abstract
Although lichens (lichen-forming fungi) play an important role in the ecological integrity
of many vulnerable landscapes, only a minority of lichen-forming fungi have been
barcoded out of the currently accepted ~18,000 species. Regular Sanger sequencing can
be problematic when analyzing lichens since saprophytic, endophytic, and parasitic fungi
live intimately admixed, resulting in low-quality sequencing reads. Here, high-throughput,
long-read 454 pyrosequencing in a GS FLX+ System was tested to barcode the fungal
partner of 100 epiphytic lichen species from Switzerland using fungal-specific primers
when amplifying the full Internal Transcribed Spacer region (ITS). The present study
shows the potential of DNA barcoding using pyrosequencing, in that the expected lichen
fungus was successfully sequenced for all samples except one. Alignment solutions such
as BLAST were found to be largely adequate for the generated long reads. In addition,
the NCBI nucleotide database—currently the most complete database for lichen-forming
fungi—can be used as a reference database when identifying common species, since the
majority of analyzed lichens were identified correctly to the species or at least to the
genus level. However, several issues were encountered, including a high sequencing error
rate, multiple ITS versions in a genome (incomplete concerted evolution), and in some
samples the presence of mixed lichen-forming fungi (possible lichen chimeras).
Keywords: 454 pyrosequencing, DNA barcoding, Intragenomic variation, Internal
transcribed spacer, Lichenized fungi
Page 2 of 70
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
3
Introduction
Lichens are intimate and long-term symbiotic associations consisting of a heterotrophic
fungal partner—also called the mycobiont—and photosynthetic algae or cyanobacteria—
also called the photobiont (Nash III 2008). In a systematic context, lichens are named
after the fungal partner, and according to most recent estimates, about 18% of all fungal
species are lichen-forming (Feuerer & Hawksworth 2007). While lichens include many
bio-indicators for monitoring environmental quality—including air pollution and
ecological integrity of forest landscapes (Nimis et al. 2002)—accurate identification of
lichenized fungal species remains challenging (Lumbsch and Leavitt 2011). DNA-based
specimen identification to a species level is useful in a system of well-circumscribed taxa
and a high-quality reference database (Seifert 2009, Begerow et al. 2010). Recently, the
internal transcribed spacer region (ITS) of the nuclear ribosomal RNA cistron was
proposed as the primary fungal barcode marker (Schoch et al. 2012). However, only a
minority of lichen-forming fungi have been barcoded from the estimated 17,500 species
(Feuerer & Hawksworth 2007). For example, the largest and most typical order of lichen-
forming fungi (Lecanorales) consists of about 5,700 species, but only about 29% of them
have publicly available ITS sequences in sequence databases such as the National
Institute of Health’s (NIH) genetic sequence database, GenBank
(http://www.ncbi.nlm.nih.gov/genbank/; accessed 28 September 2015), the Barcode of
Life Data System (BOLD; www.boldsystems.org; Ratnasingham and Hebert 2007), and
UNITE (https://unite.ut.ee/; accessed 24 November 2015; Abarenkov et al. 2010).
Page 3 of 70
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
4
Barcoding lichens using Sanger sequencing has been successful in some groups of
lichens (Kelly et al. 2011, Divakar et al. 2015), especially with foliose and fruticose
lichens; however, in crustose lichens—that constitute by far the vast majority of
lichenized species (Bergamini et al. 2005)—it often proves quite challenging (Flück
2012). Sampling difficulties occur where other very similar lichen species live mixed or
close by, and many saprophytic, endophytic, and parasitic fungi also live intimately
admixed with the lichen mycobiont, making the application of Sanger sequencing
insufficient in many cases (Flück 2012, Orock et al. 2012). A limited number of studies
are known to have successfully applied pyrosequencing to recover the identity of a lichen,
when Sanger sequencing failed to produce usable results (Hodkinson and Lendemer 2013,
Lücking et al. 2014a).
Recent advancements in pyrosequencing methods now allow the amplification of
fragments up to 1,000 basepairs (bp) in the GS FLX+ system of Roche/454
pyrosequencing. However, 454 pyrosequencing is notorious for its high indel (short
insertions and deletions) error rate in homopolymeric regions (three or more identical
nucleotites) and carry-forward-incomplete-extension (CAFIE) errors (Margulies et al.
2005, Huse et al. 2007, Gilles et al. 2011, Lücking et al. 2014b). In addition to artifactual
sequence variation, biological sequence variation—such as intragenomic and/or intra-
mycelial (i.e. allelic heterozygosity) variation of this multicopy gene—may be possible
(Wörheide et al. 2004, Simon and Weiß 2008, Linder et al. 2013).
Page 4 of 70
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
5
The aim of our study was to test whether lichen mycobiont can be successfully identified
to a single species using DNA barcoding through pyrosequencing the ITS marker of 100
species of both foliose-fruticose and crustose lichens, using fungal-specific markers in the
high-throughput 454 sequencing in the GS FLX+ system, while elucidating and
quantifying artifactual and biological sequence variation.
Materials and Methods
Taxon sampling and morphology-based identification
One hundred lichen specimens, including 52 crustose species and 48 macrolichens
(foliose and fruticose thalli), were collected from different locations in Switzerland in
2011 and 2014. The full list of specimens with voucher information is given in Table 1.
Additional specimen data—such as the collection GPS coordinates, substrate, specimen
photographs, the primary ITS barcodes—are available though a public BOLD dataset
DOI: dx.doi.org/10.5883/DS-LICODE. All project data can also be accessed though the
PlutoF cloud database under a project “Barcoding Swiss lichens using 454
pyrosequencing” (https://plutof.ut.ee/#/study/view/31890). Specimen identifications of
the collected material were based on morphological and chemical characters, following
the nomenclature of Clerc and Truong (2012). Lichen chemistry was examined with
standardized thin layer chromatography (TLC) using solvent systems A, B, and C
(Culberson and Ammann 1979, White and James 1985). Sorediate Bacidia and Lecanora
species without apothecia, samples #6, #7, and #42, could not be identified to the species
Page 5 of 70
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
6
level and were left as Bacidina arnoldiana aggregate (aggr.) and Lecanora strobilina
aggr., respectively. Specimen LC-088, determined as Lecidella sp., includes an
undescribed chemotype for Lecidella (arthothelin, 2,4-dichloronorlichexanthone), and
BC-47-1 Lecanora sp. morphologically resembles Haematomma ochroleucum or Biatora
flavopunctata, with atranorin, usnic acid, and zeorin identified as secondary substances.
Specimens with poorly expressed diagnostic characters were re-checked in light of
barcoding data, and the determination of five specimens was corrected (LC-023 Lecanora
cf. umbrina ⇒ Lecania cf. cyrtella, BC-027-1 Haematomma aff. ochroleucum or Biatora
flavopunctata ⇒ Lecanora sp., BC-155-5 Lecidea nylanderi ⇒ Loxospora elatina, BC-
161-5 Lecanora farinaria/expersa ⇒ Loxospora elatina, KM-03-02 Usnea florida ⇒
Usnea intermedia). All specimens are stored at the WSL Swiss Federal Research Institute
at –20°C.
Molecular methods
About 3–5 mg of visually uncontaminated lichen thallus of each specimen was sampled
for molecular analyses. In sampling, vegetative thallus was preferred, but in some
apotheciate crustose species with no visible vegetative thallus, 1–2 apothecia were used
for DNA extraction. Frozen lichen samples were lyophilized and disrupted with a
stainless steel bead in a Retsch MM2000 mill (Düsseldorf, Germany) for two min at 30
Hz. The full genomic DNA was extracted using the Qiagen DNEasy Plant Mini Kit
(QIAGEN, Hilden, Germany) following the manufacturer’s Plant Tissue Mini Protocol
and diluted in 50µL elution buffer. A two-step Polymerase Chain Reaction (PCR)
approach was used for unidirectional amplicon sequencing. First, the full ITS was
Page 6 of 70
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
7
amplified via PCR by using short fungal-specific primers, ITS1F (Gardes and Bruns
1993) and ITS4 (White et al. 1990). Each PCR reaction (25µl) consisted of 2× KAPA
HiFi HotStart ReadyMix PCR mix (KAPA Biosystems), 5 µM of each of the forward and
reverse primer, and 1 µL of template DNA. Thermal cycle conditions included the initial
denaturation at 95°C for 2 min; 35 cycles of denaturation for 20 sec at 98°C, primer
annealing for 15 sec at 57°C, and extension for 20 sec at 72°C; and final extension for 3
min at 72°C. The PCR products were visualized on 1.5% agarose gel and purified via
Exo-SAP (Fermentas) treatment. In the second PCR step, the products were re-amplified
with full-length fusion primers designed by Microsynth AG (Balgach, Switzerland). The
fusion primers contained the GS FLX Titanium A- and B-adapters of the kit Lib-L
(Roche Diagnostics), the Multiplex Identifier (MID) sequence, four base library key
sequence (TCAG), and the same sample-specific primers used in the first step. The MID
tags were used only on the forward, A-adapter for unidirectional sequencing.
Amplifications were conducted in 50 µL reaction volumes containing 39 µL of
mastermix with reagents 5× KAPA HiFi Buffer, 10 mM KAPA dNTP Mix, 2U/100µL
KAPA HiFi HotStart DNA Polymerase (KAPA HiFi HotStart PCR Kit, KAPA
Biosystems), 4 µM of each of the forward and reverse primer, and 1 µL of template DNA.
The cycle conditions were: initial denaturation at 95°C for 3 min; 12 cycles of
denaturation at 98°C for 20 sec, primer annealing at 56°C for 30 sec, and elongation at
72°C for 30 sec; and a final elongation step at 72°C for 5 min. The aliquots were then
purified via AMPure Beads XP (Beckman Coulter, Brea, CA, USA). Concentrations of
purified PCR products were quantified by fluorometry using the Quant-iT™ PicoGreen®
dsDNA Assay Kit (Molecular Probes, Eugene, OR, USA). Probes were pooled in four
Page 7 of 70
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
8
equimolar pools (25 each), purified in 1.5% preparative gel, and run on 4/16 of a
sequencing plate using Titanium FLX+ reagents on a GS Roche Sequencer (454
technology, Roche Diagnostics). Pre-sequencing steps (starting from the second PCR),
sequencing, raw data processing using #3 pipeline preconfigured by Roche for long
amplicons, and sorting the resulting reads into samples based on their MID’s
(demultiplexing) were all carried out by Mircosynth AG.
Data processing and analyses
We followed a traditional and rather conservative approach when sorting and clustering
sequences. Reads with a non-matching forward primer, that were shorter than 500 bp,
included more than two ambiguous sites, and/or with a mean quality score lower than 25
were disregarded using the programs Cutadapt v1.7 (Martin 2011) and PRINSEQ-lite
v0.20.4 (Schmieder and Edwards 2011). The quality-sorted reads were screened for
chimeras using the ‘uchime_ref’ command in the sequence analysis tool USEARCH
v8.0.1623 (Edgar et al. 2011), searching against the UNITE/INSDC reference database of
fungal ITS sequences (Nilsson et al. 2015) available at http://unite.ut.ee/repository.php
(accessed 26 July 2015). The sequence data were not denoised, and singleton sequences
were included to better understand the artifactual and biological sequence variation. The
remaining reads for each sample were sorted by length, then clustered with USEARCH at
a 95% similarity threshold using the ‘centroids’ function in the UCLUST algorithm
(Edgar 2010). This approach assumes that the longest read of a cluster is the most
appropriate and assures that the longest fragment of the full ITS marker will be used for
downstream analyses. The less stringent clustering was opted for our data to compensate
Page 8 of 70
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
9
for a high sequence error rate in the 454 system and to better reflect biological units,
based on preliminary analyses where the suggested 97% threshold (see Blaalid et al.
2013) was applied. The resulting centroid sequences were compared against the NCBI
nucleotide database (Coordinators 2013) using the ‘blastn’ algorithm to obtain their
initial taxonomic affiliations.
Even though the Roche GS FLX+ system is capable of sequencing fragments up to 1000
bp long, the average read length remains between 500-600 bp (Erguner et al. 2015). The
quality of the barcodes becomes an issue towards the end of the read as the base-call
quality drops, and the further away from the start, the more ambiguous the base calling
becomes (Fig. 1). The full ITS region in fungi has an average length of 500 bp in
ascomycetes (Porter and Golding 2011). Based on the available sequences of the studied
fungi, the barcoded ITS region is usually more than 550 bp long, and depending on the
presence and length of the group I intron, it can be more than 1,000 bp long.
Consequently, ITS barcodes of more than 500-600 bp get fewer full-length target reads
with low-quality sequence ends. Furthermore, various sequencing and PCR errors, such
as homopolymer indel errors, CAFIE errors, and chimeras, are frequent and not always
detected using the quality scores (reviewed in, for example, Margulies et al. 2005, Huse
et al. 2007, Gilles et al. 2011, Lücking et al. 2014b).
To obtain a more reliable species reference sequence, the reads of the target taxon that
were concordant with the quality parameters specified above were aligned using the
MAFFT v7.017 automatic algorithm (Katoh and Standley 2013) in the Geneious v7.1.6
platform (Kearse et al. 2012). To gather the relevant sequences from a pool, we used the
Page 9 of 70
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
10
cluster identifications from a previous step. Sequences within the target species (the
expected mycobiont of the observed and determined lichen species) cluster(s)—identified
as the taxon and/or closely related taxon (for closely related species groups) using the
GenBank BLAST function—were gathered and aligned together with reference
sequences from GenBank. Clear outliers in the alignment were removed, the primer
binding sites were trimmed, and the remaining region—including the end of ribosomal
RNA gene 18S, ITS1, 5.8S rRNA gene, ITS2, the beginning of the 28S rRNA gene, and
in some species, also the group I intron at the end of the 18S gene—was designated as the
barcode region for the species. Nucleotide diversity was estimated for alignments with
removed reference sequences using DnaSP v5.10.1 (Librado and Rozas 2009). The
consensus sequence of this region was assigned as the barcode for this species. The
barcode and the representative centroid sequence—the longest read of the biggest cluster
of the taxon—were aligned using the same alignment function in MAFFT, to estimate
sequence similarity and the quantity of indels and nucleotide differences. Using the
consensus sequence approach allowed us to ignore much of the random noise within
sequences, but it is dependent on the accuracy of a given alignment. Alternatively to
MAFFT, we applied the PaPaRa alignment (Berger and Stamatakis 2011) to better
account for erroneous insertions. Finally, however, PaPaRa alignments were not used due
to multiple reasons, discussed and exemplified in the online supplementary Data S1.
We tested the identification of the generated barcodes using the NCBI nucleotide
database, currently the most complete database for lichen-forming fungi. The barcodes
were compared against the database using the ‘megaBLAST’ function in GenBank
Page 10 of 70
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
11
(http://www.ncbi.nlm.nih.gov/genbank/; Accessed 11 January 2015; Madden 2002). To
evaluate the success of specimen identification using DNA barcoding we only counted
the identity of the best match in GenBank BLAST, ignoring the sequence similarity since
no singe meaningful threshold can be applied over the species set, and for many of the
taxa, too little or no information is known for accurate threshold estimation. We
compared the morphological species determination with the best match (= with the
highest bit score) from the barcode sequence megaBLAST search and recorded for each
specimen if the identification via DNA barcoding resulted in concordant results at the
species or the genus level (= correct species / = correct genus but different species or
species not present in the database), divergent (= different genus and species), or neutral
results (= unidentified or uncultured fungus). In cases where the hypothesized identity
differed from the GenBank match, or the similarity between the GenBank sequences and
our sequence was less than 97% (following Blaalid et al. 2013), a phylogeny-based
identification was used. To achieve this, available closely-related representative ITS
sequences of the same genus or species group (depending on the known information of
the species phylogeny) were aligned with the barcode using the program MAFFT v7
(Katoh and Standley 2013), where the G-INS-i alignment algorithm (Katoh and Toh
2008) with ‘1PAM / K=2’ scoring matrix was set. The alignments of ITS sequences were
analyzed using a maximum likelihood (ML) criterion in the program RAxML v7.3.1
(Stamatakis 2006) where the evolutionary model was set to GTRGAMMA, and node
support was assessed using 1,000 "fastbootstrap" replicates (Stamatakis et al. 2008).
Page 11 of 70
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
12
Where distinct nucleotide variation—possibly representing intragenomic variation of ITS
sequences or a mixture of individuals of the same or closely related species—was noted,
the target species alignments were studied carefully, and different ITS version
frequencies were counted. In cases where (a) less-dominant version(s) of ITS was/were
higher in frequency than 10% of the target reads, consensus barcode(s) for the less-
dominant versions were also generated. The most frequent barcode version was assigned
as the primary barcode for the specimen. The different barcode versions were aligned
using the MAFFT v7.017 automatic algorithm (Katoh and Standley 2013), and sequence
variation, nucleotide diversity, and distances (p-distances, with indels treated as missing
data with pairwise deletion) were estimated using DnaSP v5.10.1 (Librado and Rozas
2009) and MEGA5 (Tamura et al. 2011). More distant barcodes of a sample—possibly
resulting from a mixture of closely related species—were aligned with closely related
representative ITS sequences (for the species list and references see Table S1 in online
supplementary material), and phylogenetic trees were constructed using the ML approach
executed with RAxML v7.3.1 as described above. In some samples, unalignable, up to
ten-bp-long regions were detected. Since these regions were usually within or after GC-
rich areas, we suspected a problematic PCR accompanied by CAFIE errors, rather than
genuine mutations (Dutton et al. 1993). A known issue for GC-rich islands is knot
formations in the RNA fold. The RNA secondary structure for such problematic regions
was estimated using RNAstructure web servers (available at:
http://rna.urmc.rochester.edu/RNAstructureWeb; accessed 18 November 2015) utilizing
the default parameters (Bellaousov et al. 2013).
Page 12 of 70
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
13
Results
The sequencing of 100 lichens resulted in 128,449 reads. The number of reads per sample
varied between 207 and 5,473, with an average of 1,285. A general overview of
sequencing reads is given in Fig. 1. Through quality sorting and chimera checking, on
average 21% of the reads were removed, varying from 9% to 57% among samples.
Clustering of reads at 95% resulted in an average of 88 clusters per sample (minimum –
mean±standard deviation – maximum; 5 – 88±57 – 231). Despite conservative quality
sorting, high nucleotide variation per sample was present in the data. While a high
proportion of variation resulted from contaminant fungi within samples, significant
nucleotide variation within the target taxon is also present (Table 2). The mean number of
clusters per target species was 8 (0 – 8±7 – 32), and the average nucleotide diversity
within a target was 0.01128±0.01862. The pairwise identity between the consensus
sequence (barcode) and centroid sequence of the biggest cluster was on average 98.5%,
with 9.65 as the mean number of differences per sequence pair. Comparing centroid
sequences with consensus barcodes revealed a mean difference of 1.46%. About 90% of
the differences were insertions (69%) and deletions (21%) that were especially prominent
towards the end of the sequences, while the rest were nucleotide substitutions or CAFIE
errors.
From one hundred samples, we were able to recover the fungal ITS sequence of the
lichen mycobiont for all sequenced species except one. Fuscidea arboricola sequences
Page 13 of 70
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
14
were not recovered in its sequence pool. Full barcodes, including the end of 18S rRNA,
ITS1, 5.8S rRNA, ITS2, and the beginning 28S rRNA, were generated for all species,
except for the barcode of Anaptychia crinalis, which remained incomplete, with a partly
missing ITS1 sequence due to two long group I introns towards the end of 18S rDNA
(altogether 512 bp). To recover the end of the barcode, chimeric sequences without the
introns were used. Introns at the end of 18S rDNA were identified in 37 target taxa
(Table 1). The length of the intron varied from 60 to 594 bp. The primary barcode
sequences have been deposited in the NCBI and BOLD databases under the accession
codes as indicated in the Table 1. All project data—specimen voucher info, photographs,
the complete list of the barcode sequences (including the alternative versions), alignment
matrices, phylogenetic trees—can be accessed though the PlutoF cloud database under a
project “Barcoding Swiss lichens using 454 pyrosequencing”
(https://plutof.ut.ee/#/study/view/31890).
The barcodes from the targeted lichenized fungi were, in most cases, morphologically
and molecularly identified to the same species (n=69) or at least to the same genus (n=18)
using the NCBI nucleotide database as reference (Fig. 2; for details see online
supplementary material Table S2). Nine of the species correctly identified using
GenBank BLAST showed less than 97% similarity, which could indicate wider ITS
genetic variation of the species or even the presence of cryptic species. ITS sequences of
14 of the studied species were previously not available in GenBank, this being one of the
reasons for imprecise identifications. For eight species, the best match in GenBank was
Page 14 of 70
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
15
an “uncultured fungus”, originating from environmental metabarcoding studies where
morphology-based identifications and taxonomic assignments are often not performed.
Nine samples were identified to the same genus but to a different species than that based
on morphology, and one sample to a different genus based on a high identity score
(≥97%). These species were: Anaptychia crinalis (identified as Anaptychia ciliaris,
98.0%), Bryoria capillaris (identified as Bryoria fuscescens, 98.8%), Lecanora impudens
(identified as Lecanora allophana, 99.2%), Loxospora elatina (identified as Loxospora
ochrophaea, 99.7%), Parmelia sulcata (identified as Myelochroa auruleanta, 99.7%),
Parmotrema perlatum (identified as Parmotrema crinitum, 98.7%), and Usnea barbata
and U. substerilis (identified as Usnea intermedia/rigida, >98.0%). We found that these
misidentifications had three main causes – (a) labelling or identification mistakes in the
NCBI nucleotide database, as in the case of Parmelia sulcata; (b) the incomplete
reference database with missing species or partial sequences, as in the case of Anaptychia
crinalis; (c) biological reasons, such as low genetic variation in the ITS region (e.g.,
Bryoria capillaris, Parmotrema perlatum, Usnea barbata, and U. substerilis).
Besides the expected mycobiont, many other fungi were recovered within our samples.
For 22 samples, the most-sequenced fungus was not the expected mycobiont but another
fungus instead. Many of these fungi were identified as lichen-associated (facultative
parasites/lichenicolous) or plant-associated (epi-or endophytes). However, the lichen-
associated fungal diversity within our sequenced lichens is not in the scope of the present
study and will not be discussed further.
Page 15 of 70
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
16
We studied the nucleotide variation of samples where (a) less frequent version(s) of ITS
constituted more than 10% of the expected mycobiont reads. Multiple barcodes per
species were generated for 22 samples (Table 3; for the full list of barcodes of such
samples refer to the online supplementary Table S3). The number of generated barcodes,
and distances between them, varied considerably from species to species. Barcode
versions with only indels as differences showed zero distance (Buellia arborea, Cladonia
digitata, Hypotrachyna laevigata). However, since the indels in these species were
present in the non-coding regions (group I intron, ITS1, or ITS2), they cannot be
completely ruled out as genuine mutations. In Lecanora subcarpinea, Lepraria lobificans,
and Menegazzia terebrata, a poorly alignable, up to 10-bp-long region was detected in
ITS1. These regions were positioned within or at the end of sequence regions with high G
+ C content (GC%), where formations of strong knots in RNA folds are likely. In
Lecanora subcaripinea, a region of six ambiguous bases occurs at the end of a 38-bp
region of 79.5% GC content. In Lepraria lobificans, five ambiguous bases occur at the
end of a 44-bp region with a GC% of 82.6%. In Menegazzia terebrata, in a 60-bp region,
with a GC% of 86.0%, an ambiguous region of nine bases occurs within the sequencing
reads. RNA secondary structure analyses revealed that the sequences in these areas form
a secondary structure knot with probabilities of >95% for Lecanora, >99% for Lepraria,
and >99% for Menegazzia. Online supplementary Figure S1 shows, by way of example,
the ITS1 RNA secondary structure of the primary barcode version and ambiguous bases
of this region in Menegazzia terebrata barcodes.
Page 16 of 70
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
17
The dominant ITS version of Physica adscendens is highly similar to sequences of this
species (99%; online supplementary material Table S2), while the second, less-frequent
version includes a high number of mostly unidirectional transitional mutations and is of
low similarity to Physcia adscendens sequences (87%). However, no better alignment
with any other sequence in GenBank was found. Altogether, 92 transitional mutations are
present among the two ITS versions (4 in 18S, 32 in the group I intron, 22 in ITS1, 9 in
5.8S, 18 in ITS2, and 7 in 28S). These mutations are almost exclusively unidirectional,
being mainly G⇒A and C⇒T.
Besides artificial nucleotide variation caused by PCR/sequencing errors, nucleotide
variation can occur for biological reasons, such as: (1) incomplete concerted evolution
within the ITS and (2) intraspecific variation. For 13 species, distinct nucleotide variation
could not be explained by PCR or CAFIE errors only (Table 3). By comparing the
variation within the reads of the mycobiont with available Sanger sequences from
GenBank of the same species (where available), we also found the same variable bases in
Sanger sequences in eight species. Therefore, it seems probable that intragenomic and/or
intra-mycelial (i.e. allelic heterozygosity) variation is present at least in the following
samples: #8 Bryoria capillaris, #55 Lepraria rigidula, #64 Melanohalea exasperata, #70
Parmelia sulcata, #79 Physcia adscendens, #95 Usnea intermedia, and #96 Usnea
lapponica. For eight species, such comparisons could not be made due to a lack of Sanger
sequences. Sequence variation possibly resulting from intragenomic or intra-mycelial
variation were additionally detected in #6 Bacidina arnoldiana, #12 Chaenotheca cf.
stemonea, #19 Fellhanera bouteillei, #30 Lecania cf. cyrtella, #50 Lecidella scabra, and
Page 17 of 70
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
18
#51 Lecidella sp., of which the first three are seemingly also accompanied with
intraspecific variation (Fig. 3.-5.). The number of barcodes, however, might not
necessarily represent the true intragenomic variation of a species. For example, seven
variable nucleotide positions were found in Usnea intermedia with no clear segregation
into distinct ITS types. In ITS1, three mutation positions (G⇔C, T⇔C, T⇔C) generated
three types, while in ITS2, four mutation positions (T⇔C, G<⇔A, A⇔C, T⇔A)
generated four types, of which three are dominant. These types were combined with each,
generating seven different barcode versions. Incomplete concerted evolution within ITS
is therefore possible. However, inflation of ITS types should also be considered due to
recombination or chimera formation between different ITS1 and ITS2 versions. The more
similar the parents, the harder it is to detect and identify chimeras. Therefore, it could just
as well be possible that fewer than seven different ITS versions are natural, while others
are artificial formations. The number of barcodes can also be inflated by systematic
CAFIE errors, which we have tried to account for when interpreting the reasons of base
variation of some species (e.g. #69 Parmelia saxatilis). For the full list of the alternative
barcodes and their hypothetical origin refer to the online supplementary Table S3.
The highest number of variable sites, as well as the greatest genetic distance between
barcodes, occurred in samples where multiple distinct lineages, probably representing
different species, were identified in the pool. These samples were crustose species
Bacidina arnoldiana aggr., Chaenotheca cf. stemonea, and Fellhanera bouteillei.
In Bacidina arnoldiana aggr. (sample #6), five different ITS versions from two
closely related Bacidina arnoldiana clades were identified (Fig. 4). The morphologically
Page 18 of 70
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
19
homogenous, yet variable species B. arnoldiana is therefore polyphyletic, comprising in
itself multiple lineages and possibly other taxa with unresolved taxonomy. The five ITS
versions from Bacidina arnoldiana belong in two major clusters, ‘B. arnoldiana 1’ and
‘B. arnoldiana 3’, and group together with other B. arnoldiana sequences. ‘B. arnoldiana
2’ includes only GenBank data and is more closely related to ‘B. arnoldiana 3’. Four ITS
versions cluster within ‘B. arnoldiana 1’, of which two are dominant, A1 and B1. These
are also almost equally present in the sequence pool (A1 in 65 reads, 42.6%; B1 in 45
reads, 41.7%). The subversions A1 and B1 have a pairwise identity of only 93.8%, and
based on genetic distances, clade ‘B. arnoldiana 1’ could therefore include cryptic
species. However, no final conclusions can be made of our limited sample size and data
based on ITS markers only.
In Chaenotheca cf. stemonea (sample #12), sequences from two Chaenotheca
“species” were present (Fig. 5) – C. cf. stemonea and C. trichialis-xyloxena group. The
first includes four highly similar (>98%) barcodes differing in multiple mutations and in
intron presence/absence towards the end of the 18S rDNA. The Chaenotheca trichialis-
xylogena group includes six barcodes that cluster further into three groups: versions B
together with a subgroup of C. trichialis, version A1 clusters within C. xyloxena, and
version A2 with a second subgroup of C. trichialis at the base of the clade. The distances
within subversions B are low (pairwise identity >98%), while between subversions B, A1,
and A2, distances are larger (pairwise identities between 96–97%).
Fellhanera bouteillei (sample #19) includes three ITS versions, of which two
highly similar types (identity 99%), A1 and A2, are dominant, while the third, less-
frequent version B is only present in three sequences. The less-dominant version is,
Page 19 of 70
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
20
however, significant as it represents the clade of Fellhanera beouteillei s.str. together
with sequences available in GenBank (Fig. 6). The dominant types A1 and A2 cluster
distantly outside of Fellhanera bouteillei and could represent a different species (pairwise
identity between barcode versions A1 and B is 89%).
Discussion
Barcoding lichens
In the scope of this study, we ‘barcoded’ 99 out of 100 lichen mycobiont species using
454 pyrosequencing. Good success in generating barcodes shows the high potential of
this method, especially concerning crustose lichens, which can be difficult to sequence
with the Sanger technique. To our knowledge, this is the first attempt to apply
pyrosequencing to lichen barcoding to such an extent.
The success of DNA-based identification depends greatly on the reference database, both
in the quality and quantity of the barcodes included. Purely DNA-based identification of
lichens—without considering morphological and chemical characters—is far from reach,
especially regarding crustose lichens and samples from poorly-investigated regions due to
missing references and unknown diversity (Orock et al. 2012). Within the present study,
we generated ITS barcodes for 12 described species whose sequences were previously
not available in GenBank and for two morphologically newly characterized specimens
(#41 Lecanora sp., #51 Lecidella sp.). Multiple species showed low intraspecific ITS
Page 20 of 70
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
21
sequence similarity to other available sequences in GenBank (e.g. #10 Bunodophoron
melanocarpum, #12 Chaenotheca cf. stemonea, #19 Fellhanera bouteillei, #30 Lecania cf.
cyrtella, #66 Micarea cinerea) or a high similarity to a different taxon (e.g. #8 Bryoria
capillaris, #13−16 Cladonia species, #36 Lecanora impudens, #47 Lecidella cf.
elaeochroma, #47 Lecidella cf. leprothalla, #58/59 Loxospora elatina, #74 Parmotrema
perlatum, #91/92 Usnea barbata; refer to the Table S2 and to the project in PlutoF for
further information). Before specimen identification via DNA barcoding can be
confidently applied, taxonomic studies in these groups are needed to confirm the species
monophyly or, in cases of non-monophyly, propose segregation of a taxon into multiple
species. However, our results also suggest that the NCBI nucleotide database, currently
the most complete database for lichen-forming fungi, could be used as a reference
database to identify common species, as the majority of the analyzed lichens were
identified correctly to the species or, at least, to the genus level.
When generating barcodes, we applied a rather conservative approach by generating
consensus sequences from target taxon alignments, rather than using centroid sequences.
In most cases, centroid reads were sufficiently accurate for correct specimen
identification to a species level. However, when producing a barcode database, even a
few errors can cause systemic mistakes in future identifications. Even though it is more
time consuming, this approach helps to better identify and avoid mistakes, such as those
caused by sequencing, undetected chimeras, and multiple ITS copies. However, the
propagation of errors cannot be avoided completely; they can originate directly from PCR
Page 21 of 70
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
22
amplification or be hindered by systematic sequencing errors, independent from data
processing methods.
Reasons for nucleotide variation
The source of nucleotide variation within a species can be the PCR, the sequencing, or be
of biological origin. While the first two sources generate artifactual variation, biological
origins represent genuine mutations.
We identified three main sources of artifactual variation in our samples.
(1) Chimeras are hybrid products of PCR formed by different template sequences.
Undetected chimeras lead to overestimated richness, artificial entities, and incorrect
taxonomic identifications. It has been shown that chimera formation is especially strong
in regions with alternating patterns between conserved and less-conserved regions, such
as 16S rRNA genes (Shin et al. 2014). In the ITS barcodes—including conserved 18S,
5.8S, and 28S rRNA exons, alternating with variable ITS1 and ITS2 regions—frequent
chimeras with switched ITS1 or ITS2 regions were observed (data not shown). Reads
with group I intron towards the end of 18S rRNA gene often included chimeras where the
intron had been excluded. Longer templates require longer extension times, which can
contribute to chimera formation (Haas et al. 2011, Shin et al. 2014). Chimera detection is
reasonably applicable in a closed system, where recombining gene variants are known.
While the vast majority of fungal diversity is still unknown, chimeras should be
considered as a real issue in diversity analyses where multiple templates are present in the
pool (Sharifian 2010).
Page 22 of 70
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
23
(2) High content of the nucleotides guanine and cytosine can impede
amplification and result in low-quality sequencing with ambiguous bases. This difficulty
is associated with poor denaturation of regions with high GC concentration, while
template secondary structure hinders the access of PCR primers and enzyme work at
elongation (Dutton et al. 1993). Even though high-temperature denaturation (98°C) and
high-fidelity heat-stable polymerase were used, the formation of strong knots in RNA
structure that impede correct amplification cannot be ruled out and might have affected
the ambiguous regions in the species Lecanora subcarpinea, Lepraria lobificans, and
Menegazzia terebrata.
(3) 454 pyrosequencing is notorious for its high indel error rate in homopolymeric
regions and carry-forward-incomplete-extension (CAFIE) errors (Margulies et al. 2005,
Huse et al. 2007, Gilles et al. 2011). Incomplete extension creates shorter homopolymers
due to insufficient dNTPs. Carry-forward errors are the result of incomplete dNTP
flushing, which then causes a nucleotide from the end of the homoploymer to be read
some bases later (e.g. a true sequence AAAATCG, can be read as AAATCGA). A
comparison between the consensus barcodes and a representative read (the centroid
sequence of the biggest cluster) showed a mean difference of 1.46%. About 90% of the
errors were indels. These were most prominent in homopolymeric regions and were
especially frequent towards the end of the sequence, while the rest were nucleotide
substitutions. Considering that in closely related species complexes, less than 2%
variation could already result in a different identification, the necessity to reliably
separate artificial from genuine variation becomes clear.
Page 23 of 70
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
24
Nucleotide variation in the ITS can represent genuine mutations when a species is
characterized by incomplete concerted evolution (Feliner and Rosselló 2007). We have
highlighted 12 species in Table 3 where detected nucleotide variation could represent
intragenomic variation within ITS, or alternatively, allelic heterozygosity caused by
differing nuclei within a single mycelium (Huang et al. 2010, Hyde et al. 2013). The
latter is more likely when only two primary allele variants in approximatly equal ratios
are observed (Linder et al. 2013). Two more or less equally represented ITS paralogs
were found in #1 Alectoria sarmentosa, #6 Bacidina arnoldiana, #8 Bryoria capillaris,
#43 Lecanora subcarpinea, and #70 Parmelia sulcata, while one dominant ITS version
together with relatively rare haplotypes was found in #5 Bacidia vermifera, #12
Chaenotheca cf. stemonea, #19 Fellhanera bouteillei, #30 Lecania cf. cyrtella, #50
Lecidella scabra, #51 Lecidella sp., #55 Lepraria rigidula, #64 Melanohalea exasperata,
#95 Usnea intermedia, and #96 Usnea lapponica. The latter list includes representatives
from a closely related species group in the genus Usnea (sect. Usnea; Mark et al. 2016),
where low genetic variation and possible rapid radiation of species have been
characterized (Truong et al. 2013, Kraichak et al. 2015,). However, four other Usnea
specimens from this closely related group were also included in our study, where such
nucleotide variation was not detected (#91/92 U. barbata, #94 U. intermedia, and #97 U.
substerilis). Whether or not intragenomic variation can be detected via pyrosequencing
likely also depends on the random effects of PCR and the proportions of different ITS
copies in a genome.
Xu et al. (2009) demonstrated a high level of intra-individual polymorphism in rDNA,
including multiple functional genes, putative pseudo genes, and recombinants. They also
Page 24 of 70
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
25
found a systematic bias in the efficiency with which different sequences were amplified.
In Physcia adscendens, a second ITS version with unidirectional transitions from
cytocine to thymine and adenine to guanine was detected. The second ITS version of this
species could represent a rare divergent ITS allele or a pseudogene, propagated to a
higher frequency due to a PCR bias. Ribosomal DNA pseudogenes are characterized by
low stability in the predicted secondary structure, a higher substitution rate, and
deamination-driven stubstitutions (Buckler et al. 1997). A large multigene family of 5S
rRNA, including pseudogenes, has also been identified in other filamentous fungi
(Margolin et al. 1998, Rooney and Ward 2005). A duplication event of the ribosomal
DNA cistron could be a plausible explanation for the detected nucleotide variation in
Physcia adscendens and is in concordance with the findings of Lücking et al. (2012).
Intraspecific variation, resulting from possible lichen chimeras or microcosms—where
diverse communities live mixed and alongside—is illustrated by the phylogenies for three
crustose species. For example, in a single sample of Bacidina, constituting a minute
quantity of lichen soredia, four distinct lineages of Bacidina were identified in a visually
inseparable community. In these species, the problem of distinguishing between intra-
and interspecific variation arises. This variation within a specimen cannot be detected in
Sanger sequencing environmental samples, and this could be one of the reasons for the
taxonomic mess in the Bacidina arnoldiana aggregate, as also illustrated in Figure 4.
Further studies with more data are needed to investigate such phenomena.
Page 25 of 70
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
26
Whatever the reason, nucleotide variation detected through 454 pyrosequencing must be
interpreted with caution, both with respect to the acquired barcode database as well as
answering biological questions. Using other, less error-prone squencing platforms and/or
combining a large number of reads per taxon could help to avoid random sequencing
errors. Sanger references can aid in distinguishing between CAFIE errors and biological
sequence variation. However, Sanger sequencing may remain insufficient in some cases,
as it is unable to detect intragenomic sequence variation (e.g. to investigate species and
genome evolution) and fails to give useable results in a mixture of templates.
Acknowledgements
We thank Jean-Claude Walser from ETH Zurich Genetic Diversity Centre (GDC) for
help with pyrosequencing data analyses and Microsynth AG for their collaboration. We
also wish to thank the editor and the reviewers for their useful comments, and Curtis
Gautschi for linguistic corrections. This study was funded by the Rectors’ Conference of
the Swiss Universities (CRUS) through the SCIEX-NMSch Scientific Exchange
Programme (project SCIEX-LiCode), FOEN – the Federal Office for the Environment
(grant to Silvia Stofer and CS), and SwissBOL (grant to CS).
References
Abarenkov, K., Henrik Nilsson, R., Larsson, K.Ä., Alexander, I.J., Eberhardt, U., Erland,
S., Høiland, K., Kjøller, R., Larsson, E., and Pennanen, T. 2010. The UNITE
Page 26 of 70
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
27
database for molecular identification of fungi – recent updates and future
perspectives. New Phytol., 186: 281–285. Available at: https://unite.ut.ee/;
accessed 24 November 2015.
Begerow, D., Nilsson, H., Unterseher, M., and Maier, W. 2010. Current state and
perspectives of fungal DNA barcoding and rapid identification procedures. Appl.
Microbiol. Biotechnol., 87: 99–108.
Bellaousov, S., Reuter, J.S., Seetin, M.G., and Mathews, D.H. 2013. RNAstructure: web
servers for RNA secondary structure prediction and analysis. Nucleic Acids Res.,
41: W471–W474.
Bergamini, A., Scheidegger, C., Stofer, S., Carvalho, P., Davey, S., Dietrich, M., Dubs,
F., Farkas, E., Groner, U., Kärkkäinen, K., Keller, C., Lökös, L., Lommi, S.,
Máguas, C., Mitchell, R., Pinho, P., Rico, V.J., Aragón, G., Truscott, A.-M.,
Wolseley, P., and Watt, A. 2005. Performance of macrolichens and lichen genera
as indicators of lichen species richness and composition. Conserv. Biol., 19:
1051–1062.
Berger, S.A. and Stamatakis, A. 2011. Aligning short reads to reference alignments and
trees. Bioinformatics 27.15: 2068−2075.
Blaalid, R., Kumar, S., Nilsson, R.H., Abarenkov, K., Kirk, P.M., Kauserud, H. 2013.
ITS1 versus ITS2 as DNA metabarcodes for fungi. Mol. Ecol. Resour., 13: 218–
224.
Buckler, E.S., Ippolito, A., and Holtsford, T.P. 1997. The evolution of ribosomal DNA
divergent paralogues and phylogenetic implications. Genetics, 145: 821–832.
Page 27 of 70
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
28
Clerc, P. and Truong, C. 2012. Catalogue des lichens de Suisse. Version 2.0, 11.06.2012.
Available at: http://www.ville-ge.ch/musinfo/bd/cjb/cataloguelichen.
Coordinators, N.R. 2013. Database resources of the National Center for Biotechnology
Information. Nucleic Acids Res., 41: D8–D20. Available at:
http://www.ncbi.nlm.nih.gov/genbank/; Accessed 28 September 2015.
Culberson, C.F. and Ammann, K. 1979. Standardmethode zur
Dünnschichtchromatographie von Flechtensubstanzen. Herzogia, 5: 1–24.
Divakar, P.K., Leavitt, S.D., Molina, M.C., Del-Prado, R., Lumbsch, H.T., and Crespo, A.
2016. A DNA barcoding approach for identification of hidden diversity in
Parmeliaceae (Ascomycota): Parmelia sensu stricto as a case study. Bot. J. Linn.
Soc., 180.1: 21–29.
Dutton, C.M., Paynton, C., and Sommer, S.S. 1993. General method for amplifying
regions of very high G + C content. Nucleic Acids Res., 21: 2953–2954.
Edgar, R.C. 2010. Search and clustering orders of magnitude faster than BLAST.
Bioinformatics, 26: 2460–2461.
Edgar, R.C., Haas, B.J., Clemente, J.C., Quince, C., and Knight, R. 2011. UCHIME
improves sensitivity and speed of chimera detection. Bioinformatics, 27: 2194–
2200.
Erguner, B., Ustek, D., and Sagiroglu, M.S. 2015. Performance comparison of Next
Generation sequencing platforms. In 37th Annual International Conference of the
IEEE, 2015, Italy: Engineering in Medicine and Biology Society (EMBC): 6453–
6456 .
Page 28 of 70
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
29
Feliner, G.N. and Rosselló, J.A. 2007. Better the devil you know? Guidelines for
insightful utilization of nrDNA ITS in species-level evolutionary studies in plants.
Mol. Phylogen. Evol., 44: 911–919.
Flück, D. 2012. DNA barcoding of Swiss epiphytic crustose lichens – a feasibility study.
M.Sc. thesis. Academic Deparment, University of Bern, Bern, Switzerland.
Gardes, M. and Bruns, T.D. 1993. ITS primers with enhanced specificity for
basidiomycetes – application to the identification of mycorrhizae and rusts. Mol.
Ecol., 2: 113–118.
Gilles, A., Meglécz, E., Pech, N., Ferreira, S., Malausa, T., and Martin, J.-F. 2011.
Accuracy and quality assessment of 454 GS-FLX Titanium pyrosequencing.
BMC Genomics, 12: 245.
Haas, B.J., Gevers, D., Earl, A.M., Feldgarden, M., Ward, D.V., Giannoukos, G., Ciulla,
D., Tabbaa, D., Highlander, S.K., and Sodergren, E. 2011. Chimeric 16S rRNA
sequence formation and detection in Sanger and 454-pyrosequenced PCR
amplicons. Genome Res., 21: 494–504.
Hodkinson, B. and Lendemer, J. 2013. Next-generation sequencing reveals sterile
crustose lichen phylogeny. Mycosphere, 4: 1028–1039.
Honegger, R. 1996. Mycobionts. In Lichen Biology. Edited by T.H. Nash III, Cambridge
University Press, Cambridge, UK.
Huang, C., J. Xu, W. Gao, Q. Chen, H. Wang, and J. Zhang. 2010. A reason for overlap
peaks in direct sequencing of rRNA gene ITS in Pleurotus nebrodensis. FEMS
Microbiol. Lett. 305:14–17.
Page 29 of 70
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
30
Huse, S.M., Huber, J.A., Morrison, H.G., Sogin, M.L., and Welch, D.M. 2007. Accuracy
and quality of massively parallel DNA pyrosequencing. Genome Biol., 8: R143.
Hyde, K.D., Udayanga, D., Manamgoda, D.S., Tedersoo, L., Larsson, E., Abarenkov, K.
Bertrand, Y.J.K., Oxelman, B., Hartmann, M., Kauserud, H., Ryberg, M.,
Kristiansson, E., Nilsson, H.R. 2013. Incorporating molecular data in fungal
systematics: a guide for aspiring researchers. Current Research in Environmental
and Applied Mycology, 3: 1–32.
Katoh, K. and Standley, D.M. 2013. MAFFT multiple sequence alignment software
version 7: improvements in performance and usability. Mol. Biol. Evol., 30: 772–
780.
Katoh, K. and Toh, H. 2008. Recent developments in the MAFFT multiple sequence
alignment program. Briefings Bioinf., 9: 286–298.
Kearse, M., Moir, R., Wilson, A., Stones-Havas, S., Cheung, M., Sturrock, S., Buxton, S.,
Cooper, A., Markowitz, S., Duran, C., Thierer, T., Ashton, B., Mentjies, P., and
Drummond, A. 2012. Geneious Basic: an integrated and extendable desktop
software platform for the organization and analysis of sequence data.
Bioinformatics, 28: 1647–1649.
Kelly, L.J., Hollingsworth, P.M., Coppins, B.J., Ellis, C.J., Harrold, P., Tosh, J., and Yahr,
R. 2011. DNA barcoding of lichenized fungi demonstrates high identification
success in a floristic context. New Phytol., 191: 288–300.
Kraichak, E., Divakar, P.K., Crespo, A., Leavitt, S.D., Nelsen, M.P., Lücking, R., and
Lumbsch, H.T. 2015. A tale of two hyper-diversities: diversification dynamics of
the two largest families of lichenized fungi. Sci. Rep., 5: 10028.
Page 30 of 70
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
31
Librado, P. and Rozas, J. 2009. DnaSP v5: a software for comprehensive analysis of
DNA polymorphism data. Bioinformatics, 25: 1451–1452.
Lindner, D.L., Carlsen, T., Nilsson, R.H., Davey, M., Schumacher, T. & Kauserud, H.
2013. Employing 454 amplicon pyrosequencing to reveal intragenomic
divergence in the internal transcribed spacer rDNA region in fungi. Ecology and
Evolution, 3.6: 1751–1764.
Lüking, R., Kalb, K., and Essene, A. 2012. The power of ITS: using megaphylogenies of
barcoding genes to reveal inconsistencies in taxonomic identifications of genbank
submissions. In The 7th IAL symposium: Lichens: from genome to ecosystems in
a changing world, January 2012, Bangkok, Thailand. Book of Abstracts, 3B-1-O2.
Lücking, R., Dal-Forno, M., Sikaroodi, M., Gillevet, P.M., Bungartz, F., Moncada, B.,
Yánez-Ayabaca, A., Chaves, J.L., Coca, L.F., and Lawrey, J.D. 2014a. A single
macrolichen constitutes hundreds of unrecognized species. PNAS, 111: 11091–
11096.
Lücking, R., Lawrey, J.D., Gillevet, P.M., Sikaroodi, M., Dal-Forno, M., and Berger, S.A.
2014b. Multiple ITS haplotypes in the genome of the lichenized basidiomycete
Cora inversa (Hygrophoraceae): fact or artifact? J. Mol. Evol., 78: 148–162.
Lumbsch, H.T. and Leavitt, S.D. 2011. Goodbye morphology? A paradigm shift in the
delimitation of species in lichenized fungi. Fungal Diversity, 50: 59–72.
Madden, T. 2002. The BLAST sequence analysis tool. In The NCBI Handbook. Edited
by J. McEntyre and J. Ostell, National Center for Biotechnology Information,
Bethesda, MD, USA.
Page 31 of 70
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
32
Margolin, B.S., Garrett-Engele, P.W., Stevens, J.N., Fritz, D.Y., Garrett-Engele, C.,
Metzenberg, R.L., and Selker, E.U. 1998. A methylated Neurospora 5S rRNA
pseudogene contains a transposable element inactivated by repeat-induced point
mutation. Genetics, 149: 1787–1797.
Margulies, M., Egholm, M., Altman, W.E., Attiya, S., Bader, J.S., Bemben, L.A., Berka,
J., Braverman, M.S., Chen, Y.-J., and Chen, Z. 2005. Genome sequencing in
microfabricated high-density picolitre reactors. Nature, 437: 376–380.
Mark, K., Saag, L., Leavitt, S.D., Will-Wolf, S., Nelsen, M.P., Tõrra, T., Saag, A.,
Randlane, T., and Lumbsch, H.T. 2016. Evaluation of traditionally circumscribed
species in the lichen-forming genus Usnea, section Usnea (Parmeliaceae,
Ascomycota) using a six-locus dataset. Organisms Diversity & Evolution, In press.
doi: 10.1007/s13127-016-0273-7.
Martin, M. 2011. Cutadapt removes adapter sequences from high-throughput sequencing
reads. EMBnet. journal, 17: 10–12.
Nash III, T.H. 2008. Lichen Biology. Cambridge University Press, Cambridge, UK.
Nilsson, R.H., Tedersoo, L., Ryberg, M., Kristiansson, E., Hartmann, M., Unterseher, M.,
M. Porter, T., Bengtsson-Palme, J., M. Walker, D., and de Sousa, F. 2015. A
comprehensive, automatically updated fungal ITS sequence dataset for reference-
based chimera control in environmental sequencing efforts. Microbes Environ.,
30: 145–150. Available at: http://unite.ut.ee/repository.php; accessed 26 July 2015.
Nimis, P.L., Scheidegger, C., and Wolseley, P.A. 2002. Monitoring with Lichens –
Monitoring Lichens. Springer Netherlands, The Netherlands.
Page 32 of 70
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
33
Orock, E.A., Leavitt, S.D., Fonge, B.A., St Clair, L.L., and Lumbsch, H.T. 2012. DNA-
based identification of lichen-forming fungi: can publicly available sequence
databases aid in lichen diversity inventories of Mount Cameroon (West Africa)?
The Lichenologist, 44: 833–839.
Porter, T.M., Golding, G.B. 2011. Are similarity- or phylogeny-based methods more
appropriate for classifying internal transcribed spacer (ITS) metagenomic
amplicons? New Phytol., 192: 775–782.
Ratnasingham, S. and Hebert, P.D. 2007. BOLD: The Barcode of Life Data System. Mol.
Ecol. Notes, 7: 355–364. Available at: www.boldsystems.org; accessed 24
November 2015.
Rooney, A.P. and Ward, T.J. 2005. Evolution of a large ribosomal RNA multigene family
in filamentous fungi: birth and death of a concerted evolution paradigm. Proc.
Natl. Acad. Sci. U. S. A., 102: 5084–5089.
Schmieder, R. and Edwards, R. 2011. Quality control and preprocessing of metagenomic
datasets. Bioinformatics, 27: 863–864.
Schoch, C.L., Seifert, K.A., Huhndorf, S., Robert, V., Spouge, J.L., Levesque, C.A.,
Chen, W., and Consortium, F.B. 2012. Nuclear ribosomal internal transcribed
spacer (ITS) region as a universal DNA barcode marker for Fungi. Proc. Natl.
Acad. Sci. U. S. A., 109: 6241–6246.
Seifert, K.A. 2009. Progress towards DNA barcoding of fungi. Mol. Ecol. Resour., 9: 83–
89.
Sharifian, H. 2010. Errors induced during PCR amplification. M.Sc. thesis. Academic
Deparment, Swiss Federal Institute of Technology, Zurich, Switzerland.
Page 33 of 70
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
34
Shin, S., Lee, T.K., Han, J.M., and Park, J. 2014. Regional effects on chimera formation
in 454 pyrosequenced amplicons from a mock community. J. Microbiol. (Seoul,
Repub. Korea), 52: 566–573.
Simon, U.K. and Weiß, M. 2008. Intragenomic variation of fungal ribosomal genes is
higher than previously thought. Mol. Biol. Evol., 25: 2251–2254.
Stamatakis, A. 2006. RAxML-VI-HPC: maximum likelihood-based phylogenetic
analyses with thousands of taxa and mixed models. Bioinformatics, 22: 2688–
2690.
Stamatakis, A., Hoover, P., and Rougemont, J. 2008. A rapid bootstrap algorithm for the
RAxML Web servers. Syst. Biol., 57: 758–771.
Tamura, K., Peterson, D., Peterson, N., Stecher, G., Nei, M., and Kumar, S. 2011.
MEGA5: molecular evolutionary genetics analysis using maximum likelihood,
evolutionary distance, and maximum parsimony methods. Mol. Biol. Evol., 28:
2731–2739.
Truong, C., Divakar, P.K., Yahr, R., Crespo, A., and Clerc, P. 2013. Testing the use of
ITS rDNA and protein-coding genes in the generic and species delimitation of the
lichen genus Usnea (Parmeliaceae, Ascomycota). Mol. Phylogen. Evol., 68: 357–
372.
Velmala, S., Myllys, L., Goward, T., Holien, H., and Halonen, P. 2014. Taxonomy of
Bryoria section Implexae (Parmeliaceae, Lecanoromycetes) in North America and
Europe, based on chemical, morphological and molecular data. Ann. Bot. Fenn.,
51: 345–371.
Page 34 of 70
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
35
White, F. and James, P.W. 1985. A new guide to microchemical techniques for the
identification of lichen substances. British Lichen Society Bulletin, 57: 1–41.
White, T.J., Bruns, T., Lee, S., and Taylor, J.W. 1990. Amplification and direct
sequencing of fungal ribosomal RNA genes for phylogenetics. In PCR Protocols:
A Guide to Methods and Applications. Edited by M.A. Innis, D.H. Gelfand, J.J.
Sninsky, and T.J. White, Academic Press, New York. pp. 315–322.
Wörheide, G., Nichols, S.A., and Goldberg, J. 2004. Intragenomic variation of the rDNA
internal transcribed spacers in sponges (Phylum Porifera): implications for
phylogenetic studies. Mol. Phylogen. Evol., 33: 816–830.
Xu, J., Zhang, Q., Xu, X., Wang, Z., and Qi, J. 2009. Intragenomic variability and
pseudogenes of ribosomal DNA in stone flounder Kareius bicoloratus. Mol.
Phylogen. Evol., 52: 157–166.
Page 35 of 70
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
36
Table 1 Studied species with specimen voucher info, the primary barcode length for the sequenced ITS region, the position and length of an intron in
the barcode (if present), and the primary barcode accession numbers in the NCBI and BOLD databases.
# Target taxon Collection data (site, date, ID)
Barcode
length (bp)
Group I intron
length (bp) and
position GenBank # BOLD #
1 Alectoria sarmentosa (Ach.) Ach. CH-Bern; 06.02.2014; LC-064 579 KX132948 LIFU039-16
2 Anaptychia crinalis (Schleich. ex Schaer.)
Vězda
CH-Graubünden; 18.02.2014; LC-071 1067 512 (18-322;
332-538)
KX132954 LIFU045-16
3 Arthrosporum populorum A. Massal. CH-Valais; 06.02.2014; LC-046 586 KX132986 LIFU078-16
4 Bacidia rubella (Hoffm.) A. Massal. CH-Valais; 06.02.2014; LC-041 774 216 (26-241) KX132984 LIFU076-16
5 Bacidia vermifera (Nyl.) Th. Fr. CH-Bern; 04.07.2012; RE-515-19 578 KX132992 LIFU084-16
6 Bacidina arnoldiana aggr.* CH-Lucerne; 30.03.2011; BC-038-1 583 KX132958 LIFU049-16
7 Bacidina arnoldiana aggr.* CH-Zurich; 15.07.2011; BC-132-1 795 213 (26-238) KX132972 LIFU063-16
8 Bryoria capillaris (Ach.) Brodo & D. Hawksw. CH-Bern; 06.02.2014; LC-058 803 223 (26-248) KX132945 LIFU036-16
9 Buellia arborea Coppins & Tønsberg CH-Obwalden; 13.08.2011; BC-154-2 584 KX132975 LIFU066-16
10 Bunodophoron melanocarpum (Sw.) Wedin CH-Bern; 06.02.2014; LC-065 584 KX132949 LIFU040-16
11 Cetrelia monachorum (Zahlbr.) W. L. Culb. &
C. F. Culb.
CH-Lucerne; 22.02.2014; KM-03-07 577 KX132924 LIFU015-16
12 Chaenotheca cf. stemonea (Ach.) Müll. Arg. CH-Neuchâtel; 13.05.2011; BC-087-3 571 KX133006 LIFU098-16
13 Cladonia chlorophaea (Flörke ex Sommerf.)
Spreng.
CH-Zurich; 01.02.2014; KM-01-8 872 227 (26-252) KX132914 LIFU005-16
14 Cladonia coniocraea (Flörke) Spreng. CH-Bern; 06.02.2014; LC-068 868 226 (26-251) KX132951 LIFU042-16
Page 36 of 70
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
37
15 Cladonia digitata (L.) Hoffm. CH-Bern; 06.02.2014; LC-067 650 KX132950 LIFU041-16
16 Cladonia squamosa (Scop.) Hoffm. CH-Bern; 06.02.2014; LC-069 873 227 (26-252) KX132952 LIFU043-16
17 Evernia divaricata (L.) Ach. CH-Bern; 06.02.2014; LC-057 585 KX132944 LIFU035-16
18 Evernia prunastri (L.) Ach. CH-Graubünden; 18.02.2014; LC-070 578 KX132953 LIFU044-16
19 Fellhanera bouteillei (Desm.) Vězda CH-Lucerne; 03.2014; LC-118 609 KX132990 LIFU082-16
20 Flavoparmelia caperata (L.) Hale CH-Zurich; 15.02.2014; KM-02-02 1176 594 (26-619) KX132916 LIFU007-16
21 Frutidella pullata (Norman) Schmull CH-Bern; 17.05.2011; BC-115-2 828 232 (26-257) KX132970 LIFU061-16
22 Fuscidea arboricola Coppins & Tønsberg CH-Vaud; 10.03.2007; BC-185-5 n.a. KX132962 LIFU053-16
23 Fuscidea pusilla Tønsberg CH-St. Gallen; 20.04.2011; BC-063-2 542 1LIFU072-16
24 Hyperphyscia adglutinata (Flörke) H.
Mayrhofer & Poelt
CH-St. Gallen; 11.04.2011; BC-055-3 795 228 (26-253) KX132959 LIFU050-16
25 Hypocenomyce scalaris (Ach. ex Lilj.) M.
Choisy
CH-Valais; 06.02.2014; LC-016 564 KX132982 LIFU074-16
26 Hypogymnia farinacea Zopf CH-Bern; 06.02.2014; LC-062 580 KX132947 LIFU038-16
27 Hypogymnia physodes (L.) Nyl. CH-Valais; 06.02.2014; LC-039 810 228 (26-253) KX132937 LIFU028-16
28 Hypogymnia tubulosa (Schaer.) Hav. CH-Jura; 01.03.2014; LC-107 797 220 (26-245) KX132956 LIFU047-16
29 Hypotrachyna laevigata (Sm.) Hale CH-Lucerne; 22.02.2014; KM-03-05 582 KX132922 LIFU013-16
30 Lecania cf. cyrtella (Ach.) Th. Fr. CH-Valais; 06.02.2014; LC-023 801 223 (26-248) KX132983 LIFU075-16
31 Lecanora albella (Pers.) Ach. CH-Bern; 17.05.2011; BC-118-1 590 KX133002 LIFU094-16
32 Lecanora allophana f. sorediata Nyl. CH-Valais; 07.05.2011; BC-72-3 813 223 (26-248) KX133001 LIFU093-16
33 Lecanora argentata/subrugosa CH-Bern; 17.05.2011; BC-118-2 598 KX133003 LIFU095-16
34 Lecanora carpinea (L.) Vain. CH-Valais; 16.04.2011; BC-58-3 840 248 (26-273) KX132999 LIFU091-16
Page 37 of 70
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
38
35 Lecanora horiza (Ach.) Röhl. CH-St. Gallen; 11.04.2011; BC-54-3 591 KX132998 LIFU090-16
36 Lecanora impudens Degel. CH-St. Gallen; 11.04.2011; BC-49-1 813 223 (26-248) KX132996 LIFU088-16
37 Lecanora muralis (Schreb.) Rabenh. CH-Valais; 06.02.2014; LC-045 590 KX132985 LIFU077-16
38 Lecanora praesistens Nyl. CH-Bern; 20.05.2013; MG-076-49 833 231 (26-256) KX132991 LIFU083-16
39 Lecanora pulicaris (Pers.) Ach. CH-Jura; 01.03.2014; LC-101 831 226 (26-251) KX132989 LIFU081-16
40 Lecanora saligna (Schrad.) Zahlbr. CH-Glarus; 11.04.2011; BC-47-1 597 KX132995 LIFU087-16
41 Lecanora sp.* CH-Bern; 22.05.2011; BC-027-1 796 206 (26-231) KX133005 LIFU097-16
42 Lecanora strobilina aggr.* CH-Glarus; 11.04.2011; BC-52-1 590 KX132997 LIFU089-16
43 Lecanora subcarpinea Szatala CH-Bern; 01.03.2014; LC-090 595 KX132988 LIFU080-16
44 Lecanora varia (Hoffm.) Ach. CH-Zurich; 16.04.2011; BC-60-5 808 218 (26-243) KX133000 LIFU092-16
45 Lecidella albida Hafellner CH-St. Gallen; 20.04.2011; BC-067-3 590 KX132987 LIFU079-16
46 Lecidella cf. elaeochroma (Ach.) M. Choisy CH-Graubünden; 18.02.2014; LC-072 592 KX132966 LIFU057-16
47 Lecidella cf. leprothalla (Zahlbr.) Knoph &
Leuckert
CH-Neuchâtel; 13.05.2011; BC-077-2 599 KX132965 LIFU056-16
48 Lecidella flavosorediata (Vězda) Hertel &
Leuckert
CH-Valais; 08.05.2011; BC-074-14 599 KX132994 LIFU086-16
49 Lecidella flavosorediata (Vězda) Hertel &
Leuckert
CH-Bern; 01.03.2014; LC-091 597 KX132960 LIFU051-16
50 Lecidella scabra (Taylor) Hertel & Leuckert CH-St. Gallen; 11.04.2011; BC-055-4 595 KX132993 LIFU085-16
51 Lecidella sp.* CH-Bern; 01.03.2014; LC-088 596 KX132978 LIFU069-16
52 Lepraria elobata Tønsberg CH-Obwalden; 13.08.2011; BC-158-2 604 KX132979 LIFU070-16
53 Lepraria jackii Tønsberg CH-Obwalden; 13.08.2011; BC-156-3 601 KX132974 LIFU065-16
Page 38 of 70
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
39
54 Lepraria lobificans Nyl. CH-Zurich; 18.07.2011; BC-133-3 564 KX132961 LIFU052-16
55 Lepraria rigidula (B. de Lesd.) Tønsberg CH-Valais; 16.04.2011; BC-056-2 598 KX132973 LIFU064-16
56 Lepraria vouauxii (Hue) R. C. Harris CH-Zurich; 18.07.2011; BC-133-2 601 KX132927 LIFU018-16
57 Letharia vulpina (L.) Hue CH-Valais; 06.02.2014; LC-006 581 KX132976 LIFU067-16
58 Loxospora elatina (Ach.) A. Massal. CH-Obwalden; 13.08.2011; BC-155-5 564 KX133004 LIFU096-16
59 Loxospora elatina (Ach.) A. Massal. CH-Obwalden; 19.08.2011; BC-161-5 564 KX132969 LIFU060-16
60 Megalaria pulverea (Borrer) Hafellner & E.
Schreiner
CH-Bern; 17.05.2011; BC-113-3 569 KX132913 LIFU004-16
61 Melanelixia fuliginosa subsp. glabratula
(Lamy) J. R. Laundon
CH-Zurich; 01.02.2014; KM-01-6 807 229 (26-254) KX132939 LIFU030-16
62 Melanelixia glabra (Schaer.) O. Blanco, A.
Crespo, Divakar, Essl., D. Hawksw. & Lumbsch
CH-Valais; 06.02.2014; LC-043 580 KX132938 LIFU029-16
63 Melanelixia subargentifera (Nyl.) O. Blanco, A.
Crespo, Divakar, Essl., D. Hawksw. & Lumbsch
CH-Valais; 06.02.2014; LC-042 582 KX132935 LIFU026-16
64 Melanohalea exasperata (De Not) O. Blanco,
A. Crespo, Divakar, Essl., D. Hawksw. &
Lumbsch
CH-Valais; 06.02.2014; LC-028 801 220 (26-245) KX132943 LIFU034-16
65 Menegazzia terebrata (Hoffm.) A. Massal. CH-Bern; 06.02.2014; LC-054 583 KX132981 LIFU073-16
66 Micarea cinerea (Schaer.) Hedl. CH-Lucerne; 22.02.2014; KM-03-03 571 KX132957 LIFU048-16
67 Micarea xanthonica Coppins & Tønsberg CH-St. Gallen; 15.03.2011; BC-015-6 755 192 (18-209) KX132971 LIFU062-16
68 Mycoblastus sanguinarius (L.) Norman CH-Bern; 17.05.2011; BC-119-4 596 KX132946 LIFU037-16
69 Parmelia saxatilis (L.) Ach. CH-Bern; 06.02.2014; LC-060 583 KX132926 LIFU017-16
Page 39 of 70
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
40
70 Parmelia sulcata Taylor CH-Valais; 06.02.2014; LC-004 593 KX132912 LIFU003-16
71 Parmelina tiliacea (Hoffm.) Hale CH-Zurich; 01.02.2014; KM-01-4A 578 KX132918 LIFU009-16
72 Parmeliopsis ambigua (Wulfen) Nyl. CH-Valais; 06.02.2014; LC-002 581 KX132923 LIFU014-16
73 Parmotrema crinitum (Ach.) M. Choisy CH-Lucerne; 22.02.2014; KM-03-06 609 KX132915 LIFU006-16
74 Parmotrema perlatum (Huds.) M. Choisy CH-Zurich; 14.02.2014; KM-02-01 610 KX132941 LIFU032-16
75 Phaeophyscia ciliata (Hoffm.) Moberg CH-Valais; 06.02.2014; LC-050 998 418 (18-224;
233-443)
KX132942 LIFU033-16
76 Phaeophyscia orbicularis (Neck.) Moberg CH-Valais; 06.02.2014; LC-051 992 418 (18-224;
233-443)
KX132940 LIFU031-16
77 Phaeophyscia poeltii (Frey) Clauzade & Cl.
Roux
CH-Valais; 06.02.2014; LC-049 801 221 (26-246) KX132963 LIFU054-16
78 Phlyctis argena (Spreng.) Flot. CH-St. Gallen; 20.04.2011; BC-065-1 924 351 (18-368) KX132910 LIFU001-16
79 Physcia adscendens (Fr.) H. Olivier CH-Zurich; 01.02.2014; KM-01-2B 784 222 (26-247) KX132934 LIFU025-16
80 Physcia aipolia (Ehrh. ex Humb.) Fürnr. CH-Valais; 06.02.2014; LC-026 783 216 (26-241) KX132933 LIFU024-16
81 Physconia distorta (With.) J. R. Laundon CH-Valais; 06.02.2014; LC-021 972 402 (18-217;
226-427)
KX132967 LIFU058-16
82 Placynthiella dasaea (Stirt.) Tønsberg CH-Neuchâtel; 13.05.2011; BC-084-1 900 335 (26-360) KX132911 LIFU002-16
83 Pleurosticta acetabulum (Neck.) Elix &
Lumbsch
CH-Zurich; 01.02.2014; KM-01-3A 577 KX132925 LIFU016-16
84 Pseudevernia furfuracea (L.) Zopf CH-Valais; 06.02.2014; LC-001 574 KX132917 LIFU008-16
85 Punctelia jeckeri (Roum.) Kalb CH-Zurich; 15.02.2014; KM-02-03 901 336 (26-361) KX132977 LIFU068-16
86 Pycnora sorophora (Vain.) Hafellner CH-Obwalden; 13.08.2011; BC-156-1 565 KX132955 LIFU046-16
Page 40 of 70
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
41
Note: The morphology and chemistry of the undescribed taxa marked with * are discussed in text. 1No sequence, specimen data only.
87 Ramalina pollinaria (Westr.) Ach. CH-Jura; 01.03.2014; LC-093 770 215 (26-240) KX133007 LIFU099-16
88 Scoliciosporum umbrinum (Ach.) Arnold CH-Valais; 06.02.2014; LC-024 598 KX133008 LIFU100-16
89 Scoliciosporum umbrinum (Ach.) Arnold CH-Bern; 16.05.2013; MG-114-3C 598 KX132936 LIFU027-16
90 Tuckermannopsis chlorophylla (Willd.) Hale CH-Valais; 06.02.2014; LC-032 560 KX132929 LIFU020-16
91 Usnea barbata (L.) Weber ex F. H. Wigg. CH-Valais; 06.02.2014; LC-011 578 KX132932 LIFU023-16
92 Usnea barbata (L.) Weber ex F. H. Wigg. CH-Valais; 06.02.2014; LC-019 578 KX132921 LIFU012-16
93 Usnea ceratina Ach. CH-Lucerne; 22.02.2014; KM-03-04 578 KX132919 LIFU010-16
94 Usnea intermedia (A. Massal.) Jatta CH-Lucerne; 22.02.2014; KM-03-01 578 KX132920 LIFU011-16
95 Usnea intermedia (A. Massal.) Jatta CH-Lucerne; 22.02.2014; KM-03-02 578 KX132930 LIFU021-16
96 Usnea lapponica Vain. CH-Valais; 06.02.2014; LC-013 579 KX132928 LIFU019-16
97 Usnea substerilis Motyka CH-Valais; 06.02.2014; LC-009 577 KX132968 LIFU059-16
98 Violella fucata (Stirt.) T. Sprib. CH-Bern; 17.05.2011; BC-109-4 834 235 (26-260) KX132980 LIFU071-16
99 Violella fucata (Stirt.) T. Sprib. CH-Obwalden; 13.08.2011; BC-159-4 592 KX132931 LIFU022-16
100 Vulpicida pinastri (Scop.) J.-E. Mattsson & M.
J. Lai
CH-Valais; 06.02.2014; LC-015 792 216 (26-241) KX132948 LIFU039-16
Page 41 of 70
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
42
Table 2 Pyrosequencing results with nucleotide variation in the studied samples and comparison of consensus barcodes with representative sequencing
reads for each sample.
Comparison between consensus barcode and
representative centroid
# Species Reads Clusters1
Reads of target
taxon (%)
Target
clusters1
Nucleotide
diversity ππππ
Pairwise
identity (%)
No of base
differences (%) TS / TV / Ind2
1 Alectoria sarmentosa 582 28 463 (79.6) 1 0.00385 97.2 14 (2.4) 1 / 3 / 10
2 Anaptychia crinalis 342 70 52 (15.2) 3 0 98.6 9 (1.4) 0 / 0 / 9
3 Arthrosporum populorum 2937 78 2499 (85.1) 24 0.00433 98.0 12 (2.0) 0 / 0 / 12
4 Bacidia rubella 2194 38 2167 (98.8) 24 0.00643 98.6 11 (1.4) 0 / 0 / 11
5 Bacidia vermifera 4052 195 1813 (44.7) 24 0.00865 97.6 14 (2.4) 0 / 0 / 14
6 Bacidina arnoldiana aggr. 1715 231 116 (6.8) 8 0.03161 98.6 8 (1.4) 0 / 1 / 7
7 Bacidina arnoldiana aggr. 1406 185 100 (7.1) 4 0.01275 99.0 8 (1.0) 0 / 0 / 8
8 Bryoria capillaris 636 45 424 (66.7) 4 0.00811 98.5 12 (0.5) 2 / 2 / 8
9 Buellia arborea 891 98 281 (31.5) 8 0.00902 97.6 14 (2.4) 0 / 0 / 14
10 Bunodophoron melanocarpum 128 19 82 (64.1) 3 0.02226 98.0 12 (2.0) 0 / 0 / 12
11 Cetrelia monachorum 1903 93 1644 (86.4) 32 0.00487 97.6 14 (2.4) 0 / 1 / 13
12 Chaenotheca cf. stemonea 542 94 156 (28.8) 8 0.08548 99.3 4 (0.7) 2 / 0 / 2
13 Cladonia chlorophaea 1159 131 485 (41.8) 1 0.00099 98.9 10 (1.1) 0 / 0 / 10
14 Cladonia coniocraea 425 52 241 (56.7) 1 0.00274 98.9 10 (1.2) 0 / 0 / 10
15 Cladonia digitata 718 69 582 (81.1) 6 0.00195 97.5 16 (2.5) 0 / 2 / 14
Page 42 of 70
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
43
16 Cladonia squamosa 291 82 159 (54.6) 1 0.0022 98.9 10 (1.1) 1 / 0 / 9
17 Evernia divaricata 472 48 366 (77.5) 5 0.0014 99.2 5 (0.8) 0 / 0 / 5
18 Evernia prunastri 382 55 112 (29.3) 8 0.00118 99.0 6 (1.0) 0 / 0 / 6
19 Fellhanera bouteillei 422 78 159 (37.7) 12 0.02398 99.7 2 (0.3) 0 / 0 / 2
20 Flavoparmelia caperata 553 109 42 (7.6) 9 0.00442 97.9 19 (2.1) 0 / 1 / 18
21 Frutidella pullata 841 161 31 (3.7) 3 0.00283 98.9 9 (1.1) 1 / 0 / 8
22 Fuscidea arboricola 1369 180 0 0 - - - - / - / -
23 Fuscidea pusilla 1568 165 411 (26.2) 3 0.00409 99.6 2 (0.4) 1 / 0 / 1
24 Hyperphyscia adglutinata 1433 218 198 (13.8) 5 0.00389 99.5 4 (0.5) 0 / 0 / 4
25 Hypocenomyce scalaris 1446 34 1349 (93.3) 8 0.00467 98.8 7 (1.2) 1 / 2 / 4
26 Hypogymnia farinacea 752 33 621 (82.6) 4 0.00097 97.0 18 (3.0) 0 / 1 / 17
27 Hypogymnia physodes 806 57 615 (76.3) 2 0.00352 97.7 19 (2.3) 2 / 1 / 16
28 Hypogymnia tubulosa 716 43 532 (74.4) 2 0.00219 98.0 16 (2.0) 0 / 0 / 16
29 Hypotrachyna laevigata 1137 92 867 (76.3) 16 0.00145 99.1 5 (0.9) 0 / 0 / 5
30 Lecania cf. cyrtella 728 78 413 (56.7) 7 0.02063 98.0 16 (2.0) 0 / 0 / 16
31 Lecanora albella 267 11 12 (4.5) 3 0.11882 77.7 118 (19.9)* 26 / 47 / 45
32 Lecanora allophana f. sorediata 1717 19 1397 (81.4) 3 0.00643 99.5 4 (0.5) 0 / 0 / 4
33 Lecanora argentata/subrugosa 625 24 553 (88.5) 13 0.01542 98.8 7 (1.2) 0 / 0 / 7
34 Lecanora carpinea 224 6 187 (83.5) 2 0.02157 99.5 4 (0.5) 0 / 0 / 4
35 Lecanora horiza 963 9 963 (100) 9 0.0019 98.5 9 (1.5) 1 / 0 / 8
36 Lecanora impudens 221 14 186 (84.2) 7 0.00799 99.4 5 (0.6) 1 / 1 / 3
Page 43 of 70
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
44
37 Lecanora muralis 1231 97 737 (59.9) 22 0.00641 99.3 4 (0.7) 0 / 0 / 4
38 Lecanora praesistens 773 19 127 (16.4) 1 0.00532 99.1 7 (0.9) 0 / 0 / 7
39 Lecanora pulicaris 379 10 347 (91.6) 2 0.01427 99.0 8 (1.0) 0 / 0 / 8
40 Lecanora saligna 298 29 209 (70.1) 12 0.00692 99.0 6 (1.0) 0 / 0 / 6
41 Lecanora sp. 868 124 104 (12) 4 0.03276 99.0 7 (1.0) 3 / 0 / 4
42 Lecanora strobilina aggr. 269 11 247 (91.9) 3 0.01246 98.5 9 (1.5) 0 / 0 / 9
43 Lecanora subcarpinea 1551 14 1390 (89.6) 7 0.00754 98.0 12 (2.0) 2 / 5 / 5
44 Lecanora varia 430 5 404 (94) 2 0.00386 98.9 9 (1.1) 2 / 0 / 7
45 Lecidella albida 1096 163 247 (22.5) 4 0.01403 97.7 14 (2.3) 0 / 1 / 13
46 Lecidella cf. elaeochroma 1215 58 1152 (72.7) 8 0.00403 99.2 5 (0.8) 1 / 0 / 4
47 Lecidella cf. leprothalla 1585 133 254 (20.9) 5 0.00127 99.3 4 (0.7) 3 / 1 / 0
48 Lecidella flavosorediata 634 59 644 (69.1) 4 0.0462 98.7 8 (1.3) 0 / 0 / 8
49 Lecidella flavosorediata 932 76 214 (33.8) 2 0.00199 99.3 4 (0.7) 0 / 0 / 4
50 Lecidella scabra 2186 185 337 (15.4) 13 0.00193 97.7 14 (2.3) 2 / 0 / 12
51 Lecidella sp. 331 39 160 (48.3) 14 0.01223 99.7 2 (0.3) 0 / 0 / 2
52 Lepraria elobata 1134 194 197 (17.4) 22 0.0049 98.2 11 (1.8) 1 / 0 / 10
53 Lepraria jackii 960 166 161 (16.8) 18 0.00537 99.2 5 (0.8) 0 / 0 / 5
54 Lepraria lobificans 1021 212 349 (34.2) 16 0.00497 99.3 4 (0.7) 1 / 0 / 3
55 Lepraria rigidula 704 184 144 (20.5) 13 0.01333 99.2 5 (0.8) 0 / 0 / 5
56 Lepraria vouauxii 744 90 312 (41.9) 7 0.00822 98.8 7 (1.2) 0 / 0 / 7
57 Letharia vulpina 621 19 574 (92.4) 7 0.0011 98.1 11 (1.9) 0 / 0 / 11
Page 44 of 70
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
45
58 Loxospora elatina 722 90 291 (40.3) 13 0.01002 98.9 6 (1.1) 0 / 0 / 6
59 Loxospora elatina 3077 60 2713 (88.2) 23 0.00785 99.1 5 (0.9) 0 / 0 / 5
60 Megalaria pulverea 1516 198 290 (19.1) 9 0.01208 99.6 2 (0.4) 0 / 0 / 2
61 Melanelixia fuliginosa subsp.
glabratula
975 163 294 (30.2) 4 0.00344 97.7 19 (2.3) 0 / 0 / 19
62 Melanelixia glabra 890 96 476 (53.5) 10 0.00457 98.3 10 (1.7) 0 / 0 / 10
63 Melanelixia subargentifera 698 58 478 (68.5) 3 0.00111 99.3 4 (0.7) 0 / 0 / 4
64 Melanohalea exasperata 891 85 330 (37) 3 0.00361 99.0 8 (1.0) 1 / 0 / 7
65 Menegazzia terebrata 926 94 672 (72.6) 20 0.00207 97.1 17 (2.9) 1 / 6 / 10
66 Micarea cinerea 422 32 341 (80.8) 4 0.00999 99.1 5 (0.9) 0 / 0 / 5
67 Micarea xanthonica 711 87 373 (52.5) 7 0.00728 99.3 5 (0.7) 0 / 0 / 5
68 Mycoblastus sanguinarius 824 102 222 (26.9) 3 0.00246 98.8 6 (1.0) 0 / 0 / 6
69 Parmelia saxatilis 362 58 241 (66.6) 1 0.00159 96.9 18 (3.1) 0 / 3 / 15
70 Parmelia sulcata 753 143 400 (53.1) 21 0.00604 98.0 11 (1.8) 0 / 0 / 11
71 Parmelina tiliacea 1391 133 1031 (74.1) 18 0.00203 99.0 6 (1.0) 1 / 0 / 5
72 Parmeliopsis ambigua 1401 89 878 (62.7) 7 0.00294 98.1 11 (2.1) 1 / 0 / 10
73 Parmotrema crinitum 1284 83 1146 (89.3) 15 0.0023 98.4 10 (1.6) 0 / 1 / 9
74 Parmotrema perlatum 1981 112 1234 (62.3) 17 0.00143 98.2 11 (1.8) 0 / 0 / 11
75 Phaeophyscia ciliata 365 86 90 (24.7) 3 0.00115 98.2 17 (1.8) 0 / 0 / 17
76 Phaeophyscia orbicularis 407 92 83 (20.4) 2 0.01161 99.4 5 (0.6) 0 / 0 / 5
77 Phaeophyscia poeltii 756 71 423 (56) 3 0.00104 97.9 17 (2.1) 0 / 0 / 17
Page 45 of 70
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
46
78 Phlyctis argena 1078 127 246 (22.8) 3 0 99.1 8 (0.9) 0 / 0 / 8
79 Physcia adscendens 962 111 474 (49.3) 29 0.04417 98.5 12 (1.5) 0 / 0 / 12
80 Physcia aipolia 612 73 377 (61.6) 4 0.002 98.6 11 (1.4) 0 / 0 / 11
81 Physconia distorta 1081 182 183 (16.9) 3 0.04149 98.2 17 (1.8) 0 / 0 / 17
82 Placynthiella dasaea 1089 133 179 (16.4) 5 0.00495 99.1 8 (0.9) 1 / 0 / 7
83 Pleurosticta acetabulum 585 134 20 (3.4) 14 0.05287 99.1 5 (0.9) 1 / 2 / 2
84 Pseudevernia furfuracea 1017 27 879 (86.4) 9 0.0017 95.7 25 (4.3)* 9 / 7 / 9
85 Punctelia jeckeri 499 79 107 (21.4) 2 0.07591 99.4 5 (0.6) 0 / 0 / 5
86 Pycnora sorophora 984 112 351 (35.7) 8 0.01008 98.9 6 (1.1) 0 / 0 / 6
87 Ramalina pollinaria 705 22 654 (92.8) 9 0.00525 98.5 12 (1.5) 1 / 0 / 11
88 Scoliciosporum umbrinum 1244 101 8 (1.4) 5 0.05371 99.5 4 (0.7) 0 / 1 / 3
89 Scoliciosporum umbrinum 592 201 148 (11.9) 3 0.01479 99.0 6 (1.0) 0 / 0 / 6
90 Tuckermannopsis chlorophylla 347 72 146 (42.1) 5 0.0035 97.6 14 (2.4) 0 / 0 / 14
91 Usnea barbata 1498 39 1269 (84.7) 10 0.00258 97.8 13 (2.2) 0 / 1 / 12
92 Usnea barbata 2202 51 2101 (95.4) 11 0.00209 96.8 19 (3.2) 0 / 4 / 15
93 Usnea ceratina 2627 69 2518 (95.9) 14 0.00213 97.0 18 (3.0) 0 / 1 / 17
94 Usnea intermedia 2152 59 2009 (93.4) 2 0.00355 97.6 14 (2.4) 0 / 4 / 10
95 Usnea intermedia 1536 70 1287 (83.8) 5 0.00466 97.8 13 (2.2) 0 / 0 / 13
96 Usnea lapponica 1585 50 1463 (92.3) 14 0.00311 96.3 22 (3.7) 0 / 1 / 21
97 Usnea substerilis 868 38 700 (80.6) 7 0.00205 97.8 11 (1.9) 0 / 1 / 10
98 Violella fucata 830 93 134 (16.1) 4 0.02164 98.6 12 (1.4) 0 / 0 / 12
Page 46 of 70
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
47
99 Violella fucata 1780 122 646 (36.3) 7 0.00704 99.7 2 (0.3) 0 / 0 / 2
100 Vulpicida pinastri 874 93 387 (44.3) 9 0.00159 98.9 9 (1.1) 1 / 0 / 8
Note: 1Clustering conducted at 95% using ‘centroids’ function in USEARCH; 2quantity of transitions (TS), transversions (TV), and indels (Ind) between the generated consensus
sequence (barcode) and representative centroid. * denotes chimeric representative centroid.
Page 47 of 70
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
48
Table 3 List of samples where minor barcode version(s) constituted more than 10% of the target reads, with characteristics of nucleotide variation
between the barcodes and comparison with available Sanger sequences of the species (where possible, otherwise marked with “—“).
Quantity of base differences TS/TV/Ind2
#
Species
No of
Barcodes
Alignment
(bp)
Variable
sites
Nucleotide
diversity ππππ
Max
distance 18S Intron ITS1 5.8S ITS2 28S
Same
mutations
in Sanger
1 Alectoria sarmentosa 8 579 8 0.00795 0.016 0/0/0 n.a. 0/4/1 0/0/0 1/4/2 0/0/0 0
6 Bacidina arnoldiana aggr. 1 5 799 66 0.05519 0.087 0/0/0 n.a. 7/1/1 0/0/0 7/0/1 0/0/0 —
8 Bryoria capillaris 4 803 6 0.00498 0.008 0/0/0 0/1/0 2/1/0 0/0/0 2/0/0 0/0/0 3
9 Buellia arborea 2 586 0 0 0 0/0/0 n.a. 0/0/1 0/0/0 0/0/0 0/0/0 —
12 Chaenotheca cf. stemonea 1 10 647 121 0.11473 0.253 0/0/0 0/0/03 2/0/0 1/0/1 3/1/3 0/0/0 —
15 Cladonia digitata 4 650 0 0 0 0/0/0 n.a. 0/0/0 0/0/0 0/0/3 0/0/0 0
19 Fellhanera bouteillei 1 3 610 57 0.06421 0.102 0/0/0 n.a. 3/1/0 0/0/0 1/1/0 0/0/0 —
29 Hypotrahyna laevigata 2 583 0 0 0 0/0/0 n.a. 0/0/0 0/0/0 0/0/1 0/0/0 —
30 Lecania cf. cyrtella 2 802 25 0.03145 0.032 0/0/0 6/5/1 7/3/0 1/0/1 3/0/2 0/0/3 —
43 Lecanora subcarpinea 18 597 7 0.00445 0.01 0/0/0 n.a. 1/3/7 0/0/0 3/0/0 0/0/0 0
50 Lecidella scabra 20 837 10 0.00987 0.024 0/0/0 0/0/03 7/3/1 0/0/0 3/1/1 0/0/0 —
51 Lecidella sp. 9 597 13 0.01063 0.017 0/0/0 n.a. 8/1/1 0/0/0 3/1/0 0/0/0 —
54 Lepraria lobificans 6 565 4 0.00239 0.011 0/0/0 n.a. 0/6/3 0/0/0 0/0/1 0/0/0 1
Page 48 of 70
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
49
55 Lepraria rigidula 6 601 13 0.00836 0.022 0/0/0 n.a. 5/2/1 0/0/0 3/3/2 0/0/0 12
64 Melanohalea exasperata 2 806 6 0.00752 0.008 0/0/0 0/0/1 3/1/1 0/0/0 2/0/3 0/0/1 5
65 Menegazzia terebrata 16 592 16 0.00836 0.035 0/0/0 n.a. 1/18/0 3/6/6 0/0/3 0/0/0 0
69 Parmelia saxatilis 2 586 3 0.00515 0.005 0/0/0 n.a. 0/0/0 0/0/1 0/3/3 0/0/0 0
70 Parmelia sulcata 2 593 3 0.00506 0.005 0/0/0 n.a. 2/0/0 0/0/0 1/0/0 0/0/0 2
79 Physcia adscendens 2 790 92 0.11735 0.137 4/0/0 32/0/0 22/0/0 9/0/0 18/0/4 7/0/2 3
86 Pycnora sorophora 2 565 4 0.00708 0.007 0/0/0 n.a. 0/0/0 0/0/0 0/4/0 0/0/0 0
95 Usnea intermedia 9 578 7 0.00596 0.01 0/0/0 n.a. 1/2/0 0/0/0 2/2/0 0/0/0 4
96 Usnea lapponica 6 579 8 0.00783 0.014 0/0/0 n.a. 2/2/0 0/0/0 3/1/0 0/0/0 4
Note: 1In presence of intraspecific variation, nucleotide variation within one of the “species” was compared: Bacidina arnoldiana subversions A1 & A2, Chaenotheca cf. stemonea
subversions A1, A2, B & C, Fellhanera bouteillei subversions A1 & A2 (for reference see Figures 3.–5.); 2quantity of transitions (TS), transversions (TV), and indels (Ind), divided
into exons and introns; 3variation in intron presence. Nucleotide variation of specimens in bold more likely represents genuine mutations, while in others is probably due to
PCR/sequencing errors.
Page 49 of 70
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
50
Figure 1 General overview of pyrosequencing reads, with (A) number of reads per
sample, (B) average read length, and (C) read quality distribution over read length.
Figure 2 Comparison of morphological species determination and the results from
sequence megaBLAST search in the NCBI nucleotide database for the studied 100
species. Only the best match accession identy was considered, and the sequence
similarity results were ignored. Diagram proportions are indicated for concordant
results at the species or the genus level (= correct species / = correct genus but
different species or species not present), divergent (= different genus and species), or
neutral results (= unidentified or uncultured fungus). For one species, no sequences
were recovered within the quality-sorted reads; this is noted as “no target”.
Figure 3 Maximum likelihood ITS gene tree of available Bacidina arnoldiana aggr.
sequences, with extended outgroup of closely related Bacidia species. Barcodes from
Bacidina arnoldiana aggr. (sample #6) are indicated with a shaded background. The
number of reads per barcode is given in brackets at the end of the name. Branches
leading to strongly supported nodes (≥70%) are marked in bold. The scale bar shows
the number of substitutions per site.
Figure 4 Maximum likelihood ITS gene tree of available Chaenotheca sequences,
including the barcodes from Chaenotheca cf. stemonea (sample #12, all with shaded
background). The number of reads per barcode is given in brackets at the end of the
name. Branches leading to strongly supported nodes (≥70%) are marked in bold. The
scale bar shows the number of substitutions per site.
Page 50 of 70
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
51
Figure 5 Maximum likelihood ITS gene tree of available Fellhanera sequences with
extended outgroup of closely related species, including the barcodes from Fellhanera
boutellei (sample #19; all with shaded background). The number of reads per barcode
is given in brackets at the end of the name. Branches leading to strongly supported
nodes (≥70%) are marked in bold. The scale bar shows the number of substitutions
per site.
Figure captions for supplementarty data:
Figure S1 ITS1 rDNA secondary structure of the dominant ITS type of Menegazzia
terebrata (sample #65) together with probabilities of fold formation. The red circle on
the RNA structure indicates the region of ambiguous bases. An alignment of the
alternate barcodes in this region is given below.
Page 51 of 70
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
146x118mm (300 x 300 DPI)
Page 52 of 70
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
64x48mm (300 x 300 DPI)
Page 53 of 70
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
224x202mm (300 x 300 DPI)
Page 54 of 70
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
314x328mm (300 x 300 DPI)
Page 55 of 70
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
275x276mm (300 x 300 DPI)
Page 56 of 70
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
Mark et al. “Barcoding lichen-forming fungi using 454 pyrosequencing is challenged by
artifactual and biological sequence variation”
PaPaRa alignments
Our priority in this paper was to present (hopefully) accurate barcodes for the sequenced
samples. Using the consensus sequence approach allowed us to ignore much of the
random noise within sequences, but it depends on the accuracy of a given alignment.
Carry-forward-incomplete-extension (CAFIE) errors are the result of incomplete dNTP
flushing, which then causes a nucleotide from the end of a homoploymer to be read some
bases later (e.g. a true sequence AAAATCG can be read as AAATCGA). Sequences with
this error cannot be removed through the quality filters, as it appears as a true base call;
when occurring repeatedly, this can be interpreted in an alignment as genuine
intragenomic variation. Alternatively to MAFFT, we applied the PaPaRa alignment1 that
should better account for erroneous insertions.
PaPaRa alignments with default parameters were conducted for 85 datasets, where
enough closely related reference sequences could be acquired from a public database
(GenBank). Reference trees for each reference alignment were produced with RAxML.
For 14 samples, there were fewer than four (minimum for RAxML tree construction)
reference sequences (i.e. sequences of approx. >90% similarity) available, and PaPaRa
alignments could not be performed.
We compared the consensus sequences from the PaPaRa alignments with the barcodes
generated from MAFFT alignments. In this comparison we found that for the majority of
analysed samples the consensus sequences from the two alignments were identical
(n=42). In 38 samples, one or more differences between the consensus sequences were
found. These differences resulted from the interpretation of homopolymer and/or
alignment errors.
In general, PaPaRa alignments were comparable to MAFFT alignments, but in regions
with high noise rate (i.e. read ends), PaPaRa produced more consistent alignments.
However, consistent alignments were produced in PaPaRa only when reference
sequences were as long as the alignment. In other words, for a good PaPaRa alignment,
reference sequences have to cover the full length of the matrix. PaPaRa fails to align
regions that are not covered by references; therefore, for five alignments a reasonable
consensus could not be produced by PaPaRa, and comparisons with the barcode could not
be made.
Additionally, PaPaRa follows rather strictly the reference alignment and the reference
tree. Its algorithm more accurately inserted gaps in noisy regions, but with true CAFIE
1 Berger, S.A., and Stamatakis, A. 2011. Aligning short reads to reference alignments and
trees. Bioinformatics 27.15: 2068−2075.
Page 57 of 70
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
errors, it produced a misleading alignment that we have illustrated in the figure below. In
the illustration, three clear CAFIE errors can be seen, and instead of making gap-rich
columns, an evolutionarily less likely alignment is produced by PaPaRa.
Fig. An alignment example of sequences with CAFIE errors (data from #69 Parmelia
saxatilis). PaPaRa follows strictly the gaps produced by the reference sequences, while
the MAFFT alignment minimizes the number of mutations.
It seems that using PaPaRa does not provide a benefit for our data sufficient to choose it
over the MAFFT alignment. The main reasons are: (1) when using PaPaRa, we lost
already almost 30% of the barcodes from missing or inappropriate reference sequences,
and the alignments of many others were far from perfect due to this reason. (2) The
original idea for choosing pyrosequencing over Sanger sequencing for barcoding lichen
fungi was to be able to produce barcodes for species that are more difficult to sequence
using the traditional way. If we need Sanger sequences of these samples as references for
alignment, then the original benefit of this approach is lost. (3) And finally, PaPaRa does
not seem to detect CAFIE errors in our data more efficiently. By contrast, it rather strictly
follows the gaps of references and suggests evolutionarily less likely mutations.
However, in other cases, where many good-quality reference sequences are available, it
could prove to be a better option than MAFFT.
REFERENCES
PAPARA
alignment
MAFFT
alignment
* * * * * *
Page 58 of 70
https://mc06.manuscriptcentral.com/genome-pubs
Genome
DraftT
A
C
CGAGGG
10
A
GG
GG
C
TTT
G
20
CGCTCC
C
GGG
30
GGCTC
CG G C C
40
C C C
A
A
C
T
C
T
T
50
C
AA
CC
C G T T G
60
CG
TA
C
C
T
AC C
70
T T C G
T
TG C
TT
80
TGGCGGGC
C
C
90
C
GGGG
CT C
G
CCCC
CGCCGGC
110
T
TT
T
GG
GC
CG
120
GC
GA
GC
GCCC
130
GCCGGAG
G
A C
140
T CT A
C
C A T T
C
150
T A TT
C
TGT
C
A
160
GT
GATGTC
C
G
170
AGT
ENERGY = 16.5
Probability >= 99%99% > Probability >= 95%95% > Probability >= 90%90% > Probability >= 80%80% > Probability >= 70%70% > Probability >= 60%60% > Probability >= 50%50% > Probability
Conse
nsus
Identi
ty
Page 59 of 70
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
The species and references of additional ITS sequences used for Bacidina (Fig. 5), Chaenotheca (Fig.
6), and Fellhanera (Fig. 7) phylogeny reconstructions.
# Species Voucher info GenBank # Reference
1 Bacidia auerswaldii Johansson 20 (UPS) AF282122 Ekman 2001
2 Bacidia bagliettoana Ekman 3137 (BG) AF282123 Ekman 2001
3 Bacidia caligans Johansson 21 (UPS) AF282096 Ekman 2001
4 Bacidia circumspecta Ekman L1330 (LD) AF282124 Ekman 2001
5 Bacidia medialis Ekman L1193 (LD) AF282102 Ekman 2001
6 Bacidia sp. DF223 (WSL) KX098339 Flück 2012
7 Bacidia sp. DF72 (WSL) KX098340 Flück 2012
8 Bacidia sp. DF80 (WSL) KX098341 Flück 2012
9 Bacidia subincompta Ekman 3413 (BG) AF282125 Ekman 2001
10 Bacidia subincompta DF231 (WSL) KX098342 Flück 2012
11 Bacidia vermifera Johansson 1619 (BG) AF282109 Ekman 2001
12 Bacidia vermifera E:DNA:EDNA09-01506 FR799130 Kelly et al. 2011
13 Bacidia vermifera E:DNA:EDNA09-02101 FR799131 Kelly et al. 2011
14 Bacidina adastra GPN 4961 JN972442 Czarnota and Guzow-
Krzemińska 2012
15 Bacidina arnoldiana Ekman 3157 (BG) AF282093 Ekman 2001
16 Bacidina arnoldiana AFTOL-ID 1845 HQ650650 Schmull et al. 2011
17 Bacidina arnoldiana GPN 5301 JN972439 Czarnota and Guzow-
Krzemińska 2012
18 Bacidina arnoldiana GPN 4895 JN972440 Czarnota and Guzow-
Krzemińska 2012
19 Bacidina arnoldiana GPN 5384 JN972441 Czarnota and Guzow-
Krzemińska 2012
20 Bacidina arnoldiana DF207 (WSL) KX098343 Flück 2012
21 Bacidina arnoldiana DF210 (WSL) KX098344 Flück 2012
22 Bacidina arnoldiana DF211a (WSL) KX098345 Flück 2012
23 Bacidina arnoldiana DF212 (WSL) KX098346 Flück 2012
24 Bacidina arnoldiana DF230 (WSL) KX098347 Flück 2012
25 Bacidina arnoldiana DF89 (WSL) KX098348 Flück 2012
26 Bacidina chloroticula Toensberg 18642 (BG) AF282098 Ekman 2001
27 Bacidina delicata 1996 Fritz (BG) AF282097 Ekman 2001
28 Bacidina delicata LG DNA 369 JQ796854 Sérusiaux et al. 2012
29 Bacidina egenula Ekman 3003 (BG) AF282095 Ekman 2001
30 Bacidina flavoleprosa GPN 5329 JN972443 Czarnota and Guzow-
Krzemińska 2012
31 Bacidina inundata Ekman 3187 (BG) AF282094 Ekman 2001
32 Bacidina neosquamulosa GPN 4888 JN972444 Czarnota and Guzow-
Krzemińska 2012
33 Bacidina neosquamulosa LG DNA 490 JQ796855 Sérusiaux et al. 2012
34 Bacidina phacodes Ekman 3414 (BG) AF282100 Ekman 2001
35 Bacidina sulphurella GPN 5967 JN972446 Czarnota and Guzow-
Krzemińska 2012
36 Bacidina sulphurella GPN 7246 JN972447 Czarnota and Guzow-
Krzemińska 2012
37 Bapalmuia palmularis Lucking 16003 (BG) AY756457 Andersen, unpubl.
38 Byssolecania variabilis Lucking 16033b (BG) AY756458 Andersen, unpubl.
39 Byssoloma leucoblepharum Ekman 3502 (BG) AY756459 Andersen, unpubl.
40 Byssoloma marginatum Tonsberg 27125 (BG) AY756460 Andersen, unpubl.
Page 60 of 70
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
41 Byssoloma subdiscordans Tonsberg 25968 (BG) AY756461 Andersen, unpubl.
42 Calopadia foliicola Lucking 16011 (BG) AY756462 Andersen, unpubl.
43 Chaenotheca brachypoda Tibell 17062 (UPS) AF297962 Tibell 2001b
44 Chaenotheca brachypoda Tibell 22193 (UPS) AF297963 Tibell 2001b
45 Chaenotheca brunneola Tibell 22202 (UPS) AF297964 Tibell 2001b
46 Chaenotheca cf. stemonea
Tibell 22113
Tibell 22113 (UPS) AF298135 Tibell 2001b
47 Chaenotheca chlorella Tibell 22187 (UPS) AF297966 Tibell 2001b
48 Chaenotheca chlorella Tibell 22372 (UPS) AF445356 Tibell and Beck 2001
49 Chaenotheca chrysocephala Tibell 22162 (UPS) AF298120 Tibell 2001b
50 Chaenotheca chrysocephala Tibell 21799 (UPS) AF298121 Tibell 2001b
51 Chaenotheca cinerea Jonsson & Nordin (UPS) AF298122 Tibell 2001b
52 Chaenotheca cinerea Tibell 22374 (UPS) AF421201 Tibell 2002
53 Chaenotheca deludens Tibell 16575 AF408678 Tibell 2001a
54 Chaenotheca ferruginea Tibell 22276 (UPS) AF298123 Tibell 2001b
55 Chaenotheca ferruginea DF82 (WSL) KX098349 Flück 2012
56 Chaenotheca furfuracea Thor 15698 AF298124 Tibell 2001b
57 Chaenotheca furfuracea Tibell 22364 AF445357 Tibell and Beck 2001
58 Chaenotheca furfuracea Wedin 6366 (UPS) JX000101 Prieto et al. 2013
59 Chaenotheca furfuracea DF251 (WSL) KX098350 Flück 2012
60 Chaenotheca furfuracea DF252 (WSL) KX098351 Flück 2012
61 Chaenotheca gracilenta Wedin 7022 (S) JX000100 Prieto et al. 2013
62 Chaenotheca gracillima Tibell 17943 (UPS) AF298126 Tibell 2001b
63 Chaenotheca gracillima Tibell 17052 (UPS) AF298127 Tibell 2001b
64 Chaenotheca gracillima Tibell 17614 (UPS) AF408679 Tibell 2001a
65 Chaenotheca gracillima Tibell 22112 (UPS) AF408680 Tibell 2001a
66 Chaenotheca gracillima Tibell 18931 (UPS) AF408681 Tibell 2001a
67 Chaenotheca gracillima Tibell 16725 (UPS) AF408682 Tibell 2001a
68 Chaenotheca hispidula Tibell 21900 (UPS) AF298128 Tibell 2001b
69 Chaenotheca hygrophila Thor 15612 AF298129 Tibell 2001b
70 Chaenotheca laevigata Tibell 22176 (UPS) AF298130 Tibell 2001b
71 Chaenotheca laevigata Tibell 21998b (UPS) AF298131 Tibell 2001b
72 Chaenotheca nitidula Koffman 222 (UPS) AF492386 Tibell and Koffman 2002
73 Chaenotheca nitidula Koffman 170 (UPS) AF492387 Tibell and Koffman 2002
74 Chaenotheca nitidula Tibell 21490 (UPS) AF492388 Tibell and Koffman 2002
75 Chaenotheca phaeocephala Tibell 16885 (UPS) AF298132 Tibell 2001b
76 Chaenotheca phaeocephala Tibell 22201 (UPS) AF298133 Tibell 2001b
77 Chaenotheca phaeocephala Hermansson 8290 (UPS) AF445358 Tibell and Beck 2001
78 Chaenotheca phaeocephala Hermansson 8428 (UPS) AF445359 Tibell and Beck 2001
79 Chaenotheca phaeocephala Tibell 21819 (UPS) AF445360 Tibell and Beck 2001
80 Chaenotheca phaeocephala Tibell 22291 (UPS) AF446045 Tibell and Beck 2001
81 Chaenotheca sphaerocephala Tibell 21939 (UPS) AF298134 Tibell 2001b
82 Chaenotheca stemonea Tibell 22191 (UPS) AF408683 Tibell 2001a
83 Chaenotheca subroscida Tibell 22150 (UPS) AF298136 Tibell 2001b
84 Chaenotheca subroscida Tibell 22161 (UPS) AF298137 Tibell 2001b
85 Chaenotheca subroscida Tibell 22167 (UPS) AF446047 Tibell and Beck 2001
86 Chaenotheca subroscida Tibell 22365 (UPS) AF446048 Tibell and Beck 2001
87 Chaenotheca trichialis Tibell 16878 (UPS) AF298139 Tibell 2001b
88 Chaenotheca trichialis Tibell 22355 AF410674 Tibell 2001a
89 Chaenotheca trichialis Tibell 22300 (UPS) AF421203 Tibell 2002
Page 61 of 70
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
Andersen, H.L. Phylogeny and classification of Micarea. Unpubl.
Czarnota, P. and Guzow-Krzemińska, B. 2012. ITS rDNA data confirm a delimitation of Bacidina arnoldiana and B.
sulphurella and support a description of a new species within the genus Bacidina. The Lichenologist, 44: 743–755.
Ekman, S. 2001. Molecular phylogeny of the Bacidiaceae (Lecanorales, lichenized Ascomycota). Mycol Res, 105: 783–
797.
Flück, D. 2012. DNA barcoding of Swiss epiphytic crustose lichens – a feasibility study. M.Sc. thesis. Academic
Deparment, University of Bern, Bern, Switzerland.
Grube, M. 2001. A simple method to prepare foliicolous lichens for anatomical and molecular studies. The Lichenologist,
33: 547–550.
Kelly, L.J., Hollingsworth, P.M., Coppins, B.J., Ellis, C.J., Harrold, P., Tosh, J., and Yahr, R. 2011. DNA barcoding of
lichenized fungi demonstrates high identification success in a floristic context. New Phytol., 191: 288–300.
Prieto, M., Baloch, E., Tehler, A., and Wedin, M. 2013. Mazaedium evolution in the Ascomycota (Fungi) and the
classification of mazaediate groups of formerly unclear relationship. Cladistics, 29: 296–308.
Schmull, M., Miadlikowska, J., Pelzer, M., Stocker-Wörgötter, E., Hofstetter, V., Fraker, E., Hodkinson, B.P., Reeb, V.,
Kukwa, M., and Lumbsch, H.T. 2011. Phylogenetic affiliations of members of the heterogeneous lichen-forming
fungi of the genus Lecidea sensu Zahlbruckner (Lecanoromycetes, Ascomycota). Mycologia, 103: 983–1003.
Sérusiaux, E., van den Boom, P.P., Brand, M.A., Coppins, B.J., and Magain, N. 2012. Lecania falcata, a new species from
Spain, the Canary Islands and the Azores, close to Lecania chlorotiza. The Lichenologist, 44: 577–590.
Tibell, L. 2001a. Cybebe gracilenta in an ITS/5.8 S rDNA based phylogeny belongs to Chaenotheca (Coniocybaceae,
lichenized Ascomycetes). The Lichenologist, 33: 519–525.
Tibell, L. 2001b. Photobiont association and molecular phylogeny of the lichen genus Chaenotheca. The Bryologist, 104:
191–198.
Tibell, L. 2002. Morphological variation and ITS phylogeny of Chaenotheca trichialis and C. xyloxena (Coniocybaceae,
lichenized ascomycetes). Ann. Bot. Fenn., 39: 73–80.
Tibell, L. and Beck, A. 2001. Morphological variation, photobiont association and ITS phylogeny of Chaenotheca
phaeocephala and C. subroscida (Coniocybaceae, lichenized ascomycetes). Nord. J. Bot., 21: 651–660.
Tibell, L. and Koffman, A. 2002. Chaenotheca nitidula, a new species of calicioid lichen from northeastern North America.
The Bryologist, 105: 353-357.
90 Chaenotheca trichialis Tibell 22090 (UPS) AF421205 Tibell 2002
91 Chaenotheca trichialis Selva 8274a (UMFK) AF421206 Tibell 2002
92 Chaenotheca trichialis Prieto 3028 (S) JX000102 Prieto et al. 2013
93 Chaenotheca xyloxena Tibell 18431 (UPS) AF298138 Tibell 2001b
94 Chaenotheca xyloxena Tibell 22188 (UPS) AF298140 Tibell 2001b
95 Chaenotheca xyloxena Tibell 16673 (UPS) AF421208 Tibell 2002
96 Chaenotheca xyloxena Tibell 16605 (UPS) AF421209 Tibell 2002
97 Chaenotheca xyloxena Tibell 22361 (UPS) AF421211 Tibell 2002
98 Chaenotheca xyloxena Tibell 22329 (UPS) AF421212 Tibell 2002
99 Chaenotheca xyloxena Selva 7753 (UMFK) AF421213 Tibell 2002
100 Fellhanera bouteillei AF414858 Grube 2001
101 Fellhanera bouteillei Ekman 3417 (BG) AY756463 Andersen, unpubl.
102 Fellhanera subtilis Tonsberg 28199 (BG) AY756464 Andersen, unpubl.
103 Fellhanera viridisorediata Tonsberg 27375 (BG) AY756465 Andersen, unpubl.
104 Fellhaneropsis myrtillicola Tonsberg 25311 (BG) AY756466 Andersen, unpubl.
105 Lasioloma arachnoideum Lucking 16005 (BG) AY756467 Andersen, unpubl.
106 Micarea clavopycnidiata Tonsberg 27215 (BG) AY756473 Andersen, unpubl.
107 Micarea prasinella McCune 25337 (BG) AY756491 Andersen, unpubl.
108 Sporopodium antoninianum Lucking 16002d (BG) AY756498 Andersen, unpubl.
109 Szczawinskia tsugae Tonsberg 30044 (BG) AY756499 Andersen, unpubl.
Page 62 of 70
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
GenBank megablastBLAST best hit results for sequenced species’ primary barcodes (queries on 11
January 2015).
GenBank BLAST result
# Barcode Organism
Accession
No
Max
score
Query
coverage (%)
Identity
(%)
1 Alectoria sarmentosa versA Uncultured Alectoria JN397372 990 96 99
2 Anaptychia crinalis Anaptychia ciliaris KJ027714 571 56 96
3 Arthrosporum populorum Arthrosporum populorum AF282106 911 85 99
4 Bacidia rubella Bacidia rubella EU266078 1110 82 98
5 Bacidia vermifera Bacidia vermifera FR799128 856 89 96
6 Bacidina arnoldiana aggr. versA1 Bacidina neosquamulosa JN972444 817 100 92
7 Bacidina arnoldiana aggr. Bacidina neosquamulosa JN972444 965 70 98
8 Bryoria capillaris versA Bryoria fuscescens DQ534452 1456 100 99
9 Buellia arborea versA Buellia penichra AF540503 785 85 95
10 Bunodophoron melanocarpum Sphaerophorus globosus DQ534485 684 100 88
11 Cetrelia monachorum Cetrelia olivetorum HQ671305 989 98 98
12 Chaenotheca cf. stemonea versC Chaenotheca cf.
stemonea 'Tibell 22113'
AF298135 737 84 92
13 Cladonia chlorophaea Cladonia sp. KC-2014 KM245923 1434 99 97
14 Cladonia coniocraea Cladonia gracilis JN863236 1550 100 99
15 Cladonia digitata versA Cladonia borealis JN863286 1081 96 98
16 Cladonia squamosa Cladonia cenotea FN868596 1447 98 97
17 Evernia divaricata Evernia divaricata AF297741 979 93 99
18 Evernia prunastri Evernia prunastri HQ650611 1037 97 99
19 Fellhanera bouteillei versA1 Fellhanera bouteillei AF414858 773 100 90
20 Flavoparmelia caperata Flavoparmelia sp. Hur
CH080070
HQ671302 2002 93 99
21 Frutidella pullata Lecidea pullata HQ650669 1367 90 99
22 Fuscidea arboricola No barcode
23 Fuscidea pusilla Lasallia papulosa HQ650603 388 72 85
24 Hyperphyscia adglutinata Hyperphyscia adglutinata AF224361 880 63 98
25 Hypocenomyce scalaris Hypocenomyce scalaris HQ650632 974 97 99
26 Hypogymnia farinacea Hypogymnia tubulosa HQ725075 915 94 97
27 Hypogymnia physodes Uncultured ascomycete AM901734 1474 100 99
28 Hypogymnia tubulosa Uncultured ascomycete AM901734 1048 100 90
29 Hypotrachyna laevigata versA Hypotrachyna laevigata AY611074 928 86 99
30 Lecania cf. cyrtella versA Uncultured fungus clone KC965610 808 69 93
31 Lecanora albella Lecanora albella AY541241 907 92 97
32 Lecanora allophana f. sorediata Lecanora allophana AF070031 909 62 99
33 Lecanora argentata/subrugosa Lecanora subrugosa AY398711 1003 92 99
34 Lecanora carpinea Lecanora carpinea FR799199 961 63 99
35 Lecanora horiza Lecanora horiza AY541252 959 92 98
36 Lecanora impudens Lecanora allophana AF070031 922 62 99
37 Lecanora muralis Lecanora muralis FJ497040 939 100 95
38 Lecanora praesistens Uncultured Rhizoplaca
clone 19A
EF095283 660 95 82
39 Lecanora pulicaris Uncultured fungus clone JN904337 861 58 99
40 Lecanora saligna Lecanora saligna AF189716 942 85 99
41 Lecanora sp. Uncultured fungus clone JN905599 841 62 97
Page 63 of 70
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
42 Lecanora strobilina aggr. Lecanora symmicta AF070024 848 85 97
43 Lecanora subcarpinea versA1 Lecanora subcarpinea AY541269 974 92 99
44 Lecanora varia Lecanora varia AF070027 918 62 99
45 Lecidella scabra versO Uncultured fungus clone KC965645 1029 97 99
46 Lecidella albida Rhizoplaca macleanii JX036146 462 69 90
47 Lecidella cf. elaeochroma Uncultured fungus clone KC965645 953 97 96
48 Lecidella cf. leprothalla Uncultured fungus clone KC966001 944 100 95
49 Lecidella flavosorediata Uncultured fungus clone KC966001 931 100 95
50 Lecidella flavosorediata Uncultured fungus clone KC966001 944 100 95
51 Lecidella sp. versA Uncultured fungus clone KC966001 959 100 96
52 Lepraria elobata Lepraria elobata EF619545 1098 100 99
53 Lepraria jackii Lepraria jackii EF619567 1110 100 100
54 Lepraria lobificans versA Lepraria lobificans HQ650623 891 97 96
55 Lepraria rigidula versA Lepraria rigidula EF619561 1072 100 99
56 Lepraria vouauxii Lepraria membranacea KF682452 880 100 93
57 Letharia vulpina Letharia vulpina AF228470 1050 100 99
58 Loxospora elatina Loxospora ochrophaea HQ650641 1031 100 99
59 Loxospora elatina Loxospora ochrophaea HQ650641 1031 100 99
60 Megalaria pulverea Megalaria pulverea FR799227 933 89 99
61 Melanelixia fuliginosa subsp.
glabratula
Melanelixia fuliginosa EU034667 1391 93 100
62 Melanelixia glabra Melanelixia glabra EU761216 924 86 99
63 Melanelixia subargentifera Melanelixia albertana GU994560 922 95 97
64 Melanohalea exasperata versA Melanohalea exasperata AY611083 867 62 98
65 Menegazzia terebrata versC4 Menegazzia terebrata GU994567 990 93 99
66 Micarea cinerea Micarea alabastrites AY756469 632 88 89
67 Micarea xanthonica Ascomycota sp. CH-
Nl2LBM
FJ008690 523 71 85
68 Mycoblastus sanguinarius Mycoblastus
sanguinarius
JF744920 994 90 99
69 Parmelia saxatilis versA Parmelia saxatilis EU034668 922 96 97
70 Parmelia sulcata versA Myelochroa aurulenta HQ671303 1075 99 99
71 Parmelina tiliacea Parmelina tiliacea JX466454 1057 99 99
72 Parmeliopsis ambigua Uncultured Ascomycota FR682210 996 93 99
73 Parmotrema crinitum Parmotrema crinitum AY586565 977 87 99
74 Parmotrema perlatum Parmotrema crinitum AY586565 944 87 99
75 Phaeophyscia ciliata Phaeophyscia ciliata AF224447 1696 95 99
76 Phaeophyscia orbicularis Phaeophyscia orbicularis JQ301694 1683 94 99
77 Phaeophyscia poeltii Phaeophyscia denigrata EU670226 1053 98 91
78 Phlyctis argena Phlyctis argena FR799264 902 55 99
79 Physcia adscendens versA Physcia adscendens FJ497044 1437 100 99
80 Physcia aipolia Physcia caesia DQ534476 1266 100 96
81 Physconia distorta Physconia distorta EU266086 1365 77 99
82 Placynthiella dasaea Placynthiella uliginosa HQ650633 1127 98 90
83 Pleurosticta acetabulum Pleurosticta acetabulum AF058034 1011 96 99
84 Pseudevernia furfuracea Pseudevernia furfuracea GU300779 959 98 98
85 Punctelia jeckeri Punctelia jeckeri AF494385 1531 93 99
86 Pycnora sorophora versA Pycnora sorophora KF360404 821 79 99
Page 64 of 70
https://mc06.manuscriptcentral.com/genome-pubs
Genome
DraftNote: ITS sequences of species previously not represented in GenBank are marked in bold.
87 Ramalina pollinaria Ramalina pollinaria JF923612 1339 100 98
88 Scoliciosporum umbrinum Uncultured fungus clone
S335
FJ820822 1105 100 100
89 Scoliciosporum umbrinum Uncultured fungus clone
S335
FJ820822 905 100 94
90 Tuckermannopsis chlorophylla Tuckermannopsis
chlorophylla
AF192413 994 97 99
91 Usnea barbata Usnea intermedia JN086313 987 93 99
92 Usnea barbata Usnea intermedia JN086313 970 93 99
93 Usnea ceratina Usnea ceratina AJ457142 977 92 99
94 Usnea intermedia Usnea intermedia JN086313 987 93 99
95 Usnea intermedia versA Usnea intermedia JN086313 970 93 99
96 Usnea lapponica versA Usnea lapponica JN086316 992 93 99
97 Usnea substerilis Usnea wasmuthii JN086334 952 93 99
98 Violella fucata Violella fucata JN009733 1458 95 99
99 Violella fucata Mycoblastus fucatus JF744968 922 85 99
100 Vulpicida pinastri Vulpicida pinastri AF072239 1347 96 98
Page 65 of 70
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
List of specimens with multiple barcodes generated in this study. Primary barcodes representing the specimens in BOLD and available in
GenBank are marked in bold (refer to the Table 1 for GB and BOLD codes). The hypothetical source for the alternative barcodes is marked
in the Notes column (data accessible in PlutoF database: https://plutof.ut.ee/#/study/view/31890).
Specimen
Code 454 Barcodes
No of
reads Notes Taxon
LC-064 Alectoria_sarmentosa_versA 43 primary barcode Alectoria sarmentosa
Alectoria_sarmentosa_versB 42 alternate barcode version: intragenomic variation Alectoria sarmentosa
Alectoria_sarmentosa_versC 5 alternate barcode version: intragenomic variation Alectoria sarmentosa
Alectoria_sarmentosa_versD 10 alternate barcode version: intragenomic variation Alectoria sarmentosa
Alectoria_sarmentosa_versE 41 alternate barcode version: CAFIE errors Alectoria sarmentosa
Alectoria_sarmentosa_versF 51 alternate barcode version: CAFIE errors Alectoria sarmentosa
Alectoria_sarmentosa_versG 6 alternate barcode version: CAFIE errors Alectoria sarmentosa
Alectoria_sarmentosa_versH 12 alternate barcode version: CAFIE errors Alectoria sarmentosa
RE-515-19 Bacidia_sp1_versA 20 alternate barcode version: intraspecific and intragenomic variation Bacidia rubella
Bacidia_sp1_versB 7 alternate barcode version: intraspecific and intragenomic variation Bacidia rubella
Bacidia_sp2 14 alternate barcode version: intraspecific and intragenomic variation Biatora beckhausii
Bacidia_vermifera_versA 1763 primary barcode 1Bacidia vermifera
Bacidia_vermifera_versB 50 alternate barcode version: intragenomic variation Bacidia vermifera
BC-038-1 Bacidina_arnoldiana_versA1 46 primary barcode 1Bacidina arnoldiana sp1
Bacidina_arnoldiana_versA2 6 alternate barcode version: intraspecific and intragenomic variation Bacidina arnoldiana sp1
Bacidina_arnoldiana_versB1 45 alternate barcode version: intraspecific and intragenomic variation Bacidina arnoldiana sp1
Bacidina_arnoldiana_versB2 3 alternate barcode version: intraspecific and intragenomic variation Bacidina arnoldiana sp1
Bacidina_arnoldiana_versC 8 alternate barcode version: intraspecific and intragenomic variation 1Bacidina arnoldiana sp3
LC-058 Bryoria_capillaris_versA 123 primary barcode Bryoria capillaris
Bryoria_capillaris_versB 26 alternate barcode version: intragenomic variation Bryoria capillaris
Bryoria_capillaris_versC 39 alternate barcode version: intragenomic variation Bryoria capillaris
Bryoria_capillaris_versD 139 alternate barcode version: intragenomic variation Bryoria capillaris
BC-154-2 Buellia_arborea_versA 225 primary barcode Buellia arborea
Buellia_arborea_versB 48 alternate barcode version: CAFIE errors (?) Buellia arborea
BC-087-3 Chaenotheca_cf_stemonea_versA1 15 alternate barcode version: intragenomic variation 2Chaenotheca cf. stemonea
Chaenotheca_cf_stemonea_versA2 9 alternate barcode version: intragenomic variation Chaenotheca cf. stemonea
Chaenotheca_cf_stemonea_versB 7 alternate barcode version: intragenomic variation Chaenotheca cf. stemonea
Chaenotheca_cf_stemonea_versC 75 primary barcode Chaenotheca cf. stemonea
Chaenotheca_sp_versA1 8 alternate barcode version: intraspecific and intragenomic variation 2Chaenotheca xyloxena/trichialis
Chaenotheca_sp_versA2 9 alternate barcode version: intraspecific and intragenomic variation Chaenotheca xyloxena/trichialis
Page 66 of 70
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
Chaenotheca_sp_versB1 3 alternate barcode version: CAFIE errors (?) Chaenotheca xyloxena/trichialis
Chaenotheca_sp_versB2 9 alternate barcode version: CAFIE errors (?) Chaenotheca xyloxena/trichialis
Chaenotheca_sp_versB3 10 alternate barcode version: intraspecific and intragenomic variation Chaenotheca xyloxena/trichialis
Chaenotheca_sp_versB4 10 alternate barcode version: intraspecific and intragenomic variation Chaenotheca xyloxena/trichialis
LC-067 Cladonia_digitata_versA 167 primary barcode Cladonia digitata
Cladonia_digitata_versB 60 alternate barcode version: intragenomic variation/sequencing errors Cladonia digitata
Cladonia_digitata_versC 56 alternate barcode version: intragenomic variation/sequencing errors Cladonia digitata
Cladonia_digitata_versD 46 alternate barcode version: intragenomic variation/sequencing errors Cladonia digitata
LC-118 Fellhanera_bouteillei_versA1 106 primary barcode 3Fellhanera bouteillei s.lat
Fellhanera_bouteillei_versA2 20 alternate barcode version: intragenomic variation Fellhanera bouteillei s.lat
Fellhanera_bouteillei_versB 3 alternate barcode version: intraspecific variation 3Fellhanera bouteillei s.str.
KM-03-05 Hypotrachyna_laevigata_versA 496 primary barcode Hypotrachyna laevigata
Hypotrachyna_laevigata_versB 120 alternate barcode version: CAFIE/homopolymer errors Hypotrachyna laevigata
Hypotrachyna_laevigata_versC 241 alternate barcode version: CAFIE/homopolymer errors Hypotrachyna laevigata
LC-023 Lecania_cf_cyrtella_versA 292 primary barcode Lecania cf. cyrtella
Lecania_cf_cyrtella_versB 89 alternate barcode version: intragenomic variation Lecania cf. cyrtella
LC-090 Lecanora_subcarpinea_versA1 474 primary barcode Lecanora subcarpinea
Lecanora_subcarpinea_versA2 424 alternate barcode version: CAFIE errors+intragenomic variation (?) Lecanora subcarpinea
Lecanora_subcarpinea_versA3 168 alternate barcode version: CAFIE errors+intragenomic variation (?) Lecanora subcarpinea
Lecanora_subcarpinea_versA4 109 alternate barcode version: CAFIE errors+intragenomic variation (?) Lecanora subcarpinea
Lecanora_subcarpinea_versB1 6 alternate barcode version: CAFIE errors+intragenomic variation (?) Lecanora subcarpinea
Lecanora_subcarpinea_versB2 10 alternate barcode version: CAFIE errors+intragenomic variation (?) Lecanora subcarpinea
Lecanora_subcarpinea_versB3 36 alternate barcode version: CAFIE errors+intragenomic variation (?) Lecanora subcarpinea
Lecanora_subcarpinea_versC1 2 alternate barcode version: CAFIE errors+intragenomic variation (?) Lecanora subcarpinea
Lecanora_subcarpinea_versC2 2 alternate barcode version: CAFIE errors+intragenomic variation (?) Lecanora subcarpinea
Lecanora_subcarpinea_versC3 1 alternate barcode version: CAFIE errors+intragenomic variation (?) Lecanora subcarpinea
Lecanora_subcarpinea_versD1 2 alternate barcode version: CAFIE errors+intragenomic variation (?) Lecanora subcarpinea
Lecanora_subcarpinea_versD2 1 alternate barcode version: CAFIE errors+intragenomic variation (?) Lecanora subcarpinea
Lecanora_subcarpinea_versD3 1 alternate barcode version: CAFIE errors+intragenomic variation (?) Lecanora subcarpinea
Lecanora_subcarpinea_versD4 2 alternate barcode version: CAFIE errors+intragenomic variation (?) Lecanora subcarpinea
Lecanora_subcarpinea_versE1 1 alternate barcode version: CAFIE errors+intragenomic variation (?) Lecanora subcarpinea
Lecanora_subcarpinea_versE2 1 alternate barcode version: CAFIE errors+intragenomic variation (?) Lecanora subcarpinea
Lecanora_subcarpinea_versE3 7 alternate barcode version: CAFIE errors+intragenomic variation (?) Lecanora subcarpinea
Lecanora_subcarpinea_versE4 4 alternate barcode version: CAFIE errors+intragenomic variation (?) Lecanora subcarpinea
BC-055-4 Lecidella_scabra_versA 5 alternate barcode version: intragenomic variation Lecidella scabra
Page 67 of 70
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
Lecidella_scabra_versB 3 alternate barcode version: intragenomic variation Lecidella scabra
Lecidella_scabra_versC 1 alternate barcode version: intragenomic variation Lecidella scabra
Lecidella_scabra_versD 1 alternate barcode version: intragenomic variation Lecidella scabra
Lecidella_scabra_versE 1 alternate barcode version: intragenomic variation Lecidella scabra
Lecidella_scabra_versF 1 alternate barcode version: intragenomic variation Lecidella scabra
Lecidella_scabra_versG 4 alternate barcode version: intragenomic variation Lecidella scabra
Lecidella_scabra_versH 3 alternate barcode version: intragenomic variation Lecidella scabra
Lecidella_scabra_versI 16 alternate barcode version: intragenomic variation Lecidella scabra
Lecidella_scabra_versJ 2 alternate barcode version: intragenomic variation Lecidella scabra
Lecidella_scabra_versK 1 alternate barcode version: intragenomic variation Lecidella scabra
Lecidella_scabra_versL 1 alternate barcode version: intragenomic variation Lecidella scabra
Lecidella_scabra_versM 12 alternate barcode version: intragenomic variation Lecidella scabra
Lecidella_scabra_versN 22 alternate barcode version: intragenomic variation Lecidella scabra
Lecidella_scabra_versO 202 primary barcode Lecidella scabra
Lecidella_scabra_versP 2 alternate barcode version: intragenomic variation Lecidella scabra
Lecidella_scabra_versQ 1 alternate barcode version: intragenomic variation Lecidella scabra
Lecidella_scabra_versR 1 alternate barcode version: intragenomic variation Lecidella scabra
Lecidella_scabra_versS 6 alternate barcode version: intragenomic variation Lecidella scabra
Lecidella_scabra_versT 37 alternate barcode version: intragenomic variation Lecidella scabra
LC-088 Lecidella_sp_versA 138 primary barcode Lecidella sp.
Lecidella_sp_versB 9 alternate barcode version: intragenomic variation Lecidella sp.
Lecidella_sp_versC 5 alternate barcode version: intragenomic variation Lecidella sp.
Lecidella_sp_versD 2 alternate barcode version: intragenomic variation Lecidella sp.
Lecidella_sp_versE 2 alternate barcode version: intragenomic variation Lecidella sp.
Lecidella_sp_versF 2 alternate barcode version: intragenomic variation Lecidella sp.
Lecidella_sp_versG 1 alternate barcode version: intragenomic variation Lecidella sp.
Lecidella_sp_versH 1 alternate barcode version: intragenomic variation Lecidella sp.
Lecidella_sp_versI 1 alternate barcode version: intragenomic variation Lecidella sp.
BC-133-3 Lepraria_lobificans_versA 135 primary barcode Lepraria lobificans
Lepraria_lobificans_versB 107 alternate barcode version: CAFIE errors Lepraria lobificans
Lepraria_lobificans_versC 20 alternate barcode version: CAFIE errors Lepraria lobificans
Lepraria_lobificans_versD 20 alternate barcode version: CAFIE errors Lepraria lobificans
Lepraria_lobificans_versE 42 alternate barcode version: CAFIE errors Lepraria lobificans
Lepraria_lobificans_versF 25 alternate barcode version: CAFIE errors Lepraria lobificans
BC-056-2 Lepraria_rigidula_versA 100 primary barcode Lepraria rigidula
Page 68 of 70
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
Lepraria_rigidula_versB 21 alternate barcode version: intragenomic variation Lepraria rigidula
Lepraria_rigidula_versC 5 alternate barcode version: intragenomic variation Lepraria rigidula
Lepraria_rigidula_versD 5 alternate barcode version: intragenomic variation Lepraria rigidula
Lepraria_rigidula_versE 8 alternate barcode version: intragenomic variation Lepraria rigidula
Lepraria_rigidula_versF 5 alternate barcode version: intragenomic variation Lepraria rigidula
LC-028 Melanohalea_exasperata_versA 270 primary barcode Melanohalea exasperata
Melanohalea_exasperata_versB 60 alternate barcode version: intragenomic variation Melanohalea exasperata
LC-054 Menegazzia_terebrata_versA1 56 alternate barcode version: PCR/sequencing errors Menegazzia terebrata
Menegazzia_terebrata_versA2 90 alternate barcode version: PCR/sequencing errors Menegazzia terebrata
Menegazzia_terebrata_versA3 23 alternate barcode version: PCR/sequencing errors Menegazzia terebrata
Menegazzia_terebrata_versA4 46 alternate barcode version: PCR/sequencing errors Menegazzia terebrata
Menegazzia_terebrata_versA5 9 alternate barcode version: PCR/sequencing errors Menegazzia terebrata
Menegazzia_terebrata_versA6 30 alternate barcode version: PCR/sequencing errors Menegazzia terebrata
Menegazzia_terebrata_versA7 44 alternate barcode version: PCR/sequencing errors Menegazzia terebrata
Menegazzia_terebrata_versB1 49 alternate barcode version: PCR/sequencing errors Menegazzia terebrata
Menegazzia_terebrata_versB2 26 alternate barcode version: PCR/sequencing errors Menegazzia terebrata
Menegazzia_terebrata_versB3 19 alternate barcode version: PCR/sequencing errors Menegazzia terebrata
Menegazzia_terebrata_versC1 39 alternate barcode version: PCR/sequencing errors Menegazzia terebrata
Menegazzia_terebrata_versC2 23 alternate barcode version: PCR/sequencing errors Menegazzia terebrata
Menegazzia_terebrata_versC3 11 alternate barcode version: PCR/sequencing errors Menegazzia terebrata
Menegazzia_terebrata_versC4 93 primary barcode Menegazzia terebrata
Menegazzia_terebrata_versD 70 alternate barcode version: PCR/sequencing errors Menegazzia terebrata
Menegazzia_terebrata_versE 48 alternate barcode version: PCR/sequencing errors Menegazzia terebrata
LC-060 Parmelia_saxatilis_versA 125 primary barcode Parmelia saxatilis
Parmelia_saxatilis_versB 116 alternate barcode version: CAFIE errors Parmelia saxatilis
LC-004 Parmelia_sulcata_versA 212 primary barcode Parmelia sulcata
Parmelia_sulcata_versB 188 alternate barcode version: intragenomic variation Parmelia sulcata
KM-01-2B Physcia_adscendens_versA 337 primary barcode Physcia adscendens
Physcia_adscendens_versB 118 alternate barcode version: pseudogene (?) Physcia adscendens
BC-156-1 Pycnora_sorophora_versA 296 primary barcode Pycnora sorophora
Pycnora_sorophora_versB 41 alternate barcode version: CAFIE errors (?) Pycnora sorophora
KM-03-02 Usnea_intermedia_versA 1101 primary barcode Usnea intermedia
Usnea_intermedia_versB1 9 alternate barcode version: intragenomic variation Usnea intermedia
Usnea_intermedia_versB2 16 alternate barcode version: intragenomic variation Usnea intermedia
Usnea_intermedia_versB3 6 alternate barcode version: intragenomic variation Usnea intermedia
Page 69 of 70
https://mc06.manuscriptcentral.com/genome-pubs
Genome
Draft
Usnea_intermedia_versB4 7 alternate barcode version: intragenomic variation Usnea intermedia
Usnea_intermedia_versB5 30 alternate barcode version: intragenomic variation Usnea intermedia
Usnea_intermedia_versB6 66 alternate barcode version: intragenomic variation Usnea intermedia
Usnea_intermedia_versB7 10 alternate barcode version: intragenomic variation Usnea intermedia
Usnea_intermedia_versB8 20 alternate barcode version: intragenomic variation Usnea intermedia
LC-013 Usnea_lapponica_versA 1264 primary barcode Usnea lapponica
Usnea_lapponica_versB1 92 alternate barcode version: intragenomic variation Usnea lapponica
Usnea_lapponica_versB2 24 alternate barcode version: intragenomic variation Usnea lapponica
Usnea_lapponica_versB3 16 alternate barcode version: intragenomic variation Usnea lapponica
Usnea_lapponica_versB4 10 alternate barcode version: intragenomic variation Usnea lapponica
Usnea_lapponica_versB5 24 alternate barcode version: intragenomic variation Usnea lapponica 1For reference see Fig. 3;
2 see Fig. 4;
3 see Fig. 5.
Page 70 of 70
https://mc06.manuscriptcentral.com/genome-pubs
Genome