linnea guldbrand - diva1038999/fulltext02.pdf · skolan fÖr bioteknologi. . 1 ... pcr primers were...

51
INOM EXAMENSARBETE BIOTEKNIK, AVANCERAD NIVÅ, 30 HP , STOCKHOLM SVERIGE 2016 Development of a massive parallel sequencing method for population genetics, for the sequencing of 1,000 dog mitochondrial genomes per Miseq run, based on nested and multiplexed PCR amplification and PCR-incorporated dual-index identification barcodes LINNEA GULDBRAND KTH SKOLAN FÖR BIOTEKNOLOGI

Upload: hadung

Post on 09-Apr-2019

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: LINNEA GULDBRAND - Diva1038999/FULLTEXT02.pdf · SKOLAN FÖR BIOTEKNOLOGI. . 1 ... PCR primers were constructed both for initial long-range products and for shorter fragments,

INOM EXAMENSARBETE BIOTEKNIK,AVANCERAD NIVÅ, 30 HP

, STOCKHOLM SVERIGE 2016

Development of a massive parallel sequencing method for population genetics, for the sequencing of 1,000 dog mitochondrial genomes per Miseq run, based on nested and multiplexed PCR amplification and PCR-incorporated dual-index identification barcodes

LINNEA GULDBRAND

KTHSKOLAN FÖR BIOTEKNOLOGI

Page 2: LINNEA GULDBRAND - Diva1038999/FULLTEXT02.pdf · SKOLAN FÖR BIOTEKNOLOGI. . 1 ... PCR primers were constructed both for initial long-range products and for shorter fragments,

www.kth.se

Page 3: LINNEA GULDBRAND - Diva1038999/FULLTEXT02.pdf · SKOLAN FÖR BIOTEKNOLOGI. . 1 ... PCR primers were constructed both for initial long-range products and for shorter fragments,

1

Development of a massive parallel sequencing method for

population genetics, for the sequencing of 1,000 dog mitochondrial genomes per Miseq run, based on nested

and multiplexed PCR amplification and PCR-incorporated dual-index identification barcodes

Linnea Guldbrand

Master Thesis at the School of Biotechnology, KTH Royal Institute of Technology,

Department of Gene Technology, SciLifeLabs

Supervisor and Examiner: Peter Savolainen

Page 4: LINNEA GULDBRAND - Diva1038999/FULLTEXT02.pdf · SKOLAN FÖR BIOTEKNOLOGI. . 1 ... PCR primers were constructed both for initial long-range products and for shorter fragments,

2

Abstract

The geographical origin of the domestic dog has not yet been conclusively established. The

mitochondrion, being matrilineally inherited and prone to a greater rate of mutation compared

to nuclear DNA, is of great significance in genetic evolutionary studies and as such, complete

sequencing of the mitochondrial genome of a great number of individuals would provide

important data for the furthering of such studies.

This project aims to design a method of sequencing the entire mitochondrial genome of the

domestic dog for a large number of individuals in parallel on the Illumina MiSeq sequencing

platform, using several sets of PCR primers to generate barcoded and sequencing-ready

libraries of predetermined fragments for each individual.

PCR primers were constructed both for initial long-range products and for shorter fragments,

suitable for sequencing and containing partial sequencing adaptors, covering the entire

mitochondrial chromosome. Additionally, primers containing barcode indices and the final

required sequencing constructs were designed. The viability of the primers and of different

PCR parameters were investigated, verified on agarose gels and Bioanalyzer, and a set of

samples were taken through cleaning, barcoding, and sequencing.

Results indicate a promising method, where all primers successfully generate product, and

both cleaning and sequencing appears in essence successful, but the relative amounts of

product obtained from each primer, and subsequently the amount of reads obtained in

sequencing, varies significantly with the initial set up. Subsequent experiments, performed

after the closing of the practical part of the project, have shown that compensating for this

uneven amplification by using significantly unequal primer concentrations greatly serves to

alleviate these issues.

Page 5: LINNEA GULDBRAND - Diva1038999/FULLTEXT02.pdf · SKOLAN FÖR BIOTEKNOLOGI. . 1 ... PCR primers were constructed both for initial long-range products and for shorter fragments,

3

Abstract ...................................................................................................................................... 2 Introduction ................................................................................................................................ 4

Previous Findings on the Geographical Origins of the Domestic Dog .................................. 4 The Mitochondrion, in Biology and in Forensics ................................................................... 5 The Illumina, Dual Indexed, Paired End Sequencing Method ............................................... 6 Existing Methods for Whole mtDNA Sequencing ................................................................. 9

Aim of Project .......................................................................................................................... 10 Materials and Methods ............................................................................................................. 11

Primer Layout and Design .................................................................................................... 11 PCR Reactions ...................................................................................................................... 11

Results ...................................................................................................................................... 13 Primer Layout and Design .................................................................................................... 13 PCR Reactions, Viability of Primers and Multiplex Set Ups ............................................... 20 Barcoding PCR ..................................................................................................................... 34 Concentration Measurements and Cleaning ......................................................................... 35 Initial Sequencing ................................................................................................................. 37 Results Obtained Post-Project .............................................................................................. 40

Discussion ................................................................................................................................ 41 Primer Layout and Design .................................................................................................... 41 PCR Procedures .................................................................................................................... 43 Indexing, Cleaning, and Sequencing .................................................................................... 44

Conclusions .............................................................................................................................. 44 References ................................................................................................................................ 46

Page 6: LINNEA GULDBRAND - Diva1038999/FULLTEXT02.pdf · SKOLAN FÖR BIOTEKNOLOGI. . 1 ... PCR primers were constructed both for initial long-range products and for shorter fragments,

4

Introduction Previous Findings on the Geographical Origins of the Domestic Dog

That the domestic dog has its evolutionary origin in the wolf has long been known and

accepted as fact, based on both genetic evidence and on archaeological findings, as well as on

behavioural and physical traits [1]. However, the precise circumstances, the historical time

point, and the geographical location of the original domestication event, or events, are less

clear. Several evolutionary genetic studies have been performed, using various methods and

sample materials, with varying results. Simply put, these studies attempt to identify the most

likely common ancestor of the domestic dog as the one whose genetic material can account

for the diversity of all others, while also comparing them to wild wolf populations and

estimating the timeframe for the domestication event via the rate at which mutations are

believed to accumulate. Variously, such studies have indicated the geographical origin of the

domestic dog in places as disparate as Europe, South East Asia, and the Middle East.

A 2002 study [1] of a stretch of 582 base pairs from the so-called control region of the

mitochondrial DNA from 654 dogs, representing dog populations worldwide, indicated an

East Asian origin for the domestic dog, based on a comparatively higher degree of

phylogenetic variation in dogs from this area. These findings were corroborated by the

analysis of 14 437 base pairs from the Y chromosome (fragmented sequences from

incomplete sequencing of a male dog DNA, assigned to the Y chromosome through

comparison to female dog DNA as well as the human Y chromosome sequences [21]) from

151 dogs worldwide [2] as well as by a further study of the complete mitochondrial DNA

from 169 individuals, combined with the 582 control region base pairs from 1576 individuals,

both placing the geographical origin of the domestic dog in South Eastern Asia, south of the

Yangtze River, China, less than 16 300 years ago [3]. Both studies also show that this region

of South East China, south of the Yangtze River, is the region in which the genetic diversity is

the greatest, and is the only region where almost all haplotypes of both the mtDNA and the Y

chromosome can be found simultaneously. Additionally, analysis of the mtDNA of Native

American dog breeds, when compared to East Asian and European dogs, as well as Pre-

Columbian samples, show low levels of European mtDNA [4], indicating a more ancient,

Asian, origin of the Native American dog breeds. Similarly, mtDNA analysis of the

Australian Dingo and Polynesian domestic dogs indicate an origin in mainland South East

Asia [5].

Page 7: LINNEA GULDBRAND - Diva1038999/FULLTEXT02.pdf · SKOLAN FÖR BIOTEKNOLOGI. . 1 ... PCR primers were constructed both for initial long-range products and for shorter fragments,

5

On the other hand, a study of the mitochondrial DNA from 18 ancient canids [6] indicated a

closer relationship with either ancient or modern European canids for all modern dogs the

world over. The study did, admittedly, lack ancient canid samples from both the Middle East

and China, two other major candidates for the geographical origin of the domestic dog.

Furthermore, genome-wide SNP (Single Nucleotide Polymorphism) analysis of over 48,000

SNPs in 912 dogs as well as 225 grey wolves has indicated a Middle Eastern origin for the

domestic dog, based on the significantly larger genetic variation found in breeds from this

region [7].

The Mitochondrion, in Biology and in Forensics

The mitochondrion is an organelle present in eukaryotic cells, whose role is to perform

oxidative metabolism, providing energy for the cell. The origin of the mitochondrion is

assumed to be the enveloping of a purple bacterium by an ancient eukaryotic ancestor,

resulting in an endosymbiotic relationship whereby both the eukaryotic host cell and the

bacterium symbiont benefits. Certain genetic material has since migrated from the original

bacteria into the nucleic DNA of the host cell to the degree that the modern mitochondrion

can no longer survive independently, but does retain certain vital genes in its own,

mitochondrial DNA, the mtDNA. They also still, like bacteria, reproduce through division,

rather than through being disassembled and reassembled, as is the case with all other

organelles, apart from the chloroplasts of plants, which have similar origins to the

mitochondrion. [8]

Mitochondrial DNA is usually comprised of one essential double-stranded, circular

chromosome, in multiple copies, but single-stranded, and linear chromosomes exist. The

mitochondrial DNA encodes for components necessary for protein production and certain

enzymes required for aerobic metabolism, but many components of the mitochondrion are

encoded by nuclear DNA and are transported into the mitochondrion. Multiple copies of

mitochondria are present in any given cell, and their genetic make up are not necessarily

homogenous. [8]

Mitochondria are inherited maternally. In meiosis in females, mitochondria are evenly

segregated between the two new cells and in the resulting embryo, the mitochondria present

in the fertilised ovum will divide and produce all the mitochondria in the new organism. Due

to this manner of inheritance, mitochondrial DNA can be used in forensics, to trace familial

Page 8: LINNEA GULDBRAND - Diva1038999/FULLTEXT02.pdf · SKOLAN FÖR BIOTEKNOLOGI. . 1 ... PCR primers were constructed both for initial long-range products and for shorter fragments,

6

relations on the maternal side (e.g. mother and child, as well as siblings who share a mother,

but not fatherhood), rule out suspects based on crime scene DNA, and on a larger time scale,

trace the evolution of a species. Mitochondrial DNA is more suited to such analyses for two

main reasons. Firstly, while each cell only contains one complete set up of nuclear DNA,

mitochondrial DNA is present in multiple copies per cell, thereby making it significantly

more abundant than nuclear DNA, somewhat circumventing the common issue of limited

sample material. Secondly, specifically in mammals, mutations accumulate at a much higher

rate in mitochondrial DNA than in nuclear DNA, on average 10-8 times per nucleotides and

year, meaning that evolutionary differences can be visible on a comparative shorter timescale.

[8]

The Illumina, Dual Indexed, Paired End Sequencing Method

The Illumina sequencing method is a Sequencing-By-Synthesis (SBS) sequencing method,

known as Solexa, utilising fluorescently labelled nucleotides to track base incorporation. In its

basic iteration, DNA is sheared into randomly sized fragments, to the ends of which a forward

and a reverse adaptor sequence are ligated. These adaptor-ligated fragments are then, single-

strandedly, randomly attached to the surface of the flow cell, where each individual fragment

is amplified into clusters of multiple copies of the same fragment, using so-called bridge

amplification. This means that the non-attached adaptor sequence of any given fragment

anneals to its complementing adaptor on the surface of the flow cell, which then acts as a

primer, allowing amplification into an arch-shaped double stranded structure where each of

the two strands is in one end attached to the flow cell. A denaturation step separates the two

strands of the ‘bridge’, and the process is repeated, until a sufficiently dense cluster of single

stranded DNA fragments has been formed.

Page 9: LINNEA GULDBRAND - Diva1038999/FULLTEXT02.pdf · SKOLAN FÖR BIOTEKNOLOGI. . 1 ... PCR primers were constructed both for initial long-range products and for shorter fragments,

7

Figure 1: Basic overview of Illumina sequencing, using random fragmentation and adaptor ligation [9]

After the clusters have been formed, the sequencing commences. Bases are determined

through the use of fluorescently labelled nucleotides, each of the four bases fluorescing at a

different wavelength. The labelled nucleotides are also reversible terminators, meaning that

the fluorescent label blocks more than one nucleotide from being incorporated at a time, but

that after the base has been determined, this label is enzymatically cleaved off, in preparation

for the next cycle, allowing the next nucleotide to be incorporated. For the first cycle of the

sequencing, all four fluorescently labelled nucleotides are added to the flow cell at once,

together with primers specific to the adaptor sequences and DNA polymerase, and one

nucleotide is incorporated in the first position of each strand, in each cluster. A laser is then

used to excite the fluorescent label on the nucleotide, and its identity is recorded for each

cluster. The label is then cleaved off, and all remaining reagents are washed away. For all

subsequent cycles, all four labelled nucleotides and DNA polymerase is added, one base is

incorporated in each strand in each cluster, laser excitation and image recording is performed,

the label is cleaved off. Remaining reagents are then washed away in preparation for the next

cycle. [9] [10]

Page 10: LINNEA GULDBRAND - Diva1038999/FULLTEXT02.pdf · SKOLAN FÖR BIOTEKNOLOGI. . 1 ... PCR primers were constructed both for initial long-range products and for shorter fragments,

8

Figure 2: Sequencing-by-Synthesis using Illumina sequencing, by annealing one base at a time and detecting them by their fluorescent label [9]

This sequencing method provides sequencing data from all fragments applied to the flow cell,

but has the downside of not being able to differentiate between the origins of the different

fragments, as well as, depending on the size of the fragments, not being able to obtain full

sequences, due to limitations imposed by the inherent read length of the sequencing method.

A way to enable the former is to employ the Illumina Single- or Dual-Indexed Sequencing

method, both based on the Paired End Sequencing method. The second can be achieved by

ensuring that all fragments used in the sequencing are below the maximum read length for the

particular sequencing method.

The Dual-Indexed Paired End Sequencing method employs several modifications to the

original adaptor construct and the sequencing procedure in order to distinguish between the

origins of different sequenced fragments. Instead of a simple adaptor on each end of the

fragment to be sequenced, the adaptor sequences are composed of several different

components each. These complex adaptors are shown in Figure 3, in a step-by-step depiction

of the sequencing process. Upstream of the DNA insert to be sequenced, the components of

the construct are the P5 adaptor, one of the slide-attaching sequences, which is also

complementary to the i5 Index Sequencing Primer, followed by the i5 Index, and the

sequence complementary to the Read 1 Primer, which initiates the sequencing from one end

Page 11: LINNEA GULDBRAND - Diva1038999/FULLTEXT02.pdf · SKOLAN FÖR BIOTEKNOLOGI. . 1 ... PCR primers were constructed both for initial long-range products and for shorter fragments,

9

of the DNA Insert. Downstream, the DNA insert is followed by a stretch of bases that is

complementary to both the i7 Index Sequencing Primer and to the Read 2 Primer, the i7

Index, and finally the P7 adaptor sequence that also attaches to the surface of the flow cell.

Each of the two index sequences is composed of 8 bases.

The sequencing includes four different primers, sequencing the DNA insert from both ends as

well as the two indices. First, Read Primer 1 is aligned and the DNA insert is sequenced from

the P5 end of the construct. The Read 1 product is then removed. Secondly, the i7 Index

Sequencing Primer is used to sequence the i7 Index, after which the index product is

removed. Then, the P5 adaptor is annealed to its corresponding adaptor, grafted to the surface

of the flow cell, which is used as the primer for the i5 Index. The i5 Index product is removed

and the full complementary strand is generated and the original strand is removed. Lastly,

Read Primer 2 is used to sequence the DNA insert form the P7 end. [11]

Figure 3: Schematic overview of the Dual-Indexed Paired End sequencing method, showing the order and the orientation of the primers involved [11]

Existing Methods for Whole mtDNA Sequencing

There are existing methods for sequencing the entire mitochondrial DNA of several

individuals in parallel, using different set ups. One such is the PTS (Parallel Tagged

Sequencing) method on the 454 sequencing platform [23, 24], using single-stranded, self-

hybridising barcodes to tag samples prior to pooling and sequencing. Samples are barcoded

separately and then pooled and prepared for sequencing. The barcodes are 6 bp long and

Page 12: LINNEA GULDBRAND - Diva1038999/FULLTEXT02.pdf · SKOLAN FÖR BIOTEKNOLOGI. . 1 ... PCR primers were constructed both for initial long-range products and for shorter fragments,

10

allow for 72 samples to be sequenced in parallel. Another method is the PCR-product capture

method [25], using fragments from a reference individual, fixed to beads, in order to retrieve

and enrich mtDNA fragments from complex DNA mixtures. Long range PCR is used to

produce two PCR fragments that cover the entire mtDNA, and these are then sonicated into

15-800 bp fragments, which are biotinylated and immobilized on streptavidin-coated beads.

The beads are then used to extract mtDNA fragments from sheared DNA mixtures, by

hybridisation, and the fragments can then be eluted, amplified, and sequenced, after separately

barcoding each library and preparing them for sequencing.

Aim of Project

The aim of this project was to design and implement a method for the sequencing of the

canine mitochondrial genome, for the purpose of producing data for phylogeographical

analysis of the geographical origin of the domestic dog, for which large numbers of samples

are necessary. The strategy employed was the introduction of barcodes during preparatory

PCR in order to enable multiplexed sequencing, on the Illumina MiSeq, of 1152 individuals in

parallel. Ultimately, the samples intended for use are saliva samples stored on FTA cards

(Whatman).

In contrast to existing methods, the focus of this project lies on a high degree of

parallelisation, requiring steps taken to reduce workload and on streamlining the procedures,

and on the specificity of the amplified fragments, in size and location, to guarantee the

coverage of the entirety of the mtDNA, in fragments that can be fully sequenced by the

Illumina MiSeq sequencing platform. The PTS method, being on the 454 sequencing platform

and only providing a 72-plex, is therefore not suitable. Neither is the PCR-product capture

method, both due to the fact that the intended sample material for the project is immobilised

on FTA cards, and because it requires one library per individual to be prepared all the way to

sequencing separately, which is both labour and cost intensive.

In order to enable these high degrees of parallelisation, it is important that the read numbers

obtained from each fragment are as even as possible. This is to ensure that all fragments are

sequenced with a sufficiently high redundancy to provide reliable output data.

Page 13: LINNEA GULDBRAND - Diva1038999/FULLTEXT02.pdf · SKOLAN FÖR BIOTEKNOLOGI. . 1 ... PCR primers were constructed both for initial long-range products and for shorter fragments,

11

Materials and Methods Primer Layout and Design

Primers were designed using the NCBI Primer BLAST tool [15], which can be used to

generate primers according to a set of user specified parameters regarding, using the canine

mtDNA reference genome [13] as the template.

The goal was to generate primers that would allow for the sequencing of the entire canine

mitochondrial genome in fragments of a size that would be fully covered by the MiSeq

sequencing platform. The highest number of base pairs the MiSeq can cover is 600 bp, which

influences the number of primer pairs that are needed. These primers would, apart from the

sequence-specific component, contain parts of the adaptor constructs necessary for MiSeq

sequencing, to which barcoding primers, containing the rest of the necessary adaptors, can

later be incorporated.

In addition to these primers, a set of primers, to be used for initial amplification of longer

fragments, were desired. The purpose of these long-range primers are to limit the use of the

original samples, to avoid depleting it, as well as to create a type of nested PCR [26] for the

sequencing-specific fragments, reducing the likelihood of unspecific targets being generated

by limiting the available unrelated template.

PCR Reactions

PCR reactions were carried out using either TagTaq, obtained from the Alba Nova University

Center [22], or PlatinumTaq, produced by Invitrogen, both being polymerase enzymes for the

purpose of the replication of DNA. The TagTaq was used for the inner primers, due to its

availability and lower cost, as its lower processivity was deemed sufficient for the shorter

inner primers, while PlatinumTaq was required to fully amplify the longer outer primers.

Originally, TagTaq was intended for both the inner and the outer primers, but after attempting

to amplify the outer primers using the TagTaq, in multiple reaction set-ups, and failing to

obtain product, possibly due to the outer fragments being too long for the TagTaq enzyme to

successfully amplify, PlatinumTaq was employed instead.

The TagTaq-based reaction mixture consisted of 2.5 µl “P” (10x polymerase buffer, final

concentrations 50 mM KCl, 2 mM MgCl2, 10 mM TrisHCl pH 8.5, 0.1% v/v Tween), 2.5 µl

Page 14: LINNEA GULDBRAND - Diva1038999/FULLTEXT02.pdf · SKOLAN FÖR BIOTEKNOLOGI. . 1 ... PCR primers were constructed both for initial long-range products and for shorter fragments,

12

“C” (10x dNTP mix, containing 2 mM of each dNTP in water, final concentration 0.2mM), 1

µl Forward primer (0.2 µM final concentration), 1 µl Reverse primer (0.2 µM final

concentration), 1 µl template, and 17 µl nuclease-free H2O, to a final volume of 25 µl per

reaction. Initially, 0.1 µl TagTaq was used per reaction, according to suggestions from the

providers of the enzyme, but this was later increased to 0.2 µl per reaction due to the low

yield.

For PlatinumTaq-based reaction mixtures, used for amplifying the outer fragments, volumes

were adapted from the information sheet provide by Invitrogen [16] and consisted of 5 µl 10x

PCR Buffer without MgCl2, 5 µl dNTP mixture (2 mM of each dNTP), 1.5 µl MgCl2 (50

mM), 2 µl Forward primer (0.2 µM final concentration), 2 µl Reverse primer (0.2 µM final

concentration), 1 µl template, and 33.5 µl nuclease-free H2O, to a final volume of 50 µl per

reaction. A volume of 0.2 µl PlatinumTaq, 5U/µl, per reaction was used throughout the

experiments.

For both TagTaq-based and PlatinumTaq based reactions, in the case of multiplexing,

initially, equal amounts of each of the necessary forward and reverse primers were added, and

the volume of H2O was lowered accordingly. In later experiments, in attempts to obtain

comparable levels of each product in these multiplexes, the concentrations of the primers

included in each multiplex were varied, increasing the concentration of those primers that

failed to yield product in relation to those that did.

The PCR reactions were tried out with several different annealing temperatures, extension

times, and numbers of cycles. The initial set up for the inner primers was 1.5 minutes of initial

denaturation at 94°C, followed by 30 cycles of 30 seconds of annealing at 46°C and 2 minutes

of extension at 72°C, a final extension for 10 minutes at 72°C and ending in a Hold at 4°C.

The number of cycles was later increased to 40, and both 49°C and 52°C as annealing

temperatures were evaluated.

The PCR reaction parameters for the outer primers were initially the same as for the inner

primers, with the exception of the annealing temperature being set to 50°C. This was later

adjusted to evaluate both 5 and 10 minutes of extension time, as well as different numbers of

cycles.

The annealing temperatures were chosen by manually calculating the optimal annealing

temperature for each primer, using only the sequence-specific part of the inner primers and

the entirety of the outer primers, adding 2°C for an adenine or a thymine and 4°C for a

Page 15: LINNEA GULDBRAND - Diva1038999/FULLTEXT02.pdf · SKOLAN FÖR BIOTEKNOLOGI. . 1 ... PCR primers were constructed both for initial long-range products and for shorter fragments,

13

guanine or a cytosine, together with estimations of melting temperatures from the primer

generating tool, and choosing a temperature that was believed to be sufficiently low to allow

all primers to anneal successfully.

The success of PCR reactions were evaluated by running aliquots of the reaction mixture on

1% agarose gels, pre-stained with GelRed (Biotium). In the case of multiplex reactions,

singleplex reaction mixtures for the primers participant in the multiplex were prepared and

dilutions of the multiplex reaction mixtures were used as template for the singleplexes. The

product of these singleplexes were then checked on gels, on the assumption that if and only if

the multiplex had been successful would the singleplex be successful in regards to that

specific primer.

Results Primer Layout and Design

The primers required for the project were subject to a number of criteria set by the intended

sequencing platform, the parameters of adjacent primers, as well as the nature of the mtDNA

itself.

The external criteria set by the Illumina Paired End sequencing on the MiSeq is stated as a

maximum of 550 bases per primer-amplified segment, including the primer sequences, for

sufficient coverage of the entire segment. This includes an overlap of 50 bp at the centre for

better coverage of the ends of the reads. This is due to the fact that, as an ensemble

sequencing-by-synthesis (SBS) method, the read length when sequencing on the MiSeq is

limited by the reliability of the synchronous incorporation of the correct base to each strand in

the cluster. In each step of the sequencing, the correct base has to be incorporated exactly

once and be measured accurately, followed by the removal of the extension-blocking agent,

allowing the next base to be incorporated and measured. As the sequencing proceeds, errors

are eventually introduced, wherein bases fail to be properly incorporated in certain strands,

leading to portions of the cluster lagging behind the others, giving 'false' signals. As these

errors accumulate, the signal-to-noise ratio will decrease, ultimately to the point where bases

can no longer be accurately detected. The number of bases into the sequencing where this

threshold is reached dictates the read length of the method in question. [12]

It is, however, possible to use segments of sizes approaching 600 bp by utilising the Illumina

Page 16: LINNEA GULDBRAND - Diva1038999/FULLTEXT02.pdf · SKOLAN FÖR BIOTEKNOLOGI. . 1 ... PCR primers were constructed both for initial long-range products and for shorter fragments,

14

stitching algorithm to combine an overlap of at least 10 bp to a single read, using consensus

and quality data from the two reads, allowing the use of larger inserts. The upper limit for the

size of a DNA insert, including primer sequences, was thus set to 590 bp. Subtracting the

length of the primer sequences from the inserts leaves approximately 550 bp sequenced in

each insert, as primers are ideally around 20 bp long. With this average fragment length, it

was estimated that 32 fragments would be needed in order to fully cover the 16727 bp

reference genome, with a reasonable margin for overlaps and difficult-to-align stretches of

DNA. [13]

Sequencing 32 individual 550 bp long sequences would yield a total of 17600 bp, leaving a

margin of 837 bases when compared to the 16727 bp of the reference genome. Spread out

over 32 fragments, this enables a variance of 27 bp per fragment, providing a certain degree

of freedom when aligning the primers. Finally, in order to fully cover the mitochondrial

genome, the fragments cannot average lower than 523 bp (563 bp with the primers included).

32 primer pairs is also a desirable number from a practical design point of view, as sets of 32

fit evenly on 96-well plates as well as in multiples of eight, corresponding to the width of

common laboratory equipment.

These 32 primer pairs must then be laid out in an interconnecting fashion, where each forward

primer must be placed slightly upstream of the reverse primer of the previous pair, relative to

the leading strand, so that every base is sequenced independently.

Figure 4: Schematic representation of the overlapping orientation of primers, highlighting how all parts of the template are covered by amplified fragments in an interlocking fashion. Template DNA represented by the wide yellow line, the primers by red and orange arrows (alternating colours purely for visual clarity) and the amplified fragments in corresponding colours below the template.

Another limiting factor for the placement of the primers is the repeating region of the mtDNA

inside which the primers cannot reliably be placed. This is due to the fact that the repeating

region, as indicated by its name, is comprised of multiple repetitions of the same DNA motif,

meaning that a primer that is complementary to a site in this region is thus complimentary to a

Page 17: LINNEA GULDBRAND - Diva1038999/FULLTEXT02.pdf · SKOLAN FÖR BIOTEKNOLOGI. . 1 ... PCR primers were constructed both for initial long-range products and for shorter fragments,

15

large number of sites, upstream and downstream of the intended annealing site, at every place

where this motif repeats itself. In the domestic canine mtDNA, this repeating region alternates

between two almost identical 10 bp segments, only differing in one position. This region

covers bases 16131 through 16430 of the reference genome, but can vary greatly in size

between individuals due to differing numbers of repeats of the two 10 bp motifs. [14]

The option of not including the repeating region was considered, as the size differences may

mean that longer repeating regions would not be completely sequenced by the Illumina Paired

End sequencing method, and shorter ones would be sequenced to redundancy, but possibly

without the means to tell to what extent, represented in figure 5.

Figure 5: Schematic representation of the different possible results when sequencing the repeating region. Due to its varying size between individuals, coverage will vary, and due to its repeating nature, conclusive alignments cannot be guaranteed.

It was decided to attempt to sequence the repeat region to the highest extent possible, as full

coverage of the rest of the mtDNA appeared to be achievable with the remaining 31 primer

pairs, meaning that no information would be lost from trying to sequence the repeat region as

well. Including the repeat region as an amplified segment would also ensure that the bases

immediately preceding and following it would actually be included in the sequencing,

something that could otherwise not be achieved, as primers cannot reliably be aligned inside

the repeat region.

Page 18: LINNEA GULDBRAND - Diva1038999/FULLTEXT02.pdf · SKOLAN FÖR BIOTEKNOLOGI. . 1 ... PCR primers were constructed both for initial long-range products and for shorter fragments,

16

With this in mind, the primers were aligned, starting from the primer pair upstream of the

repeating region, placing the reverse primer as close to the repeating region as possible,

followed by the pair covering the repeating region, ensuring enough room after the repeating

region to align the forward primer of the next primer pair.

The primers were designed using the NCBI Primer BLAST tool [15]. The Canis familiaris

reference mitochondrion genome entry [13] was used as the template to which the primers

were to be aligned. The PCR product size was set to a maximum of 590 bases and a minimum

of 540 bases, to ensure coverage of the whole mtDNA sequence. Remaining parameters were

subject to dynamic modifications depending on the ease or, rather, difficulty with which

primers could be aligned. TM was desired to be between 52 and 60 degrees Celsius, with an

optimal temperature of 56 degrees. The allowed difference in TM between the primers in a

pair was initially set at 3 degrees, but was subject to increases in cases where primers could

otherwise not be aligned.

The initial advanced settings were for a primer size between 17 and 23 bases with 20 as an

optimum, a GC-clamp of 2, maximum poly-X sequences of 4, and maximum 3’ GC content

of 3. GC content was desired to be between 40 and 60% and due to the nature of the

mitochondrial DNA, ‘Avoid low complexity regions for primer selection’ was unchecked.

Primers were then generated by specifying a stretch of approximately 50 bases within which

the forward primer was allowed to align. The starting point of the first stretch was dictated by

the end of the repeating region, while all subsequent alignment areas were instead dictated by

the location of the reverse primer in the previous primer pair, i.e. in relation to the leading

strand, each forward primer had to end before the reverse primer of the preceding pair

‘started’.

Due to the structure of the mtDNA and the rigidity of where the next primers had to be

aligned, in relation to the preceding pairs, it was often difficult to align primers according to

the above-mentioned ‘optimal’ parameters, which necessitated that the conditions were made

less stringent, on a primer-by-primer basis. Initially, the stretch of bases allotted to the

alignment of the forward primer would be extended, in the hopes of finding a primer without

having to lower the other requirements placed on the primer. Failing this, as moving the

primer too far back would in the end compromise the possibility of covering the entire

mtDNA in the chosen number of primers, the remaining parameters were in turn made less

stringent. The decision on what parameter to change was aided by the error message given by

the Primer-BLAST tool upon failure to generate a primer pair, which would list the reasons

Page 19: LINNEA GULDBRAND - Diva1038999/FULLTEXT02.pdf · SKOLAN FÖR BIOTEKNOLOGI. . 1 ... PCR primers were constructed both for initial long-range products and for shorter fragments,

17

for the failure, e.g. TM difference too high, too long poly-X sequence, or lack of GC clamp.

Decisions were also made by observing the surrounding sequence manually, and thereby

decide whether or not certain changes were appropriate. For each primer pair, the changes to

the parameters that were deemed to cause the least impactful changes to the overall structure

of the primers were chosen.

To limit the use of template DNA, which is available in limited amounts, primers that would

amplify longer parts of the mtDNA were required. These would then be used as templates for

the aforementioned 32 primer pairs, also creating a sort of nested PCR [26], which reduces

the likelihood of generating unspecific PCR products. It also serves the purpose of generating

template that is in solution, not bound to FTA cards.

The 32 primers pairs will from here on be referred to as ‘Inner Primers’ and these new,

analogously dubbed ‘Outer Primers’ were aligned in much the same manner as the inner

primers, interlocking with each other, but also taking care not to overlap with the alignment

sequences of the inner primers.

Figure 6: Schematic representation of the interlocking design of the outer primers. Template DNA represented by the wide yellow line, the primers by blue and green arrows (alternating colours purely for visual clarity) and the amplified fragments in corresponding colours below the template.

Page 20: LINNEA GULDBRAND - Diva1038999/FULLTEXT02.pdf · SKOLAN FÖR BIOTEKNOLOGI. . 1 ... PCR primers were constructed both for initial long-range products and for shorter fragments,

18

Figure 7: Schematic representation of how the outer primers fully cover a set of four inner primers. Template DNA represented by the wide yellow line, the inner primers by red and orange arrows and outer primers by blue and green arrows (alternating colours purely for visual clarity) and the amplified fragments in corresponding colours below the template.

The outer primers were designed to cover four inner primers each, resulting in eight outer

primer pairs, each amplifying around 2200 bp long sequences. These were to serve as both a

way of amplifying the original templates, which is available in limited amounts, and as a way

to create a nested PCR, reducing the complexity in subsequent PCR reactions.

As detailed previously, in order to utilise the Illumina Dual-Indexed Paired End sequencing

protocol, a number of additional specific sequences need to be present in the primers. The

basic Illumina sequencing relies on random fragmentation of sample DNA, followed by

ligation of specific adaptors to the fragments, which enable bridge amplification of the

fragments on the sample slide, as well as containing the primer alignment sequence for the

sequencing-by-synthesis steps.

As this project endeavours to sequence the entire mtDNA of thousands of individuals in

specific, predetermined PCR-amplified segments, this random fragmentation approach to

creating to DNA inserts to which adaptors are ligated is not appropriate, as it would require

separate libraries for each individual and involves increased labour and cost, as well as

removing the specificity of using primers to ensure full coverage. Instead, the Read Primer

parts of the adaptor sequences are added single-strandedly to the 5’ end of the forward and

reverse inner primers as handles, making these increase in size significantly. In order to

complete the sequencing-enabling structures for Dual-Indexed Paired End Sequencing, an

additional PCR step is required. This step will be used to introduce the outermost adaptor

sequences that allow ligation to the slides, P5 and P7, as well as the two index sequences, i5

and i7.

Page 21: LINNEA GULDBRAND - Diva1038999/FULLTEXT02.pdf · SKOLAN FÖR BIOTEKNOLOGI. . 1 ... PCR primers were constructed both for initial long-range products and for shorter fragments,

19

Figure 8: Schematic overview of the two PCR steps that complete the sequencing construct. The top step uses specific inner primers (shown in dark grey) with attached partial adaptor sequences containing read primer complementary sequences (shown in yellow and light blue). The second step adds the outer adaptor sequences (shown in red and dark blue) and the indices (shown in light and dark green) by completing the previously added adaptors.

The final construct, shown above in figure 8, consists, from left to right, of the P5 flow cell

attachment sequence, the i5 index barcode, the Read 1 Primer complementary region, the

forward insert specific primer, the DNA insert, the reverse insert specific primer, the i7 index

complementary region (which doubles as the Read 2 complementary region when read in the

other direction), the i7 index barcode, and the P7 flow cell attachment sequence. The

difference in structure between the default ligated adaptor and this PCR-generated construct

lies in the forward and reverse insert specific primers, which enable the sequencing of

specific, predetermined parts of the sample DNA, but from a sequencing stand point, these are

merely treated as a part of the DNA insert, and do not influence the sequencing itself in any

way.

Using 32 inner fragments per individual, and sequencing 1152 individuals in parallel, given

the total read output of the MiSeq v3 sequencing kit [27], an equal distribution of reads

between the fragments would, in an ideal situation, provide a redundancy of 600 reads per

fragment and individual.

Page 22: LINNEA GULDBRAND - Diva1038999/FULLTEXT02.pdf · SKOLAN FÖR BIOTEKNOLOGI. . 1 ... PCR primers were constructed both for initial long-range products and for shorter fragments,

20

PCR Reactions, Viability of Primers and Multiplex Set Ups

Initially, the first eight inner primers, from Eurofins, were tried in singleplex, 0.1 µl TagTaq,

30 cycles.

Figure 9: First attempt at amplifying the first 8 inner primers (0.1 µl TagTaq, 30 cycles) flanked by two DNA ladders (Low Range, 3% TopVisionAgarose #RO491 25-700 bp). Ladders are smeary and dissimilar, and exposure is high in an attempt to visualize potential product.

Bands were very weak and smeary. As this was true for the ladders too, as well as for other

gels run by others in the lab at the same point in time, part of the fault may, in this case, lie in

the gel bath itself.

Next, the same primers were tried again, resulting in a gel with much sharper bands but still

very weak product bands, showing only for primers 1-3 (Figure 10), but submitting the

remaining reaction mixtures to a subsequent extra 12 cycles showed more clear results

(Figure 11). The gel after the additional 12 cycles shows bands of the expected size for

primers 1-3 multiple, as well as multiple bands of lower sizes, presumed to be various primer

dimers. These results prompted the decision to increase the number of cycles for the inner

primers to 40. Low processivity of the enzyme was suspected.

Page 23: LINNEA GULDBRAND - Diva1038999/FULLTEXT02.pdf · SKOLAN FÖR BIOTEKNOLOGI. . 1 ... PCR primers were constructed both for initial long-range products and for shorter fragments,

21

Figure 10: Second attempt at amplifying the first 8 inner primers (0.1 µl TagTaq, 30 cycles), flanked by two DNA ladders (Low Range, 3% TopVisionAgarose #RO491 25-700 bp, and M, 1% TopVision LE GQ Agarose #RO491 250-10000 bp). Ladders are clearer but product is very weak, faintly visible for primers 1 through 3.

Figure 11: Second attempt on first 8 inner primers after 12 additional PCR cycles. Primers 1 through 3 are clearly visible. Samples flanked by two DNA ladders as before (Low Range, 3% TopVisionAgarose #RO491 25-700 bp, and M, 1% TopVision LE GQ Agarose #RO491 250-10000 bp). The two rightmost lanes before the M ladder are primers 1 and 2 from a different sample compared to the first eight lanes.

Reactions for the same eight primers were then run using twice the amount of enzyme, 0.2 µl,

for 40 cycles, and 10 times as much enzyme, 1 µl, but remaining at 30 cycles. The 0.2 µl, 40

cycle run showed product of the expected size for all primers except primer 8, while the 1.0

µl, 30 cycle run did not yield any product. Whether the latter was caused by a laboratory

mistake or a result of imbalances between the reaction components due to the increase in

enzyme concentration, or simply still too few amplification cycles was not further

investigated.

Page 24: LINNEA GULDBRAND - Diva1038999/FULLTEXT02.pdf · SKOLAN FÖR BIOTEKNOLOGI. . 1 ... PCR primers were constructed both for initial long-range products and for shorter fragments,

22

Figure 12: Inner primers 1 through 8 amplified with 0.2 µl TagTag and 40 cycles, in duplicate, M ladder. All primers except primer 8 appear clearly.

Figure 13: Inner primers 1 through 8 amplified with 1.0 µl TagTag and 30 cycles, in duplicate, M ladder. No primers visible, possibly due to human error.

Deciding to proceed with 40 cycles and 0.2 µl enzyme per reaction for inner primers in

singleplex, different annealing temperatures were investigated. Both 49°C and 52°C were

tried, using the now established parameters, both yielding product for all primers apart from

primer 8.

Figure 14: Inner primers 1 through 8 amplified with 0.2 µl TagTag and 40 cycles, 49°C annealing temperature to the left and 52°C annealing temperature to the right, Low Range ladder. All primers except primer 8 appear clearly in both sets.

Page 25: LINNEA GULDBRAND - Diva1038999/FULLTEXT02.pdf · SKOLAN FÖR BIOTEKNOLOGI. . 1 ... PCR primers were constructed both for initial long-range products and for shorter fragments,

23

Inner primers were also tried in multiplex, initially in quadruplexes of primers 1-4 and 5-8, 50

cycles, yielding vague primer dimer products. At the same time, the eight outer primers were

run for the first time, using TagTaq, 50°C annealing temperature and 40 cycles, but no

product was obtained. Figure 15 below shows these results, using inner primer 1 as a positive

control.

Figure 15: Attempt at amplifying the 8 outer primers using 0.2 µl TagTaq, 50°C annealing temperature and 40 cycles, in duplicate, with inner primer 1 as positive control, M ladder. The two rightmost lanes contain inner primer multiplex attempts, primers 1-4 and 5-8, 50 cycles. No outer primer product visible, and only unspecific product visible for the inner primer multiplexes.

The outer primers and the multiplex attempts were retried using both 5 minutes and 10

minutes extension time for both, again failing to result in the desired products.

Figure 16: Attempt at amplifying the 8 outer primers using 0.2 µl TagTaq, 50°C annealing temperature and 40 cycles, using 5 minutes extension time (left) and 10 minutes (right), with inner primer 1 as positive control, M ladder. The 4 rightmost lanes contain inner primer multiplex attempts, primers 1-4 and 5-8, 50 cycles, in duplicate. Again, no outer primer product visible, and only unspecific product visible for the inner primer multiplexes.

Page 26: LINNEA GULDBRAND - Diva1038999/FULLTEXT02.pdf · SKOLAN FÖR BIOTEKNOLOGI. . 1 ... PCR primers were constructed both for initial long-range products and for shorter fragments,

24

In order to rule out human error, all outer primers were re-suspended from stock solution and

the PCR reactions were re-run at 40 cycles and 10 minutes extension time. As yet again no

product was obtained, it was suspected that TagTaq lacked the processivity required to

adequately amplify the longer outer primer fragments, and PlatinumTaq was tried instead,

using 45 cycles and 10 minutes extension time. This set up yielded clear product for all eight

outer primers. It was subsequently concluded that TagTaq did indeed lack the necessary

processivity to reliably produce the longer, outer primer fragments, and PlatinumTaq was

employed for all outer primer reactions from this point onwards.

Figure 17: Outer primers amplified with PlatinumTaq, 45 cycles, 10 minutes extension time, inner primer one as positive control, M ladder. All outer primers visible.

Subsequently, quadruplexes of the outer primers were set up, 1-4 and 5-8, using 0.2 µl

PlatinumTaq, 45 cycles, and 10 minutes extension time, and singleplexes of each of the eight

primers were run using 1µl 1:500 dilutions from the corresponding quadruplexes as template

and were run for 15 cycles. These secondary singleplexes yielded product in outer primers 3,

4, 5, and 8.

Page 27: LINNEA GULDBRAND - Diva1038999/FULLTEXT02.pdf · SKOLAN FÖR BIOTEKNOLOGI. . 1 ... PCR primers were constructed both for initial long-range products and for shorter fragments,

25

In order to further investigate the possibility of quadruplexing the outer primers, quadruplexes

comprised of odd- and even-numbered outer primers, as well as a combination of outer

primers 1, 2, 6, and 7 and 3, 4, 5, and 8. As before, these multiplexes were verified by

singleplex reactions based on the multiplex product, run on gels. The former combinations

showed clear product for outer primer 3, 4, 5, and weak bands for 7 and 8. The latter was

similar, and showed primers 3, 5, and 8 relatively clearly, and 4 weakly.

Figure 19: Singleplexes of outer primers, from 1 µl 1:500 dilutions of Even and Odd combination (i.e. 1-3-5-7 and 2-4-6-8) quadruplex template, 15 cycles, M ladder. Outer primers 3, 4, and 5 clearly visible, primer 7 and 8 vary faintly, and 1, 2, and 6 seemingly not amplified.

Figure 20: Singleplexes of outer primers, from 1 µl 1:500 dilutions of 1-2-6-7 and 3-4-5-8 quadruplex template, 15 cycles, M ladder. Primers 3, 5, and 8 were visible, and primer 4 was faintly visible.

These results led to the decision to try the outer primers in duplexes, one set up with primers

Figure 18 Singleplexes of outer primers, from 1 µl 1:500 dilutions of 1-4 and 5-8 quadruplex template, 15 cycles, M ladder. Outer primers 3, 4, 5, and 8 clearly visible, 1, 2, 6, and 7 seemingly not amplified.

Page 28: LINNEA GULDBRAND - Diva1038999/FULLTEXT02.pdf · SKOLAN FÖR BIOTEKNOLOGI. . 1 ... PCR primers were constructed both for initial long-range products and for shorter fragments,

26

1+2, 3+4, 5+6, and 7+8, and one set up with primers 1+3, 2+4, 5+7, and 6+8. The duplex

reaction mixture was then used as templates for the corresponding singleplexes, using 1 µl

1:20 dilutions and run for 15 cycles. These set ups consistently yielded product for primers 3,

4, 5, and 8, similarly to the earlier quadruplexes, but primers 1 and 2 showed weak

amplification when paired together, as did primer 7 when paired with primer 5.

Figure 21: Singleplexes of outer primers from duplex set-ups (1+2, 3+4, 5+6, and 7+8), M ladder (one blank lane between the ladder and the first primer). Primers 1 through 5, and primer 8 visible, primers 3 through 5 more strongly.

Figure 22: Singleplexes of outer primers from duplex set-ups (1+3, 2+4, 5+7, and 6+8), M ladder (one blank lane between the ladder and the first primer). Primers 3 through 5, and primers 7 and 8 visible, primers 3, 5, and 8more strongly, primer 7 very faint.

Outer primers 5 and 6, one that had consistently worked and one that did not appear to work,

were subsequently chosen for testing other parameters of the PCR. Singleplexes of primer 5

and 6 were run at 20 cycles and at 30 cycles, with 5 minutes and 10 minutes of extension

time, i.e. four different PCR set ups for each primer. A duplex of outer primers 5 and 6 were

also run at the same parameters. The gels showed 20 cycles to be too few to properly amplify

the segments, and the longer extension time appeared to increase yield. At 10 minutes

extension time, outer primer 5 amplified to a higher extent than primer 6. The singleplexes

performed from the duplexes showed amplification of only primer 5.

Page 29: LINNEA GULDBRAND - Diva1038999/FULLTEXT02.pdf · SKOLAN FÖR BIOTEKNOLOGI. . 1 ... PCR primers were constructed both for initial long-range products and for shorter fragments,

27

Figure 23: Parameter tests for outer primers, using outer primers 5 and 6. From left to right, 20 cycles with 5 minutes extension time, 20 cycles with 10 minutes extension time, 30 cycles with 5 minutes extension time, and 30 minutes with 10 minutes extension time for both primer 5 and 6. 20 cycles did not yield product at any extension time, and the higher extension time yielded higher degrees of product, especially for primer 5.

Figure 24: Singleplex amplification of outer primers 5 and 6 from duplex template (imaged cropped from larger gel with other samples on). Only showing result of 30 cycle runs, ostensibly only yielding product for outer primer 5.

After the initial attempts at running the first eight inner primers, the remaining 24 inner

primers were ordered. Due to the issues with getting inner primer 8 to yield product, and

based on advice regarding primer purchase (personal communication with Afshin Ahmadian,

Associate Professor, School of Biotechnology, Royal Institute of Technology, KTH) the new

primers were ordered from Biolegio. Primer 8 was redesigned, and both the new and old

version was ordered, along with primer 1, for comparison to the Eurofins primers, together

with inner primers 9-32.

Firstly, primer 1 from both Eurofins and Biolegio were run in triplicate, as well as both the

original and the new version of Primer 8, both from Biolegio and also in triplicate. The

reactions were run as before, at 46°C annealing temperature and for 40 cycles. The results

Page 30: LINNEA GULDBRAND - Diva1038999/FULLTEXT02.pdf · SKOLAN FÖR BIOTEKNOLOGI. . 1 ... PCR primers were constructed both for initial long-range products and for shorter fragments,

28

showed comparable results for both versions of inner primer 1 and indicate product from the

re-synthesis of the original inner primer 8, but not from the new version.

Figure 25: Comparison of inner primer 1 from Eurofins and from Biolegio, and of the old and new design of inner primer 8, both from Biolegio, in triplicate. Primer 1 worked comparably well from both manufacturers, and the old design of primer 8 from Biolegio appeared to work, while the new design did not.

Next, inner primers 9 through 32 were tested in duplicate, according to the same parameters.

Due to the unexpected result from the two inner primer 8, these were also re-run and are

included on the gel showing inner primers 25 through 32. The re-run did support the previous

evidence in showing that the re-synthesis of the original primer 8 worked, while the redesign

did not. The majority of inner primers 9-32 showed product, and the ones that did not or

appeared only weakly were re-run.

Figure 26: Singleplexes of inner primers 9-16, in duplicate (one set after the other, with one empty lane in-between the sets), M ladder.

Page 31: LINNEA GULDBRAND - Diva1038999/FULLTEXT02.pdf · SKOLAN FÖR BIOTEKNOLOGI. . 1 ... PCR primers were constructed both for initial long-range products and for shorter fragments,

29

Figure 27: Singleplexes of inner primers 17-24, in duplicate (one set after the other, with one empty lane in-between the sets), M ladder.

Figure 28: Singleplexes of inner primers 25-32, in duplicate (one set after the other), M ladder. Additionally, to the left of the ladder, the old and new design of inner 8, again showing product from the old design.

The primers to be re-run in duplicate were 9, 16, 19, 20, 22, 23, 24 and 25. The results

obtained were largely inconclusive, being inconsistent between duplicates, at best showing

fairly weak bands, at worst appearing almost fully blank, and overall showing a lot of

unspecific product.

Figure 29: Singleplexes in duplicate of the inner primers between 9 and 32 that did not appear to yield product in the initial singleplexes. From left to right, in pairs, 9 16, 19, 20, 22, 23, 24, and 25, M ladder.

Page 32: LINNEA GULDBRAND - Diva1038999/FULLTEXT02.pdf · SKOLAN FÖR BIOTEKNOLOGI. . 1 ... PCR primers were constructed both for initial long-range products and for shorter fragments,

30

Next, the viability of TagTaq compared to PlatinumTaq for the amplification of inner primers

was assessed, at the same time investigating how well the inner primers amplify from a

previously amplified outer primer segment, by setting up singleplexes of inner primers 9-12

using template from singleplex amplification of outer primer 3. The outer PCR was run at

50°C annealing temperature, 30 cycles, and 5 minutes extension time. 1 µl 1:20 dilution of the

outer primer product was used as template for the inner singleplexes. These were run at 46°C

annealing temperature, 40 cycles. All steps were performed in duplicate, i.e. two singleplex

reactions of outer primer 3 were used for duplicates of both the TagTaq and the PlatinumTaq,

totalling four singleplexes of each inner primer. All singleplexes were successful, with

PlatinumTaq showing much more strongly, and the two sets of TagTag clearly differing in

intensity.

Figure 30: Duplicate sets of inner primers 9 through 12 in singleplex from previous amplification of outer primer 3, comparing TagTaq (left) to PlatinumTaq (right), M ladder. The PlatinumTaq amplified inner primers show more strongly, and there is a marked difference between the two TagTaq sets, despite having been amplified under the same conditions.

Similarly, quadruplexes of inner primers 9-12, one using TagTaq and one using PlatinumTaq,

were set up, still using the amplified outer primer 3, 1:20 dilution, as template. Reactions

were run at 46°C annealing temperature, 30 cycles. Secondary singleplexes for verification

were performed with Platinum for all reactions, on 1:20 dilutions of the quadruplex mixtures.

Results were similar between the two enzymes; inner primers 9, 11, and 12 were successfully

amplified, while primer 10 was very weak.

Page 33: LINNEA GULDBRAND - Diva1038999/FULLTEXT02.pdf · SKOLAN FÖR BIOTEKNOLOGI. . 1 ... PCR primers were constructed both for initial long-range products and for shorter fragments,

31

Figure 31: Singleplexes from quadruplexes of inner primers 9 through 12. All singleplex reactions were performed with PlatinumTaq, while one quadruplex was performed TagTaq (left) and one with PlatinumTaq (right), M ladder.

Due to the apparent failure of certain outer primers in quadruplex reactions, outer primers

were re-suspended from stock and run in singleplex, as before, showing primers 3 through 8

clearly, primer 2 was weaker and primer 1 was very faint. This was to investigate degradation

of the primers due to freeze-thaw cycles as the cause of the amplification failure.

Figure 32: Outer primers in singleplex, re-suspended from stock, M ladder.

Diluting all outer primer product (apart from primer 1, being significantly weaker) at a ratio

of 1:20, singleplexes of all inner primers were run from their corresponding outer primer, for

40 cycles, with 46°C annealing temperature. Results showed amplification of inner primers

corresponding to each of the outer primers, but not from all inner primers, even from inner

primers that had previously been successfully amplified.

Page 34: LINNEA GULDBRAND - Diva1038999/FULLTEXT02.pdf · SKOLAN FÖR BIOTEKNOLOGI. . 1 ... PCR primers were constructed both for initial long-range products and for shorter fragments,

32

Figure 33: Singleplexes of inner primers 1 through 16 from outer primer singleplexes 1 through 4, M ladder.

Figure 34: Singleplexes of inner primers 17 through 32 from outer primer singleplexes 5 through 8, M ladder.

The resuspended outer primers were then tried in quadruplex; outer primers 1, 3, 5, and 7 in

one quadruplex and outer primer 2, 4, 6, and 8 in the other. They were run in duplicates of

both 20 and 30 cycles, all using 50°C annealing temperature and 5 minutes. 1:20 dilutions of

the quadruplex reaction mixtures were used for singleplex verification and were run for 15

cycles. The gels showed successful amplification of primers 3, 4, 5, and 8 in both the 20 cycle

and the 30 cycle quadruplexes, and in the latter, outer primer 7 was also visible.

Figure 35: Duplicates of outer primer singleplexes from outer primer quadruplexes (1+3+5+7 and 2+4+6+8, i.e. Even and Odd), M ladder. Quadruplex run for 20 cycles, primers 3, 4, 5, and 8 faintly visible.

Page 35: LINNEA GULDBRAND - Diva1038999/FULLTEXT02.pdf · SKOLAN FÖR BIOTEKNOLOGI. . 1 ... PCR primers were constructed both for initial long-range products and for shorter fragments,

33

Figure 36: Duplicates of outer primer singleplexes from outer primer quadruplexes (1+3+5+7 and 2+4+6+8, i.e. Even and Odd), M ladder. Quadruplex run for 30 cycles, primers 3, 4, 5, 7 and 8 visible, number 7 more faintly.

It was then decided to attempt singleplex inner primer amplification, using the 30 cycle outer

primer quadruplex reaction mixture as template, at 1:100 dilution. The singleplexes were run

for 40 cycles. Although product was only expected for inner primers corresponding to outer

primers 3, 4, 5, 7, and 8, gels showed amplification of inner primers from all outer primers,

including those that appeared not to have worked in quadruplex. Only two inner primers

appeared to not have yielded product.

Figure 37: Singleplexes of inner primers 1 through 16, from outer primer quadruplex template (even and odd outer primer combinations), M ladder. Most inner primers visible, irrespective of whether or not the corresponding outer primer appeared to have yielded product.

Figure 38: Singleplexes of inner primers 17 through 32, from outer primer quadruplex template (even and odd outer primer combinations), M ladder. Most inner primers visible, irrespective of whether or not the corresponding outer primer appeared to have yielded product.

To verify this unexpected result, the entire experiment was run again, starting from the

quadruplex of the outer primers, with similar results.

Page 36: LINNEA GULDBRAND - Diva1038999/FULLTEXT02.pdf · SKOLAN FÖR BIOTEKNOLOGI. . 1 ... PCR primers were constructed both for initial long-range products and for shorter fragments,

34

Figure 39: Singleplexes of inner primers 1 through 16, from re-run of outer primer quadruplex template (even and odd outer primer combinations), M ladder. Again, almost all inner primers are clearly visible.

Figure 40: Singleplexes of inner primers 17 through 32, from re-run of outer primer quadruplex template (even and odd outer primer combinations), M ladder. Again, almost all inner primers are clearly visible.

Barcoding PCR

Following these results, the decision was made to proceed towards sequencing. For two

different DNA samples, PCR1 was run in quadruplexes of odd and even numbered outer

primers, 40 cycles, and PCR2 was run in quadruplexes, duplexes and singleplexes according

to the pattern in figure 41, each for 25 cycles. A further four DNA samples were prepared in

the same manner, but for these, PCR2 was only run on quadruplexes.

Figure 41: Duplex and quadruplex set-ups for all 32 inner primers, with corresponding names (Q1-8 for the quadruplexes and D1-8 and D17-24 for the duplexes).

Page 37: LINNEA GULDBRAND - Diva1038999/FULLTEXT02.pdf · SKOLAN FÖR BIOTEKNOLOGI. . 1 ... PCR primers were constructed both for initial long-range products and for shorter fragments,

35

Product from PCR2 were pooled together, per individual and multiplexing set-up, creating 32-

plexes of inner fragments, and a 1:100 dilution of these pools were used as template for the

barcode-introducing PCR3. This reaction was run for 15 cycles, at 58 °C annealing

temperature, and 5 minutes extension time. Both TagTaq and PlatinumTaq were employed,

according to Table 1 below.

 

Samples   Singleplex  TagTaq  

Singleplex  PlatinumTaq  

Duplex  TagTaq  

Duplex  PlatinumTaq  

Quadruplex  TagTaq  

Quadruplex  PlatinumTaq  

IR119   X   X   X   X   X   X  IR126   X   X   X   X   X   X  IR85   -­‐   -­‐   -­‐   -­‐   X   -­‐  IR92   -­‐   -­‐   -­‐   -­‐   X   -­‐  IR114   -­‐   -­‐   -­‐   -­‐   X   -­‐  IR127   -­‐   -­‐   -­‐   -­‐   X   -­‐  Table 1: Table showing the different combinations of samples, enzymes, and multiplexing variants used in PCR3 for the introduction of the barcode indices.

Concentration Measurements and Cleaning

Concentration measurements were performed on all 32-plexes after PCR3, using the Qubit

dsDNA HS Assay Kit (Invitrogen, Life Technologies), followed by a cleaning step on an

MBS machine (Magnetic Bead Separation) in order to remove smaller fragments than those

meant for sequencing, such as loose primers and primer dimer constructs. This first

concentration measurement was performed both as a quick way of verifying product from

PCR3, and as a means of estimating the relative concentration of actual product when

compared to the clean samples. The Illumina CA Purification protocol [15] was used, diluting

20 µl of PCR3 product to 50 µl using elution buffer (EB). A concentration of 14% PEG was

used as precipitation buffer in order to achieve an appropriate size cut-off [17] [18]. The

parameters entered into the Magnatrix OS were 50 µl sample volume, 20 µl magnetic beads,

100 µl Precipitation Buffer, 25 µl EB, and 10 minutes binding time, resulting in input

volumes of 50 µl sample, 95 µl EB, 125 µl 14% PEG, 220 µl 80% EtOH, and 25 µl beads.

After MBS cleaning, a second Qubit concentration measurement was performed on all

samples, in order to estimate the actual product concentration, from which the pooling of

samples for the sequencing was subsequently based. The samples were also run on

BioAnalyzer (Agilent Technologies, 1000 kit) for a visual verification of the success of the

Page 38: LINNEA GULDBRAND - Diva1038999/FULLTEXT02.pdf · SKOLAN FÖR BIOTEKNOLOGI. . 1 ... PCR primers were constructed both for initial long-range products and for shorter fragments,

36

cleaning step. The desired products are expected to be in the range of 600-700 bp, due to the

base fragment being around 550 bp, to which large additional adaptor sequences have been

added.

Figure 42: Bioanalyzer results of all samples apart from the singleplex PlatinumTaq set ups, which were on a separate Bioanalyzer run. All samples are successfully cleaned, showing no short, unspecific products, and those run with PlatinumTaq clearly showing a peak at the expected size range of 600-700, while peaks are very small for those run with TagTaq.

Both assays indicated higher yields of specific product from the sample set-ups run with

PlatinumTaq than from those that were performed with TagTaq, and product above the

detection cut-off for all samples except one (Table 2). In the case of the four additional

individual samples, these were pooled into one sample prior to the second cleaning step.

Page 39: LINNEA GULDBRAND - Diva1038999/FULLTEXT02.pdf · SKOLAN FÖR BIOTEKNOLOGI. . 1 ... PCR primers were constructed both for initial long-range products and for shorter fragments,

37

Sample  Concentration  before  cleaning   Concentration  after  cleaning  

In  assay  [ng/ml]  

In  sample  [µg/ml]  

In  assay  [ng/ml]  

In  sample  [µg/ml]  

IR119ST   20.6   4.12   1.32   0.263  

IR119SP   50.2   10.0   23.9   4.78  

IR119DT   21.0   4.20   1.06   0.212  

IR119DP   70.8   14.2   25.6   5.12  

IR119QT   14.7   2.94   <0.5*   -­‐  

IR119QP   48.7   9.74   21.6   4.33  

IR126ST   20.0   4.0   1.61   0.322  

IR126SP   36.3   7.27   22.3   4.45  

IR126DT   18.8   3.77   1.24   0.248  

IR126DP   35.6   7.12   19.9   3.99  

IR126QT   23.2   4.64   1.42   0.284  

IR126QP   37.6   7.51   18.0   3.59  

IR85QT   17.6   3.52  

1.15   0.230  IR92QT   11.6   2.33  

IR114QT   14.5   2.89  

IR127QT   22.7   4.53  Table 2: Overview of the amount of PCR in the different samples before and after cleaning using the Illumina CA Purification Protocol on the MBS [17].

Initial Sequencing

All 32-plexes were then pooled for sequencing on the MiSeq, using the MiSeq Reagent Kit

V2, 300 cycles (Illumina), i.e. paired-end sequencing of 150 bases from each end. After

demultiplexing, the results, shown in abbreviation in Tables 3 and 4, were analysed, and table

5 provides a colour coding key, used to highlight the different magnitudes of reads. Results

showed generally lower numbers of reads than anticipated, and while not being completely

conclusive nevertheless showed clear trends in successful amplification and sequencing. The

main implications were that sample set-ups where PlatinumTaq had been used for PCR3 had

overall generated larger numbers of reads than those that had been performed with TagTaq,

and that the inner fragments corresponding to outer fragments 1, 2, and 6 (inner fragments 1

through 8, and 21 through 24) had yielded far fewer reads than the remaining ones. The latter

trend was particularly noticeable among the first 8 inner fragments, with only two out of 16

deviating from the pattern, while the pattern for fragments 21 through 24 was not as

pervasive.

Page 40: LINNEA GULDBRAND - Diva1038999/FULLTEXT02.pdf · SKOLAN FÖR BIOTEKNOLOGI. . 1 ... PCR primers were constructed both for initial long-range products and for shorter fragments,

38

Sample  85   92   114   127   119DP   119DT   119QP   119QT   119SP   119ST  

Fragments  1   392   332   1   10   1191   861   3763   691   1551   1382  2   4   1   3   1   2190   1892   18   8   360   182  3   21   1   3   1   172   101   27   8   711   900  4   0   6   5   4   1565   410   240   9   49258   3885  5   13   9   7   9   411   199   297   16   6943   2992  6   1   2   0   0   159   90   57   8   26382   2707  7   7   6   2   5   34767   6293   361   15   63126   6568  8   1   0   0   0   211   69   97   3   37699   1718  9   2125   1157   1146   1158   133270   12310   28138   1132   397   515  10   2136   1071   1538   1943   116761   31809   14731   1677   9387   13220  11   1683   1017   1725   1931   168542   17748   8367   351   38699   5969  12   2161   1215   266   5836   126426   24588   9050   629   57564   10430  13   967   406   90   531   104270   7243   51632   819   31637   2396  14   435   144   114   394   104642   6709   63538   1390   72882   3827  15   502   144   58   122   29198   8296   1073   335   74150   7058  16   192   72   13   3011   75913   9907   19411   973   82509   16184  17   2326   1409   2517   2885   59213   9038   270002   6629   706   1394  18   3349   2002   2754   2442   105531   10036   316883   9134   73157   5234  19   3322   1830   1564   1680   136755   12378   437807   11373   110441   7505  20   3386   1705   52   561   126721   9141   390608   8600   99596   5159  21   1477   612   13   130   1427   2868   4268   528   139331   10548  22   10   2   8   6   2243   993   3006   133   71586   12202  23   10   5   1   5   734   591   548   95   70299   6355  24   2   1   2   9   8859   6700   2139   107   40107   19496  25   424   193   260   231   4194   1639   19755   1307   73767   6168  26   395   199   42   42   8142   3634   25041   1888   155734   18440  27   188   86   128   72   6825   1478   33161   749   32533   3727  28   74   36   19   48   1698   574   6172   237   221553   12900  29   5145   2708   4085   4350   101017   9688   257360   7500   198695   19424  30   8141   4205   2521   4991   130392   19357   274886   9148   906   4091  31   2772   1551   1587   1965   71028   13996   179512   5725   41943   3070  32   4579   2926   743   753   66604   29799   228790   7520   78606   12442  

Table 3: Table showing the number of reads obtained for each fragment from the additionally prepared TagTaq quadruplexes 85, 92, 114, 127, and all six TagTaq and PlatinumTaq set ups for sample 119. The colour of the background, in a scale from red to yellow to green, serves to highlight the differences in the number of reads obtained for each fragment, with one colour for each degree of magnitude, with the exception of dark red, signifying a frequency of 0, detailed in table 5.

Page 41: LINNEA GULDBRAND - Diva1038999/FULLTEXT02.pdf · SKOLAN FÖR BIOTEKNOLOGI. . 1 ... PCR primers were constructed both for initial long-range products and for shorter fragments,

39

Sample  126DP   126DT   126QP   126QT   126SP   126ST  

Fragments  1   74   87   8   200   497   447  2   76   35   15   63   69   96  3   34   16   19   71   401   568  4   5336   651   984   496   191   202  5   120   152   183   265   111   22  6   87   18   53   67   435   479  7   56622   3217   855   154   3583   4222  8   271   70   109   62   20365   1590  9   121469   12941   5479   3597   395   161  10   199811   24356   10993   9149   483   114  11   147953   23749   13682   7657   81658   10882  12   311537   59062   71196   31221   82312   11919  13   245150   20750   40067   14957   35829   2558  14   152221   13621   64102   10398   106776   8142  15   68266   12994   1838   18021   80154   6856  16   124980   11135   87470   8458   80434   16867  17   471   4529   226507   34521   58965   5959  18   133148   14492   255859   45269   19840   2061  19   3427   1402   183958   28863   100110   9264  20   1083   181   85691   9098   96191   10227  21   2996   970   4545   11022   167713   16844  22   1451   510   2753   2892   134284   19105  23   381   561   188   1925   32671   4575  24   23872   2099   14601   1511   69398   12497  25   372   694   1991   7647   43532   7101  26   10983   1363   19320   3233   129972   15561  27   1008   943   75301   16132   132979   9330  28   832   285   38164   12357   117058   10674  29   157249   19281   223762   46346   250894   30530  30   132277   26932   258884   78112   118590   12603  31   128804   27254   212690   39436   86354   6759  32   22988   12172   13393   33537   94675   39913  

Table 4: Table showing the number of reads obtained for each fragment from all six TagTaq and PlatinumTaq set ups for sample 126. The colour of the background, in a scale from red to yellow to green, serves to highlight the differences in the number of reads obtained for each fragment, with one colour for each degree of magnitude, detailed in table 5.

Colour  Coding  Key   Number  of  reads  

  n  =  0  

  1  <  n  <  10  

 10  <  n  <  100  

  100  <  n  <  1'000  

  1'000  <  n  <  10'000  

 10'000  <  n  <  100'000  

  100'000  <  n  <  1'000'000  Table 5: Table showing the colours used to lable the reads, and their corresponding number of reads

Page 42: LINNEA GULDBRAND - Diva1038999/FULLTEXT02.pdf · SKOLAN FÖR BIOTEKNOLOGI. . 1 ... PCR primers were constructed both for initial long-range products and for shorter fragments,

40

Results Obtained Post-Project

Subsequent PCR and MiSeq runs, performed by the research group during the writing of the

report, have yielded additional insight into the workings of the primers, PCR set ups, and

sequencing. After modifying the relative concentrations of both inner and outer primers, a

highly modified set up of primer concentrations, seen in tables 6 and 7, were found to result in

a very even level of reads in sequencing, seen in table 8, with a maximum 6.1-fold difference

between the fragments with the highest and lowest number of reads, counting the two high

outliers (4.2 if discounting the outliers).

Outer  Primer   Concentration  in  PCR1  

[µM]  1   0.5  2   0.4  3   0.08  4   0.04  5   0.04  6   0.4  7   0.15  8   0.035  

Table 6: Table showing the modified concentrations of the outer primers. Rather than equal concentrations for each pair primers, the concentrations now vary significantly.

 

Concentrations  of  Primers  in  Modified  Inner  Quadruplexes    Q1   c  [µM]   Q2   c  [µM]   Q3   c  [µM]   Q4   c  [µM]  1   0.07   2   0.1   3   0.2   4   0.15  9   0.1   10   0.1   11   0.15   12   0.08  17   0.1   18   0.1   19   0.18   20   0.1  25   0.07   26   0.12   27   0.07   28   0.15                  

Q5*   c  [µM]   Q6   c  [µM]   Q7*   c  [µM]   Q8   c  [µM]  7   0.1   6   0.25   5   0.4   8   0.2  13   0.3   14   0.5   15   0.8   16   0.25  21   0.2   22   0.3   23   0.6   24   0.15  29   0.1   30   0.35       32   0.3  31   0.2              

Table 7: Table showing the modified quadruplexes and related concentrations of the inner primers. Rather than equal concentrations for each pair of primers, the concentrations vary significantly, and the primers included in quadruplexes 5 and 7 have been changed: inner primers 7 and 31 have been moved to Q5* and primer 5 to Q7*, making them a quintuplex and a triplex, respectively.

   

 

 

Page 43: LINNEA GULDBRAND - Diva1038999/FULLTEXT02.pdf · SKOLAN FÖR BIOTEKNOLOGI. . 1 ... PCR primers were constructed both for initial long-range products and for shorter fragments,

41

 

Sample   C1V2.7    Fragment   %   Reads  

1   2,32  %   24103  2   1,39  %   14484  3   3,85  %   40023  4   5,32  %   55317  5   2,54  %   26366  6   4,03  %   41859  7   2,44  %   25403  8   1,08  %   11207  9   3,66  %   37995  10   4,68  %   48609  11   4,67  %   48503  12   1,92  %   19993  13   1,87  %   19398  14   1,43  %   14910  15   1,34  %   13876  16   2,06  %   21401  17   3,46  %   35993  18   4,50  %   46735  19   4,37  %   45458  20   6,58  %   68381  21   3,15  %   32769  22   1,55  %   16121  23   1,74  %   18135  24   2,92  %   30357  25   2,19  %   22806  26   4,40  %   45703  27   3,61  %   37564  28   4,11  %   42677  29   4,19  %   43558  30   2,90  %   30190  31   2,89  %   30011  32   2,83  %   29465  

Table 8: Table showing the result of a MiSeq sequencing run on samples generated using the modified concentrations shown in tables 6 and 7. As before, the colour key in table 5 was used to label the numbers of reads, and the percentage of total reads are here graded in blue for clarity.

Discussion Primer Layout and Design

The first, and one of the more challenging parts of the project, was the initial primer design.

Compounded by the lack of previous experience in primer design, together with the

constraints stipulated by the template DNA itself, the process took significant time, effort, and

Page 44: LINNEA GULDBRAND - Diva1038999/FULLTEXT02.pdf · SKOLAN FÖR BIOTEKNOLOGI. . 1 ... PCR primers were constructed both for initial long-range products and for shorter fragments,

42

compromise to finalise. The first attempt at aligning 32 pairs of primers and have them cover

the entirety of the 16727 bp mtDNA failed to reach all the way around, and thus, the second

attempt required further compromises and careful choices between possible primer pairs on

every step of the way. That is not to say that the final set of primers is necessarily inherently

inferior in quality, rather that different choices early on in the primer selection process,

influenced by knowledge of the shortcomings of the previous attempt, allowed for a tighter

alignment of primer pairs, thereby covering more ground. It was quite evident, when

performing the second primer alignment, that actively choosing a few of the longer-reaching

primer pairs early on effectively shifted the entire alignment into a more favourable frame,

where most forward inner primers could be placed closer to the preceding reverse inner

primer, in comparison to the previous attempt. This ultimately lead to more freedom in

aligning problematic primer pairs, due to the fact that more pairs, over all, met or exceeded

the average required length.

Most inner primer pairs have proven to successfully amplify their intended target of the

expected size, with a few significant exceptions, including the failure of the originally

synthesised primer pair 8, as well as the fact that primer pair 2 occasionally appeared much

lower on the gels than expected, suggesting a shorter sequence than intended had been

amplified. This is, however, to be expected, as inner primer 2 covers the repeating region,

which varies in size between individuals. Apart from these specific events, primers often

resulted in shorter non-specific, suspected primer dimers, sometimes in worryingly large

quantities in comparison to the specific product, and there was a significant inconsistency in

the successful amplification of individual sequences. Certain primer pairs were more prone to

this latter behaviour than others, but there was, in the end, evidence for each primer pair to

have functioned, however not all 32 at any single one occasion. The suspected primer dimer

products were also highly successfully removed by the cleaning step, employed later in the

process.

The 32 inner primer set up does seem to be successful, albeit after several modifications to

relative concentrations.

There is some concern regarding the amplification of the control region, being a very

informative region of the mtDNA that contains a large number of single nucleotide

polymorphisms. It would be preferable to cover this region in one single amplified segment,

but currently, it spans the last third of primer the segment amplified by primer pair 32,

through segments 1 and 2, ending at the very beginning of number 3. It is, however, not

Page 45: LINNEA GULDBRAND - Diva1038999/FULLTEXT02.pdf · SKOLAN FÖR BIOTEKNOLOGI. . 1 ... PCR primers were constructed both for initial long-range products and for shorter fragments,

43

possible to fully cover the control region in a single amplified segment using the chosen

sequencing platform, as it measures 1270 bp in the reference genome, [19], which is

obviously far beyond the reach of the 600 bp covered by the Illumina Paired End Sequencing

protocol. Even just the hypervariable region 1, HV1, containing the majority of these

informative sites, once primers for its specific amplification had been added, would have

exceeded the upper limit of 590 bp, measuring in itself 582 bp [20].

PCR Procedures

Although not to the same extent as primer design, previous experience with setting up PCR

reactions, particularly at this scale, was quite limited. There was significant trail and error

involved in finding working protocols, and especially early on, it was harder to rule out

human error, as opposed to sub-optimal protocols when results were poor. As the project

proceeded, however, more experience was obtained, both theoretically and practically, and it

became easier to both perform and evaluate the PCR runs. At the closing of the laborative part

of the project, many of the protocols were in all likelihood less than optimal, and can probably

be improved upon at a later date.

As it stood then, quadruplexes of the outer primers appeared to be working, as evidenced by

the subsequent successful singleplex amplification of inner primers from the diluted

quadruplex reaction mixture, despite the fact that subsequent singleplexes of the outer primers

themselves have ostensibly not shown this not to be the case. These odd results were later

elucidated by experiments performed post-project, which revealed that highly un-equal

concentrations of both inner and outer primers were required to ultimately yield even numbers

of reads in sequencing.

Outer primers, when run in singleplex from earlier quadruplexes, using equal concentrations,

almost invariably yielded product for only primers 3, 4, 5, and 8. In light of the concentrations

established by later experiments, where these previously successfully outer primers were

amplified with concentrations of around a tenth of those that were previously not successfully

amplified, these original results hardly seem surprising. Given the exponential nature of PCR

amplification, this difference in concentrations is highly significant. The same results can be

seen in the inner primers.

Page 46: LINNEA GULDBRAND - Diva1038999/FULLTEXT02.pdf · SKOLAN FÖR BIOTEKNOLOGI. . 1 ... PCR primers were constructed both for initial long-range products and for shorter fragments,

44

Indexing, Cleaning, and Sequencing

The incorporation of index primers into the sequencing construct seems to be overall

successful, but appears to vary significantly based primarily on the enzyme employed. This is

evident from the fact that the concentration of product post-purification, i.e. the specific

product in the correct size range, is significantly higher in the reactions that used PlatinumTaq

than the ones that used TagTaq.

The cleaning itself also appears successful, judging by the difference in the amount of product

pre- and post-cleaning, and the lack of low length sequences present post-cleaning. The

Bioanalyzer corroborates these results, showing samples clear of low length, unspecific

products, primer dimers, or loose primers, while showing peaks in the expected size range for

the desired, complete sequencing constructs.

The first attempt at sequencing was moderately successful, showing reads from most of the 32

different fragments for all sequenced individuals, while also displaying indications of the

expected issues with this original 32-fragment approach. There was far from an equal

representation of each primer fragment, and there were also clear indications that the relative

success of the outer primer from which an individual inner primer is derived heavily

influences the number of reads obtained for the inner primer in question. The latter conclusion

is derived from the pattern of lower yielding inner primers, which occur in groups of four that

correspond to the outer primers whereas they do not correspond to the physical inner

quadruplexes in which they were amplified. The samples that had the indices incorporated

with PlatinumTaq had overall a higher number of reads, collectively, than those that used

TagTaq, but generally shows the same patterns and variation between inner primer fragments.

Using the results obtained post-project, it is clearly evident that mainly the outer primers were

very uneven in their comparative efficiency, but that drastically changing their relative

concentrations to compensate for this resulted in very favourable results.

Conclusions

While the initial sequencing run was not by any means perfect, the method as a whole shows

promise. All outer primers appear functional, as do all inner primers, although some variation

has been observed in different individual experiments. All individual inner primers have been

shown to yield product in different experiments, but never all at the same time, at least not to

Page 47: LINNEA GULDBRAND - Diva1038999/FULLTEXT02.pdf · SKOLAN FÖR BIOTEKNOLOGI. . 1 ... PCR primers were constructed both for initial long-range products and for shorter fragments,

45

the degree as to being readily visible on agarose gels. While the outer primers could not be

successfully proven to be functional by re-amplifying them in singleplex from the outer

primer quadruplexes themselves, subsequent successful inner primer amplification from outer

primer quadruplexes shows that all outer primers must have amplified enough to provide

adequate template for the inner primers. The cleaning on the MBS using the Illumina CA

Purification works well, and does not seem to need any modifications, and the incorporation

of indices works well, especially when using PlatinumTaq. The sequencing itself indicates

that the amount of indexed sequencing constructs provided by the TagTaq enzyme was in fact

enough to provide a substantial amount of reads, comparable to those provided by the

PlatinumTaq, where the higher amount of reads were generally greater by one order of

magnitude.

The challenges yet to be overcome after the conclusion of the practical part of the project

were primarily those of levelling out the highly varying read numbers between the different

fragments. Since these appeared to correlate significantly with the outer primers from which

they were amplified, a solution seemed to be to alter the relative concentrations of the outer

primers in the initial outer quadruplexes, or in some other manner manipulate the relative

abundance of the outer primers products. This, in conjunction with slight adjustments to the

compositions and the relative concentrations of the primers in the inner quadruplexes, has

now, as described in the post-project results, been shown to drastically improve the ratio

between the number of reads obtained for the 32 different fragments. Based on these results,

the method appears very promising.

Page 48: LINNEA GULDBRAND - Diva1038999/FULLTEXT02.pdf · SKOLAN FÖR BIOTEKNOLOGI. . 1 ... PCR primers were constructed both for initial long-range products and for shorter fragments,

46

References

1. Savolainen P., Zhang Y-P., Luo J., Lundeberg J., and Leitner T. (2002)

Genetic Evidence for an East Asian Origin of Domestic Dogs

SCIENCE Vol. 298:1610-1613

2. Ding Z-L., Oskarsson M., Ardalan A., Angleby H., Dahlgren L-G., Tepeli C.,

Kirkness E., Savolainen P., and Zhang Y-P. (2011)

Origins of domestic dog in Southern East Asia is supported by analysis of Y-

chromosome DNA

Heredity (2011), 1–8

3. Pang J-F., Kluetsch C., Zou X-J., Zhang A-B., Luo L-Y., Angleby H., Ardalan A.,

Ekström C., Sköllermo A., Lundeberg J., Matsumura S., Leitner T., Zhang Y-P., and

Savolainen P. (2009)

mtDNA Data Indicate a Single Origin for Dogs South of Yangtze River, Less Than

16,300 Years Ago, from Numerous Wolves

Mol. Biol. Evol. 26(12): 2849–2864

4. van Asch B., Zhang A-B., Oskarsson M. C. R., Klütsch C. F. C., Amorim A., and

Savolainen P. (2013)

Pre-Columbian origins of Native American dog breeds, with only limited replacement

by European dogs, confirmed by mtDNA analysis

Proc R Soc B 280: 20131142

5. Oskarsson M. C. R., Klütsch C. F. C., Boonyaprakob U., Wilton A., Tanabe Y., and

Savolainen P. (2011)

Mitochondrial DNA data indicate an introduction through Mainland Southeast Asia

for Australian dingoes and Polynesian domestic dogs

Proc. R. Soc. B

DOI: 10.1098/rspb.2011.1395

6. Shapiro B., Cui P., Schuenemann V. J., Sawyer S. K., Greenfield D. L., Germonpré

M. B., Sablin M. V., López-Giráldez F., Domingo-Roura X., Napierala H., Uerpmann

H-P., Loponte D. M., Acosta A. A., Giemsch L., Schmitz R. W., Worthington B.,

Buikstra J. E., Druzhkova A., Graphodatsky A. S., Ovodov N. D., Wahlberg N.,

Freedman A. H., Schweizer R. M., Koepfli K-P., Leonard J. A., Meyer M., Krause J.,

Pääbo S., Green R. E., Wayne R. K. (2013)

Page 49: LINNEA GULDBRAND - Diva1038999/FULLTEXT02.pdf · SKOLAN FÖR BIOTEKNOLOGI. . 1 ... PCR primers were constructed both for initial long-range products and for shorter fragments,

47

Complete Mitochondrial Genomes of Ancient Canids Suggest a European Origin of

Domestic Dogs

SCIENCE Vol. 342:871-874

7. von Holdt B. M., Pollinger J. P., Lohmueller K. E., Han E., Parker H. G., Quignon P.,

Degenhardt J. D., Boyko A. R., Earl D. A., Auton A., Reynolds A., Bryc K., Brisbin

A., Knowles J. C., Mosher D. S., Spady T. C., Elkahloun A., Geffen E., Pilot M.,

Jedrzejewski W., Greco C, Randi E., Bannasch D., Wilton A., Shearman J., Musiani

M., Cargill M., Jones P. G., Qian Z., Huang W., Ding Z-L, Zhang Y-P., Bustamante

C. D., Ostrander E. A., Novembre J., and Wayne R. K. (2010)

Genome-wide SNP and haplotype analyses reveal a rich history underlying dog

domestication

Nature 464, 898-902

8. Ringo J. (2004)

Fundamental Genetics

9. DNA Sequencing with Solexa® Technology (2007)

Illumina, Pub. No. 770-2007-002 01May07

10. Illumina Sequencing Technology Highest data accuracy, simple workflow, and a

broad range of applications (2010)

Illumina, Pub. No. 770-2007-002 Current as of 11 October 2010

11. Sequencing Dual-Indexed Libraries on the HiSeq® System User Guide

ILLUMINA PROPRIETARY Part # 15032071 Rev. B July 2012

12. Fuller, C. W., Middendorf L. R., Benner S. A., Church G. M., Harris T., Huang X.,

Jovanovich S. B., Nelson J. R., Schloss J. A., Schwartz D. C., and Vezenov D. V.

(2009)

The challenges of sequencing by synthesis

Nature Biotechnology Vol: 27 Nr:,11 1013-1023

13. GenBank: U96639.2, Mitochondrial reference genome of the domestic canine on the

NCBI Database

http://www.ncbi.nlm.nih.gov/nuccore/U96639

14. Savolainen P., Arvestad L., and Lundeberg J. (2000)

mtDNA Tandem Repeats in Domestic Dogs and Wolves: Mutation Mechanism

Studied by Analysis of the Sequence of Imperfect Repeats

Mol. Biol. Evol. 17(4): 474–488

15. Primer BLAST primer alignment tool on the NCBI database

Page 50: LINNEA GULDBRAND - Diva1038999/FULLTEXT02.pdf · SKOLAN FÖR BIOTEKNOLOGI. . 1 ... PCR primers were constructed both for initial long-range products and for shorter fragments,

48

http://www.ncbi.nlm.nih.gov/tools/primer-blast/

16. Invitrogen information sheet on Platinum Taq

https://www.lifetechnologies.com/content/dam/LifeTech/migration/files/pcr/pdfs.par.2

6652.file.dat/platinumtaq-pps.pdf

17. Protocol for Illumina CA Purification

https://github.com/EnvGen/LabProtocols/blob/master/CA_cleaning.pdf

18. Lundin S., Stranneheim H., Pettersson E., Klevebring D., Lundeberg J. (2010)

Increased Throughput by Parallelization of Library Preparation for Massive

Sequencing.

PLoS ONE 5(4): e10029.

DOI:10.1371/journal.pone.0010029

19. Gundry R. L., Allard M. W., Moretti T. R., Honeycutt R. L., Wilson M. R., Monson

K. L., and Foran D. R. (2007)

Mitochondrial DNA Analysis of the Domestic Dog: Control Region Variation Within

and Among Breeds

DOI: 10.1111/j.1556-4029.2007.00425.x

20. Imes D. L., Wictum E. J., Allard M. W., Sacks B. N. (2012)

Identification of single nucleotide polymorphisms within the mtDNA genome of the

domestic dog to discriminate individuals with common HVI haplotypes

DOI: 10.1016/j.fsigen.2012.02.004

21. Natanaelsson C., Oskarsson MC., Angleby H., Lundeberg J., Kirkness E., Savolainen

P. (2006).

Dog Y chromosomal DNA sequence: identification, sequencing and SNP discovery.

BMC

Genet 7: 45.

22. AlbaNova University Center

School of Biotechnology of the Royal Institute of Technology (KTH)

23. Meyer M., Stenzel U., Myles S., Prüfer K., and Hofreiter M. (2007)

Targeted high-throughput sequencing of tagged nucleic acid samples

DOI: 10.1093/nar/gkm566

24. Gunnarsdóttir E. D., Li M., Bauchet M., Finstermeier K., and Stoneking M. (2011)

High-throughput sequencing of complete human mtDNA genomes from the

Philippines

DOI: 10.1101/gr.107615.110

Page 51: LINNEA GULDBRAND - Diva1038999/FULLTEXT02.pdf · SKOLAN FÖR BIOTEKNOLOGI. . 1 ... PCR primers were constructed both for initial long-range products and for shorter fragments,

49

25. Maricic T., Whitten M., and Pääbo S. (2010)

Multiplexed DNA Sequence Capture of Mitochondrial Genomes Using PCR Products

DOI: 10.1371/journal.pone.0014004

26. Improved quantitative PCR using nested primers.

Haff L.A.

Genome Res. 1994 3: 332-337

27. Illumina MiSeq Specifications

http://www.illumina.com/systems/miseq/performance_specifications.html