mpsplex - a large, massively parallel sequencing snp panel

23
MPSPlex - A large, massively parallel sequencing SNP panel for the identification of missing persons Further development and optimisation Felix Bittner This project was performed at the International Commission on Missing Persons in the time of 01.05.2018 - 10.12.2018. Supervisor: Thomas Parsons, PhD International Commission on Missing Persons Examiner: Prof. Ate Kloosterman, PhD University of Amsterdam Date of submission: 10.12.2018 Studentnumber: 11389001 Number of EC: 36 Master of Forensic Science Institute for Interdisciplinary Studies University of Amsterdam

Upload: others

Post on 07-Nov-2021

15 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: MPSPlex - A large, massively parallel sequencing SNP panel

MPSPlex - A large, massively

parallel sequencing SNP panel for

the identification of missing

persons

Further development and optimisation

Felix Bittner

This project was performed at the International Commission onMissing Persons in the time of 01.05.2018 - 10.12.2018.

Supervisor: Thomas Parsons, PhD

International Commission on Missing Persons

Examiner: Prof. Ate Kloosterman, PhD

University of Amsterdam

Date of submission: 10.12.2018Studentnumber: 11389001

Number of EC: 36

Master of Forensic ScienceInstitute for Interdisciplinary Studies

University of Amsterdam

Page 2: MPSPlex - A large, massively parallel sequencing SNP panel

MPSPlex - A large, massively parallel sequencing

SNP panel for the identification of missing

persons

Further development and optimisation

Felix Bittner

Abstract

Missing persons are often identified by short tandem repeat analysis.Due to lack of reliable antemortem samples, DNA of unknown

persons must be statistically compared to putative family members.When the DNA is severely degraded, locus dropout leads to a lower

statistical power of the analysis. This alone can preventidentification, but is further compounded if only distant relatives are

available. Single nucleotide polymorphisms can use shorteramplicons and are thus less susceptible to degradation, they howeverlack statistical power compared to STRs. To take advantage of this,

ICMP is developing MPSPlex: A platform agnostic, massivelyparallel sequencing SNP panel with 1456 loci, designed for the

identification of missing persons.This study identified 490 SNP loci in MPSplex with indicators of

poor performance and suggested primer redesign approaches for 414of them. For 153, primers could be redesigned, which lead to anaverage 10x coverage success rate of 0.99 with these loci on highquality DNA . After 55 loci were discarded during manual review

and 61 loci were excluded at analysis, MPSplex reached an average10x coverage between 0.95-0.99, depending on input. Finally,

MPSplex’s fitness for purpose was demonstrated by sequencing of 4degraded bone samples. Here, 10x coverage rates and concordance

compared favourably to co-sequenced NA12878 control DNA.These results establish MPSplex as a robust panel with over 1205

SNPs and 46 microhaplotypes for the identification of missingpersons via degraded bone samples, even if only distant relatives are

available. Further work should focus on robotic automation,optimisation and validation.

Page 3: MPSPlex - A large, massively parallel sequencing SNP panel

Acknowledgements

I want to thank Thomas Parsons and Ate Kloosterman for

their guidance and extremely valuable supervision. Without

a doubt, this project would not have happened without them.

My colleagues Michelle and Sejla have spent countless hours

in the lab with me, have motivated me and made me feel at

home. The same goes for the rest of the team at ICMP, I

cannot thank them enough.

Finally, I want to thank Qiagen who have provided advice,

technical support, reagents and their expertise throughout

this project.

Page 4: MPSPlex - A large, massively parallel sequencing SNP panel

Contents

1 Introduction 1

2 Materials and Methods 3

2.1 Original MPSplex panel design . . . . . . . . . . . . . . . . . . . . . 32.2 MPSplex library preparation and sequencing protocol . . . . . . . . . 3

2.2.1 QIAseq Targeted DNA Panel principle . . . . . . . . . . . . . 42.3 Panel performance before redesign . . . . . . . . . . . . . . . . . . . . 42.4 Analysis of poorly performing loci . . . . . . . . . . . . . . . . . . . . 6

2.4.1 Identification of poorly performing loci . . . . . . . . . . . . . 62.4.2 Primer redesign recommendations . . . . . . . . . . . . . . . . 62.4.3 Addition of ancestry informative SNPs . . . . . . . . . . . . . 6

2.5 Panel performance after redesign . . . . . . . . . . . . . . . . . . . . 72.5.1 High quality and sheared DNA . . . . . . . . . . . . . . . . . 72.5.2 Bone samples . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3 Results 9

3.1 Panel performance before redesign . . . . . . . . . . . . . . . . . . . . 93.2 Analysis of poorly performing loci . . . . . . . . . . . . . . . . . . . . 9

3.2.1 Identification of poorly performing loci . . . . . . . . . . . . . 93.2.2 Primer redesign recommendations . . . . . . . . . . . . . . . . 93.2.3 Addition of ancestry informative SNPs . . . . . . . . . . . . . 9

3.3 Panel performance after redesign . . . . . . . . . . . . . . . . . . . . 103.3.1 High quality and sheared DNA . . . . . . . . . . . . . . . . . 103.3.2 Degraded bone samples . . . . . . . . . . . . . . . . . . . . . . 11

4 Discussion & Conclusion 13

i

Page 5: MPSPlex - A large, massively parallel sequencing SNP panel

List of Figures

2.1 Absolute MPSplex primer to SNP target distances. . . . . . . . . . . 32.2 MPSplex: Number of SNPs per chromosome. . . . . . . . . . . . . . . 42.3 Qiaseq library preparation (QIAGEN, n.d.). . . . . . . . . . . . . . . 5

3.1 Boxplots of performance metrics used to identify poorly performingloci. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3.2 PCA plot of 55 ancestry informative SNPs included in MPSplex. . . . 123.3 SNPs without previous performance problems (N=1052) covered at

least 10x coverage. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

ii

Page 6: MPSPlex - A large, massively parallel sequencing SNP panel

List of Tables

2.1 Sample setup for sequencing of high quality NA1287 and GIAB familytrio DNA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2 Primer redesign recommendations categories . . . . . . . . . . . . . . 62.3 Sample setup for sequencing of high quality and sheared DNA after

redesign. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.4 Sample setup for sequecing of degraded bone samples. . . . . . . . . . 8

3.1 Summary of primer redesign suggestions. Loci may have multiplecategories, in total 414 loci were suggested for redesign. . . . . . . . . 9

3.2 10x coverage performance of NA12878 after primer redesign for 21and 22 UPCR cycles and 1 and 10ng input DNA. . . . . . . . . . . . 11

3.3 Average coverage of SNPs without previous performance problems(N=1052) for bones and NA12878. . . . . . . . . . . . . . . . . . . . 11

iii

Page 7: MPSPlex - A large, massively parallel sequencing SNP panel

1. Introduction

Mass fatalities such as the 2004 Indian Ocean tsunami, the war in former Yugoslavia,the 9/11 attacks or flight MH17 lead to the disappearance of many people. In theaftermath, forensic investigators face the challenge to identify the missing. One oftheir tools of choice is DNA analysis, however missing persons investigations posechallenges not normally encountered in criminal casework.

Reliable antemortem samples of the missing person are rarely available and DNAidentifications are typically made through the DNA of family members. Such caseshowever often fail to reach statistical thresholds, because the likelihood of sharingalleles with a family member decreases with distance of the relationship, or ratherthe number of meioses. Even when investigators combine different kits to analyseadditional STR markers, current STR technology fails to provide sufficient statisticswhen, for example, only a first cousin or a half-sibling is available for comparison.On top of that, adverse environmental conditions such as radiation, enzymatic orbacterial activity and other chemical reactions highly degrade the avaialable DNA(Alaeddini, Walsh & Abbas, 2010). This is compounded by the, often, considerabledelay between the incident and the investigation. The result are low quantity DNAsamples that are highly fragmented (Jakubowska, Maciejewska & Paw lowski, 2011;Amory, Huel, Bilic, Loreille & Parsons, 2012). Increasingly, Short-Tandem-Repeat(STR) loci will fail to amplify, for example because their length exceeds the sizeof DNA fragments in the sample, or because their primer regions are damagedin other ways. With fewer amplified loci the statistical power of the resulting STRprofile decreases, which leads to additional complication, if not prevention, of reliableidentification.

In recent years, scientists addressed the issue of DNA degradation by reducing thesize of STR amplicons (Hill, Kline, Coble & Butler, 2007; Butler, Shen & McCord,2003; Chung, Drabek, Opel, Butler & McCord, 2004; Drabek, Chung, Butler &McCord, 2004; Coble & Butler, 2005). This has led to the availability of so-calledmini-STR panels, for example the Minifiler by Thermofisher (Mulero et al., 2008),with amplicon sizes of just 71-250bp. This reduction in amplicon size has improvedrecovery of DNA profiles from degraded samples (Hughes-Stamm, Ashton & vanDaal, 2010) and mini-STRs panels are now routinely used. Nonetheless, caseworkstudies with mini-STR panels show that a large number of the especially challen-ging cases (e.g. 11.25% of total samples (Parsons et al., 2007)) still fail to reachthresholds for identification. Significant further progress is not expected becausethe STR amplicon size is necessarily limited by their repeat unit structure and theavailability of close primer-binding sites. STR technology is thus unlikely to resolvecases with DNA so highly degraded that mini-STRs fail to produce sufficient results.

An alternative to traditional STR markers are single nucleotide polymorphisms(SNPs), by definition they are single base pair changes and would thus requireonly very short amplicons. To match the statistical power of a standard STR kitone would need 40-60 SNP loci (Butler, Coble & Vallone, 2007) and nowadays

1

Page 8: MPSPlex - A large, massively parallel sequencing SNP panel

1. Introduction

commercial SNP kits such as SNPforID accomplish exactly this (Musgrave-Brownet al., 2007). In contrast to the clinical world these panels are however rather smalland lack the statistical power desired for distant kinship analysis.

In the clinical world, large SNP panels of 100s, and even 1000s, of SNPs arepossible, thanks to massively parallel sequencing (MPS) technology. With this tech-nology, countless fragments are spatially separated and immobilised on a flow-cellor semiconductor chip. Fragments are then subjected to a sequencing-by-synthesis(SBS) reaction and the results are recorded for each position separately. Millions ofreads are then aligned to a reference genome to infer their genomic location (Met-zker, 2009). Multiple commercial systems, such as QIAseq [16] and MiSeq [17], offerhighly standardised workflows, designed for high-throughput, clinical, laboratories.Some of these technologies have been adopted by the forensic community, with mul-tiple SNP based assays for human identifications now developed and validated. Earlypanels included around 140 SNPs (Grandell, Samara & Tillmar, 2016; de la Puenteet al., 2017) but recent publications have expanded this to over 400 SNPs in a singlepanel (Mo et al., 2018). Even though these panels are promising for identificationvia direct reference samples, they do not provide enough SNP markers to achievehigh enough statistical power for distant kinship matching in missing persons andDVI investigations.

For these reasons the ICMP is currently developing MPSPlex: A large, massivelyparallel sequencing SNP panel, on QIAseq chemistry, specifically designed for theidentification of missing persons. This approach has several advantages over thecurrently available options. Firstly, it uses single primer extension, which resultsin more robust sequencing and smaller targets. On top of that, unique moleculeidentifiers (UMIs) are incorporated at the beginning of the process, which laterprovides information on the number of unique molecules each allele observation isbased on (Kivioja et al., 2011). Such information can then be used to, crucially,establish thresholds for reliable allele observations. Finally, a panel designed withthe QIAseq chemistry is also instrument agnostic and can run on MiSeq as well asGeneRead instruments, thereby maximising applicability.

The goal of this work was to identify SNP loci in MPSplex with poor performance,improve their performance by primer redesign and finally demonstrate MPSplex’sfitness for obtaining robust sequencing results from degraded bone samples. To thisend, high quality DNA samples were first sequenced and SNP loci evaluated on anumber of performance metrics. Statistical outliers for those metrics were subjectedto extensive manual review of sequencing context, primers and performance, afterwhich primer redesign recommendations were made. Primers were then redesignedby Qiagen and the new panel tested again on high quality DNA, but also degradedbone samples.

2

Page 9: MPSPlex - A large, massively parallel sequencing SNP panel

2. Materials and Methods

2.1 Original MPSplex panel design

MPSplex currently holds 1291 autosomal SNPs, 32 X-chromosomal SNPs and 46microhaplotypes, targeted by one or more primers. 93% of primers are closer than80bp to the desired target, see figure 2.1. Each chromosome, excluding the Y chro-mosome, is well covered with between 24-108 loci each (Figure 2.2). The sites havepreviously been selected from 3000 1000-Genome tri- and tetra allelic sites basedon high heterozygosity, balance across population and no close linkage to commonlyused SNPs or STRs (unpublished data by Tillmar, A. O. and Phillips, C.)

Figure 2.1: Absolute MPSplex primer to SNP target distances.

2.2 MPSplex library preparation and sequencing

protocol

MPSplex is based on the QIAseqTM Targeted DNA Panel with custom primers. Allsequencing was performed on a GeneReader instrument with the GeneRead ClonalAmp Q and GeneRead Advanced Sequencing Q kits. Library preparation, clonalamplification and sequencing were all performed according to the manufacturer’sguidelines (see (QIAGEN, October, 2017, January, 2017, December, 2016) respect-ively). Where significant adjustments to the protocol were made, they have beenadded to the description of specific experiments below.

3

Page 10: MPSPlex - A large, massively parallel sequencing SNP panel

Panel performance before redesign 2. Materials and Methods

Figure 2.2: MPSplex: Number of SNPs per chromosome.

2.2.1 QIAseq Targeted DNA Panel principle

During QIAseq targeted DNA panel library preparation (figure 2.3), samples are firstenzymatically fragmented, end-repaired and A-tailed. A construct consisting of abinding site for the universal primer, sample index and molecular barcode, also calledunique molecular index (UMI), is then ligated to the 5’ end of fragments. Targetfragments are enriched by a number of PCR cycles with one gene specific primerand a universal primer, complementary to the ligated forward primer. A subsequentuniversal PCR then amplifies the final library and adds further, platform specific,adaptors and additional sample indices.

2.3 Panel performance before redesign

To assess the performance of MPSplex, NA12878 (National Institute of Standards& Technology, 2015) as well as the GIAB family trio (National Institute of Stand-ards & Technology, 2016) were sequenced prior to this work (see table 2.1). Thesesamples are well characterised and known genotypes are thus available. Librarieshad been prepared according to standard parameters of the manufacturer’s protocol(QIAGEN, October, 2017).

4

Page 11: MPSPlex - A large, massively parallel sequencing SNP panel

Panel performance before redesign 2. Materials and Methods

Figure 2.3: Qiaseq library preparation (QIAGEN, n.d.).

Table 2.1: Sample setup for sequencing of high quality NA1287 and GIAB familytrio DNA

Sample 5ng input 40ng input

NA12878 1 replicates 1 replicateNA24149 1 replicate 1 replicateNA24143 1 replicate 1 replicateNA24385 1 replicate 1 replicate

5

Page 12: MPSPlex - A large, massively parallel sequencing SNP panel

Analysis of poorly performing loci 2. Materials and Methods

2.4 Analysis of poorly performing loci

2.4.1 Identification of poorly performing loci

Poorly performing loci were flagged if they were lower statistical outliers for one ormore of the metrics quality score, allele balance, read efficiency, forward-reverse bal-ance or if they showed an average homozygous variant frequency below 95%. Readefficiency was defined as the ratio of locus coverage and the number of reads in a500bp area around it. Loci without flags were categorised as ’confirmed’, meaningwell performing loci without need for further adjustment. The other loci were cat-egorised as ’questioned’, meaning need for further optimisation or simply need forfurther observation.

2.4.2 Primer redesign recommendations

Loci flagged for poor performance were subjected to manual review of their sequen-cing context, performance and primer sites. Loci were categorised according toidentifiable causes of poor performance and primer redesign recommendations weregiven accordingly, see table 2.2.

Table 2.2: Primer redesign recommendations categories

Category Definition Recommendation

Single primer Only one primerdesigned.

Design additional primeron opposite, or same,strand.

One directional Two primers on thesame strand.

Design primer on oppos-ite strand.

Primer distance above80bp

One or both primerends further than80bp.

Design primer closer totarget.

SNP in primer SNP position in theprimer binding site.

Design degenerateprimers for SNP.

All primer redesign recommendations were submitted to Qiagen and subjectedto their primer design pipeline. Primers were redesigned accordingly and a newversion of the MPSplex custom QIAseqTM Targeted DNA Panel was synthesisedand shipped by Qiagen.

2.4.3 Addition of ancestry informative SNPs

When the country of origin of an unidentified person is unknown, ante-mortemsample collection becomes challenging. Here, ancestry inference may be a valuableinvestigative tool. For this reason, 55 highly ancestry informative SNPs, publishedby Kidd et al., 2014, were submitted to Qiagen for inclusion in the assay redesign.

6

Page 13: MPSPlex - A large, massively parallel sequencing SNP panel

Panel performance after redesign 2. Materials and Methods

2.5 Panel performance after redesign

2.5.1 High quality and sheared DNA

To assess MPSplex’s performance after the redesign, high quality DNA (NA12878)was sequenced, sheared and unsheared, on a flow cell with the following adjustmentsto the manufacturers protocol:

• Fragmentation: 24 minutes

• All purification steps: 1.4 followed by 1.2 bead/sample ratio

• Target enrichment: 8 cycles at 68C for 10m each

Table 2.3: Sample setup for sequencing of high quality and sheared DNA afterredesign.

UPCR cycles

Sample name 21 cycles 22 cycles

NA12878 1ng, 10ng 1ng, 10ngNA12878 sheared to 150bp 10ng 10ngNA12878 sheared to 500bp 10ng 10ngNA12878 sheared to 1000bp 10ng 10ng

2.5.2 Bone samples

To demonstrate MPSplex’s fitness for the typing of degrades samples, 4 degradedbone samples with previous success in STR analysis were sequenced. The followingadjustments to the manufacturers protocol were made:

• Fragmentation: 24 minutes

• All purification steps: 1.4 followed by 1.2 bead/sample ratio

• Target enrichment: 6 cycles at 68C for 15m each

All samples were sequenced on three different flow cells, the first one to assessbone performance with different inputs, the second one to compare the highest boneinput to NA12878 control DNA and the last one to maximize success from the lowestbone input, see table 2.4. Only loci without previously identified performance issueswere considered for this analysis (N=1052).

7

Page 14: MPSPlex - A large, massively parallel sequencing SNP panel

Panel performance after redesign 2. Materials and Methods

Table 2.4: Sample setup for sequecing of degraded bone samples.

Samples pooled

Sample name 12 7 4

Bone 1 1ng, 500pg, 250pg 1ng 250pgBone 2 1ng, 500pg, 250pg 1ng 250pgBone 3 1ng, 500pg, 250pg 1ng 250pgBone 4 1ng, 500pg, 250pg 1ng 250pg

NA12878 1ng, 500pg, 250pg

8

Page 15: MPSPlex - A large, massively parallel sequencing SNP panel

3. Results

3.1 Panel performance before redesign

Over all SNPs and samples MPSplex achieved a concordance of 0.9644 (±0.1853)and had an average 10x coverage success rate of 0.9652 (±0.1832) for 40ng inputand 0.98 (±0.1401) for 5ng.

3.2 Analysis of poorly performing loci

3.2.1 Identification of poorly performing loci

In total 490 loci were identified as lower outliers for read efficiency, base qualityscore, allele balance or forward-reverse balance (see figure 3.1 for the distributions).The highest number of loci were identified with the read efficiency metric (180) whilebase quality scores indicated only very few outliers.

3.2.2 Primer redesign recommendations

For 414 loci the manual review lead to at least one (often multiple) recommenda-tion(s) for primer redesign, see table 3.1. 55 loci were deemed poor genomic targetsfor sequencing and removed from MPSplex.

Table 3.1: Summary of primer redesign suggestions. Loci may have multiple cat-egories, in total 414 loci were suggested for redesign.

Category Loci suggested Loci redesigned

Single primer 63 0

One directional 204 0

Primer distance above80bp

154 0

SNP in primer 158 (166 primers) 153 (161 primers)

161 SNP primers were successfully redesigned by Qiagen, this corresponds to 153unique loci with one or more primer(s). All of these correspond to instances where aSNP was observed in a primer-binding region. The sequence context did not permitfor additional, or closer primers in all other categories.

3.2.3 Addition of ancestry informative SNPs

A total of 107 primers corresponding to the 55 ancestry informative SNPs weredesigned by Qiagen.

9

Page 16: MPSPlex - A large, massively parallel sequencing SNP panel

Panel performance after redesign 3. Results

Figure 3.1: Boxplots of performance metrics used to identify poorly performing loci.

3.3 Panel performance after redesign

3.3.1 High quality and sheared DNA

Overall MPSplex achieved average 10x coverage success rates of 0.9748 (±0.1310)with 10ng input and 0.9217 (±0.2365) with 1ng input (see table 3.2 for details). Lociwith redesigned primers had an average 10x coverage success rate above 0.99 andwere highly concordant to known genotypes (0.9803). Most strikingly, only 61 of theloci with previous performance issues showed a 10x success rate of below 0.9, mostof them with primer distances above 80bp. If these are removed from the analysisthe 10x coverage success rises to 0.9969 (±0.0421) and 0.9620 (±0.1488) respectively.

All 55 ancestry informative SNPs reached 10x coverage in all samples. A PCAplot from these SNPs with a reference sample set spanning 5 major continentalpopulations (Bioinformatics Group of the Faculty of Mathematics, n.d.) is shown in

10

Page 17: MPSPlex - A large, massively parallel sequencing SNP panel

Panel performance after redesign 3. Results

figure 3.2.

Table 3.2: 10x coverage performance of NA12878 after primer redesign for 21 and22 UPCR cycles and 1 and 10ng input DNA.

10 ng 1 ng

21 cycles 22 cycles 21 cycles 22 cycles

Confirmed Questioned Confirmed Questioned Confirmed Questioned Confirmed Questioned

Above 10x coverage 4414 763 4415 752 1090 165 1058 136

Total 4428 872 4428 872 1107 218 1107 218

10x coverage percentage 99.68 87.50 99.71 86.24 98.46 75.69 95.57 62.39

3.3.2 Degraded bone samples

1049 and 646 SNPs were sequenced with at least 10x coverage in the bone samples.In comparison, 1052 and 1006 SNPs were sequenced from NA12878 DNA. A notabledrop in the number of SNPs reaching 10x coverage was only observed for bone 3and 4 at 250pg (figure 3.3).

The average coverage over all SNPs without previous performance problems(N=1052) falls with decreased input, see table 3.3, and is comparable betweenNA12878 and bone samples. All 4 bone samples reach average between replicateconcordances of over 99% while NA12878 reaches an average concordance to knowngenotypes of 99.70%.

Table 3.3: Average coverage of SNPs without previous performance problems(N=1052) for bones and NA12878.

1ng 500pg 250pg

Bone NA12878 Bone NA12878 Bone NA12878

79.05(±48.51) 77.52(±26.53) 41.31(±22.64) 66.49(±24.32) 29.69(±22.11) 28.62(±12.51)

11

Page 18: MPSPlex - A large, massively parallel sequencing SNP panel

Panel performance after redesign 3. Results

Figure 3.2: PCA plot of 55 ancestry informative SNPs included in MPSplex.

Figure 3.3: SNPs without previous performance problems (N=1052) covered at least10x coverage.

12

Page 19: MPSPlex - A large, massively parallel sequencing SNP panel

4. Discussion & Conclusion

Missing persons identification with current STR analysis faces two major issues:locus dropout due to DNA degradation and low statistical power, if only distancerelatives are available. The use of SNPs allows for very short amplicons, that are lesssusceptible to DNA degradation, but available forensic panels do not contain morethan 400 markers. Thus, these SNP panels do not solve the issue of distant kinshipmatching with degraded samples. ICMP’s MPSplex panel holds over 1200 SNPsand 46 short microhaplotypes, which will provide enough statistical power to fillthis gap and provide a mechanism to address cases without previous expectation ofresolution. To make MPSplex widely available however, SNPs with low performancehad to be identified and troubleshot. In addition, MPSplex’s performance had tobe demonstrated on degraded bone samples.

Due to the amount of data produced by massively parallel sequencing, the iden-tification of poorly performing loci had to be streamlined. All metrics used in thisstudy are regarded as important indicators for high quality results in both CE andMPS data and thus served as excellent flagging mechanisms. For the evaluationof overall performance 10x coverage, as well as concordance presented metrics thatwere most outcome relevant. 10x coverage in particular provides a good proxy forfuture, validated thresholds, as it is widely used in the sequencing literature. Theuse of this metric gains further credence, as traditionally UMIs are not availableand 10x coverage in the literature would include PCR copies. With MPSplex 10xcoverage applies to 10 observed unique molecules, making this a very robust metric.

The performance of the 153 loci with redesigned primers was excellent. Eventhough many other loci had no primers redesigned, results on high quality andsheared DNA indicate that only 63 loci consistently fail to meet coverage thresholdsand overall the results are highly concordant to expected genotypes. Exclusion ofthese loci would lead to a robust panel with 1205 SNPs and 46 microhaplotypes.MPSplex’s robustness in its application to degraded bone samples in particularwas demonstrated. The 10x coverage of bones compares extremely favourable toNA12878, and only at lower inputs do some bone samples have considerably fewerloci reaching the threshold. It is noteworthy however, that even for those samplesmore than 500 loci were well covered. And since this data would represent thestatistical power of multiple STR analyses, identification still seems feasible.

The addition of ancestry informative SNPs to the panel proved straight-forwardand provided extremely high sequencing success, even without additional optimisa-tion. While for now this set of ancestry informative markers provides a categorisationinto major continental populations, future additions could significantly expand this.Furthermore, the results suggest that the inclusion of well characterised markers isindeed without problems and this opens a path for further growth of MPSplex orcustomisation by future users.

13

Page 20: MPSPlex - A large, massively parallel sequencing SNP panel

4. Discussion & Conclusion

As MPSplex is in a development stage, its protocols are not fully optimised. Tomaximise recovery of loci, optimisation of protocol parameters such as DNA input,fragmentation time, bead to sample ratios, etc. should be considered. Crucially, theMPS approach may benefit from DNA extraction methods that specifically capturevery small fragments that are disregarded in standard workflows. Extraction meth-ods from the field of ancient DNA analysis, such as Rohland and Hofreiter, 2007,may provide an excellent starting point. Finally, MPSplex will have to be fullyvalidated, according to best forensic practices, in order to be applicable to routinecase work.

At present, the sequencing protocol of MPSplex requires extensive manual pipet-ting and sequencing is limited to 12 samples per flow cell. Full optimisation andvalidation, but also high-throughput casework, would consequently be very labourand time intensive. Automation on robotic platforms and an increase in the numberof sample adaptors should therefore be prioritised and steps have already been takenin this direction.

Despite the lack of full optimisation and validation, the results of this studydemonstrate that MPSplex is very robust when used on degraded bone samples.Crucially, it will provide a robust panel with over 1205 SNPs and 46 microhaplotypesfor the identification of missing persons via degraded bone samples, even if onlydistant relatives are available. Additionally it provides capabilities for ancestryinference and could easily be extended with well characterised markers.

14

Page 21: MPSPlex - A large, massively parallel sequencing SNP panel

Bibliography

Alaeddini, R., Walsh, S. J. & Abbas, A. (2010). Forensic implications of genetic ana-lyses from degraded DNA—a review. Forensic Science International: Genetics,4 (3), 148–157. doi:10.1016/j.fsigen.2009.09.007

Amory, S., Huel, R., Bilic, A., Loreille, O. & Parsons, T. J. (2012). Automatable fulldemineralization DNA extraction procedure from degraded skeletal remains.Forensic Science International: Genetics, 6 (3), 398–406. doi:10.1016/j.fsigen.2011.08.004

Bioinformatics Group of the Faculty of Mathematics, U. o. S. d. C. (n.d.). Forensicmps aims panel reference sets. Retrieved December 5, 2018, from http : / /mathgene.usc.es/snipper/forensic mps aims.html

Butler, J. M., Coble, M. D. & Vallone, P. M. (2007). STRs vs. SNPs: Thoughts onthe future of forensic DNA testing. Forensic Science, Medicine, and Pathology,3 (3), 200–205. doi:10.1007/s12024-007-0018-1

Butler, J. M., Shen, Y. & McCord, B. R. (2003). The development of reduced sizeSTR amplicons as tools for analysis of degraded DNA. Journal of ForensicSciences, 48 (5), 2003043. doi:10.1520/jfs2003043

Chung, D. T., Drabek, J., Opel, K. L., Butler, J. M. & McCord, B. R. (2004). A studyon the effects of degradation and template concentration on the amplificationefficiency of the STR miniplex primer sets. Journal of Forensic Sciences, 49 (4),1–8. doi:10.1520/jfs2003269

Coble, M. D. & Butler, J. M. (2005). Characterization of new MiniSTR loci to aidanalysis of degraded DNA. Journal of Forensic Sciences, 50 (1), 1–11. doi:10.1520/jfs2004216

de la Puente, M., Phillips, C., Santos, C., Fondevila, M., Carracedo, A. & Lareu, M.(2017). Evaluation of the qiagen 140-SNP forensic identification multiplex formassively parallel sequencing. Forensic Science International: Genetics, 28,35–43. doi:10.1016/j.fsigen.2017.01.012

Drabek, J., Chung, D. T., Butler, J. M. & McCord, B. R. (2004). Concordancestudy between miniplex assays and a commercial STR typing kit. Journal ofForensic Sciences, 49 (4), 1–2. doi:10.1520/jfs2004032

Grandell, I., Samara, R. & Tillmar, A. O. (2016). A SNP panel for identity and kin-ship testing using massive parallel sequencing. International Journal of LegalMedicine, 130 (4), 905–914. doi:10.1007/s00414-016-1341-4

Hill, C. R., Kline, M. C., Coble, M. D. & Butler, J. M. (2007). Characterization of26 MiniSTR loci for improved analysis of degraded DNA samples. Journal ofForensic Sciences, 53 (1), 73–80. doi:10.1111/j.1556-4029.2008.00595.x

Hughes-Stamm, S. R., Ashton, K. J. & van Daal, A. (2010). Assessment of DNAdegradation and the genotyping success of highly degraded samples. Interna-

15

Page 22: MPSPlex - A large, massively parallel sequencing SNP panel

BIBLIOGRAPHY BIBLIOGRAPHY

tional Journal of Legal Medicine, 125 (3), 341–348. doi:10.1007/s00414-010-0455-3

Jakubowska, J., Maciejewska, A. & Paw lowski, R. (2011). Comparison of three meth-ods of DNA extraction from human bones with different degrees of degrada-tion. International Journal of Legal Medicine, 126 (1), 173–178. doi:10.1007/s00414-011-0590-5

Kidd, K. K., Speed, W. C., Pakstis, A. J., Furtado, M. R., Fang, R., Madbouly, A.,. . . Kidd, J. R. (2014). Progress toward an efficient panel of SNPs for ancestryinference. Forensic Science International: Genetics, 10, 23–32. doi:10.1016/j.fsigen.2014.01.002

Kivioja, T., Vaharautio, A., Karlsson, K., Bonke, M., Enge, M., Linnarsson, S.& Taipale, J. (2011). Counting absolute numbers of molecules using uniquemolecular identifiers. Nature Methods, 9 (1), 72–74. doi:10.1038/nmeth.1778

Metzker, M. L. (2009). Sequencing technologies — the next generation. Nature Re-views Genetics, 11 (1), 31–46. doi:10.1038/nrg2626

Mo, S.-K., Ren, Z.-L., Yang, Y.-R., Liu, Y.-C., Zhang, J.-J., Wu, H.-J., . . . Ni, M.(2018). A 472-SNP panel for pairwise kinship testing of second-degree relatives.Forensic Science International: Genetics, 34, 178–185. doi:10.1016/j.fsigen.2018.02.019

Mulero, J. J., Chang, C. W., Lagace, R. E., Wang, D. Y., Bas, J. L., McMahon, T. P.& Hennessy, L. K. (2008). Development and validation of the AmpFℓSTR R©

MiniFilerTM PCR amplification kit: A MiniSTR multiplex for the analysis ofdegraded and/or PCR inhibited DNA∗. Journal of Forensic Sciences, 53 (4),838–852. doi:10.1111/j.1556-4029.2008.00760.x

Musgrave-Brown, E., Ballard, D., Balogh, K., Bender, K., Berger, B., Bogus, M.,. . . Court, D. S. (2007). Forensic validation of the SNPforID 52-plex assay.Forensic Science International: Genetics, 1 (2), 186–190. doi:10.1016/j.fsigen.2007.01.004

National Institute of Standards & Technology. (2015, November 24). Reference ma-terial 8398, human dna for whole-genome variant assessment (daughter ofutah/european ancestry) (Report of Investigation No. 8398). National Insti-tute of Standards & Technology. Gaithersburg, Maryland, United States ofAmerica.

National Institute of Standards & Technology. (2016, September 8). Reference ma-terial 8392, human dna for whole-genome variant assessment (family trio ofeastern european ashkenazim jewish ancestry) (Report of Investigation No. 8392).National Institute of Standards & Technology. Gaithersburg, Maryland, UnitedStates of America.

Parsons, T. J., Huel, R., Davoren, J., Katzmarzyk, C., Milos, A., Selmanovic, A.,. . . Rizvic, A. (2007). Application of novel “mini-amplicon” STR multiplexesto high volume casework on degraded skeletal remains. Forensic Science In-ternational: Genetics, 1 (2), 175–179. doi:10.1016/j.fsigen.2007.02.003

QIAGEN. (n.d.). QIAseq Targeted DNA panels - Product Details. Retrieved Octo-ber 30, 2018, from https://www.qiagen.com/us/shop/sample-technologies/dna/genomic-dna/qiaseq-targeted-dna-panels/#productdetails

QIAGEN. (December, 2016). Qiagen generead advanced sequencing q kit handbook.Version REF 185231. QIAGEN.

16

Page 23: MPSPlex - A large, massively parallel sequencing SNP panel

BIBLIOGRAPHY BIBLIOGRAPHY

QIAGEN. (January, 2017). Qiagen generead clonal amp q handbook. Version REF185001. QIAGEN.

QIAGEN. (October, 2017). Qiaseq targeted dna panel handbook. QIAGEN.Rohland, N. & Hofreiter, M. (2007). Ancient DNA extraction from bones and teeth.

Nature Protocols, 2 (7), 1756–1762. doi:10.1038/nprot.2007.247

17