jennifer d. churchill , jonathan l. king bruce budowle...primer pool 1 primer pool 2 this project...

1
Primer Pool 1 Primer Pool 2 This project was supported in part by Thermo Fisher Scientific. 1 Institute of Applied Genetics, Department of Molecular and Medical Genetics, University of North Texas Health Science Center, 3500 Camp Bowie Blvd., Fort Worth, TX 76107, USA 2 Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, Saudi Arabia Contact Jennifer D. Churchill for information regarding the content of this poster at: [email protected] Traditionally, sequencing of the mitochondrial genome has been limited mostly to hypervariable regions (HVI and HVII) of the control region due to the high density of sequence variants and limitations with Sanger-type sequencing (STS) methodology. Massively parallel sequencing (MPS) offers an alternative to STS, and the Ion PGM™ and Ion S5™ Systems (Thermo Fisher Scientific) are promising MPS platforms for forensic analyses. Sequence data of the entire mitochondrial genome can increase discrimination power, and the increased resolution and quantitative nature afforded by MPS technologies allow for detection of heteroplasmy levels at each nucleotide position. These attributes provide avenues for potential mixture interpretation. Overall, results indicated robust and accurate data can be generated. A sensitivity of detection comparable to current CE technologies was illustrated. Successful analysis of casework-type samples, including challenged and mixture samples, was demonstrated. Lastly, when used with the Ion Chef™ System, an efficient workflow now is available which makes the process worthy of consideration for forensic casework. These results support the potential for incorporating whole mitochondrial genome analysis by MPS into forensic laboratories for routine mitochondrial DNA analyses and potentially mixture interpretation. Future work will involve full validation studies and expanding on current population studies. These studies are important for the development of databases with whole mitochondrial genome data, evaluations of amplicon success, determining limitations of the system, and developing appropriate interpretation guidelines. Jennifer D. Churchill 1 , Jonathan L. King 1 , Bruce Budowle 1,2 Mitochondrial genome sequence data were generated for 108 reference samples, and several informative performance metrics were used to evaluate the quality and reliability of the data produced. Haplotype calls for overlapping samples were concordant with mitochondrial genome data previously generated by long PCR on the PGM (Figure 1) 8 and on the MiSeq 1 . Nominal amplicon dropout or low-performing amplicons were seen in 14 samples. Ninety-nine percent of the base-pairs sequenced had a read depth of 5X or greater. Ninety-six percent of the base-pairs sequenced had a read depth of 50X or greater. Average coverage across the mitochondrial genome ranged from 259X to 8,579X (Figure 2A). Reads were generated from both strands of the DNA (Figure 2B). Average noise across the mitochondrial genome ranged from 0.002% to 9.03% (Figure 2C). A dilution series from one ng to one pg of input genomic DNA illustrated the sensitivity of detection for this multiplex (Figure 4). Haplotype concordance across the dilution series was assessed. A variant was considered “dropped out” if coverage at that position was less than 10X. Results showed no incorrect SNP calls at positions with coverage ≥10X, and the overall success rate was exceedingly high. The results for the three samples are summarized as follows: Sample 1 - Haplotypes were concordant across the entire dilution series. No missing variants Sample 2 - Haplotypes were concordant from 1 ng to 4 pg. Three variants dropped out in the 2 pg sample 16183C, 16189C, 8473C Four variants dropped out in the 1 pg sample 16183C, 16189C, 8473C, 14766T All variants were present, but at a read depth <10X Sample 3 - Haplotypes were concordant from 1 ng to 125 pg. One variant (the same variant) dropped out from the 62.5 pg through the 1 pg samples 263G was present across the entire dilution series, but at a read depth <10X The amplicon that covers this region tended to be a low performing amplicon in the 31 samples typed The mitochondrial genomes of 108 reference samples, including African, Hispanic, Caucasian, and Asian individuals, were amplified with the Precision ID mtDNA Whole Genome panel and sequenced on the Ion PGM™ and Ion S5™ Systems. Serial dilutions of input DNA amounts ranging from one ng to one pg for three control samples were analyzed. Challenged samples, including bones, aged buccal swabs, and hair shafts, and mixture samples were sequenced to evaluate the Precision ID mtDNA Whole Genome System’s success with casework-type samples. For manual library preparation 10 μl of PCR product from each of the two multiplexes were combined for each sample and used for library preparation. Manual library preparation was performed according to manufacturer’s protocols 3 . The Ion Chef™ and manufacturer’s protocols were used to perform emulsion PCR, enrichment, and chip loading. Sequencing was performed with the Ion PGM™ Hi-Q™ Sequencing Kit as described in Churchill et al. 2015 4 . Raw data were aligned with Ion Torrent Suite Software, and variant calls were made with the variantCaller plugin. Data were analyzed further with IGV 5,6 and mitoSAVE 7 . A minimum coverage threshold of 10X and point heteroplasmy threshold of 0.20 was set for mitochondrial DNA variant calls. Performance metrics such as concordance, amplicon success, coverage, strand balance, and noise were analyzed to evaluate the quality and reliability of the data produced. Additional mitochondrial genome libraries were prepped with the Ion Chef™ and sequenced on the Ion S5™ to evaluate quality and efficiency of the Precision ID mtDNA Whole Genome MPS workflow. Figure 1. BAM files of sequence data across the entire mitochondrial genome generated with the short-amplicon mitochondrial genome panel (top) and long PCR (bottom) aligned in IGV. A) B) Figure 4. Results from a portion of the serial dilution analyses on three samples with input DNA ranging from one ng to one pg. The average coverage for these three samples across the entire mitochondrial genome for one ng of input DNA ranged from 286X to 8,574X (A) and for one pg of input DNA ranged from 32X to 683X (B). References: (1) J.L. King, B.L. LaRue, N.M. Novroski, M. Stoljarova, S.B. Seo, X. Zeng, et al. High-quality and high-throughput massively parallel sequencing of the human mitochondrial genome using the Illumina MiSeq. Forensic Sci Int Genet. 12(2014) 128-35. (2) M. Krzywinski, J.E. Schein, I. Birol, J. Connors, R. Gascoyne, D. Horsman, et al. Circos: an information aesthetic for comparative genomics. Genome Res. 19(2009) 1639-45. (3) Thermo Fisher Scientific. HID-Ion AmpliSeq™ Mitochondrial Library Preparation. February 2015. (4) J.D. Churchill, J. Chang, J, Ge, N. Rajagopalan, S.C. Wootton, C.W. Chang, R. Lagace, W. Liao, et al., Blind Study Evaluation Illustrates Utility of the Ion PGM™ System for Use in Human Identity DNA Typing, Croat Med J. 56(2015) 218-29. (5) H. Thorvaldsdóttir, J.T. Robinson, J.P. Mesirov. Integrative Genomics Viewer (IGV): high- performance genomics data visualization and exploration. Brief Bioinform. 14(2013) 178-92. (6) J.T. Robinson, H. Thorvaldsdóttir, W. Winckler, M. Guttman, E.S. Lander, G. Getz, J.P. Mesirov. Integrative genomics viewer. Nat Biotechnol. 29(2011) 24–26. (7) J.L. King, A. Sajantila, B. Budowle. mitoSAVE: Mitochondrial sequence analysis of variants in Excel. Forensic Sci Int Genet. 12(2014) 122-5. (8) J.D. Churchill, J.L. King, R. Chakraborty, B. Budowle. Effects of the Ion PGM™ Hi-Q™ sequencing chemistry on sequence data quality. Int J Legal Med. A large multiplex, short-amplicon system was developed for sequencing the entire mitochondrial genome on the Ion PGM™ and Ion S5™ MPS platforms. The Applied Biosystems™ Precision ID mtDNA Whole Genome Panel (Thermo Fisher Scientific) is comprised of two multiplexes each containing 81 primer pairs that generate amplicons that are ≤ 175 bps in length. Figure 1. Circos plot illustrating mitochondrial variant data from 283 previously sequenced mitochondrial genomes. Sequence and data analysis of these mitochondrial genomes are described in King et al. 2014. 1 Figure 2. Circos plot 2 illustrating the tiled, overlapping pattern of the two multiplexes that make up the Precision ID mtDNA Whole Genome Panel. Figure 2. Performance metrics evaluated for the 108 reference samples sequenced with the Precision ID mtDNA Whole Genome System included mean coverage (A), mean positive and negative strand balance (B), and the mean percentage of total number of reads attributed to noise (C) across the entire mitochondrial genome. A) C) Figure 3. Circos plot 2 providing a simultaneous illustration of primer location for both multiplexes in the Precision ID mtDNA Whole Genome Panel, coverage, strand balance, noise, and sequence variants. Successful analysis of both challenged samples (including bones, aged buccal swabs, and hair shafts; Figures 5 and 6) and mixture samples demonstrated the potential that this panel offers for analysis of casework-type samples. For the mixtures, the major contributor’s haplotype was successfully identified with nuclear DNA ratios of 1:1, 1:5, and 1:10 (minor contributor:major contributor). Figure 5. BAM files of sequence data across the entire mitochondrial genome generated with the Precision ID mtDNA Whole Genome System (top) and coverage results (bottom) for a bone sample. Total amount of input DNA ranged from 1 ng to 191.5 pg as multiple extractions were carried out. Average coverage for this bone sample ranged from 339X to 8,252X. Figure 6. BAM files of sequence data across the entire mitochondrial genome generated with the Precision ID mtDNA Whole Genome System (top) and coverage results (bottom) for a hair shaft sample. Total amount of input DNA for this sample was 65 pg. Coverage for this hair shaft sample ranged from 23X to 2,634X. B) Figure 7. Examples of sequence variants seen in one mixture sample in IGV. In this mixture sample, the ratio of sequencing reads for each component of the mixture is comparable at each sequence variant (panel 1, 2, and 3). Thus, the three examples illustrate how the two components of the mixture can be quantitatively separated in order to determine haplotypes for each individual in the mixture.

Upload: others

Post on 03-Aug-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Jennifer D. Churchill , Jonathan L. King Bruce Budowle...Primer Pool 1 Primer Pool 2 This project was supported in part by Thermo Fisher Scientific. 1Institute of Applied Genetics,

Primer Pool 1 Primer Pool 2

This project was supported in part by Thermo Fisher Scientific.

1Institute of Applied Genetics, Department of Molecular and Medical Genetics, University of North Texas Health Science Center, 3500 Camp Bowie Blvd., Fort Worth, TX 76107, USA 2Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, Saudi Arabia

Contact Jennifer D. Churchill for information regarding the content of this poster at: [email protected]

Traditionally, sequencing of the mitochondrial genome has been limited mostly to hypervariable regions (HVI and HVII) of the control region due to the high density of sequence variants and limitations with Sanger-type sequencing (STS) methodology. Massively parallel sequencing (MPS) offers an alternative to STS, and the Ion PGM™ and Ion S5™ Systems (Thermo Fisher Scientific) are promising MPS platforms for forensic analyses. Sequence data of the entire mitochondrial genome can increase discrimination power, and the increased resolution and quantitative nature afforded by MPS technologies allow for detection of heteroplasmy levels at each nucleotide position. These attributes provide avenues for potential mixture interpretation.

Overall, results indicated robust and accurate data can be generated. A sensitivity of detection comparable to current CE technologies was illustrated. Successful analysis of casework-type samples, including challenged and mixture samples, was demonstrated. Lastly, when used with the Ion Chef™ System, an efficient workflow now is available which makes the process worthy of consideration for forensic casework. These results support the potential for incorporating whole mitochondrial genome analysis by MPS into forensic laboratories for routine mitochondrial DNA analyses and potentially mixture interpretation. Future work will involve full validation studies and expanding on current population studies. These studies are important for the development of databases with whole mitochondrial genome data, evaluations of amplicon success, determining limitations of the system, and developing appropriate interpretation guidelines.

Jennifer D. Churchill1, Jonathan L. King1, Bruce Budowle1,2

Mitochondrial genome sequence data were generated for 108 reference samples, and several informative performance metrics were used to evaluate the quality and reliability of the data produced. Haplotype calls for overlapping samples were concordant with mitochondrial genome data previously generated by long PCR on the PGM (Figure 1)8 and on the MiSeq1. Nominal amplicon dropout or low-performing amplicons were seen in 14 samples. Ninety-nine percent of the base-pairs sequenced had a read depth of 5X or greater. Ninety-six percent of the base-pairs sequenced had a read depth of 50X or greater. Average coverage across the mitochondrial genome ranged from 259X to 8,579X (Figure 2A). Reads were generated from both strands of the DNA (Figure 2B). Average noise across the mitochondrial genome ranged from 0.002% to 9.03% (Figure 2C).

A dilution series from one ng to one pg of input genomic DNA illustrated the sensitivity of detection for this multiplex (Figure 4). Haplotype concordance across the dilution series was assessed. A variant was considered “dropped out” if coverage at that position was less than 10X. Results showed no incorrect SNP calls at positions with coverage ≥10X, and the overall success rate was exceedingly high. The results for the three samples are summarized as follows: • Sample 1 - Haplotypes were concordant across the entire dilution series.

• No missing variants • Sample 2 - Haplotypes were concordant from 1 ng to 4 pg.

• Three variants dropped out in the 2 pg sample • 16183C, 16189C, 8473C

• Four variants dropped out in the 1 pg sample •16183C, 16189C, 8473C, 14766T

• All variants were present, but at a read depth <10X • Sample 3 - Haplotypes were concordant from 1 ng to 125 pg.

• One variant (the same variant) dropped out from the 62.5 pg through the 1 pg samples • 263G was present across the entire dilution series, but at a read depth <10X • The amplicon that covers this region tended to be a low performing amplicon in the 31 samples typed

The mitochondrial genomes of 108 reference samples, including African, Hispanic, Caucasian, and Asian individuals, were amplified with the Precision ID mtDNA Whole Genome panel and sequenced on the Ion PGM™ and Ion S5™ Systems. Serial dilutions of input DNA amounts ranging from one ng to one pg for three control samples were analyzed. Challenged samples, including bones, aged buccal swabs, and hair shafts, and mixture samples were sequenced to evaluate the Precision ID mtDNA Whole Genome System’s success with casework-type samples. • For manual library preparation 10 μl of PCR product from each of the two multiplexes

were combined for each sample and used for library preparation. Manual library preparation was performed according to manufacturer’s protocols3.

• The Ion Chef™ and manufacturer’s protocols were used to perform emulsion PCR, enrichment, and chip loading.

• Sequencing was performed with the Ion PGM™ Hi-Q™ Sequencing Kit as described in Churchill et al. 20154.

• Raw data were aligned with Ion Torrent Suite Software, and variant calls were made with the variantCaller plugin. Data were analyzed further with IGV5,6 and mitoSAVE7. A minimum coverage threshold of 10X and point heteroplasmy threshold of 0.20 was set for mitochondrial DNA variant calls.

• Performance metrics such as concordance, amplicon success, coverage, strand balance, and noise were analyzed to evaluate the quality and reliability of the data produced.

• Additional mitochondrial genome libraries were prepped with the Ion Chef™ and sequenced on the Ion S5™ to evaluate quality and efficiency of the Precision ID mtDNA Whole Genome MPS workflow.

Figure 1. BAM files of sequence data across the entire mitochondrial genome generated with the short-amplicon mitochondrial genome panel (top) and long PCR (bottom) aligned in IGV.

A) B)

Figure 4. Results from a portion of the serial dilution analyses on three samples with input DNA ranging from one ng to one pg. The average coverage for these three samples across the entire mitochondrial genome for one ng of input DNA ranged from 286X to 8,574X (A) and for one pg of input DNA ranged from 32X to 683X (B).

References: (1) J.L. King, B.L. LaRue, N.M. Novroski, M. Stoljarova, S.B. Seo, X. Zeng, et al. High-quality

and high-throughput massively parallel sequencing of the human mitochondrial genome using the Illumina MiSeq. Forensic Sci Int Genet. 12(2014) 128-35.

(2) M. Krzywinski, J.E. Schein, I. Birol, J. Connors, R. Gascoyne, D. Horsman, et al. Circos: an information aesthetic for comparative genomics. Genome Res. 19(2009) 1639-45.

(3) Thermo Fisher Scientific. HID-Ion AmpliSeq™ Mitochondrial Library Preparation. February 2015.

(4) J.D. Churchill, J. Chang, J, Ge, N. Rajagopalan, S.C. Wootton, C.W. Chang, R. Lagace, W. Liao, et al., Blind Study Evaluation Illustrates Utility of the Ion PGM™ System for Use in Human Identity DNA Typing, Croat Med J. 56(2015) 218-29.

(5) H. Thorvaldsdóttir, J.T. Robinson, J.P. Mesirov. Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Brief Bioinform. 14(2013) 178-92.

(6) J.T. Robinson, H. Thorvaldsdóttir, W. Winckler, M. Guttman, E.S. Lander, G. Getz, J.P. Mesirov. Integrative genomics viewer. Nat Biotechnol. 29(2011) 24–26.

(7) J.L. King, A. Sajantila, B. Budowle. mitoSAVE: Mitochondrial sequence analysis of variants in Excel. Forensic Sci Int Genet. 12(2014) 122-5.

(8) J.D. Churchill, J.L. King, R. Chakraborty, B. Budowle. Effects of the Ion PGM™ Hi-Q™ sequencing chemistry on sequence data quality. Int J Legal Med.

A large multiplex, short-amplicon system was developed for sequencing the entire mitochondrial genome on the Ion PGM™ and Ion S5™ MPS platforms. The Applied Biosystems™ Precision ID mtDNA Whole Genome Panel (Thermo Fisher Scientific) is comprised of two multiplexes each containing 81 primer pairs that generate amplicons that are ≤ 175 bps in length.

Figure 1. Circos plot illustrating mitochondrial variant data from 283 previously sequenced mitochondrial genomes. Sequence and data analysis of these mitochondrial genomes are described in King et al. 2014.1

Figure 2. Circos plot2 illustrating the tiled, overlapping pattern of the two multiplexes that make up the Precision ID mtDNA Whole Genome Panel.

Figure 2. Performance metrics evaluated for the 108 reference samples sequenced with the Precision ID mtDNA Whole Genome System included mean coverage (A), mean positive and negative strand balance (B), and the mean percentage of total number of reads attributed to noise (C) across the entire mitochondrial genome.

A)

C)

Figure 3. Circos plot2 providing a simultaneous illustration of primer location for both multiplexes in the Precision ID mtDNA Whole Genome Panel, coverage, strand balance, noise, and sequence variants.

Successful analysis of both challenged samples (including bones, aged buccal swabs, and hair shafts; Figures 5 and 6) and mixture samples demonstrated the potential that this panel offers for analysis of casework-type samples. For the mixtures, the major contributor’s haplotype was successfully identified with nuclear DNA ratios of 1:1, 1:5, and 1:10 (minor contributor:major contributor).

Figure 5. BAM files of sequence data across the entire mitochondrial genome generated with the Precision ID mtDNA Whole Genome System (top) and coverage results (bottom) for a bone sample. Total amount of input DNA ranged from 1 ng to 191.5 pg as multiple extractions were carried out. Average coverage for this bone sample ranged from 339X to 8,252X.

Figure 6. BAM files of sequence data across the entire mitochondrial genome generated with the Precision ID mtDNA Whole Genome System (top) and coverage results (bottom) for a hair shaft sample. Total amount of input DNA for this sample was 65 pg. Coverage for this hair shaft sample ranged from 23X to 2,634X.

B)

Figure 7. Examples of sequence variants seen in one mixture sample in IGV. In this mixture sample, the ratio of sequencing reads for each component of the mixture is comparable at each sequence variant (panel 1, 2, and 3). Thus, the three examples illustrate how the two components of the mixture can be quantitatively separated in order to determine haplotypes for each individual in the mixture.