improved accuracy of ultra-low frequency variant detection using...
TRANSCRIPT
Improved accuracy of ultra-low frequency variant detection using a
novel library tagging strategy Jiashi Wang*, Kevin Lai, Madelyn Light, Kristina Giorda, Mirna Jarosz, Yun Bao, and Caifu Chen
Integrated DNA Technologies, Redwood City, CA
* Corresponding author: [email protected]
Introduction Accurate variant detection at 0.1% allele frequency
Simple workflow and analysis overview
Figure 1. Novel double-stranded molecular tagging strategy enables use
of genetic information embedded in both DNA strands. xGen Duplex Seq
Adapters incorporate degenerate bases to pair top and bottom strands during
analysis.
• NGS technologies and throughput allow analyses of low-input clinical
samples, which are rapidly changing and shaping the way future cancer
care will be carried out
• Detection of ultra-low frequency (<1%) variants is confounded by errors
introduced during NGS sample preparation, library target enrichment, and
sequencing
• A unique library-tagging adapter strategy that offers significantly higher
library conversion than previous molecular labeling approaches has been
developed based on IDT xGen® Duplex Seq Adapters—Tech Access
Figure 3. Duplexed molecular tagging and consensus analysis enable
error correction. Diagram of analysis methods used to evaluate duplex
adapters with true positive (TP) variants shown in green. Reads which map
to the same location and share the same unique molecular barcode (UMI)
are used to build single-strand consensus (Min3–minimum of 3 reads) or
duplex consensus reads, when both the top and bottom strand are observed.
Figure 2. Hybridization capture–based targeted sequencing workflow.
Conclusions
xGen Duplex Seq Adapters (1) are compatible with common library preparation kits and many sample types, including FFPE and
cfDNA; (2) are easily incorporated into hybridization-based target enrichment workflows; (3) enable exceptional error correction
strategies, reducing the number of false positive calls; and (4) can accurately detect rare variants as low as 0.1%.
Figure 4. Low-frequency
variant model. Two samples,
Genome In A Bottle (GIAB)
genomic DNA—NA12878
and NA24385, were mixed.
All libraries were enriched
with a custom xGen
Lockdown® Panel (IDT)
targeting a 75 kb region of
highly polymorphic SNPs.
Accuracy of variant calling
was assessed over a 35 kb GIAB high-confidence region.
Figure 5. Accurate low-frequency variant detection. 100 ng of cell-line DNA (0.2% mixture) was acoustically sheared to 300 bp for
library preparation with the KAPA Hyper Prep Kit (Kapa Biosystems) and xGen Duplex Seq Adapters. (A) Raw or duplicate-aware
coverages are shown. (B) Sensitivity is correlated with coverage measured with each deduplication method, while using a variant-
calling threshold of 0. The positive predictive value (PPV) was largely dictated by the degree of molecular tagging and read consensus
reconstruction for low-frequency variant detection. (C) Rates of base substitution, i.e. pair-specific error rate, was measured by Picard
suite.
A
CB
A B
Figure 6. Improved coverage and variant calling for cell-free DNA samples. Commercially acquired cell-free DNA (cfDNA) samples
that were individually genotyped across the target region were mixed to model low-frequency variants with minimum alternative allele frequencies of 0.1%. 25 ng of cfDNA mixture was used for library preparation. The raw sequencing depth was ~80,000X.