a quick guide to the analytics behind genomic testing · understanding bioinformatics requires ......
TRANSCRIPT
![Page 1: A Quick Guide to the Analytics Behind Genomic Testing · Understanding bioinformatics requires ... Catalogue various types of bioinformatics analyses that support clinical genomic](https://reader036.vdocuments.mx/reader036/viewer/2022062600/5b5bff3f7f8b9ad21d8b4ca6/html5/thumbnails/1.jpg)
A Quick Guide to the
Analytics Behind Genomic Testing
1
Elaine Gee, PhD Director, Bioinformatics ARUP Laboratories
![Page 2: A Quick Guide to the Analytics Behind Genomic Testing · Understanding bioinformatics requires ... Catalogue various types of bioinformatics analyses that support clinical genomic](https://reader036.vdocuments.mx/reader036/viewer/2022062600/5b5bff3f7f8b9ad21d8b4ca6/html5/thumbnails/2.jpg)
Learning Objectives
Catalogue various types of bioinformatics analyses that support clinical genomic testing
Enumerate types of variant classes
Describe algorithmic methods for variant detection by NGS
Compare and contrast germline and somatic clinical bioinformatics pipeline methodologies
Discuss the infrastructure complexity required to support analytics for NGS testing at scale in the cloud
Explain validation strategies for bringing best-in-class pipelines into clinical production
![Page 3: A Quick Guide to the Analytics Behind Genomic Testing · Understanding bioinformatics requires ... Catalogue various types of bioinformatics analyses that support clinical genomic](https://reader036.vdocuments.mx/reader036/viewer/2022062600/5b5bff3f7f8b9ad21d8b4ca6/html5/thumbnails/3.jpg)
The Human Reference Genome
~3B base pairs structured into
23 chromosome pairs
3,098,825,702 base pairs
20,805 coding genes
14,181 pseudogenes
196,501 gene transcripts
![Page 4: A Quick Guide to the Analytics Behind Genomic Testing · Understanding bioinformatics requires ... Catalogue various types of bioinformatics analyses that support clinical genomic](https://reader036.vdocuments.mx/reader036/viewer/2022062600/5b5bff3f7f8b9ad21d8b4ca6/html5/thumbnails/4.jpg)
![Page 5: A Quick Guide to the Analytics Behind Genomic Testing · Understanding bioinformatics requires ... Catalogue various types of bioinformatics analyses that support clinical genomic](https://reader036.vdocuments.mx/reader036/viewer/2022062600/5b5bff3f7f8b9ad21d8b4ca6/html5/thumbnails/5.jpg)
Why Genomic Testing?
KRAS G12D
1 in 4 cancer deaths are from lung cancer.
~222,500 new cases of lung cancer in the U.S. in 2017.
![Page 6: A Quick Guide to the Analytics Behind Genomic Testing · Understanding bioinformatics requires ... Catalogue various types of bioinformatics analyses that support clinical genomic](https://reader036.vdocuments.mx/reader036/viewer/2022062600/5b5bff3f7f8b9ad21d8b4ca6/html5/thumbnails/6.jpg)
Short-Read Sequencers Illumina Ion-Torrent
Long-Read Sequencers PacBio NanoPore 10X Nanostring
Genomic Testing
![Page 7: A Quick Guide to the Analytics Behind Genomic Testing · Understanding bioinformatics requires ... Catalogue various types of bioinformatics analyses that support clinical genomic](https://reader036.vdocuments.mx/reader036/viewer/2022062600/5b5bff3f7f8b9ad21d8b4ca6/html5/thumbnails/7.jpg)
7
Types of NGS Testing—Somatic & Germline
![Page 8: A Quick Guide to the Analytics Behind Genomic Testing · Understanding bioinformatics requires ... Catalogue various types of bioinformatics analyses that support clinical genomic](https://reader036.vdocuments.mx/reader036/viewer/2022062600/5b5bff3f7f8b9ad21d8b4ca6/html5/thumbnails/8.jpg)
Types of NGS Testing—cfDNA and ctDNA
Non-Invasive Prenatal Testing (NIPT) Liquid Biopsy
Trisomy 21 (Down Syndrome) Non-small cell lung cancer
EGFR
![Page 9: A Quick Guide to the Analytics Behind Genomic Testing · Understanding bioinformatics requires ... Catalogue various types of bioinformatics analyses that support clinical genomic](https://reader036.vdocuments.mx/reader036/viewer/2022062600/5b5bff3f7f8b9ad21d8b4ca6/html5/thumbnails/9.jpg)
Types of NGS Testing—Infectious Disease
![Page 10: A Quick Guide to the Analytics Behind Genomic Testing · Understanding bioinformatics requires ... Catalogue various types of bioinformatics analyses that support clinical genomic](https://reader036.vdocuments.mx/reader036/viewer/2022062600/5b5bff3f7f8b9ad21d8b4ca6/html5/thumbnails/10.jpg)
Types of NGS Testing—RNA-Seq
• Alternate transcripts
• Novel gene isoforms
• Gene fusions
![Page 11: A Quick Guide to the Analytics Behind Genomic Testing · Understanding bioinformatics requires ... Catalogue various types of bioinformatics analyses that support clinical genomic](https://reader036.vdocuments.mx/reader036/viewer/2022062600/5b5bff3f7f8b9ad21d8b4ca6/html5/thumbnails/11.jpg)
Role of Clinical Bioinformatics
Build pipelines Provide supplemental information for clinical interpretation and quality control
Other computationally heavy analytics are involved in evaluating:
Design of new panels
Identification of genetic patterns in patient cohorts
Discovery of gene pathways
![Page 12: A Quick Guide to the Analytics Behind Genomic Testing · Understanding bioinformatics requires ... Catalogue various types of bioinformatics analyses that support clinical genomic](https://reader036.vdocuments.mx/reader036/viewer/2022062600/5b5bff3f7f8b9ad21d8b4ca6/html5/thumbnails/12.jpg)
Understanding bioinformatics requires understanding the laboratory process.
![Page 13: A Quick Guide to the Analytics Behind Genomic Testing · Understanding bioinformatics requires ... Catalogue various types of bioinformatics analyses that support clinical genomic](https://reader036.vdocuments.mx/reader036/viewer/2022062600/5b5bff3f7f8b9ad21d8b4ca6/html5/thumbnails/13.jpg)
Steps in a bioinformatics pipeline:
1. Sample demultiplexing
2. Read alignment
3. BAM polishing steps
4. Variant calling
5. Variant annotations
6. QC calculations
Variant Calling Pipeline
![Page 14: A Quick Guide to the Analytics Behind Genomic Testing · Understanding bioinformatics requires ... Catalogue various types of bioinformatics analyses that support clinical genomic](https://reader036.vdocuments.mx/reader036/viewer/2022062600/5b5bff3f7f8b9ad21d8b4ca6/html5/thumbnails/14.jpg)
Step 1: Sample Demultiplexing
![Page 15: A Quick Guide to the Analytics Behind Genomic Testing · Understanding bioinformatics requires ... Catalogue various types of bioinformatics analyses that support clinical genomic](https://reader036.vdocuments.mx/reader036/viewer/2022062600/5b5bff3f7f8b9ad21d8b4ca6/html5/thumbnails/15.jpg)
Step 2: Read Alignment
Read Alignment
SAM Format
![Page 16: A Quick Guide to the Analytics Behind Genomic Testing · Understanding bioinformatics requires ... Catalogue various types of bioinformatics analyses that support clinical genomic](https://reader036.vdocuments.mx/reader036/viewer/2022062600/5b5bff3f7f8b9ad21d8b4ca6/html5/thumbnails/16.jpg)
A T C C T G
A T C C C T G
A T C C T G
A T C C T G
A T C C T G
A T C C T G
PCR Duplicate Removal Base Quality Score Recalibration
Step 3: BAM Polishing Steps
homopolymer
+1%
Q30 Phred base quality score → 99.9% → 1/1000
reference
read
C
C
C
C
C
![Page 17: A Quick Guide to the Analytics Behind Genomic Testing · Understanding bioinformatics requires ... Catalogue various types of bioinformatics analyses that support clinical genomic](https://reader036.vdocuments.mx/reader036/viewer/2022062600/5b5bff3f7f8b9ad21d8b4ca6/html5/thumbnails/17.jpg)
Step 4: Variant Calling by Class
![Page 18: A Quick Guide to the Analytics Behind Genomic Testing · Understanding bioinformatics requires ... Catalogue various types of bioinformatics analyses that support clinical genomic](https://reader036.vdocuments.mx/reader036/viewer/2022062600/5b5bff3f7f8b9ad21d8b4ca6/html5/thumbnails/18.jpg)
SNV/Insertion/Deletion • Position based callers
(GATK Unified Genotyper, LoFreq)
• Local de-novo assembly of haplotypes (GATK Haplotype Caller)
• Graph based variant callers (Graph Genome)
• Neural networks (Deep Variant)
Duplications/Structural variants • Pattern growth approach (Pindel)
• Split reads, discordant paired-end reads (Manta, DELLY, CREST)
• kmer + de-novo assembly (BreaKmer)
• Unmapped or partially mapped reads (ITD Assembler)
• Depth of coverage + background error correction + principal component analysis (XHMM)
• Tumor/normal
• B allele frequency
Example Variant Calling Algorithms
![Page 19: A Quick Guide to the Analytics Behind Genomic Testing · Understanding bioinformatics requires ... Catalogue various types of bioinformatics analyses that support clinical genomic](https://reader036.vdocuments.mx/reader036/viewer/2022062600/5b5bff3f7f8b9ad21d8b4ca6/html5/thumbnails/19.jpg)
Example KRAS G12D Variant Cell
![Page 20: A Quick Guide to the Analytics Behind Genomic Testing · Understanding bioinformatics requires ... Catalogue various types of bioinformatics analyses that support clinical genomic](https://reader036.vdocuments.mx/reader036/viewer/2022062600/5b5bff3f7f8b9ad21d8b4ca6/html5/thumbnails/20.jpg)
The annotated variant includes: • Gene • Gene Transcript • Nucleotide change (cdot) • Protein change (pdot) • Variant Type
– Polymorphism – Synonymous – Non-synonymous
• Nonsense • Missense
– Frame shift
The VCF variant includes: • chromosome • position • ID • reference base • alternate base • variant quality • meta-information
– information and individual format fields
– filter flags
Step 5: Variant Annotations
VCF variant
Annotated variant
![Page 21: A Quick Guide to the Analytics Behind Genomic Testing · Understanding bioinformatics requires ... Catalogue various types of bioinformatics analyses that support clinical genomic](https://reader036.vdocuments.mx/reader036/viewer/2022062600/5b5bff3f7f8b9ad21d8b4ca6/html5/thumbnails/21.jpg)
Sample Report QC metrics
Step 6: QC Calculations
Sequencing
Run Level Cluster density Base call quality score Fragment size
Sample Level Depth coverage Uniformity Mapping quality Duplication rate
Variant Level Novel variants Known variants Transition-to-transversion ratio
![Page 22: A Quick Guide to the Analytics Behind Genomic Testing · Understanding bioinformatics requires ... Catalogue various types of bioinformatics analyses that support clinical genomic](https://reader036.vdocuments.mx/reader036/viewer/2022062600/5b5bff3f7f8b9ad21d8b4ca6/html5/thumbnails/22.jpg)
Off-Target Read D
epth
On-Target
Gene Exon
Intronic regions
Sample-Level QC Metrics for Targeted Capture
Minimum Depth of Coverage Uniformity
Mapping Quality
Duplication Rate
![Page 23: A Quick Guide to the Analytics Behind Genomic Testing · Understanding bioinformatics requires ... Catalogue various types of bioinformatics analyses that support clinical genomic](https://reader036.vdocuments.mx/reader036/viewer/2022062600/5b5bff3f7f8b9ad21d8b4ca6/html5/thumbnails/23.jpg)
How does a bioinformatics job get executed in clinical production?
Job 1
Job 3
Job 2
Job 4
Job 5
Compute Infrastructure for Data Processing
![Page 24: A Quick Guide to the Analytics Behind Genomic Testing · Understanding bioinformatics requires ... Catalogue various types of bioinformatics analyses that support clinical genomic](https://reader036.vdocuments.mx/reader036/viewer/2022062600/5b5bff3f7f8b9ad21d8b4ca6/html5/thumbnails/24.jpg)
Data Storage Infrastructure
BCL (500–550 GB)
FASTQs (12–15 GB)
Database
Object Storage “hot”
Archive “cold”
99.9% availability 99.999999999% durability
Raw output for a single run
Exome ~150x
In-house
Cloud based
FASTQs, BAMs, VCFs
Bioinformatics HiSeq 4000
![Page 25: A Quick Guide to the Analytics Behind Genomic Testing · Understanding bioinformatics requires ... Catalogue various types of bioinformatics analyses that support clinical genomic](https://reader036.vdocuments.mx/reader036/viewer/2022062600/5b5bff3f7f8b9ad21d8b4ca6/html5/thumbnails/25.jpg)
• Recommendations from CAP/AMP – 17 recommendation statements
– 59 variants tested in each variant class
• Example Statistics – Positive percentage agreement (PPA)
– Positive predictive value (PPV)
– Reproducibility
– Allelic fraction lower limit of detection
• Validation required prior to use in clinical production
Bioinformatics Pipeline Validation
![Page 26: A Quick Guide to the Analytics Behind Genomic Testing · Understanding bioinformatics requires ... Catalogue various types of bioinformatics analyses that support clinical genomic](https://reader036.vdocuments.mx/reader036/viewer/2022062600/5b5bff3f7f8b9ad21d8b4ca6/html5/thumbnails/26.jpg)
Summary
Catalogue various types of bioinformatics analyses that support clinical genomic testing
Enumerate types of variant classes
Describe algorithmic methods for variant detection by NGS
Compare and contrast germline and somatic clinical bioinformatics pipeline methodologies
Discuss the infrastructure complexity required to support analytics for NGS testing at scale in the cloud
Explain validation strategies for bringing best-in-class pipelines into clinical production
![Page 27: A Quick Guide to the Analytics Behind Genomic Testing · Understanding bioinformatics requires ... Catalogue various types of bioinformatics analyses that support clinical genomic](https://reader036.vdocuments.mx/reader036/viewer/2022062600/5b5bff3f7f8b9ad21d8b4ca6/html5/thumbnails/27.jpg)
Questions? Elaine Gee, PhD Director of Bioinformatics ARUP Laboratories [email protected]