target enrichment with ngs: cardiomyopathy as a case study - bmr genomics

50
Target Enrichment Understanding the output Andrea Telatin BMR Genomics

Upload: andrea-telatin

Post on 02-Jul-2015

201 views

Category:

Science


6 download

DESCRIPTION

Seminar on target enrichment performed with Illumina MiSeq. A description of the experiment and the output provided by the bioinformatics analyses. How to use IGV to inspect the alignments and variant calling.

TRANSCRIPT

Page 1: Target Enrichment with NGS: Cardiomyopathy as a case study - BMR Genomics

Target EnrichmentUnderstanding the output

Andrea Telatin BMR Genomics

Page 2: Target Enrichment with NGS: Cardiomyopathy as a case study - BMR Genomics

Today menu:!

!

• Disease research applications for TE panels

• bioinformatic analysis of the data…

• …and how to handle the output

using a Cardiomiopathy panel as a test case

Page 3: Target Enrichment with NGS: Cardiomyopathy as a case study - BMR Genomics

Why?Technology Overview

Page 4: Target Enrichment with NGS: Cardiomyopathy as a case study - BMR Genomics

3.3 Gb 50 Mb 0.5 Mb

Page 5: Target Enrichment with NGS: Cardiomyopathy as a case study - BMR Genomics

Gilissen, Genome Biol 2011

Page 6: Target Enrichment with NGS: Cardiomyopathy as a case study - BMR Genomics

Gilissen, Genome Biol 2011

Exome Seq Custom Panels

Page 7: Target Enrichment with NGS: Cardiomyopathy as a case study - BMR Genomics

With Custom TE!!

Finding relevant variants !

Spending less !

Focus on Your Favourite Genes

Page 8: Target Enrichment with NGS: Cardiomyopathy as a case study - BMR Genomics

With Custom TE!!

Finding relevant variants !

Spending less !

Focus on Your Favourite Genes

Page 9: Target Enrichment with NGS: Cardiomyopathy as a case study - BMR Genomics
Page 10: Target Enrichment with NGS: Cardiomyopathy as a case study - BMR Genomics

Case study

Page 11: Target Enrichment with NGS: Cardiomyopathy as a case study - BMR Genomics

Case study

Antonio Puerta

Page 12: Target Enrichment with NGS: Cardiomyopathy as a case study - BMR Genomics

Cardiomyopathies• Targets most common causative SNPs for

• ARVC (Arrhythmogenic right ventricular cardiomyopathy)

• Brugada Syndrome

• Long QT

• Hypertrophic cardiomyopathy

The Panel

Page 13: Target Enrichment with NGS: Cardiomyopathy as a case study - BMR Genomics

Cardiomyopathies• We designed a panel for CMPD

• Platform of choice: Agilent HaloPlex

• Sequencer: Illumina MiSeq (PE 2x150)

• 56 targeted genes (165 regions)

• 500 kb target size

The Panel

Page 14: Target Enrichment with NGS: Cardiomyopathy as a case study - BMR Genomics

Output at a glance• Sequenced 44 samples so far

• Average cov: 232X (±36X)

• Reads on target: 99.6%

• Target > 5X: 95.6%

Page 15: Target Enrichment with NGS: Cardiomyopathy as a case study - BMR Genomics

How?Bioinformatic Analysis

Page 16: Target Enrichment with NGS: Cardiomyopathy as a case study - BMR Genomics

• Target enrichment + Library Preparation

• Sequencing

• Alignment against reference

• Local realignment

• Variant calling

• Variant annotation

• Data mining

!

!

Format: SAM

!

Format: VCF

!

Page 17: Target Enrichment with NGS: Cardiomyopathy as a case study - BMR Genomics

Sequence alignment

Page 18: Target Enrichment with NGS: Cardiomyopathy as a case study - BMR Genomics

This is a hard example. !That is another easy example.

Page 19: Target Enrichment with NGS: Cardiomyopathy as a case study - BMR Genomics

This is a --hard---- example. || ||||| | | ||||||||| That is another easy example.

This is a-- h-ard---- example. || ||||| | | ||||||||| That is anothe-r easy example.

This is a hard example.------ || ||||| | | That is another easy example.

Gap C

ost

Page 20: Target Enrichment with NGS: Cardiomyopathy as a case study - BMR Genomics

To discover more…• The standard algorithms for sequence alignment

are Needleman-Wunsch and Smith-Waterman

• For large sequences the standard is BLAST

• For short reads one of the most popular choices is BWA (uses BWT)

• Interesting CUDA enabled implementations2003 Thesis

Page 21: Target Enrichment with NGS: Cardiomyopathy as a case study - BMR Genomics

Sequence alignment

Short

Page 22: Target Enrichment with NGS: Cardiomyopathy as a case study - BMR Genomics

Chromosomes (reference)

Short reads

Page 23: Target Enrichment with NGS: Cardiomyopathy as a case study - BMR Genomics

Chromosomes (reference)

Short reads

Page 24: Target Enrichment with NGS: Cardiomyopathy as a case study - BMR Genomics

Challenges

• Million reads to be aligned

• Short reads are less likely to be “unique”

Page 25: Target Enrichment with NGS: Cardiomyopathy as a case study - BMR Genomics

The SAM/BAM formats

• SAM (Sequence Alignment Format) is a plain text format born and designed for short reads alignments

• It’s complex for humans, because designed for machines

• It has been a major improvement in NGS analyses

Page 26: Target Enrichment with NGS: Cardiomyopathy as a case study - BMR Genomics

SAM

DAT

A

Page 27: Target Enrichment with NGS: Cardiomyopathy as a case study - BMR Genomics
Page 28: Target Enrichment with NGS: Cardiomyopathy as a case study - BMR Genomics

Sequence realignment

Page 29: Target Enrichment with NGS: Cardiomyopathy as a case study - BMR Genomics

• Sequence alignment is (mostly) done one sequence at a time

• At the end we can “rethink” the choices done while aligning, looking at the whole picture

Page 30: Target Enrichment with NGS: Cardiomyopathy as a case study - BMR Genomics

Variants?

Page 31: Target Enrichment with NGS: Cardiomyopathy as a case study - BMR Genomics

Variants?

Page 32: Target Enrichment with NGS: Cardiomyopathy as a case study - BMR Genomics

Errors!

Page 33: Target Enrichment with NGS: Cardiomyopathy as a case study - BMR Genomics
Page 34: Target Enrichment with NGS: Cardiomyopathy as a case study - BMR Genomics
Page 35: Target Enrichment with NGS: Cardiomyopathy as a case study - BMR Genomics
Page 36: Target Enrichment with NGS: Cardiomyopathy as a case study - BMR Genomics

• Once that the alignment is “cleaned”, variant calling becomes a little bit easier.

• Several aspects are involved, much more than the mere “counting differences”

• These aspects are complex, interesting… …but we are not talking about them today!

Page 37: Target Enrichment with NGS: Cardiomyopathy as a case study - BMR Genomics

The VCF format

Page 38: Target Enrichment with NGS: Cardiomyopathy as a case study - BMR Genomics

Annotation

Page 39: Target Enrichment with NGS: Cardiomyopathy as a case study - BMR Genomics
Page 40: Target Enrichment with NGS: Cardiomyopathy as a case study - BMR Genomics

Chromosomes (reference)

Short reads

Genes/Transcripts

G>G Y>. C>WAminoacid changes

Functional annotationDisease database

Effect predictorsLiterature links

Page 41: Target Enrichment with NGS: Cardiomyopathy as a case study - BMR Genomics

VEP: Variant Effect Predictor• ! genes and transcripts affected by the variants

• ! location of the variants (e.g. upstream, in coding sequence, in non-coding RNA, in regulatory regions)

• ! consequence of your variants on the protein sequence (e.g. stop gained, missense, stop lost, frameshift)

• ! known variants that match yours, and associated minor allele frequencies from the 1000 Genomes Project

• ! SIFT and PolyPhen scores for changes to protein sequence

Page 42: Target Enrichment with NGS: Cardiomyopathy as a case study - BMR Genomics
Page 43: Target Enrichment with NGS: Cardiomyopathy as a case study - BMR Genomics

ANNOVARANNOVAR is an efficient tool to functionally annotate genetic variants.

• Gene-based annotation: identify whether SNPs or CNVs cause protein coding changes and the amino acids that are affected.

• Region-based annotations: identify variants in specific genomic regions, for example, conserved regions among 44 species, predicted transcription factor binding sites,…

• Filter-based annotation: identify variants that are reported in dbSNP, or identify the subset of common SNPs (MAF>1%) in the 1000 Genome Project, or identify subset of non-synonymous SNPs with SIFT score>0.05, …

Page 44: Target Enrichment with NGS: Cardiomyopathy as a case study - BMR Genomics
Page 45: Target Enrichment with NGS: Cardiomyopathy as a case study - BMR Genomics
Page 46: Target Enrichment with NGS: Cardiomyopathy as a case study - BMR Genomics

Can open: ALIGNMENTS (BAM) ANNOTATIONS (BED) VARIANTS (VCF)

Page 47: Target Enrichment with NGS: Cardiomyopathy as a case study - BMR Genomics
Page 48: Target Enrichment with NGS: Cardiomyopathy as a case study - BMR Genomics

Any questions?

Page 49: Target Enrichment with NGS: Cardiomyopathy as a case study - BMR Genomics

Summarizing!

• Target enrichment: many individuals sequenced on genes of interest

• SAM/BAM formats to store alignments

• The IGV program to visualise tracks (including alignments)

• The VCF format to store genomic variations

• Annotation programs add things to a flat file

Page 50: Target Enrichment with NGS: Cardiomyopathy as a case study - BMR Genomics

Acknowledgments: BMR Genomics

• CEO: Barbara Simionati

• NGS Team Leader: Giorgio Malacrida

• Target Enrichment specialist: Ilena Li Mura

• Variant annotation specialist: Ivano Zara

…and everybody else there, making the whole team special.