comprehensive detection of germline and ... - bionano genomics · bionano genomics, san diego,...

1
Extraction of long DNA molecules Label DNA at specific sequence motifs Saphyr Chip linearizes DNA in NanoChannel arrays Saphyr automates imaging of single molecules in NanoChannel arrays Molecules and labels detected in images Bionano Access software assembles optical maps 1 2 3 4 5 6 Blood Cell Tissue Microbes Free DNA Solution DNA in a Microchannel DNA in a Nanochannel Gaussian Coil Partially Elongated Linearized Position (kbp) Methods (1) Long molecules of DNA are labeled with Bionano reagents by (2) incorporation of fluorophores at a specific sequence motif throughout the genome. (3) The labeled genomic DNA is then linearized in the Saphyr Chip using NanoChannel arrays (4) Single molecules are imaged by Saphyr and then digitized. (5) Molecules are uniquely identifiable by distinct distribution of sequence motif labels (6) and then assembled by pairwise alignment into de novo genome maps. COLO 829 COLO 829- BL Comprehensive Detection of Germline and Somatic Structural Mutation in Cancer Genomes by Bionano Genomics Optical Mapping Abstract Background Conclusions We demonstrate that the Saphyr system can be used to accurately detect genetic mutation hallmarks in samples with cancer. These includes large rearrangements ranging from translocations, within chromosome fusions, to copy number alterations. Researchers can perform mapping experiments to uncover somatic variants by comparing with our control sample database and a matching non-tumor sample. Furthermore, our molecule mapping approach enables us to identify lower allelic mutation. Our results indicate that Saphyr can capture a broad spectrum of variation with functional importance, and can provide easy solutions for cancer studies. Acknowledgements We like to thank our collaborators Wigard Kloosterman, Jose Espejo Valle-Inclan and Edwin Cuppen for the samples, inputs to the study design and advice on the analysis. Reference 1) Cao, H., et al., Rapid detection of structural variation in a human genome using NanoChannel-based genome mapping technology. Gigascience (2014); 3(1):34 2) Hastie, A.R., et al. Rapid Genome Mapping in NanoChannel Arrays for Highly Complete and Accurate De Novo Sequence Assembly of the Complex Aegilops tauschii Genome. PLoS ONE (2013); 8(2): e55864. 3) Das, S. K., et al. Single molecule linear analysis of DNA in nano-channel labeled with sequence specific fluorescent probes. Nucleic Acids Research (2010); 38: 8 4) Xiao, M et. al. Rapid DNA mapping by fluorescent single molecule detection. Nucleic Acids Research (2007); 35:e16. In cancer genetics, the ability to identify constitutive and low-allelic fraction structural variants (SVs) is crucial. Conventional karyotype and cytogenetics approaches are manually intensive. Microarrays and short- read sequencing cannot detect calls in segmental duplications and repeats, often miss balanced variants, and have trouble finding low- frequency mutations. We describe the use of Bionano Genomics’ Saphyr platform to comprehensively identify SVs for studying cancer genomes. DNA larger than 100 kbp are extracted, labelled at specific motifs, and linearized through NanoChannel arrays for visualization. Molecule images are digitized and de novo assembled, creating megabases long Bionano genome maps. Somatic mutations can be identified by running the variant annotation pipeline, which compares the cancer sample’s assembly calls against > 600,000 SVs in Bionano’s control sample SV database, and against a matched control sample’s variants. In addition, two new Bionano pipelines leverage these long molecules to identify additional somatic SVs: the copy number variation pipeline and the molecule mapping pipeline. By examining the coverage-depth of molecule alignment to the public human genome reference, the pipeline can identify megabases long amplifications and deletions. Similarly, clusters of split-molecule alignments against a reference can reliably find translocations and other rearrangements. We applied this suite of discovery tools to construct a comprehensive map of SVs in a well-studied melanoma cell line, COLO829. We collected data from the tumor and the matched blood cell line, constructed contiguous assemblies (N50 > 50 Mbp), and called over 6,000 SVs in each genome. Then, we classified 51 as somatic by comparing the tumor and the blood control. Furthermore molecule mapping identified extra mutations. The copy number profile captured the BRAF on chr7, as well as other chromosomal-arm gains and losses. In conclusion, with one comprehensive platform, Saphyr can discover a broad range of traditionally refractory but functionally-relevant SVs, and further improves our understanding of cancer etiology. Generating high-quality finished genomes replete with accurate identification of structural variation and high completion (minimal gaps) remains challenging using short read sequencing technologies alone. The Saphyr™ system provides direct visualization of long DNA molecules in their native state, bypassing the statistical inference needed to align paired-end reads with an uncertain insert size distribution. These long labeled molecules are de novo assembled into physical maps spanning the entire diploid genome. The resulting provides the ability to correctly position and orient sequence contigs into chromosome-scale scaffolds and detect a large range of homozygous and heterozygous structural variation with very high efficiency. A.W.C. Pang, J. Lee, T. Anantharaman, E. Lam, A. Hastie, H. Cao, M. Borodkin Bionano Genomics, San Diego, California, United States of America COLO 829 COLO 829-BL Assembly size (haplotype-aware) 6.29 Gbp 6.34 Gbp Genome map N50 52.7 Mbp 57.3 Mbp SV calls against hg19 Insertions 4,089 4,218 Deletions 1,736 1,771 Duplications 181 164 Inversion breakpoints 308 305 Intrachromosomal translocations 0 0 Interchromosomal translocations 2 0 Step 1 Compare with Bionano SV database (~ 180 normal/healthy humans) Examine coverage and chimera quality scores Overlap with gene annotation Step 2 Compare the sample’s SV calls with the matched control’s SV calls Step 3 Align the blood control’s molecules to the sample’s assembly Does the blood control contain molecule support for sample’s SV? Yes False classification as somatic due to false negative call in blood. No Sample-unique SV. Re-align the sample’s molecules to the sample’s assembly Does the sample contain molecule support for sample’s SV? Yes True SV. No False positive SV. Sample SV call set Annotated sample SV call set SV Type COLO 829 Insertions 5 Deletions 28 Duplications 16 Inversion breakpoints 0 Intra-chromosomal translocations 0 Inter-chromosomal translocations 2 RAB31 ITIH5 hg19 chr18 hg19 chr10 COLO 829 map 1871 9,735,632 10,024,212 7,493,300 7,787,674 9,869,797 7,647,848 TXNDC2 hg19 chr10 COLO 829 map 242 COLO 829 map 241 CFL1P 1 PTEN 89,593,241 89,751,246 AK13007 6 12.0 kb del 12.0 kb del COLO 829- BL map 71 COLO 829- BL map 72 89,593,241 89,751,246 CFL1P 1 PTEN AK13007 6 hg19 chr10 Chromosome 7 copy number variation BRAF amplification TMEM185 A MAGEA11 Case study: Identifying SVs in metastatic melanoma fibroblast and matched lymphoblastoid cell lines 1 de novo assembly and SV detected against hg19 1) Assembly and SV statistics for the two samples 2) A circos plot indicating the large scale rearrangements uniquely detected in COLO829. 3) A assembled map captures a t(10;18) translocation Insertion Deletion Inversion Duplication Translocatio n CNV 2 3 4 5 4) Variant annotation pipeline designed to identify somatic variants for cancer studies 5) Resulting number of somatic variants detected in COLO 829 6) A 12 kb deletion identified interrupting the PTEN gene in the tumor and not the blood cell line 6 7 Detection of low allelic mutation by direction molecule alignment to hg19 TLR8- AS1 TLR7 PRPS2 TMSB4X TPTE BAGE hg19 chrX hg19 chr21 COLO 829 local assembly map COLO 829 local assembly map 7) A t(21;X) translocation uniquely detected by molecule alignment to hg19. We directly called SVs by molecules, and then constructed a local assembly to validate the call. The red arrows indicate the location of the breakpoint on the local assembled map. Molecules COLO 829 local assembly map hg19 chrX Molecules 8) A 75.8 kb somatic inversion on chrX that was uniquely captured by molecule alignment and local assembly approach. 8 Chromosome 3 copy number variation Position (bp) Position (bp) 24 Mb deletion 170 Mb duplication 2.4 kb amplification 9 9) Copy number variation can be captured by elevation and depletion of coverage in molecule alignment to hg19 COLO 829 COLO 829-BL 10,956,785 11,327,739 12,738,707 13,093,292 148,670,159 148,906,550

Upload: others

Post on 15-Jul-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Comprehensive Detection of Germline and ... - Bionano Genomics · Bionano Genomics, San Diego, California, United States of America COLO 829 COLO 829-BL Assembly size (haplotype-aware)

Extraction of long DNA molecules Label DNA at specific sequence

motifs

Saphyr Chip linearizes DNA in

NanoChannel arrays

Saphyr automates imaging of

single molecules in

NanoChannel arrays

Molecules and labels detected in

images

Bionano Access software

assembles optical maps

1 2 3 4 5 6

Blood Cell Tissue Microbes

Free DNA Solution DNA in a Microchannel DNA in a Nanochannel

Gaussian Coil Partially Elongated Linearized

Position (kbp)

Methods

(1) Long molecules of DNA are labeled with Bionano reagents by (2) incorporation of fluorophores at a specific sequence motif throughout the genome. (3) The labeled genomic DNA is then

linearized in the Saphyr Chip using NanoChannel arrays (4) Single molecules are imaged by Saphyr and then digitized. (5) Molecules are uniquely identifiable by distinct distribution of sequence

motif labels (6) and then assembled by pairwise alignment into de novo genome maps.

COLO 829 COLO 829-

BL

Comprehensive Detection of Germline and Somatic Structural Mutation in Cancer Genomes

by Bionano Genomics Optical Mapping

Abstract Background

ConclusionsWe demonstrate that the Saphyr system can be used to accurately detect genetic mutation hallmarks in samples with

cancer. These includes large rearrangements ranging from translocations, within chromosome fusions, to copy number

alterations. Researchers can perform mapping experiments to uncover somatic variants by comparing with our control

sample database and a matching non-tumor sample. Furthermore, our molecule mapping approach enables us to

identify lower allelic mutation. Our results indicate that Saphyr can capture a broad spectrum of variation with functional

importance, and can provide easy solutions for cancer studies.

Acknowledgements

We like to thank our collaborators Wigard Kloosterman, Jose Espejo Valle-Inclan and Edwin Cuppen for the samples,

inputs to the study design and advice on the analysis.

Reference1) Cao, H., et al., Rapid detection of structural variation in a human genome using

NanoChannel-based genome mapping technology. Gigascience (2014); 3(1):34

2) Hastie, A.R., et al. Rapid Genome Mapping in NanoChannel Arrays for Highly Complete

and Accurate De Novo Sequence Assembly of the Complex Aegilops tauschii Genome.

PLoS ONE (2013); 8(2): e55864.

3) Das, S. K., et al. Single molecule linear analysis of DNA in nano-channel labeled with

sequence specific fluorescent probes. Nucleic Acids Research (2010); 38: 8

4) Xiao, M et. al. Rapid DNA mapping by fluorescent single molecule detection. Nucleic

Acids Research (2007); 35:e16.

In cancer genetics, the ability to identify constitutive and low-allelic

fraction structural variants (SVs) is crucial. Conventional karyotype and

cytogenetics approaches are manually intensive. Microarrays and short-

read sequencing cannot detect calls in segmental duplications and

repeats, often miss balanced variants, and have trouble finding low-

frequency mutations.

We describe the use of Bionano Genomics’ Saphyr platform to

comprehensively identify SVs for studying cancer genomes. DNA larger

than 100 kbp are extracted, labelled at specific motifs, and linearized

through NanoChannel arrays for visualization. Molecule images are

digitized and de novo assembled, creating megabases long Bionano

genome maps. Somatic mutations can be identified by running the variant

annotation pipeline, which compares the cancer sample’s assembly calls

against > 600,000 SVs in Bionano’s control sample SV database, and

against a matched control sample’s variants. In addition, two new

Bionano pipelines leverage these long molecules to identify additional

somatic SVs: the copy number variation pipeline and the molecule

mapping pipeline.

By examining the coverage-depth of molecule alignment to the public

human genome reference, the pipeline can identify megabases long

amplifications and deletions. Similarly, clusters of split-molecule

alignments against a reference can reliably find translocations and other

rearrangements.

We applied this suite of discovery tools to construct a comprehensive

map of SVs in a well-studied melanoma cell line, COLO829. We collected

data from the tumor and the matched blood cell line, constructed

contiguous assemblies (N50 > 50 Mbp), and called over 6,000 SVs in

each genome. Then, we classified 51 as somatic by comparing the tumor

and the blood control. Furthermore molecule mapping identified extra

mutations. The copy number profile captured the BRAF on chr7, as well

as other chromosomal-arm gains and losses. In conclusion, with one

comprehensive platform, Saphyr can discover a broad range of

traditionally refractory but functionally-relevant SVs, and further improves

our understanding of cancer etiology.

Generating high-quality finished genomes replete with accurate

identification of structural variation and high completion (minimal

gaps) remains challenging using short read sequencing technologies

alone. The Saphyr™ system provides direct visualization of long

DNA molecules in their native state, bypassing the statistical

inference needed to align paired-end reads with an uncertain insert

size distribution. These long labeled molecules are de novo

assembled into physical maps spanning the entire diploid genome.

The resulting provides the ability to correctly position and orient

sequence contigs into chromosome-scale scaffolds and detect a

large range of homozygous and heterozygous structural variation

with very high efficiency.

A.W.C. Pang, J. Lee, T. Anantharaman, E. Lam, A. Hastie, H. Cao, M. Borodkin

Bionano Genomics, San Diego, California, United States of America

COLO 829 COLO 829-BL

Assembly size (haplotype-aware) 6.29 Gbp 6.34 Gbp

Genome map N50 52.7 Mbp 57.3 Mbp

SV calls against hg19

Insertions 4,089 4,218

Deletions 1,736 1,771

Duplications 181 164

Inversion breakpoints 308 305

Intrachromosomal translocations 0 0

Interchromosomal translocations 2 0

Step 1

• Compare with Bionano SV database (~ 180 normal/healthy humans)

• Examine coverage and chimera quality scores

• Overlap with gene annotation

Step 2

• Compare the sample’s SV calls with the matched control’s SV calls

Step 3

• Align the blood control’s molecules to the sample’s assembly

• Does the blood control contain molecule support for sample’s SV?

• Yes False classification as somatic due to false negative call in blood.

• No Sample-unique SV.

• Re-align the sample’s molecules to the sample’s assembly

• Does the sample contain molecule support for sample’s SV?

• Yes True SV.

• No False positive SV.

Sample SV call

set

Annotated sample

SV call set

SV Type COLO 829

Insertions 5

Deletions 28

Duplications 16

Inversion breakpoints 0

Intra-chromosomal translocations 0

Inter-chromosomal translocations 2

RAB31

ITIH5

hg19

chr18

hg19

chr10

COLO 829

map 1871

9,735,632 10,024,212

7,493,300 7,787,674

9,869,797

7,647,848

TXNDC2

hg19 chr10

COLO 829

map 242

COLO 829

map 241

CFL1P

1

PTEN

89,593,241 89,751,246

AK13007

6

12.0 kb del

12.0 kb del

COLO 829-

BL map 71

COLO 829-

BL

map 72

89,593,241 89,751,246

CFL1P

1

PTEN AK13007

6

hg19 chr10

Chromosome 7 copy number

variation

BRAF

amplification

TMEM185

A

MAGEA11

Case study: Identifying SVs in metastatic melanoma fibroblast and matched lymphoblastoid cell lines

1 de novo assembly and SV detected against hg19

1) Assembly and SV statistics for the two samples

2) A circos plot indicating the large scale rearrangements uniquely detected in COLO829.

3) A assembled map captures a t(10;18) translocation

Insertion

Deletion

Inversion

DuplicationTranslocatio

nCNV

2

3

4

5

4) Variant annotation

pipeline designed to

identify somatic variants

for cancer studies

5) Resulting number of

somatic variants

detected in COLO 829

6) A 12 kb deletion

identified interrupting the

PTEN gene in the tumor

and not the blood cell

line6

7 Detection of low allelic mutation by direction molecule alignment to hg19

TLR8-

AS1

TLR7 PRPS2TMSB4X

TPTE BAGE

hg19

chrX

hg19

chr21

COLO 829 local

assembly map

COLO 829 local

assembly map

7) A t(21;X)

translocation uniquely

detected by molecule

alignment to hg19. We

directly called SVs by

molecules, and then

constructed a local

assembly to validate the

call. The red arrows

indicate the location of

the breakpoint on the

local assembled map.

Molecules

COLO 829 local

assembly map

hg19

chrX

Molecules

8) A 75.8 kb somatic

inversion on chrX that

was uniquely captured

by molecule alignment

and local assembly

approach.

8

Chromosome 3 copy number

variation

Position (bp)

Position (bp)

24 Mb

deletion

170 Mb

duplication

2.4 kb

amplification

9

9) Copy number variation can be captured by elevation and depletion of coverage in molecule

alignment to hg19

COLO 829

COLO 829-BL

10,956,785 11,327,739

12,738,707 13,093,292

148,670,159 148,906,550