single-molecule analysis reveals widespread structural ... · vs. normal sample; low cn (blue),...

6
Single-molecule analysis reveals widespread structural variation in multiple myeloma Aditya Gupta a,b,c , Michael Place b,c , Steven Goldstein b,c , Deepayan Sarkar d , Shiguo Zhou b,c , Konstantinos Potamousis b,c , Jaehyup Kim c,e , Claire Flanagan c,e , Yang Li f , Michael A. Newton c,g , Natalie S. Callander c,e , Peiman Hematti c,e , Emery H. Bresnick c,h , Jian Ma f , Fotis Asimakopoulos c,e , and David C. Schwartz a,b,c,1 a Biophysics Graduate Program, University of WisconsinMadison, Madison, WI 53706; b Laboratory for Molecular and Computational Genomics, Department of Chemistry, Laboratory of Genetics and Biotechnology Center, University of WisconsinMadison, Madison, WI 53706; c University of Wisconsin Carbone Cancer Center, Madison, WI 53705; d Indian Statistical Institute, New Delhi 110016, India; e Department of Medicine, School of Medicine and Public Health, University of WisconsinMadison, Madison, WI 53705; f Institute for Genomic Biology, University of Illinois at UrbanaChampaign, Urbana, IL 61801; g Department of Statistics, University of WisconsinMadison, Madison, WI 53706; and h Department of Cell and Regenerative Biology, University of WisconsinMadison Blood Research Program, Madison, WI 53705 Edited* by David E. Housman, Massachusetts Institute of Technology, Cambridge, MA, and approved May 6, 2015 (received for review September 26, 2014) Multiple myeloma (MM), a malignancy of plasma cells, is charac- terized by widespread genomic heterogeneity and, consequently, differences in disease progression and drug response. Although recent large-scale sequencing studies have greatly improved our understanding of MM genomes, our knowledge about genomic structural variation in MM is attenuated due to the limitations of commonly used sequencing approaches. In this study, we present the application of optical mapping, a single-molecule, whole-genome analysis system, to discover new structural variants in a primary MM genome. Through our analysis, we have identified and characterized widespread structural variation in this tumor genome. Additionally, we describe our efforts toward comprehensive characterization of ge- nome structure and variation by integrating our findings from opti- cal mapping with those from DNA sequencing-based genomic analysis. Finally, by studying this MM genome at two time points during tumor progression, we have demonstrated an increase in mutational burden with tumor progression at all length scales of variation. structural variation | copy number | multiple myeloma | optical mapping | DNA sequencing M ultiple myeloma (MM) is the malignancy of B lymphocytes that terminally differentiate into long-lived, antibody-pro- ducing plasma cells. Like other cancers, it is characterized by many genomic aberrations, including single nucleotide variants (SNVs) (1, 2), translocations (most notably involving the Ig heavy chain locus on chr14), and copy number changes, including aneuploidy (3). Recent large-scale sequencing studies have described wide- spread inter- and intra-tumor genomic heterogeneity (1, 2), clonal evolution (4, 5) and clonal tides (4) in MM. However, most of this work focuses on point mutations and large-scale copy number changes. Although the role of structural variation in normal hu- man genome polymorphism (6, 7) and diseases (8) is widely ap- preciated, a comprehensive analysis of structural variation in MM is yet to be reported. The therapeutic landscape for MM over the past decade has been transformed with the introduction of proteasome inhibitors (bortezomib, carfilzomib) and thalidomide analogs (9, 10). Con- sequently, patient survival rates have vastly improved (11). How- ever, MM remains an incurable cancer, and almost all patients with symptomatic MM die of their disease because acquired drug resistance limits the efficacy of current therapies and shortens overall survival (12). Therefore, understanding the impact of con- temporary treatments on MM genomic selection may provide fundamental insights for preventing and/or circumventing drug resistance through judicious use of existing therapies and/or ra- tional design of novel agents. To address these issues, we have used optical mapping (7, 1319) and DNA sequencing to comprehensively characterize structural variation in a primary MM genome at two stages of tumor pro- gression and drug response. The two stages represent a sensitive relapse (MM-S; patient responded to subsequent treatments) and a subsequent refractory relapse (MM-R; no response to any treat- ments) (SI Materials and Methods and Fig. 1). Optical mapping is a single-molecule system that constructs large datasets comprising ordered restriction maps (Rmaps; 1 Rmap is a restriction map of a single DNA molecule) from individual genomic DNA molecules (Fig. S1). These datasets are submitted to a computational pipeline powered by cluster computing for genome assembly (15) and dis- covery of structural variants (7, 14, 16, 19). The final assembly pre- sents a relatively unbiased, long-range view of the genome, free of amplification and cloning artifacts, which supports the identification of structural variants and large-scale copy number changes. Previously, optical mapping has been used to uncover structural variation in normal (7), disease risk (17), and cancerous (18) human genomes. Here, we connect long-range structural variation findings from optical mapping with results from whole genome DNA se- quencing data analysis (Fig. 1). Such analysis has enabled us to comprehensively identify somatic variation in these tumor samples across all length scales, including structural, copy number, and single nucleotide variation. Additionally, by analyzing these tumor samples at two time points during tumor progression, we have highlighted an increase in mutational burden with tumor progression. Significance In the last several years, we have seen significant progress to- ward personalized cancer genomics and therapy. Although we routinely discern and understand genomic variation at single base pair and chromosomal levels, comprehensive analysis of genome variation, particularly structural variation, remains a challenge. We present an integrated approach using optical mappinga single-molecule, whole-genome analysis systemand DNA sequencing to comprehensively identify genomic struc- tural variation in sequential samples from a multiple myeloma patient. Through our analysis, we have identified widespread structural variation and an increase in mutational burden with tumor progression. Our findings highlight the need to routinely incorporate structural variation analysis at many length scales to understand cancer genomes more comprehensively. Author contributions: A.G., F.A., and D.C.S. designed research; A.G., M.P., and S.G. per- formed research; A.G., M.P., S.G., D.S., S.Z., K.P., J.K., M.A.N., N.S.C., P.H., F.A., and D.C.S. contributed new reagents/analytic tools; A.G., M.P., S.G., C.F., Y.L., E.H.B., J.M., F.A., and D.C.S. analyzed data; and A.G., N.S.C., F.A., and D.C.S. wrote the paper. The authors declare no conflict of interest. *This Direct Submission article had a prearranged editor. Freely available online through the PNAS open access option. Data deposition: The sequences reported in this paper have been deposited in the Se- quence Read Archive (SRA), www.ncbi.nlm.nih.gov/sra (accession no. SRP058274). 1 To whom correspondence should be addressed. Email: [email protected]. This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. 1073/pnas.1418577112/-/DCSupplemental. www.pnas.org/cgi/doi/10.1073/pnas.1418577112 PNAS | June 23, 2015 | vol. 112 | no. 25 | 76897694 APPLIED BIOLOGICAL SCIENCES Downloaded by guest on February 14, 2021

Upload: others

Post on 04-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Single-molecule analysis reveals widespread structural ... · vs. normal sample; low CN (blue), normal CN (gray), and high CN (red) states are shown. Ring E: Somatic CN changes from

Single-molecule analysis reveals widespread structuralvariation in multiple myelomaAditya Guptaa,b,c, Michael Placeb,c, Steven Goldsteinb,c, Deepayan Sarkard, Shiguo Zhoub,c, Konstantinos Potamousisb,c,Jaehyup Kimc,e, Claire Flanaganc,e, Yang Lif, Michael A. Newtonc,g, Natalie S. Callanderc,e, Peiman Hemattic,e,Emery H. Bresnickc,h, Jian Maf, Fotis Asimakopoulosc,e, and David C. Schwartza,b,c,1

aBiophysics Graduate Program, University of Wisconsin–Madison, Madison, WI 53706; bLaboratory for Molecular and Computational Genomics, Departmentof Chemistry, Laboratory of Genetics and Biotechnology Center, University of Wisconsin–Madison, Madison, WI 53706; cUniversity of Wisconsin CarboneCancer Center, Madison, WI 53705; dIndian Statistical Institute, New Delhi 110016, India; eDepartment of Medicine, School of Medicine and PublicHealth, University of Wisconsin–Madison, Madison, WI 53705; fInstitute for Genomic Biology, University of Illinois at Urbana–Champaign, Urbana,IL 61801; gDepartment of Statistics, University of Wisconsin–Madison, Madison, WI 53706; and hDepartment of Cell and Regenerative Biology, Universityof Wisconsin–Madison Blood Research Program, Madison, WI 53705

Edited* by David E. Housman, Massachusetts Institute of Technology, Cambridge, MA, and approved May 6, 2015 (received for review September 26, 2014)

Multiple myeloma (MM), a malignancy of plasma cells, is charac-terized by widespread genomic heterogeneity and, consequently,differences in disease progression and drug response. Althoughrecent large-scale sequencing studies have greatly improved ourunderstanding of MM genomes, our knowledge about genomicstructural variation in MM is attenuated due to the limitations ofcommonly used sequencing approaches. In this study, we presentthe application of optical mapping, a single-molecule, whole-genomeanalysis system, to discover new structural variants in a primary MMgenome. Through our analysis, we have identified and characterizedwidespread structural variation in this tumor genome. Additionally, wedescribe our efforts toward comprehensive characterization of ge-nome structure and variation by integrating our findings from opti-cal mappingwith those fromDNA sequencing-based genomic analysis.Finally, by studying this MM genome at two time points during tumorprogression, we have demonstrated an increase in mutational burdenwith tumor progression at all length scales of variation.

structural variation | copy number | multiple myeloma | optical mapping |DNA sequencing

Multiple myeloma (MM) is the malignancy of B lymphocytesthat terminally differentiate into long-lived, antibody-pro-

ducing plasma cells. Like other cancers, it is characterized by manygenomic aberrations, including single nucleotide variants (SNVs)(1, 2), translocations (most notably involving the Ig heavy chainlocus on chr14), and copy number changes, including aneuploidy(3). Recent large-scale sequencing studies have described wide-spread inter- and intra-tumor genomic heterogeneity (1, 2), clonalevolution (4, 5) and clonal tides (4) in MM. However, most of thiswork focuses on point mutations and large-scale copy numberchanges. Although the role of structural variation in normal hu-man genome polymorphism (6, 7) and diseases (8) is widely ap-preciated, a comprehensive analysis of structural variation in MMis yet to be reported.The therapeutic landscape for MM over the past decade has

been transformed with the introduction of proteasome inhibitors(bortezomib, carfilzomib) and thalidomide analogs (9, 10). Con-sequently, patient survival rates have vastly improved (11). How-ever, MM remains an incurable cancer, and almost all patientswith symptomatic MM die of their disease because acquired drugresistance limits the efficacy of current therapies and shortensoverall survival (12). Therefore, understanding the impact of con-temporary treatments on MM genomic selection may providefundamental insights for preventing and/or circumventing drugresistance through judicious use of existing therapies and/or ra-tional design of novel agents.To address these issues, we have used optical mapping (7, 13–19)

and DNA sequencing to comprehensively characterize structuralvariation in a primary MM genome at two stages of tumor pro-gression and drug response. The two stages represent a sensitive

relapse (MM-S; patient responded to subsequent treatments) and asubsequent refractory relapse (MM-R; no response to any treat-ments) (SI Materials and Methods and Fig. 1). Optical mapping is asingle-molecule system that constructs large datasets comprisingordered restriction maps (Rmaps; 1 Rmap is a restriction map of asingle DNA molecule) from individual genomic DNA molecules(Fig. S1). These datasets are submitted to a computational pipelinepowered by cluster computing for genome assembly (15) and dis-covery of structural variants (7, 14, 16, 19). The final assembly pre-sents a relatively unbiased, long-range view of the genome, free ofamplification and cloning artifacts, which supports the identificationof structural variants and large-scale copy number changes.Previously, optical mapping has been used to uncover structuralvariation in normal (7), disease risk (17), and cancerous (18) humangenomes. Here, we connect long-range structural variation findingsfrom optical mapping with results from whole genome DNA se-quencing data analysis (Fig. 1). Such analysis has enabled us tocomprehensively identify somatic variation in these tumor samplesacross all length scales, including structural, copy number, and singlenucleotide variation. Additionally, by analyzing these tumor samplesat two time points during tumor progression, we have highlighted anincrease in mutational burden with tumor progression.

Significance

In the last several years, we have seen significant progress to-ward personalized cancer genomics and therapy. Although weroutinely discern and understand genomic variation at singlebase pair and chromosomal levels, comprehensive analysis ofgenome variation, particularly structural variation, remains achallenge. We present an integrated approach using opticalmapping—a single-molecule, whole-genome analysis system—

and DNA sequencing to comprehensively identify genomic struc-tural variation in sequential samples from a multiple myelomapatient. Through our analysis, we have identified widespreadstructural variation and an increase in mutational burden withtumor progression. Our findings highlight the need to routinelyincorporate structural variation analysis at many length scalesto understand cancer genomes more comprehensively.

Author contributions: A.G., F.A., and D.C.S. designed research; A.G., M.P., and S.G. per-formed research; A.G., M.P., S.G., D.S., S.Z., K.P., J.K., M.A.N., N.S.C., P.H., F.A., and D.C.S.contributed new reagents/analytic tools; A.G., M.P., S.G., C.F., Y.L., E.H.B., J.M., F.A., andD.C.S. analyzed data; and A.G., N.S.C., F.A., and D.C.S. wrote the paper.

The authors declare no conflict of interest.

*This Direct Submission article had a prearranged editor.

Freely available online through the PNAS open access option.

Data deposition: The sequences reported in this paper have been deposited in the Se-quence Read Archive (SRA), www.ncbi.nlm.nih.gov/sra (accession no. SRP058274).1To whom correspondence should be addressed. Email: [email protected].

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1418577112/-/DCSupplemental.

www.pnas.org/cgi/doi/10.1073/pnas.1418577112 PNAS | June 23, 2015 | vol. 112 | no. 25 | 7689–7694

APP

LIED

BIOLO

GICAL

SCIENCE

S

Dow

nloa

ded

by g

uest

on

Feb

ruar

y 14

, 202

1

Page 2: Single-molecule analysis reveals widespread structural ... · vs. normal sample; low CN (blue), normal CN (gray), and high CN (red) states are shown. Ring E: Somatic CN changes from

ResultsRmap Alignments Reveal Widespread Copy Number Changes in theMM Genome. In any region of the genome, the total number ofaligned Rmaps (depth of coverage) serves as an indicator of copynumber. For somatic copy number analysis using optical map-ping data, we compared the depth of coverage of both tumorsamples (MM-S andMM-R) to a reference dataset (normal) usinga hidden Markov model-based coverage analysis algorithm (18,19). As a result, the tumor genomes were partitioned into low,normal, and high copy number states. This analysis is analogousto traditional hybridization or sequencing-based copy numberanalysis (20) because alignment of Rmaps (300–2,500 kb in length)

is scored in place of probes or short sequence reads. On comparingMM-R with paired normal sample, we found widespread genomicgains and losses that spanned close to one-third of the referencegenome and were generally associated with chromosomal ends(Fig. 2, ring D). A comparison of optical mapping-based copynumber analysis with DNA sequencing-based copy number ana-lysis (21) revealed that, for events greater than 500 kb, 97% of thegenome was assigned concordant copy number state by bothmethods (Fig. 2, rings D and E). Finally, copy number changesspanning ∼172 Mb, relative to the reference genome, were ob-served only in the MM-R sample and not in the MM-S sample,indicating an increase in copy number changes with tumor pro-gression (Fig. 2, sectors highlighted in yellow background color).

OpticalMapping

DNASequencing

StructuralVariants

Copy NumberAnalysis

StructuralVariants

SNVs/IndelsGATK/Strelka

NORMAL MM-S(Tumor 1)

MM-R(Tumor 2)

Deletions Insertions Translocations,Inversions, Others

Read PairBreakDancer

Read DepthCNVnator/FREEC

Split ReadPindel

Fig. 1. Overview of cancer genome analysis pipeline comprising optical mapping and DNA sequencing data. Red text indicates that the method identifiessomatic variation directly by comparing the tumor to the normal sample. Colored outlines highlight different variation types analyzed by integrating data fromboth approaches; for example, deletions from optical mapping (blue outline) were analyzed along with deletions from BreakDancer, Pindel, and CNVnator.

chr1

0 20 40 60 80 100

120

140

160

180

200

220

240

chr2

020

4060

80

100

120

140

160

180

200

220

240

chr3

020

40

60

80

100

120

140

160

180

chr4

020406080100

120140

160180

chr5

020

4060

80100

120140160180

chr6

020

406080100120140160

chr7 020406080100

120

140

chr8

020 40 60 08

100120140

chr9

02040608010012

0140

chr10

020

40

60

8010012

0

chr11

0

2040

6080

100120

chr12

020

4060

80100

120

chr13

020406080

100

chr1

4

0

20

40

60

80

100

chr1

5

0

20

40

60

80

100

chr1

6

0

20

40

60

80 chr170

2040

6080

chr18020

4060

chr19

020

40

chr20

020

4060

chr210

2040

chr22

02040

D

E

F

AB

C Fig. 2. Circos plot of genomic variation in MM.Tracks are as follows: The outer ring representsreference chromosomes 1 through 22 in clock-wise orientation (chr8 reversed; chrs X and Yexcluded for clarity; numbers on the ring rep-resent chromosomal position in Mb). Ring A:Green bars show somatic deletions sharedbetween MM-S and MM-R samples; brownbars show somatic deletions additionally ac-quired in MM-R sample. Ring B: Green dotsshow nonsynonymous SNVs shared betweenMM-S and MM-R samples; brown dots shownonsynonymous SNVs additionally acquired inMM-R sample. Ring C: Density plot of loci wherenormal sample is heterozygous, whereas theMM-R tumor sample shows loss of heterozygos-ity; light green bars highlight genomic regionswith copy number neutral loss of heterozygosity.Ring D: Somatic copy number (CN) changesidentified from Rmap coverage analysis of MM-Rvs. normal sample; low CN (blue), normal CN(gray), and high CN (red) states are shown. RingE: Somatic CN changes from DNA sequencingdata analysis of MM-R vs. normal sample; CN loss(blue), normal CN (gray), and CN gain (red) re-gions are shown. Ring F: Links represent struc-tural rearrangements that coincide with and leadto CN changes identified from Rmap assemblyand/or DNA sequencing data. Gray links connecttranslocation breakpoints; red link represents thet(11,14) translocation; blue links represent de-letions; green links represent inversions. Yellowhighlights (background; center to outside the cir-cle) highlight CN changes that are observed onlyin the MM-R, but not the MM-S sample.

7690 | www.pnas.org/cgi/doi/10.1073/pnas.1418577112 Gupta et al.

Dow

nloa

ded

by g

uest

on

Feb

ruar

y 14

, 202

1

Page 3: Single-molecule analysis reveals widespread structural ... · vs. normal sample; low CN (blue), normal CN (gray), and high CN (red) states are shown. Ring E: Somatic CN changes from

Rmap Assemblies Reconstruct the MM Genome and CharacterizeStructural Rearrangements. Consensus optical maps, constructedfrom iterative assembly of Rmaps, generate a genome-wide scaf-fold providing nearly telomere-to-telomere information about thegenome under study. We analyzed chimeric consensus maps,which are formed as a result of interchromosomal rearrangementsor intrachromosomal rearrangements separated by at least 300 kb,and found that the location of many chimeric consensus mapscoincided with copy number breakpoints (seen in Fig. 2, rings Dand E). Using this analysis and integrating it with DNA sequenc-ing-based structural variation analysis, we characterized genomicrearrangements at 31 out of 37 copy number breakpoints observedin the MM-R genome to base pair resolution. These rearrange-ments have been summarized in Table 1 and include unbalancedtranslocations, interstitial deletions, chromosomal truncations,tandem duplications, and more complex rearrangements (Fig. 2,ring F). By combining individual events, we pieced together thestructure of many chromosomes, effectively generating a karyo-typic representation of these chromosomes. Here, we describe thestructure of two chromosomes, chr2 and chr5, in more detail.Detailed structure of chr2. Chr2 presents with two regions of copynumber loss, a 23 Mb region spanning from 138.85 Mb to 162.03 Mband another 35 Mb region spanning from 208.07 to 243.19 Mb(Fig. 2). Consensus maps revealed an interstitial deletion thatexplains the loss of 138.85–162.03 Mb (Fig. 3A). Additionally, weidentified an unbalanced translocation between chr2 and chr10,t(2;10)(q33.3;p12.2), which explains loss of the 35 Mb region onchr 2 (208.07–243.19 Mb) and amplification of the 23 Mb regionon chr10 (23.2 Mb to start) (Fig. 3B).Detailed structure of chr5. Chr5 presents with two regions of copynumber loss, 70.74–78.41 Mb and 96.63–134.01 Mb (Fig. 2). FromRmap assembly, we observed a chromosomal end at 70.74 Mb,indicating that the region after this breakpoint was lost viatruncation in one copy (Fig. 4A). Furthermore, we identifiedtandem duplication of an 18 Mb long region spanning 78.41 Mbto 96.63 Mb, which explains copy number neutral loss of het-erozygosity in this region (Fig. 4B). We were not able to explainnormal copy number for the region spanning 134.01 Mb to theend of chr5. Finally, we observed an unbalanced translocation

between chr5 and chr11, t(5;11)(q35.3;q23.3), which is associatedwith loss of ∼900 kb at q-ter of chr5 (180.03–180.91 Mb) andamplification of ∼15.5 Mb from q-ter of chr11 (119.45–135.00 Mb)(Fig. 4C). It is interesting to note that the 70.74 Mb truncationbreakpoint overlaps a segmental duplication and that the 96.63 Mbtandem duplication breakpoint overlaps a LINE repeat, therebyobscuring these regions to full analysis using DNA sequencing dataalone. We were able to resolve and understand the highly rear-ranged structure of chr5 by integrating information from Rmapassembly, copy number analysis, and DNA sequencing data.

Structural Rearrangements Explain Many Canonical and NoncanonicalMM Copy Number Changes. The copy number changes observed inthese tumor samples represent canonical and noncanonical MMgenomic losses/gains and examples of some of these changes,drawn from Fig. 2, are described here along with underlyingstructural rearrangements.Concerning 1p loss, we identified a 72 Mb long deletion, del

(1)(p34.1;p13.1), on the short arm of chr1. The other chr1 allelepresented a 112 kb deletion at 1p32.3, encompassing the genesFAF1 (FAS [TNFRSF6]-associated factor 1) and CDKN2C (Fig.S2), leading to homozygous loss of FAF1 and CDKN2C. Thedeletion of negative cell cycle regulator CDKN2C is known to bea key early aberration in MM (22). Concerning chr11 and chr14,the t(11;14)(q13;q32) translocation involves the Ig heavy chainlocus at 14q32 and the cyclin CCND1 at 11q13, and leads tooverexpression of CCND1. The chr11 breakpoint is known tovary from 1 kb to 1 Mb upstream of CCND1 (23). We identifiedt(11;14) in both tumor samples (Fig. S3). Consistent with previousreports, the breakpoint on chr11 is ∼2 kb upstream of CCND1.Concerning chr13, we identified loss of one copy of chr13 fromRmap alignment based coverage analysis. Concerning chr17, de-letions involving TP53 locus occur in ∼10% of untreated MMpatients (24). We observed an ∼12 Mb deletion in the 17p region,which includes the gene TP53. This deletion forms part of acomplex genomic rearrangement, where two genomic segments(2 Mb to 13.89 Mb and 15.89 Mb to 16.23 Mb) are deleted and theintervening region (13.89 Mb to 15.89 Mb) is inserted in aninverted orientation (Fig. S4). Concerning chr14, we identified a

Table 1. Structural variants that underlie large-scale copy number changes in MM-R sample

Chr1 Loc1 Gene_Loc1 Chr2 Loc2 Gene_Loc2 Event

Chr1 46,360,804 MAST2 Chr1 117,273,548 DeletionChr1 160,523,168 CD84 Chr6 51,118,469 TranslocationChr1 180,927,627 Chr9 131,339,039 SPTAN1 TranslocationChr2 138,855,727 Chr2 162,035,600 TANK DeletionChr2 208,070,175 Chr10 23,276,740 ARMC3 TranslocationChr4 170,232,951 Chr4 182,336,593 DeletionChr5 70,740,000 TruncationChr5 78,413,500 BHMT Chr5 96,631,200 Tandem DuplicationChr5 134,007,729 SEC24A Chr7 66,958,972 TranslocationChr5 180,039,850 FLT4 Chr11 119,453,437 TranslocationChr7 20,878,695 Chr16 78,290,600 WWOX TranslocationChr7 46,962,661 Chr16 5,998,555 TranslocationChr7 67,099,767 Chr7 67,262,743 Inverse DuplicationChr7 120,756,417 CPED1 Chr12 20,883,779 SLCO1C1 TranslocationChr8 36,284,561 Chr9 32,013,790 TranslocationChr8 39,393,625 Chr8 41,441,031 AGPAT6 DeletionChr11 69,453,552 Chr14 106,107,045 TranslocationChr14 39,296,961 LINC00639 Chr21 29,006,964 MIR5009 TranslocationChr14 48,211,973 Chr14 52,920,149 TXNDC16 InversionChr15 30,873,871 ULK4P1, ULK4P2 Chr15 31,189,559 DeletionChr17 2,023,044 SMG6 Chr17 13,892,185 DeletionChr17 13,892,185 Chr17 15,895,185 ZSWIM7 InversionChr17 15,895,213 ZSWIM7 Chr17 16,236,159 DeletionChr17 44,041,355 MAPT Chr19 58,412,876 TranslocationChr18 52,209,201 Chr9 Past End Translocation

Gupta et al. PNAS | June 23, 2015 | vol. 112 | no. 25 | 7691

APP

LIED

BIOLO

GICAL

SCIENCE

S

Dow

nloa

ded

by g

uest

on

Feb

ruar

y 14

, 202

1

Page 4: Single-molecule analysis reveals widespread structural ... · vs. normal sample; low CN (blue), normal CN (gray), and high CN (red) states are shown. Ring E: Somatic CN changes from

4.71 Mb long inversion at the 14q21.3-q22.1 locus (Fig. S5). Otherrearrangements, for example, a translocation between chr17 andchr19, t(17;19)(q21.31;19q13.43), led to amplification of ∼37 Mbat q-ter of chr17 and loss of ∼1Mb at q-ter of chr19 (Fig. S6). Inanother event, an ∼26 Mb long region from q-ter of chr18 wasfound to be amplified in a fusion at the end of chr9 (Fig. S7).

Analysis of Deletions from Optical Mapping and DNA Sequencing.Using an automated pipeline that identifies structural variants,the consensus maps from Rmap assemblies were compared viaalignment to in silico restriction maps generated from the humanreference sequence. Such a comparison identifies intrachromosomalstructural variants like deletions and insertions, ranging in size from3 kb to ∼300 kb.Using this pipeline, we identified 139, 149, and 176 deletions in

normal, MM-S, and MM-R samples, respectively, ranging from∼3 kb to 180 kb in size, with mean size ∼10 kb and median size ∼6 kb

(Fig. S8A and Dataset S1, Table S1). We compared the deletionsfrom optical mapping to those from DNA sequencing data ana-lyzed using read-pair–based (25), split-read–based (26), and read-depth–based (27) approaches. Close to 80% of the optical map-ping deletions were validated by one or more sequencing-basedmethods (Fig. S8C). We further analyzed the deletions identifiedonly by optical mapping in MM-R sample and found that a ma-jority of them overlapped with segmental duplications (13/38) orrepeat DNA sequences (11/38). Overall, this analysis points tohigh accuracy of deletion calling and the ability to identify varia-tion in duplication/repeat rich regions using optical mapping.Previous analysis of normal human samples using optical

mapping has revealed a large number of structural polymorphismsin normal genomes (7). Accordingly, most of these deletions, asevidenced by 139 deletions observed in the normal sample, aregermline and need to be filtered out to identify somatic deletions.Using a combination of optical mapping and DNA sequencing

Normal chr2

Tumor chr2 138 Mb 162 Mb 208 Mb

A del(2)(q22.1;q24.2)

chr10: 23.2 Mb - Start

chr2: 138.738-138.844 Mb(~106 kb)

chr2: 162.025-162.162 Mb(~137 kb)

1.25 Mb4.25 Mb 1.40 Mb4.1 Mbchr2: 207.958-208.068 Mb

(~110 kb)chr10: 23.282-23.150 Mb

(~131 kb)

10 µm10 µm

B t(2;10)(q33.3;p12.2)

Fig. 3. Genomic reconstruction of chr2 in MM-R sample. Chr2 presents with an interstitial deletion [138.85–162.03 Mb; del(2)(q22.1;q24.2)] and an un-balanced translocation with chr10 [208.07 Mb; t(2,10)(q33.3;p12.2)] in MM-R sample. Chimeric consensus maps, constructed from Rmap assembly, revealbreakpoints (red lines) and span ∼5.75 Mb at each breakpoint to elucidate long-range genomic structure at the deletion (A) and the translocation (B) events.The length of each restriction fragment is proportional to its size (in kb). Spanning each breakpoint, fluorescence micrographs of six BamHI digested DNAmolecules are shown; each DNA molecule comprises a series of daughter restriction fragments, which appear as dashed white lines. These single-moleculeRmaps, along with others, were assembled to generate the consensus maps.

Normal

Tumor

OM

chr5: 96.0-96.8 Mbchr5: 78.15-79 Mb5q14.1 5q15

79-96 Mb

OM

CN 2 CN 3

CN 1CN 2

11q23.3

5q35.3chr5 (179.47-180.48 Mb)

chr11 (118.83-119.84 Mb)

5q13.2

chr5: 70.5-71.0 Mb

OM

A Truncation

B Tandem duplication

C Translocation t(5;11)(q35.3;q23.3)

Fig. 4. Genomic reconstruction of chr5 in MM-Rsample. Chr5 presents with a truncation (A), a tan-dem duplication (B), and an unbalanced trans-location with chr11 (C ) in the MM-R sample.(A) Alignment of consensus map (OM) to in silicoreference map (chr5, 70.5–71 Mb) is shown, and in-dicates truncation. The reference map is annotatedwith chromosomal band, RefSeq genes, and seg-mental duplications (top to bottom) from the Uni-versity of California, Santa Cruz (UCSC) genomebrowser. (B) Alignment of consensus map (OM) to insilico reference maps (chr5, 78.15–79 Mb, red out-line; and chr5, 96.0–96.8 Mb, dark gray outline) isshown, highlighting tandem duplication of 78.4 to96.6 Mb region. (C) Alignment of consensus map(OM) to in silico reference maps (chr5, 179.47–180.48 Mb; and chr11, 118.83–119.84 Mb) is pre-sented. The topmost and bottommost tracks anno-tate reference maps from chr5 and chr11 with copynumber profiles obtained from DNA sequencingdata analysis.

7692 | www.pnas.org/cgi/doi/10.1073/pnas.1418577112 Gupta et al.

Dow

nloa

ded

by g

uest

on

Feb

ruar

y 14

, 202

1

Page 5: Single-molecule analysis reveals widespread structural ... · vs. normal sample; low CN (blue), normal CN (gray), and high CN (red) states are shown. Ring E: Somatic CN changes from

analysis, we identified many somatic deletions that we divided intotwo categories: shared between MM-S andMM-R samples (Fig. 2,ring A, green bars) and acquired additionally in the MM-R sample(Fig. 2, ring A, brown bars). Based on the approaches used, wepartitioned these deletions into two size ranges: 10–400 bp and>400 bp (Dataset S1, Table S2).Somatic deletions shared between MM-S and MM-R samples. We iden-tified 38 somatic deletions larger than 400 bp, which range in sizefrom 497 bp to 192 kb (Dataset S1, Table S3A). Of these findings,10 deletions overlap exons. Furthermore, we identified 6 smalldeletions (10–400 bp) that overlap exons (Dataset S1, Table S3B).Among these deletions, we noticed a 73 bp deletion associated witha TP53 exon. As discussed before, MM-S and MM-R samples alsoshow del(17p), which indicates that both copies of TP53 areinactivated.Somatic deletions acquired additionally in MM-R sample. We identifiedanother 27 somatic deletions larger than 400 bp, which range insize from 445 bp to 290 kb and were additionally acquired in theMM-R sample (Dataset S1, Table S4A). Of these 27, 7 deletionsoverlap exons. Furthermore, we identified 2 small deletions (10–400 bp) that overlap exons (Dataset S1, Table S4B). Similarlyto large-scale copy number changes, an increasing number ofsomatic deletions highlights increasing mutational burden withtumor progression.

Analysis of Insertions from Optical Mapping and DNA Sequencing.Using the automated variation calling pipeline, as described abovefor deletions, we identified 450, 428, and 384 insertions in normal,MM-S, and MM-R samples, respectively, ranging from ∼3 kb to367.5 kb in size, with mean size ∼10 kb and median size ∼5 kb(Fig. S8B and Dataset S1, Table S1). Although deletions are rel-atively straightforward to identify using sequencing-based ap-proaches, identifying insertions can be problematic. We comparedthe insertions from optical mapping to insertions identified usingDNA sequencing data analysis approaches Pindel (26) andCNVnator (27). Pindel generates insertion breakpoints with noinformation about the length or structure of the insertions whereasCNVnator generates a list of duplications with no structural orcontextual information. As expected, a much smaller percentage(24–40%) of optical mapping insertions had an overlapping Pin-del, or CNVnator call (Fig. S8D). Finally, we did not discover anysomatic insertions from our analysis.

SNP/SNV Analysis from DNA Sequencing Data.Commonly associated SNPs. Of the eight inherited susceptibilitymarkers that have been identified by genome-wide associationstudies (GWAS) in MM (28–30), we found seven in our tumorsamples (Dataset S1, Table S5). Among these markers is an SNPin gene CCND1 (c.870G > A, rs9344) at 11q13.3 that has beenimplicated as a risk factor for t(11;14)(q13;q32) MM (29).SNV analysis reveals increasing mutational burden with tumor progression.We identified 10,224 and 13,511 SNVs in MM-S and MM-Rtumor samples, respectively, compared to the normal sample.This analysis yielded tumor-specific point mutation rates of 3.53and 4.52 per million bases for MM-S and MM-R samples, re-spectively, indicating an increase in mutational burden with tumorprogression. Although a bit higher than the average tumor-specificpoint mutation rate from the Multiple Myeloma ResearchConsortium (MMRC) cohort (2.9 per million bases) (1), it can beexplained by higher sequencing depth or increased incidenceof somatic mutations in our samples. No SNVs were sharedbetween our work, the MMRC cohort (1), and a previous study byEgan et al. (4).Functional annotation of SNVs and further analysis of non-

synonymous SNVs revealed that 60 such SNVs were shared be-tween both tumor samples (Fig. 2, ring B, green dots). The MM-Rsample acquired an additional 41 SNVs (Fig. 2, ring B, browndots), corroborating the increase in mutational burden with tu-mor progression (Dataset S1, Tables S6 and S7). Consistent withprevious findings from primary MM samples (31), we did notfind any PSMB5 proteasomal mutations.

SNP analysis identifies regions with copy number neutral loss of heterozygosity.From SNP/SNV analysis, we identified genomic loci that werecharacterized as heterozygous in the normal sample but homo-zygous in the tumor samples, thereby reflecting loss of heterozy-gosity. Upon plotting the density of such loci across the genome(Fig. 2, ring C), we observed, expectedly, that many regions withincreased density overlapped with copy number losses. However,four regions spanning a total of 215 Mb on q arms of chr1, chr5,and chr14 presented with copy number neutral loss of heterozy-gosity in the MM-R sample (Fig. 2, ring C, light green bars).

DiscussionComparison with Existing Genomic Methods. Traditionally, chro-mosomal karyotyping, fluorescence in situ hybridization (FISH),and, more recently, hybridization and sequencing based tech-nologies have been used to study clinical MM samples. However,they have certain limitations. FISH can address only known lo-cations of structural variation and thus precludes comprehensivediscovery. Karyotyping, on the other hand, has limited resolutionand requires actively dividing cells, which is generally an issuefor MM tumor cells because they are known to have a lowproliferative index (32). Genome-wide hybridization technolo-gies do not directly provide structural information about ge-nome structure whereas sequencing-based technologies areeither limited by cost to exome analysis or have generally beenused to study the patterns of single nucleotide variation (1, 2).With optical mapping, we use primary tumor samples andtherefore are able to identify structural variation representa-tive of underlying cell population free of hybridization, librarycreation, or amplification artifacts. Also, the resolution of struc-tural variants (∼3 kb) effectively generates a high resolutiongenomic karyotype.

Single Nucleotide vs. Structural Variation. Our work did not findany common SNVs shared between this study, a previous study(4), and the MMRC cohort (1). The lack of common SNVs andwidespread distribution of structural variation indicate that struc-tural variation might play a larger than appreciated role in explainingMM pathogenesis/drug resistance and warrants the need for apopulation study of structural variation in MM and other cancers.

Implications for MM Biology. The analysis presented here, althoughlimited to a single individual, offers unique biological insights thatmay translate into novel therapeutic strategies. Individual genesmutated in both MM-S and MM-R samples may represent clonaldrivers associated with core mechanisms of MM oncogenesisand/or acquisition of drug resistance. These events constitute le-gitimate therapeutic targets, particularly in light of recent datademonstrating the limits, and potential dangers, associated withtargeting subclonal mutations (e.g., mutant BRAF) (2). Severalevents have been established previously as important progressionfactors in myeloma pathogenesis (e.g., TP53 and CDKN2C loss)(22, 24). PIK3R1 aberrations underscore the importance of thePI3K pathway activation in MM (33). IGF2BP2 mutations affectinsulin growth factor-2 (IGF-2) translation and thus growth con-trol through an IGFR-controlled pathway that is currently thefocus of intense preclinical development in myeloma (34). Othergenes are less well studied in MM and may provide novel potentialtargets or regulatory pathways. ELL is an essential cofactor of thesuperelongation complex (SEC), a central node of transcriptionalelongation checkpoint control and a key mutational target inmyeloid and mixed-lineage leukemias (35). TCL1 is a regulator ofapoptosis implicated in B-chronic lymphocytic leukemia as well asT-cell lymphomas (36). ASXL3 belongs to a family of epigeneticregulators whose genetic loss has been implicated in the myelo-dysplastic syndromes and myeloid leukemias (37). CAMK2D isa subunit of calmodulin-dependent kinase II, an essentialregulator of Ca2+-dependent signal transduction. Interestingly,prior studies have implicated calmodulin-dependent pathwaysin proteasome inhibitor-resistant, constitutive NFκB activity inMM (38, 39). Our analysis has therefore pinpointed potential

Gupta et al. PNAS | June 23, 2015 | vol. 112 | no. 25 | 7693

APP

LIED

BIOLO

GICAL

SCIENCE

S

Dow

nloa

ded

by g

uest

on

Feb

ruar

y 14

, 202

1

Page 6: Single-molecule analysis reveals widespread structural ... · vs. normal sample; low CN (blue), normal CN (gray), and high CN (red) states are shown. Ring E: Somatic CN changes from

targets that merit validation in larger cohorts of MM patients be-fore functional experimentation using appropriate model systems.Intriguingly, genes uniquely affected in the MM-R sample

encode cell cycle regulators (CCNG2) and mitotic checkpointgenes (ZWILCH), as well as a transcription factor (MYBL1)expressed specifically in centroblasts (40), a putative preplasma-blastic cell-of-origin for MM within the germinal center reac-tion (24). These findings raise the possibility that diseaseprogression and/or acquisition of drug resistance in MM may beassociated with plasmacytic maturation arrest or dedifferentiationto earlier stages of B-cell ontogenesis. Thus, effective managementof end-stage MM may necessitate approaches that promotemitotic quiescence or cell cycle exit and terminal plasmacyticdifferentiation.

ConclusionsUsing optical mapping and DNA sequencing, we have charac-terized genomic variation in sequential samples obtained froman MM patient with progressive disease. Although these plat-forms are revealing genome analysis systems on their own, theirdiscernment of genomic variation is complementary. Combiningthe unique advantages of these systems has potentiated thecomprehensive understanding of genomic structure in this tumorgenome and has revealed widespread variation—across the en-tire length spectrum of variation.

Materials and MethodsCase History, Study Design, Data Acquisition, and Rmap Construction andAssembly. This study was approved by the Institutional Review Board atthe University of Wisconsin–Madison in accordance with the Declaration of

Helsinki. DNA samples were prepared from purified CD138(+) plasma cells(MM-S and MM-R sample) and paired cultured stromal cells (normal) from a58-y-old male MM patient with International Staging System (ISS) Stage IIIbdisease, who had been treated with combinations of bortezomib, dexa-methasone, lenalidomide, cyclophosphamide, and tandem autologous stemcell transplants at different stages of tumor progression (SI Materials andMethods). Large Rmap datasets, comprising ∼2 million Rmaps for each sample(normal, MM-S, MM-R), were constructed using BamHI restriction endonucle-ase. Rmap datasets thus obtained were submitted to our iterative assemblypipeline (7, 14–16), which constructs contigs and associated consensus mapsfrom alignment and assembly of individual Rmaps and is used to identifystructural and copy number variation. An overview of Rmap collection andassembly output is provided in Dataset S1, Table S8. The contig assembliesindicate almost complete (>99.5%; merged contig average size is >31 Mb)coverage of sequence scaffolds from human reference sequence [NationalCenter for Biotechnology Information (NCBI) Build 37].

DNA Sequencing.We obtained 100×2 bp Illumina paired end sequencing datawith properly paired mean coverage depth of 59×, 68×, and 92× for normal,MM-S, and MM-R samples, respectively, with an insert size around 250 bp(Dataset S1, Table S8).

Please refer to SI Materials and Methods for more information on ma-terials and methods.

ACKNOWLEDGMENTS. We thank members of the Laboratory for Molecularand Computational Genomics for helpful discussions. We also thank KristyKounovsky-Shafer for preparing Fig. S1. This work was supported by grantsfrom the University of Wisconsin Carbone Cancer Center (UWCCC) Pilot Pro-ject (to D.C.S., F.A., and E.H.B.), the UWCCC Trillium Fund for MultipleMyeloma Research, UW PRJ79DG, and National Human Genome ResearchInstitute Grant R01HG000225 (to D.C.S.).

1. Chapman MA, et al. (2011) Initial genome sequencing and analysis of multiple mye-loma. Nature 471(7339):467–472.

2. Lohr JG, et al.; Multiple Myeloma Research Consortium (2014) Widespread geneticheterogeneity in multiple myeloma: Implications for targeted therapy. Cancer Cell25(1):91–101.

3. Walker BA, et al. (2010) A compendium of myeloma-associated chromosomal copynumber abnormalities and their prognostic value. Blood 116(15):e56–e65.

4. Egan JB, et al. (2012) Whole-genome sequencing of multiple myeloma from diagnosisto plasma cell leukemia reveals genomic initiating events, evolution, and clonal tides.Blood 120(5):1060–1066.

5. Bolli N, et al. (2014) Heterogeneity of genomic evolution and mutational profiles inmultiple myeloma. Nat Commun 5:2997.

6. Korbel JO, et al. (2007) Paired-end mapping reveals extensive structural variation inthe human genome. Science 318(5849):420–426.

7. Teague B, et al. (2010) High-resolution human genome structure by single-moleculeanalysis. Proc Natl Acad Sci USA 107(24):10848–10853.

8. Zhang F, Gu W, Hurles ME, Lupski JR (2009) Copy number variation in human health,disease, and evolution. Annu Rev Genomics Hum Genet 10:451–481.

9. Palumbo A, Anderson K (2011) Multiple myeloma. N Engl J Med 364(11):1046–1060.10. Rajkumar SV (2012) Multiple myeloma: 2012 update on diagnosis, risk-stratification,

and management. Am J Hematol 87(1):78–88.11. Kumar SK, et al. (2008) Improved survival in multiple myeloma and the impact of

novel therapies. Blood 111(5):2516–2520.12. Kumar SK, Rajkumar SV (2014) The current status of minimal residual disease as-

sessment in myeloma. Leukemia 28(2):239–240.13. Dimalanta ET, et al. (2004) A microfluidic system for large DNA molecule arrays. Anal

Chem 76(18):5293–5301.14. Valouev A, et al. (2006) Alignment of optical maps. J Comput Biol 13(2):442–462.15. Valouev A, Schwartz DC, Zhou S, Waterman MS (2006) An algorithm for assembly of

ordered restriction maps from single DNA molecules. Proc Natl Acad Sci USA 103(43):15770–15775.

16. Valouev A, Zhang Y, Schwartz DC, Waterman MS (2006) Refinement of optical mapassemblies. Bioinformatics 22(10):1217–1224.

17. Antonacci F, et al. (2010) A large and complex structural polymorphism at 16p12.1underlies microdeletion disease risk. Nat Genet 42(9):745–750.

18. Ray M, et al. (2013) Discovery of structural alterations in solid tumor oligoden-droglioma by single molecule analysis. BMC Genomics 14:505.

19. Sarkar D, Goldstein S, Schwartz DC, Newton MA (2012) Statistical significance ofoptical map alignments. J Comput Biol 19(5):478–492.

20. Alkan C, Coe BP, Eichler EE (2011) Genome structural variation discovery and geno-typing. Nat Rev Genet 12(5):363–376.

21. Boeva V, et al. (2012) Control-FREEC: A tool for assessing copy number and alleliccontent using next-generation sequencing data. Bioinformatics 28(3):423–425.

22. Leone PE, et al. (2008) Deletions of CDKN2C in multiple myeloma: Biological andclinical implications. Clin Cancer Res 14(19):6033–6041.

23. Walker BA, et al. (2013) Characterization of IGH locus breakpoints in multiple mye-

loma indicates a subset of translocations appear to occur in pregerminal center B

cells. Blood 121(17):3413–3419.24. Kuehl WM, Bergsagel PL (2012) Molecular pathogenesis of multiple myeloma and its

premalignant precursor. J Clin Invest 122(10):3456–3463.25. Chen K, et al. (2009) BreakDancer: An algorithm for high-resolution mapping of

genomic structural variation. Nat Methods 6(9):677–681.26. Ye K, Schulz MH, Long Q, Apweiler R, Ning Z (2009) Pindel: A pattern growth ap-

proach to detect break points of large deletions and medium sized insertions from

paired-end short reads. Bioinformatics 25(21):2865–2871.27. Abyzov A, Urban AE, Snyder M, Gerstein M (2011) CNVnator: An approach to dis-

cover, genotype, and characterize typical and atypical CNVs from family and pop-

ulation genome sequencing. Genome Res 21(6):974–984.28. Broderick P, et al. (2012) Common variation at 3p22.1 and 7p15.3 influences multiple

myeloma risk. Nat Genet 44(1):58–61.29. Weinhold N, et al. (2013) The CCND1 c.870G>A polymorphism is a risk factor for

t(11;14)(q13;q32) multiple myeloma. Nat Genet 45(5):522–525.30. Chubb D, et al. (2013) Common variation at 3q26.2, 6p21.33, 17p11.2 and 22q13.1

influences multiple myeloma risk. Nat Genet 45(10):1221–1225.31. Politou M, et al. (2006) No evidence of mutations of the PSMB5 (beta-5 subunit of

proteasome) in a case of myeloma with clinical resistance to Bortezomib. Leuk Res

30(2):240–241.32. Drewinko B, Alexanian R, Boyer H, Barlogie B, Rubinow SI (1981) The growth fraction

of human myeloma cells. Blood 57(2):333–338.33. Harvey RD, Lonial S (2007) PI3 kinase/AKT pathway as a therapeutic target in multiple

myeloma. Future Oncol 3(6):639–647.34. Menu E, van Valckenborgh E, van Camp B, Vanderkerken K (2009) The role of the

insulin-like growth factor 1 receptor axis in multiple myeloma. Arch Physiol Biochem

115(2):49–57.35. Smith E, Lin C, Shilatifard A (2011) The super elongation complex (SEC) and MLL in

development and disease. Genes Dev 25(7):661–672.36. Pekarsky Y, Zanesi N, Aqeilan R, Croce CM (2004) Tcl1 as a model for lymphoma-

genesis. Hematol Oncol Clin North Am 18(4):863–879.37. Shih AH, Abdel-Wahab O, Patel JP, Levine RL (2012) The role of mutations in epige-

netic regulators in myeloid malignancies. Nat Rev Cancer 12(9):599–612.38. Markovina S, et al. (2010) Bone marrow stromal cells from multiple myeloma patients

uniquely induce bortezomib resistant NF-kappaB activity in myeloma cells. Mol

Cancer 9:176.39. Berchtold CM, Chen K-S, Miyamoto S, Gould MN (2005) Perillyl alcohol inhibits a

calcium-dependent constitutive nuclear factor-kappaB pathway. Cancer Res 65(18):

8558–8566.40. Golay J, et al. (1998) The A-Myb transcription factor is a marker of centroblasts in vivo.

J Immunol 160(6):2786–2793.

7694 | www.pnas.org/cgi/doi/10.1073/pnas.1418577112 Gupta et al.

Dow

nloa

ded

by g

uest

on

Feb

ruar

y 14

, 202

1