supplementary note the genomic landscape of … · as the following results are based on those...

36
1 SUPPLEMENTARY NOTE THE GENOMIC LANDSCAPE OF PEDIATRIC AND YOUNG ADULT T-LINEAGE ACUTE LYMPHOBLASTIC LEUKEMIA Yu Liu 1 , John Easton 1 , Ying Shao 1,2 , Jamie Maciaszek 3 , Zhaoming Wang 1 , Mark R. Wilkinson 2 , Kelly McCastlain 2 , Michael Edmonson 1 , Stanley B. Pounds 4 , Lei Shi 4 , Xin Zhou 1 , Xiaotu Ma 1 , Edgar Sioson 1 , Yongjin Li 1 , Michael Rusch 1 , Pankaj Gupta 1 , Deqing Pei 4 , Cheng Cheng 4 , Malcolm A. Smith 5 , Jaime Guidry Auvil 6 , Daniela S. Gerhard 6 , Mary V. Relling 7 , Naomi J. Winick 8 , Andrew J. Carroll 9 , Nyla A. Heerema 10 , Elizabeth Raetz 11 , Meenakshi Devidas 12 , Cheryl L. Willman 13 , Richard C. Harvey 13 , William L. Carroll 14 , Kimberly P. Dunsmore 15 , Stuart S. Winter 16 , Brent L Wood 17 , Brian P. Sorrentino 3 , James R. Downing 2 , Mignon L. Loh 18 , Stephen P Hunger 19* , Jinghui Zhang 1* and Charles G. Mullighan 2* 1 Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN 2 Department of Pathology, St. Jude Children's Research Hospital, Memphis, TN 3 Department of Hematology, St. Jude Children’s Research Hospital, Memphis, TN 4 Department of Biostatistics, St. Jude Children’s Research Hospital, Memphis, TN 5 Cancer Therapy Evaluation Program, National Cancer Institute, Bethesda, MD 6 Office of Cancer Genomics, National Cancer Institute, Bethesda, MD 7 Department of Pharmaceutical Sciences, St. Jude Children's Research Hospital, Memphis, TN 8 University of Texas Southwestern Medical Center, Dallas, TX 9 Department of Genetics, University of Alabama at Birmingham, Birmingham, AL 10 Department of Pathology, College of Medicine, The Ohio State University, Columbus, OH 11 Department of Pediatrics, Huntsman Cancer Institute and Primary Children's Hospital, University of Utah, Salt Lake City, UT; 12 Department of Biostatistics, Colleges of Medicine, Public Health & Health Profession, University of Florida, Gainesville, FL 13 Department of Pathology, The Cancer Research and Treatment Center, University of New Mexico, Albuquerque, NM 14 Department of Pediatrics, Perlmutter Cancer Center, New York University Medical Center, New York, NY 15 Health Sciences Center, University of Virginia, Charlottesville, VA 16 Department of Pediatrics, University of New Mexico, Albuquerque, NM 17 Seattle Cancer Care Alliance, Seattle, WA 18 Department of Pediatrics, Benioff Children’s Hospital, University of California at San Francisco, San Francisco, CA 19 Department of Pediatrics and the Center for Childhood Cancer Research, Children’s Hospital of Philadelphia and the Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA Nature Genetics: doi:10.1038/ng.3909

Upload: others

Post on 11-Jul-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SUPPLEMENTARY NOTE THE GENOMIC LANDSCAPE OF … · As the following results are based on those cases with karyotypic data, the numbers of cases with specific types of genomic rearrangement

1

SUPPLEMENTARY NOTE

THE GENOMIC LANDSCAPE OF PEDIATRIC AND YOUNG ADULT T-LINEAGE ACUTE LYMPHOBLASTIC LEUKEMIA

Yu Liu1, John Easton1, Ying Shao1,2, Jamie Maciaszek3, Zhaoming Wang1, Mark R. Wilkinson2, Kelly McCastlain2, Michael Edmonson1, Stanley B. Pounds4, Lei Shi4, Xin Zhou1, Xiaotu Ma1, Edgar Sioson1, Yongjin Li1, Michael Rusch1, Pankaj Gupta1, Deqing Pei4, Cheng Cheng 4, Malcolm A. Smith5, Jaime Guidry Auvil6, Daniela S. Gerhard6, Mary V. Relling7, Naomi J. Winick8, Andrew J. Carroll9, Nyla A. Heerema10, Elizabeth Raetz11, Meenakshi Devidas12, Cheryl L. Willman13, Richard C. Harvey13, William L. Carroll14, Kimberly P. Dunsmore15, Stuart S. Winter16, Brent L Wood17, Brian P. Sorrentino3, James R. Downing2, Mignon L. Loh18, Stephen P Hunger19*, Jinghui Zhang1* and Charles G. Mullighan2*

1Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN 2Department of Pathology, St. Jude Children's Research Hospital, Memphis, TN 3Department of Hematology, St. Jude Children’s Research Hospital, Memphis, TN 4Department of Biostatistics, St. Jude Children’s Research Hospital, Memphis, TN

5Cancer Therapy Evaluation Program, National Cancer Institute, Bethesda, MD 6Office of Cancer Genomics, National Cancer Institute, Bethesda, MD 7Department of Pharmaceutical Sciences, St. Jude Children's Research Hospital, Memphis, TN 8University of Texas Southwestern Medical Center, Dallas, TX 9Department of Genetics, University of Alabama at Birmingham, Birmingham, AL 10Department of Pathology, College of Medicine, The Ohio State University, Columbus, OH 11Department of Pediatrics, Huntsman Cancer Institute and Primary Children's Hospital, University of Utah, Salt Lake City, UT; 12Department of Biostatistics, Colleges of Medicine, Public Health & Health Profession, University of Florida, Gainesville, FL 13Department of Pathology, The Cancer Research and Treatment Center, University of New Mexico, Albuquerque, NM 14Department of Pediatrics, Perlmutter Cancer Center, New York University Medical Center, New York, NY 15Health Sciences Center, University of Virginia, Charlottesville, VA 16Department of Pediatrics, University of New Mexico, Albuquerque, NM 17Seattle Cancer Care Alliance, Seattle, WA 18Department of Pediatrics, Benioff Children’s Hospital, University of California at San Francisco, San Francisco, CA 19Department of Pediatrics and the Center for Childhood Cancer Research, Children’s Hospital of Philadelphia and the Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA

Nature Genetics: doi:10.1038/ng.3909

Page 2: SUPPLEMENTARY NOTE THE GENOMIC LANDSCAPE OF … · As the following results are based on those cases with karyotypic data, the numbers of cases with specific types of genomic rearrangement

2

SUPPLEMENTARY RESULTS

Identification of rearrangements from transcriptome sequencing data

Two hundred and thirty-two cases had cytogenetic data, and of these, fourteen were classified

into subgroups according to elevated expression of core transcription factors, and had

cytogenetic data supporting a rearrangement of the transcription factor gene, but lacked a

rearrangement identifiable on RNA-seq data. These included 5 cases with putative BCL11B-

TLX3 rearrangements [t(5;14)(q35;q32)], 3 TRA/D-TLX1 [t(10;14)(q24;q11.2)], 3 TRA/D-LMO2

[t(11;14)(p13;q11.2)], 1 TRB-TAL2 [t(7;9)(q34;q32], 1 TAL1 deletion [del(1)(p32p32)] and 1

PICALM-MLLT10 [t(10;11)(p13;q21)] fusion that could result in HOXA overexpression. To

further investigate the presence of rearrangements in these 14 cases, we used ChimeraScan1,

which uses discordant read mapping to detect rearrangements, in contrast to CICERO, which is

a local assembly based method using unmapped and soft-clipped reads. All 14 cases were

negative for rearrangements using this alternative method, consistent with our results of our

standard analysis using CICERO. This suggests these cases must have an alternative

mechanism of genomic alteration that does not generate a chimeric fusion transcript resulting in

gene deregulation.

This indicated that while RNA-sequencing may classify cases by deregulated expression

of transcription factor genes, it does not identify all structural genomic alterations resulting in

deregulation of these genes, particularly those alterations that do not result in the expression of

chimeric fusion transcripts. To further define the utility of different genetic and genomic

approaches to identify transcription factor gene alterations, we compared the genomic

rearrangements driving transcription factor deregulation in each subtype identified by

cytogenetic analysis and those identified from RNA-seq. Among the 232 cases with karyotypic

data, 213 could be classified into subtype based on genomic alterations identified by any of the

three modalities used (SNP array, whole exome sequencing, and RNAseq analysis of

Nature Genetics: doi:10.1038/ng.3909

Page 3: SUPPLEMENTARY NOTE THE GENOMIC LANDSCAPE OF … · As the following results are based on those cases with karyotypic data, the numbers of cases with specific types of genomic rearrangement

3

rearrangements and deregulated gene expression and/or aberrant expression of key

transcription factors. As the following results are based on those cases with karyotypic data, the

numbers of cases with specific types of genomic rearrangement differ from the main

manuscript, which describes the entire cohort. Fifty-four could be supported by cytogenetic

results (including 53 rearrangements and 1 TAL1 deletion), while 122 could be identified by

RNA-seq, plus 16 TAL1 super-enhancer mutations were identified by sequencing. Importantly,

42 TAL1 deletions were identified by SNP array analysis among this 213 cases, 41 of which

were identified by RNA-seq while one was detected by cytogenetics. Of cases with

rearrangements, 40 cases were consistent between cytogenetics and RNA-seq, with 14 cases

detected only by cytogenetics, and 82 by RNA-seq only (Supplementary Table 12).

The ability of RNA-seq to identify rearrangements varied between sub-groups, with a

median of 65% (ranging from 22% of LMO1/LYL1 cases to 86% of TAL1 cases). This was

caused by different genomic mechanisms involved. For example, a total of 17 TAL1 cases were

driven by super-enhancer mutations, which could not be identified from RNA-seq. By contrast,

MLL fusions resulting in HOXA overexpression were chimeric fusions, all of which were

identified by RNA-seq.

The other rearrangements were likely non-chimeric fusions resulting in deregulation,

mostly by juxtaposition to enhancer at T cell antigen receptor loci. We used the BCL11B-TLX3

rearrangement as an example to explore the capability of RNAseq to detect these

rearrangements. We identified 8 cases carrying this rearrangement on analysis of RNA-Seq

data, none of which had evidence of rearrangement on cytogenetic analysis. One case

(PASUGC) is shown as a representative example and indicated breakpoints from all 8 cases

with arrows in this figure (Supplementary Figure 3). The breakpoints in nearly all cases (except

one in an BCL11B intronic region) fell into intergenic regions, consistent with previous reports.2

We observed enhancer RNA/lncRNA transcripts in intergenic region around both breakpoints as

shown in wiggle plot, which were transcribed because of the active enhancer. As samples were

Nature Genetics: doi:10.1038/ng.3909

Page 4: SUPPLEMENTARY NOTE THE GENOMIC LANDSCAPE OF … · As the following results are based on those cases with karyotypic data, the numbers of cases with specific types of genomic rearrangement

4

analysed using sequencing of total RNA, we retained the ability to detect these transcripts

outside of coding regions, and thus the rearrangements. Together, these analyses suggest that

distinct structural rearrangements may juxtapose BCL11B and TLX3, resulting in TLX3

deregulation, some of which are not evident as translocations in karyotyping, for which genome

sequencing will be required to resolve.

Analysis of germline mutations in T-ALL

We analyzed germline mutations that could potentially contribute to T-ALL

predisposition, as previously described.3 A total of 89 genes were analyzed, including 60 genes

known to be associated with autosomal dominant cancer predisposition syndromes, and

additional 35 most recurrently somatically mutated genes in the T-ALL study. Three of the

genes analyzed belong to Fanconi Anemia (FA) pathway, including BRCA1, BRCA2 and

PALB2. Heterozygous mutations were observed in other FA genes in our previous study3 but

didn’t show a statistical difference between cancer patient and healthy control, thus not

included. After pathogenicity review, four germline mutations were identified from the 264 cohort

as being pathogenic or likely pathogenic. These included three frameshift mutations in BRCA1

(P1224fs in SJALL015687/PATSDS), BRCA2 (T256fs in SJTALL002045/PARWNW) and in

RUNX1 (R232fs in SJTALL002039/PASWFN), and one affecting the canonical splicing site in

BRAF (R506_E13splice in SJALL015266/PARTPW).

Deletions of 6q in T-ALL

Deletions of chromosome 6q14-q23 were present in 19.3% of cases, and were enriched in

cases with alterations deregulating TAL1, TLX1, LMO2, and NKX2-1 (Supplementary Figures 6

and 7). In nearly all cases, the deletion was broad (10-20 Mb); three cases exhibited more

limited deletions involving as few as 3 genes. The target(s) of deletion in this region that

facilitate leukemogenesis are poorly understood, although it has been postulated that alteration

of CCNC, encoding cyclin C, a putative tumor suppressor that regulates CDK19, CDK8 and

Nature Genetics: doi:10.1038/ng.3909

Page 5: SUPPLEMENTARY NOTE THE GENOMIC LANDSCAPE OF … · As the following results are based on those cases with karyotypic data, the numbers of cases with specific types of genomic rearrangement

5

CDK3, and that phosphorylates NOTCH1 resulting in degradation, may be a target.4 The only

recurrent focal deletion in this region identified in this study involved HMGN3 in three cases.

This putative minimal region of deletion was defined by a single case (rather than multiple focal

deletions overlapping this gene), and another case had a focal deletion distal to HMGN3, thus

alteration of this gene is unlikely to be the only alteration on 6q driving leukemogenesis. In

contrast, most deletions were large, suggesting that deletion of multiple genes in this region

may be important (Supplementary Figure 7b).

To explore this, we conducted an integrative analysis of multimodality genomic data

(DNA copy number alterations, mutations, structural rearrangements and expression levels from

RNA-sequencing) to systematically identify genes in which perturbation of more than one

genomic modality is likely to signify a role as a putative tumor suppressor. We used two

computational approaches to identify the peak regions of SCNA: Genomic Identification of

Significant Targets in Cancer (GISTIC)5 and the Genomic Random Interval (GRIN) model.6

GISTIC has been extensively used to identify significant SCNAs in cancer genomes and

corrects for bias from the size of lesion, but does not consider mutational data. Like GISTIC,

GRIN computes a significance value (p-value or q-value) for the abundance of gains and the

abundance of losses observed across the entire genome. Furthermore, GRIN has many

additional capabilities that are lacking in GISTIC, such as the ability to compute significance

values for the abundance of other genomic alterations (mutations, indels and structural

breakpoints). For the locus of each of 33,560 genes, we used GRIN to compute a p-value for

the number of subjects with a copy number gain, copy number loss, sequence mutation, or

structural rearrangement. For each type of lesion, the q-value7 was computed with the p-value

moment-based estimator8 of the proportion of tests with a true null hypothesis to characterize

the false discovery rate (FDR) for multiple-testing across many genes. The GRIN analysis was

adjusted for the regions of the genome that were observable with each type of platform by

transforming genomic coordinates into a new coordinate system that spliced out unobservable

Nature Genetics: doi:10.1038/ng.3909

Page 6: SUPPLEMENTARY NOTE THE GENOMIC LANDSCAPE OF … · As the following results are based on those cases with karyotypic data, the numbers of cases with specific types of genomic rearrangement

6

regions of the genome prior to p-value calculation.

We have previously shown that GRIN typically has greater statistical power than does

GISTIC.6 Accordingly, GRIN identifies a broader region, and more loci on 6q14-15 of potential

interest than GISTIC (Supplementary Figure 8). GRIN identified a deletional SCNA at chr 6q

(64.4 Mb – 116.8 Mb, 286 annotated genes) in at least 10 subjects), with a central region

deleted in at least 39 cases (79.5Mb – 97.1Mb, 102 genes). This deletion is significantly

associated with LMO1/2 rearranged subtype (30% cases), TAL1 (29%) and TLX3 (23%;

P=0.001), strongly suggesting that gene(s) deleted in this region cooperate in leukemogenesis.

We next used RNA-seq gene expression data to refine initial empirical prioritization of

candidate loci. To evaluate the association of each type of lesion with expression for each gene,

the Wilcoxon rank-sum test9 was used to compare the RNA-seq FPKM expression values of

subjects with the specific type of lesion (mutation, fusion, copy number gain, copy number loss)

with those of subjects with no detected lesion. For each type of lesion, the q-value7 with the p-

value moment-based estimator7 was used to characterize the false discovery rate of performing

multiple-tests across many genes. These analyses were only performed for genes with

available FPKM data and at least 3 subjects in each group for differential expression analysis.

Among the 102 genes in the chr 6q deletion noted above, 38 genes showed significant

underexpression in subjects with the deletion compared to subjects with no detectable lesion of

any type (q < 0.10). Thus, this suggests that alteration of multiple loci in this region contributes

to leukemogenesis.

Genomic determinants of outcome

T-ALL has often been associated with an outcome inferior to that of B-progenitor ALL, but the

biological reasons for this have been unclear. ETP ALL was associated with very poor outcome

in multiple cohorts, but this has been at least partly mitigated by intensive therapy in recent

studies.10 Multiple studies have reported conflicting associations between CDKN2A/B and

Nature Genetics: doi:10.1038/ng.3909

Page 7: SUPPLEMENTARY NOTE THE GENOMIC LANDSCAPE OF … · As the following results are based on those cases with karyotypic data, the numbers of cases with specific types of genomic rearrangement

7

NOTCH1/FBXW7 status and outcome. The current study examined a single cohort of patients

treated with a COG augmented Berlin-Frankfurt-Munster-based chemotherapy regimen and

randomized administration of nelarabine.11 We observed multiple associations between genetic

subgroups, genetic pathways, individual genetic lesions and outcome, as assessed by

measurement of levels of minimal residual disease (MRD) 29 days after commencement of

therapy, relapse, overall survival and event free survival (Supplementary Tables 20 - 23).

LMO2, LYL1 and TLX3 overexpression were each associated with inferior early treatment

response (detection of any level of MRD), whereas TAL1 and NKX2-1 subtypes exhibited low

rates of MRD positivity (Supplementary Table 20). Cell cycle pathway (and CDKN2A/B)

alterations and ribosomal processing alterations were associated with favorable MRD

responses, whereas JAK-STAT and Ras signaling pathway alterations were unfavorable

features. FLT3, IKZF1, KDM6A, NRAS, NUP98, TCF7 and WT1 alterations were associated

with inferior MRD outcome, with the rate of MRD positivity three times as high in WT1 mutant

cases (63.3%) as in wild type cases (20.6%). In contrast, LEF1, MYB and RPL10-mutated

cases had favorable MRD responsiveness. The incidence of relapse was only 7.5% (95% CI

±1.7%) in this cohort, and the only univariable associations with relapse identified were with

AKT1, MLLT10, CNOT3 and PTEN alterations (Supplementary Table 20). No associations

between NOTCH1/FBXW7 mutations and outcome were observed, either as individual genes,

or combinations of NOTCH1 mutation subtypes and FBXW7 mutations, with the exception of

the small number of cases with NOTCH1 HD, PEST and FBXW7 mutation, which had

unfavorable survival (Supplementary Tables 22 - 23).

Nature Genetics: doi:10.1038/ng.3909

Page 8: SUPPLEMENTARY NOTE THE GENOMIC LANDSCAPE OF … · As the following results are based on those cases with karyotypic data, the numbers of cases with specific types of genomic rearrangement

8

SUPPLEMENTARY TABLES

Tables are provided in the Excel workbook.

Supplementary Table 1. Details of patient cohort.

Supplementary Table 2. Clinical and laboratory characteristics of cases studied and not

studied from the AALL0434 cohort.

Supplementary Table 3. Details of exome coverage.

Column header definitions: Sample: Patient identifier tissue_code: D, diagnosis; G, germline perc_cds_ge_20x_cvg: Percentage of coding regions of genes covered at 20-fold or greater reads_in_total: total number of reads generated reads_mapped: number of reads mapped reads_nondup_mapped: non-duplicate reads mapped Mpd%: Percentage of reads mapped nondup_Mpd%: percentage of non-duplicate reads mapped

Supplementary Table 4. Details of RNA sequencing coverage.

Column header definitions: Sample: patient identifier #gene_ge_10x_cvg: Genes covered at 10-fold depth or greater #gene_ge_20x_cvg Genes covered at 20-fold depth or greater Total_Reads: Total number of sequencing reads Mapped: Number of mapped reads NonDupMapped: Number of non-duplicate mapped reads Mpd%: Percentage of reads mapped

Supplementary Table 5. RNAseq FPKM gene expression matrix

Supplementary Table 6. Variants subjected to verification sequencing.

Column header definitions: Sample: Sample identifier Gene: Entrez gene symbol Type: Type of nucleotide mutation maf_T: Mutant allele fraction in tumor maf_G: Mutant allele fraction in germline Chr: chromosome Pos: genomic position (hg19) Class: protein coding consequence of mutation mut_T: number of mutant reads in tumor identified by verification sequencing total_T: total number of reads in tumor identified by verification sequencing mut_G: number of mutant reads in germline identified by verification sequencing total_G: total number of reads in germline identified by verification sequencing

Nature Genetics: doi:10.1038/ng.3909

Page 9: SUPPLEMENTARY NOTE THE GENOMIC LANDSCAPE OF … · As the following results are based on those cases with karyotypic data, the numbers of cases with specific types of genomic rearrangement

9

Ref: reference nucleotide sequence Mut: mutant nucleotide sequence mRNA_acc: Refseq transcript ID AAChange: amino acid change Pass Verification: mutation passed verification Comment: Comment

Supplementary Table 7. Association between immunophenotype and transcription factor

subgroup.

Supplementary Table 8. Sequence mutations identified by exome sequencing.

Column header definitions: Gene: Entrez gene symbol Refseq: Refseq RNA transcript ID Chr: chromosome Start: genomic position at site of mutation aa_change: coding consequence of mutation class: mutation type Sample: sample ID Subgroup: T-ALL genetic subtype

Supplementary Table 9. Driver mutations.

Column header definitions: Gene: Entrez gene symbol #Sample: Number of samples with mutations #mut: Number of mutations identified Medals: summary of medals assigned for each gene in Medal Ceremony analysis. Gold, Silver, Bronze and Unknown: four tiers for driver classification significance, decreased sequentially. MuSigCV: p-value from MuSigCV analysis. na, not significant. MuSiC_fdr_fcpt: FDR calculated using Fisher’s Combined P-value Test from MuSiC analysis. na, not significant. MuSiC_fdr_lrt: FDR calculated using Likelihood Ratio Test from MuSiC analysis. na, not significant. MuSiC_fdr_lrt MuSiC_fdr_ct: FDR calculated using Convolution Test from MuSiC analysis. na, not significant.

Supplementary Table 10. List of genes examined for germline mutations.

T-ALL gene: gene was examined for germline mutations as a target of recurrent somatic mutations in this study. NEJM paper: gene studied in germline analysis of pediatric cancer, as reported in ref.3

Supplementary Table 11. Rearrangements identified from RNA-sequencing data.

Column header definitions: gene_a(b): Gene involved in structural variation (SV)/rearrangement refseq_a(b): Refseq identifier of gene_a(b)

Nature Genetics: doi:10.1038/ng.3909

Page 10: SUPPLEMENTARY NOTE THE GENOMIC LANDSCAPE OF … · As the following results are based on those cases with karyotypic data, the numbers of cases with specific types of genomic rearrangement

10

chr_a(b): chromosomal location of gene_a(b) position_a(b): genomic position (hg19) of SV in gene_a(b) strand_a(b): strand involved by rearrangement involving gene_a(b) Subgroup: T-ALL subgroup

Supplementary Table 12. Concordance of cytogenetic and genomics for detection of

transcription factor gene alterations

Supplementary Table 13. DNA copy number alterations in T-ALL

Curated table of DNA copy number segments identified by circular binary segmentation of SNP array data in ALL. Column header definitions: Case: Sample identifier Chromosome: Chromosome Start: Genomic start of segment of copy number alteration End: Genomic end of segment of copy number alteration N_genes: number of genes in segment log2_Ratio: log2 ratio of copy number state Copy_number: copy number of segment

Supplementary Table 14. GISTIC analysis of DNA copy number alteration data

Tables 11a and b show significant regions of deletion and gain, respectively.

Supplementary Table 15. Matrix of cases and genetic alterations

This table provides a comprehensive cross tabulation of cases, subgroup, gene expression and genetic alterations for the cohort.

Supplementary Table 16. Recurrently mutated pathways in T-ALL.

Listing of genes classified into functional pathways in T-ALL

Supplementary Table 17. Results of integrated DNA copy number, expression and

mutation analysis.

The worksheet provides statistical results for each gene on one row. The meanings of the columns are provided below. The first four columns are the gene symbol and location. The next two columns are the number of “normal” cases that have no detected lesion in the locus of the indicated gene and the median expression of that gene for those cases (when available). Then, for each type of lesion (mutation, fusion, gain, loss), there are eight columns of results. The first column is the number of subjects with that type of lesion (for example, n.mut is the number of subjects with a mutation). The next two column give the p-value and q-value for the test of overabundance of cases with a lesion (for example, p.n.mut and q.n.mut are the p-value and q-value for n.mut, respectively). The next three columns give the median expression among cases with the indicated lesion type and the p-value and q-value for differential expression

Nature Genetics: doi:10.1038/ng.3909

Page 11: SUPPLEMENTARY NOTE THE GENOMIC LANDSCAPE OF … · As the following results are based on those cases with karyotypic data, the numbers of cases with specific types of genomic rearrangement

11

between those cases and cases with no detected lesion (for example, mdn.expr.mut gives the median expression for cases with mutations, when expression data is available and p.dex.mut and q.dex.mut give the p-value and q-value for differential expression). Analogous results are provided for fusions (suffix .fus), losses (suffix .loss), and gains (suffix .gain). Column Name Column Meaning gene.symbol Gene Symbol chrom Chromosome of Gene loc.start Start Location of Lesion of Gene loc.end End Location of Lesion of Gene n.normal No. of Cases without any detected lesion in this gene mdn.expr.normal Median Expression of this gene in those cases n.mut No. of Cases with mutation detected in this gene p.n.mut p-value for n.mut q.n.mut q-value for p.n.mut mdn.expr.mut Median Expression of this gene in cases with a mutation detected in this

gene p.dex.mut p-value for Differential Expression between cases with a mutation and cases

with no detected lesion q.dex.mut q-value for p.dex.mut n.fus No. of Cases with a detected fusion involving this gene p.n.fus p-value for n.fus q.n.fus q-value for p.n.fus mdn.expr.fus Median expression for cases with a fusion detected in this gene p.dex.fus p-value for differential expression between cases with a fusion and cases

with no detected lesion q.dex.fus q-value for p.dex.fus n.loss number of cases with a loss affecting this gene p.n.loss p-value for n.loss q.n.loss q-value for p.n.loss mdn.expr.loss Median expression for cases with a loss detected in this gene p.dex.loss p-value for differential expression between cases with a loss and cases with

no detected lesion q.dex.loss q-value for p.dex.loss n.gain number of cases with a gain affecting this gene p.n.gain p-value for n.gain q.n.gain q-value for p.n.gain mdn.expr.gain Median expression for cases with a gain detected in this gene p.dex.gain p-value for differential expression between cases with a gain and cases with

no detected lesion q.dex.gain q-value for p.dex.gain

Supplementary Table 18. Gene-gene correlation matrix

Correlation coefficient and P values for association between genetic variants, pathways and expression subgroups.

Nature Genetics: doi:10.1038/ng.3909

Page 12: SUPPLEMENTARY NOTE THE GENOMIC LANDSCAPE OF … · As the following results are based on those cases with karyotypic data, the numbers of cases with specific types of genomic rearrangement

12

Supplementary Table 19. Associations between mutated genes, pathways and T-ALL

subgroups.

Rows are grouped by genetic and immunophenotype groups (ETP, near-ETP, non-ETP). Cases lacking immunophenotypic data are excluded from the ETP analyses. Columns are grouped first by pathways, then by outlier gene expression (e.g. “LMO2_25_EXP” refers to cases in the top quartile of LMO2 expression, then by individual lesions. Multiple combinations of NOTCH1 gene, and NOTCH1 pathway (ie, NOTCH1 and/or FBXW7) mutations were examined, alone and in combination, and considering all variants or only those considered clonal (mutant allele frequency, MAF>0.3). For example, “NOTCH_pathway MAF>0.3 PEST_FBXW7” refers to cases with clonal mutations in both the NOTCH1 PEST domain, and FBXW7. “NOTCH1 gene any MAF HD_PEST” refers to cases with mutations of any MAF in both the heterodimerization and PEST domains of NOTCH1.

Supplementary Table 20. Associations between genetic alterations and elevated day 29

minimal residual disease

Supplementary Table 21. Associations between genetic alterations and incidence of

relapse.

Supplementary Table 22. Associations between genetic alterations and overall survival.

Supplementary Table 23. Associations between genetic alterations and event-free

survival.

Nature Genetics: doi:10.1038/ng.3909

Page 13: SUPPLEMENTARY NOTE THE GENOMIC LANDSCAPE OF … · As the following results are based on those cases with karyotypic data, the numbers of cases with specific types of genomic rearrangement

13

SUPPLEMENTARY FIGURES

Supplementary Figure 1. Hierarchical clustering of RNA-seq data

Hierarchical clustering using Ward's minimum variance method of the top 500 most variably expressed genes from transcriptome sequencing data, with annotation of genetic alterations resulting in core transcription factor deregulation (top 7 rows, gray background with red bars denoting alteration), and transcription factor subgrouping (multicolored 8th bar).

Nature Genetics: doi:10.1038/ng.3909

Page 14: SUPPLEMENTARY NOTE THE GENOMIC LANDSCAPE OF … · As the following results are based on those cases with karyotypic data, the numbers of cases with specific types of genomic rearrangement

14

Supplementary Figure 2. Recurrently mutated genes in T-ALL.

ProteinPaint12 visualizations of recurrently mutated genes in T-ALL. With the exception of ZFP36L2, structural variants (deletions, rearrangements) are not shown.

Nature Genetics: doi:10.1038/ng.3909

Page 15: SUPPLEMENTARY NOTE THE GENOMIC LANDSCAPE OF … · As the following results are based on those cases with karyotypic data, the numbers of cases with specific types of genomic rearrangement

15

Nature Genetics: doi:10.1038/ng.3909

Page 16: SUPPLEMENTARY NOTE THE GENOMIC LANDSCAPE OF … · As the following results are based on those cases with karyotypic data, the numbers of cases with specific types of genomic rearrangement

16

Nature Genetics: doi:10.1038/ng.3909

Page 17: SUPPLEMENTARY NOTE THE GENOMIC LANDSCAPE OF … · As the following results are based on those cases with karyotypic data, the numbers of cases with specific types of genomic rearrangement

17

Nature Genetics: doi:10.1038/ng.3909

Page 18: SUPPLEMENTARY NOTE THE GENOMIC LANDSCAPE OF … · As the following results are based on those cases with karyotypic data, the numbers of cases with specific types of genomic rearrangement

18

Nature Genetics: doi:10.1038/ng.3909

Page 19: SUPPLEMENTARY NOTE THE GENOMIC LANDSCAPE OF … · As the following results are based on those cases with karyotypic data, the numbers of cases with specific types of genomic rearrangement

19

Nature Genetics: doi:10.1038/ng.3909

Page 20: SUPPLEMENTARY NOTE THE GENOMIC LANDSCAPE OF … · As the following results are based on those cases with karyotypic data, the numbers of cases with specific types of genomic rearrangement

20

Nature Genetics: doi:10.1038/ng.3909

Page 21: SUPPLEMENTARY NOTE THE GENOMIC LANDSCAPE OF … · As the following results are based on those cases with karyotypic data, the numbers of cases with specific types of genomic rearrangement

21

Supplementary Figure 3. Example of structure of BCL11B-TLX3 rearrangement

Wiggle plots showing transcription around breakpoints of BCL11B-TLX3 rearrangement in RNA-seq. Case PASUGC is shown as an example, with breakpoints marked with red/blue arrows. Breakpoints from other 7 cases identified with BCL11B-TLX3 rearrangement were also marked as black arrows in the plot.

Nature Genetics: doi:10.1038/ng.3909

Page 22: SUPPLEMENTARY NOTE THE GENOMIC LANDSCAPE OF … · As the following results are based on those cases with karyotypic data, the numbers of cases with specific types of genomic rearrangement

22

Supplementary Figure 4. MYB rearrangements

Multiple mechanisms involved in MYB deregulation were identified in T-ALL. a, rearrangements juxtaposing the 5’ region of MYB 5’ region to promoter/enhancer regions of rearrangement partners. b, rearrangements truncated a negative regulatory region on 3’ of MYB protein. c, MYB duplication.

Nature Genetics: doi:10.1038/ng.3909

Page 23: SUPPLEMENTARY NOTE THE GENOMIC LANDSCAPE OF … · As the following results are based on those cases with karyotypic data, the numbers of cases with specific types of genomic rearrangement

23

Supplementary Figure 5. Transcription factor mutations in T-ALL.

Heatmap of mutations grouped by T-ALL subtype, showing only cases in each subgroup with alterations in transcription factor genes. Note the copy number gain and loss in MYC refer to the Notch1-driver MYC enhancer (NME).

Nature Genetics: doi:10.1038/ng.3909

Page 24: SUPPLEMENTARY NOTE THE GENOMIC LANDSCAPE OF … · As the following results are based on those cases with karyotypic data, the numbers of cases with specific types of genomic rearrangement

24

Supplementary Figure 6. GISTIC analysis of DNA copy number alterations

GISTIC significance peaks showing the most significant areas of (a) deletion and (b) gain in T-ALL. Data are provided in Supplementary Table 14.

Nature Genetics: doi:10.1038/ng.3909

Page 25: SUPPLEMENTARY NOTE THE GENOMIC LANDSCAPE OF … · As the following results are based on those cases with karyotypic data, the numbers of cases with specific types of genomic rearrangement

25

Supplementary Figure 7. Chromosome 6q deletions

6q deletions identified in T-ALL, ordered by the proximal genomic boundary of deletion. a, broad view showing the extent of deletion in all cases. b, zoomed view to show the only focal area of deletion observed in three cases. The large deletions observed in the remaining cases suggests that concomitant deletion of multiple genes is likely important in leukemogenesis in many cases.

Nature Genetics: doi:10.1038/ng.3909

Page 26: SUPPLEMENTARY NOTE THE GENOMIC LANDSCAPE OF … · As the following results are based on those cases with karyotypic data, the numbers of cases with specific types of genomic rearrangement

26

Supplementary Figure 8. Results of integrated analysis to examine genetic drivers of

chromosome 6q deletion

The top panel shows alterations across the genome for all patients, and the next panel, results of significance testing from GISTIC GRIN, expression and integrated GRIN-expression results. The third and fourth panels show analogous data for 6q, showing a broader region of significance identified by GRIN than GISTIC. Representative genes in the showing low expression with no difference between deleted and non-deleted cases (COL19A1), reduced expression in deleted cases (CYB5R4, LYRM2, MAP3K7), and a gene with no expression in any cases (POPDC3) are shown.

Nature Genetics: doi:10.1038/ng.3909

Page 27: SUPPLEMENTARY NOTE THE GENOMIC LANDSCAPE OF … · As the following results are based on those cases with karyotypic data, the numbers of cases with specific types of genomic rearrangement

27

Nature Genetics: doi:10.1038/ng.3909

Page 28: SUPPLEMENTARY NOTE THE GENOMIC LANDSCAPE OF … · As the following results are based on those cases with karyotypic data, the numbers of cases with specific types of genomic rearrangement

28

Supplementary Figure 9. NOTCH1 and FBXW7 mutations

The Venn diagrams show the overlap of cases NOTCH1 and FBXW7 mutations considering all NOTCH1 mutations (a) and NOTCH1 mutations stratified by location. HD, Heterodimerization domain; PEST, domain rich in proline (P), glutamic acid (E), serine (S) and threonine (T).

Nature Genetics: doi:10.1038/ng.3909

Page 29: SUPPLEMENTARY NOTE THE GENOMIC LANDSCAPE OF … · As the following results are based on those cases with karyotypic data, the numbers of cases with specific types of genomic rearrangement

29

Supplementary Figure 10. Correlation and clustering of genetic alterations

Hierarchical clustering using Ward's minimum variance method of paired gene correlation analysis showing gene based association between lesions in T-ALL. Only genes detected in more than 5 cases and significantly associated with at least one other gene were included. Significant association was defined as absolute correlation coefficient greater than 0.1 and p-value less than 0.05 from Spearman correlation analysis.

Nature Genetics: doi:10.1038/ng.3909

Page 30: SUPPLEMENTARY NOTE THE GENOMIC LANDSCAPE OF … · As the following results are based on those cases with karyotypic data, the numbers of cases with specific types of genomic rearrangement

30

Supplementary Figure 11. Chromosome 9 and ABL1 alterations in T-ALL

a shows copy number data for 18 cases with copy number alterations and/or rearrangement of ABL1. These include cases with large copy number gains involving 9q, with several having 9p-/9q+ consistent with the presence of an isochromosome 9q. Focal gain of 9q34 accompanying NUP214-rearrangement is shown as an arrow. Case SJTALL002118 has MBNL1-ABL1 rearrangement but no copy number alteration of 9q. Case SJTALL022447 has iso(9q) with a focal deletion of 9q that results in SET-NUP214 fusion. b, zoomed view of 9q35 showing focal copy number gains for cases with NUP214-ABL1 rearrangement (SJALL16001, SJTALL001786, SJTALL002009 and SJTALL002105) and case SJTALL022447 that has interstitial deletion within a region of gain that results in SET-NUP214 fusion. c, log ratio copy number data for two cases with NUP214-ABL1 rearrangement, showing gain of the NUP214-ABL1 region.

Nature Genetics: doi:10.1038/ng.3909

Page 31: SUPPLEMENTARY NOTE THE GENOMIC LANDSCAPE OF … · As the following results are based on those cases with karyotypic data, the numbers of cases with specific types of genomic rearrangement

31

Supplementary Figure 12. TAL1 promoter mutations in T-ALL

The figure shows insertion mutations in the TAL1 promoter. The top panel shows mutations corresponding to those previously described that introduce a MYB binding site. The four cases in the bottom panel show a novel single nucleotide insertion that does not introduce a MYB binding site.

chr1: 47704940 47704990

| |

CAGAGACATCTGCCAGGAAGTAGGGTTACGTCTTTCTGTGACCCTCAGTTT

Insertion caused acquisition of MYB binding motif

PATBGC CAGAGACATCTGCCAGGAAGTAGGGTTAAACCGTCTTTCTGTGACCCTCAGTTT

PATBTX CAGAGACATCTGCCAGGAAGTAGGGTTAACGGATATAACCGTCTTTCTGTGACCCTCAGT

PARJAY CAGAGACATCTGCCAGGAAGTAGGGTTAACGGAATTTCTAACCGTCTTTCTGTGACCCTC

PARSJG CAGAGACATCTGCCAGGAAGTAGGGTTAAACCGTCTTTCTGTGACCCTCAGTTT

PASYAJ CAGAGACATCTGCCAGGAAGTAGGGTTAAACCGTCTTTCTGTGACCCTCAGTTT

PATRAB CAGAGACATCTGCCAGGAAGTAGGGCTAACGGCGTCTTTCTGTGACCCTCAGTTT

PAUBXP CAGAGACATCTGCCAGGAAGTAGGGTTAACCGTCTTTCTGTGACCCTCAGTTT

PATENL CAGAGACATCTGCCAGGAAGTAGGGTTAAACCGTCTTTCTGTGACCCTCAGTTT

PARNXJ CAGAGACATCTGCCAGGAAGTAGGGTTAACGGTCTTTCTGTGACCCTCAGTTT

PASXSI CAGAGACATCTGCCAGGAAGTAGGGTTAAACCGTCTTTCTGTGACCCTCAGTTT

PASNEH CAGAGACATCTGCCAGGAAGTAGGGTTAACCGTCTTTCTGTGACCCTCAGTTT

PAUAFN CAGAGACATCTGCCAGGAAGTAGGGTTAAACCGTCTTTCTGTGACCCTCAGTTT

PARASZ CAGAGACATCTGCCAGGAAGTAGGGTTAACCGTCTTTCTGTGACCCTCAGTTT

PARWNW CAGAGACATCTGCCAGGAAGTAGGGTTAACCGTCTTTCTGTGACCCTCAGTTT

PASFKA CAGAGACATCTGCCAGGAAGTAGGGACCGTTAATCAACGTCTTTCTGTGACCCTCAGTTT

Insertion caused acquisition of TCF1/2 binding motif and loss of GMEB1 binding motif

PATEIT CAGAGACATCTGCCAGGAAGTAGGGTTAACGTCTTTCTGTGACCCTCAGTTT

PASMHF CAGAGACATCTGCCAGGAAGTAGGGTTAACGTCTTTCTGTGACCCTCAGTTT

PARJNX CAGAGACATCTGCCAGGAAGTAGGGTTAACGTCTTTCTGTGACCCTCAGTTT

PASYWF CAGAGACATCTGCCAGGAAGTAGGGTTAACGTCTTTCTGTGACCCTCAGTTT

Nature Genetics: doi:10.1038/ng.3909

Page 32: SUPPLEMENTARY NOTE THE GENOMIC LANDSCAPE OF … · As the following results are based on those cases with karyotypic data, the numbers of cases with specific types of genomic rearrangement

32

Supplementary Figure 13. NME alterations in T-ALL

DNA copy number heatmaps generated using dChip13 showing alterations of chromosome 8q. the left-most 12 cases have trisomy 8 or large gains of 8q that are not associated with elevated MYC expression. The remaining cases have focal changes at or adjacent to the NOTCH1 MYC Enhancer (NME), amplification of which has been shown to facilitated and augment MYC expression.14,15 All amplifications include the NME region (hg18 coordinates chr8:130249247-130250154). Notably, in addition to the amplifications of the NME previously described in T-ALL, 9 cases had deletions adjacent to the NME. In two cases (PATMTV and PATDFE) the deletions were adjacent to a focal amplification that involved the NME region; in the remaining cases, the deletions were present without NME amplification, either proximal to (N=2) or distal (N=5) the NME. In one case (PATFJD) the focal deletion is within a region of broad gain. The deletions were also associated with elevated MYC expression (data not shown), so these lesions are also considered to be NME-deregulating alterations.

Nature Genetics: doi:10.1038/ng.3909

Page 33: SUPPLEMENTARY NOTE THE GENOMIC LANDSCAPE OF … · As the following results are based on those cases with karyotypic data, the numbers of cases with specific types of genomic rearrangement

33

Supplementary Figure 14. Concomitant signaling mutations in T-ALL

The figure shows the mutant allele fraction (MAF) data for cases with JAK3 mutations, and shows MAF for both the JAK3 mutations and the second signaling mutation with the highest MAF. The tumors can be grouped into three classes (C1-C3) based on MAF of JAK3 and the second mutations.

Nature Genetics: doi:10.1038/ng.3909

Page 34: SUPPLEMENTARY NOTE THE GENOMIC LANDSCAPE OF … · As the following results are based on those cases with karyotypic data, the numbers of cases with specific types of genomic rearrangement

34

Supplementary Figure 15. Epigenomic mutations in T-ALL

Heatmap shows the enrichment of USP7 mutations in TAL1-rearranged T-ALL, and PHF6 mutations in TLX1/3-rearranged T-ALL.

Nature Genetics: doi:10.1038/ng.3909

Page 35: SUPPLEMENTARY NOTE THE GENOMIC LANDSCAPE OF … · As the following results are based on those cases with karyotypic data, the numbers of cases with specific types of genomic rearrangement

35

SUPPLEMENTARY REFERENCES

1. Iyer, M.K., Chinnaiyan, A.M. & Maher, C.A. ChimeraScan: a tool for identifying chimeric

transcription in sequencing data. Bioinformatics 27, 2903-4 (2011).

2. MacLeod, R.A., Nagel, S., Kaufmann, M., Janssen, J.W. & Drexler, H.G. Activation of

HOX11L2 by juxtaposition with 3'-BCL11B in an acute lymphoblastic leukemia cell line

(HPB-ALL) with t(5;14)(q35;q32.2). Genes Chromosomes Cancer 37, 84-91 (2003).

3. Zhang, J. et al. Germline Mutations in Predisposition Genes in Pediatric Cancer. N Engl

J Med 373, 2336-46 (2015).

4. Li, N. et al. Cyclin C is a haploinsufficient tumour suppressor. Nat Cell Biol 16, 1080-91

(2014).

5. Mermel, C.H. et al. GISTIC2.0 facilitates sensitive and confident localization of the

targets of focal somatic copy-number alteration in human cancers. Genome Biol 12, R41

(2011).

6. Pounds, S. et al. A genomic random interval model for statistical analysis of genomic

lesion data. Bioinformatics 29, 2088-95 (2013).

7. Storey, J.D. & Tibshirani, R. Statistical significance for genomewide studies. Proc Natl

Acad Sci U S A 100, 9440-5 (2003).

8. Pounds, S. & Cheng, C. Robust estimation of the false discovery rate. Bioinformatics 22,

1979-87 (2006).

9. Wilcoxon, F. Individual comparisons by ranking methods. Biometrics 1, 80-83 (1945).

10. Patrick, K. et al. Outcome for children and young people with Early T-cell precursor

acute lymphoblastic leukaemia treated on a contemporary protocol, UKALL 2003. Br J

Haematol 166, 421-4 (2014).

Nature Genetics: doi:10.1038/ng.3909

Page 36: SUPPLEMENTARY NOTE THE GENOMIC LANDSCAPE OF … · As the following results are based on those cases with karyotypic data, the numbers of cases with specific types of genomic rearrangement

36

11. Winter, S.S. et al. Safe integration of nelarabine into intensive chemotherapy in newly

diagnosed T-cell acute lymphoblastic leukemia: Children's Oncology Group Study

AALL0434. Pediatr Blood Cancer 62, 1176-83 (2015).

12. Zhou, X. et al. Exploring genomic alteration in pediatric cancer using ProteinPaint. Nat

Genet 48, 4-6 (2016).

13. Lin, M. et al. dChipSNP: significance curve and clustering of SNP-array-based loss-of-

heterozygosity data. Bioinformatics 20, 1233-40 (2004).

14. Yashiro-Ohtani, Y. et al. Long-range enhancer activity determines Myc sensitivity to

Notch inhibitors in T cell leukemia. Proc Natl Acad Sci U S A 111, E4946-53 (2014).

15. Herranz, D. et al. A NOTCH1-driven MYC enhancer promotes T cell development,

transformation and acute lymphoblastic leukemia. Nat Med 20, 1130-7 (2014).

Nature Genetics: doi:10.1038/ng.3909