supplementary note recurrent mutation of the id3 … · dido lenze8, monika szczepanowski9, maren...

39
1 Supplementary Note Recurrent mutation of the ID3 gene in Burkitt Lymphoma identified by integrated genome, exome and transcriptome sequencing Julia Richter 1* , Matthias Schlesner 2* , Steve Hoffmann 3* , Markus Kreuz 4* , Ellen Leich 5* , Birgit Burkhardt 6,7* , Maciej Rosolowski 4 , Ole Ammerpohl 1 , Rabea Wagener 1 , Stephan H. Bernhart 3 , Dido Lenze 8 , Monika Szczepanowski 9 , Maren Paulsen 10 , Simone Lipinski 10 , Robert B. Russell 11 , Sabine Adam-Klages 12 , Gordana Apic 11 , Alexander Claviez 13 , Dirk Hasenclever 4 , Volker Hovestadt 14 , Nadine Hornig 1 , Jan O. Korbel 15 , Dieter Kube 16 , David Langenberger 3 , Chris Lawerenz 2 , Jasmin Lisfeld 7 , Katharina Meyer 17 , Simone Picelli 14 , Jordan Pischimarov 5 , Bernhard Radlwimmer 14 , Tobias Rausch 15 , Marius Rohde 7 , Markus Schilhabel 10 , René Scholtysik 18 , Rainer Spang 17 , Heiko Trautmann 19 , Thorsten Zenz 20,21,22 , Arndt Borkhardt 23 , Hans G. Drexler 24 , Peter Möller 25 , Roderick A.F. MacLeod 24 , Christiane Pott 19 , Stephan Schreiber 26 , Lorenz Trümper 16 , Markus Loeffler 4 , Peter F. Stadler 27 , Peter Lichter 14 , Roland Eils 2,28 , Ralf Küppers 18 , Michael Hummel 8# , Wolfram Klapper 9# , Philip Rosenstiel 10# , Andreas Rosenwald 5# , Benedikt Brors 2# , Reiner Siebert 1# on behalf of the German ICGC MMML-Seq-Project + * These authors contributed equally to this work, # These authors jointly directed this work 1 Institute of Human Genetics, Christian-Albrechts-University, Kiel, Germany; 2 Deutsches Krebsforschungszentrum Heidelberg (DKFZ), Division Theoretical Bioinformatics, Heidelberg, Germany; 3 Transcriptome Bioinformatics, LIFE Research Center for Civilization Diseases, University of Leipzig, Leipzig, Germany; 4 Institute for Medical Informatics Statistics and Epidemiology, University of Leipzig, Leipzig, Germany; 5 Institute of Pathology, University of Wuerzburg, Wuerzburg, Germany; 6 University Hospital Münster - Pediatric Hematology and Oncology, Muenster Germany; 7 University Hospital Giessen, - Pediatric Hematology and Oncology, Giessen, Germany; 8 Institute of Pathology, Charité University Medicine Berlin, Berlin, Germany; 9 Hematopathology Section, Christian-Albrechts-University, Kiel, Germany; 10 Institute of Clinical Molecular Biology, Christian-Albrechts- University, Kiel, Germany; 11 Cell Networks, Bioquant, University of Heidelberg, Heidelberg, Germany; 12 Institute of Immunology, Christian-Albrechts-University, Kiel, Germany; 13 Department of Pediatrics, University Hospital Schleswig-Holstein, Campus Kiel, Kiel, Germany; 14 Deutsches Krebsforschungszentrum Heidelberg (DKFZ), Division Molecular Genetics, Heidelberg, Germany; 15 European Molecular Biology Laboratory (EMBL), Genome Biology Research Unit, Heidelberg, Germany; 16 Department of Hematology and Oncology, Georg-Augusts- University of Göttingen, Göttingen, Germany; 17 Institute of Functional Genomics, University of Regensburg, Regensburg, Germany; 18 Institute of Cell Biology (Cancer Research), University of Duisburg-Essen, Duisburg- Essen, Medical School, Essen, Germany; 19 Department of Internal Medicine II: Hematology and Oncology, University Medical Centre, Campus Kiel, Kiel, Germany; 20 Department of Medicine V, University of Heidelberg, Heidelberg, Germany; 21 Department of Translational Oncology, National Center for Tumor Diseases (NCT) and German Cancer Research Center (DKFZ), Heidelberg, Germany; 22 Department of Medicine III, Ulm University, Ulm, Germany; 23 Department of Pediatric Oncology, Hematology and Clinical Immunology, Heinrich-Heine- University, Düsseldorf, Germany; 24 Department of Human and Animal Cell Cultures, German Collection of Microorganisms and Cell Cultures, Braunschweig, Germany; 25 Institute of Pathology, Medical Faculty of the Ulm University, Ulm, Germany; 26 Department of General Internal Medicine, Christian-Albrechts-University, Kiel, Germany; 27 Bioinformatics Group, Department of Computer Science and Interdisciplinary Center of Bioinformatics, University of Leipzig, Leipzig, Germany; 28 Institute of Pharmacy and Molecular Biotechnology, Bioquant, University of Heidelberg, Heidelberg, Germany Nature Genetics: doi:10.1038/ng.2469

Upload: truongtu

Post on 17-Sep-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

1

Supplementary Note Recurrent mutation of the ID3 gene in Burkitt Lymphoma identified by integrated genome, exome and transcriptome sequencing

Julia Richter1*, Matthias Schlesner2*, Steve Hoffmann3*, Markus Kreuz4*, Ellen Leich5*, Birgit

Burkhardt6,7*, Maciej Rosolowski4, Ole Ammerpohl1, Rabea Wagener1, Stephan H. Bernhart3, Dido Lenze8, Monika Szczepanowski9, Maren Paulsen10, Simone Lipinski10, Robert B.

Russell11, Sabine Adam-Klages12, Gordana Apic11, Alexander Claviez13, Dirk Hasenclever4, Volker Hovestadt14, Nadine Hornig1, Jan O. Korbel15, Dieter Kube16, David Langenberger3,

Chris Lawerenz2, Jasmin Lisfeld7, Katharina Meyer17, Simone Picelli14, Jordan Pischimarov5, Bernhard Radlwimmer14, Tobias Rausch15, Marius Rohde7, Markus Schilhabel10, René

Scholtysik18, Rainer Spang17, Heiko Trautmann19, Thorsten Zenz20,21,22, Arndt Borkhardt23, Hans G. Drexler24, Peter Möller25, Roderick A.F. MacLeod24, Christiane Pott19, Stephan

Schreiber26, Lorenz Trümper16, Markus Loeffler4, Peter F. Stadler27, Peter Lichter14, Roland Eils2,28, Ralf Küppers18, Michael Hummel8#, Wolfram Klapper9#, Philip Rosenstiel10#, Andreas

Rosenwald5#, Benedikt Brors2#, Reiner Siebert1#

on behalf of the German ICGC MMML-Seq-Project+

* These authors contributed equally to this work, # These authors jointly directed this work

1Institute of Human Genetics, Christian-Albrechts-University, Kiel, Germany;

2Deutsches Krebsforschungszentrum

Heidelberg (DKFZ), Division Theoretical Bioinformatics, Heidelberg, Germany; 3Transcriptome Bioinformatics,

LIFE Research Center for Civilization Diseases, University of Leipzig, Leipzig, Germany; 4Institute for Medical

Informatics Statistics and Epidemiology, University of Leipzig, Leipzig, Germany; 5Institute of Pathology, University

of Wuerzburg, Wuerzburg, Germany; 6University Hospital Münster - Pediatric Hematology and Oncology,

Muenster Germany; 7University Hospital Giessen, - Pediatric Hematology and Oncology, Giessen, Germany;

8Institute of Pathology, Charité – University Medicine Berlin, Berlin, Germany;

9Hematopathology Section,

Christian-Albrechts-University, Kiel, Germany; 10

Institute of Clinical Molecular Biology, Christian-Albrechts-University, Kiel, Germany;

11Cell Networks, Bioquant, University of Heidelberg, Heidelberg, Germany;

12Institute of

Immunology, Christian-Albrechts-University, Kiel, Germany; 13

Department of Pediatrics, University Hospital Schleswig-Holstein, Campus Kiel, Kiel, Germany;

14Deutsches Krebsforschungszentrum Heidelberg (DKFZ),

Division Molecular Genetics, Heidelberg, Germany; 15

European Molecular Biology Laboratory (EMBL), Genome Biology Research Unit, Heidelberg, Germany;

16Department of Hematology and Oncology, Georg-Augusts-

University of Göttingen, Göttingen, Germany; 17

Institute of Functional Genomics, University of Regensburg, Regensburg, Germany;

18Institute of Cell Biology (Cancer Research), University of Duisburg-Essen, Duisburg-

Essen, Medical School, Essen, Germany; 19

Department of Internal Medicine II: Hematology and Oncology, University Medical Centre, Campus Kiel, Kiel, Germany;

20Department of Medicine V, University of Heidelberg,

Heidelberg, Germany; 21

Department of Translational Oncology, National Center for Tumor Diseases (NCT) and German Cancer Research Center (DKFZ), Heidelberg, Germany;

22Department of Medicine III, Ulm University,

Ulm, Germany; 23

Department of Pediatric Oncology, Hematology and Clinical Immunology, Heinrich-Heine-University, Düsseldorf, Germany;

24Department of Human and Animal Cell Cultures, German Collection of

Microorganisms and Cell Cultures, Braunschweig, Germany; 25

Institute of Pathology, Medical Faculty of the Ulm University, Ulm, Germany;

26Department of General Internal Medicine, Christian-Albrechts-University, Kiel,

Germany; 27

Bioinformatics Group, Department of Computer Science and Interdisciplinary Center of

Bioinformatics, University of Leipzig, Leipzig, Germany; 28

Institute of Pharmacy and Molecular Biotechnology, Bioquant, University of Heidelberg, Heidelberg, Germany

Nature Genetics: doi:10.1038/ng.2469

2

Supplementary Note: page

1. List of members 3 Full list of members of the ICGC MMML-Seq 3 Full list of members of the MMML 4 2. Analyzed Materials 5 3. Supplementary Tables 6 Table S1: Clinical and molecular characteristics of four pediatric Burkitt lymphoma subjected to genome, exome, transcriptome and methylome sequencing.

6

Table S2: Statistics of whole-genome sequencing 7 Table S3: Statistics of whole-exome sequencing 8 Table S4: Statistics of whole-transcriptome sequencing 8 Table S6: Comparison of SNPs called by the SNV pipeline with SNP Array 6.0 (Affymetrix) data 9 Table S7: Summary of somatic SNVs identified by genome and exome sequencing 9 Table S8: Summary of somatic structural aberrations identified by genome and exome sequencing 9 Table S10: Combined whole-exome and whole-genome data identified high confidence somatic mutations of MYC and TP53 and detected previously known mutations.

10

Table S11: Validation of somatic SNVs using Sanger sequencing. 10 Table S15: ID3 mutations at hotspots of somatic hypermutation machinery 11 Table S16: ID3 mutation pattern as well as wildtype allele were identified by 454 deep sequencing. 12 Table S17: Analysis of B-cell lymphoma cell lines. 14 Table S18: Mutation analysis of ID3, CCND3, RHOA and NHLH1 in cell lines. 15 Table S19: Correlation of ID3 mutation status with clinical data. 16 Table S20: Characteristics of the six non-mBL with ID3 mutations 17 Table S21: Predicted effect of ID3 mutations suitable for modeling on the TCF3/TCF4 interaction. 18 4. Supplementary Figures 20 Figure S1: Coverage and BAF plots of whole-genome sequencing of all four index samples. 20 Figure S2: Complex rearrangement of chromosome 1q in BL1 and BL4. 21 Figure S3: Mutational spectrum of all detected SNVs. 22 Figure S4: Heatmap of the expression pattern of mutated genes in mBLs, Intermediates, non-mBLs and in germinal centre cells

23

Figure S5: Biallelic mutation in the ID3 gene were identified in BL4 verified by Sanger sequencing 24 Figure S6: ID3 mutations identified by Sanger sequencing of IG-MYC positive B-cell lymphoma and B-cell lymphoma cell lines.

25

Figure S7: Biallelic involvement was identified on the genomic (DNA) and on the RNA (cDNA) level by (RT-)PCR followed by Sanger sequencing.

26

Figure S8: Distribution of mutations over the ID3 gene skewed to the functional parts of the protein. 27 Figure S9: Frequencies of distinctive ID3 mutation patterns as well as wildtype allele fractions identified by deep sequencing.

28

Figure S10: Western blot analysis identified cell lines with complete loss of ID3 protein expression. 29 Figure S11: Comparison of ID3 mutated and unmutated IG-MYC positive B-cell lymphoma. 30 Figure S12: Characteristics of ID3 mutated and unmutated proto-typic mBLs. 31 Figure S13: ID3 mutated and unmutated index samples share similar ID3 DNA methylation patterns. 32 Figure S14: 450K Illumina methylation array and Bisulfite-Pyrosequencing analysis of the ID3 locus showed no differences between ID3 mutated and unmutated samples.

33

Figure S15: Changes in 18q were associated with ID3 wildtype status in proto-typic mBL. 35 Figure S16: TCF4 expression is increased in ID3 wildtype mBL with 18q gain in comparison to ID3 mutated mBL.

36

Figure S17: Model of the interaction between ID3 and TCF4 together with a DNA fragment. 37 Figure S18: ID3 mutations affected highly conserved regions. 38 Supplementary References 39

Nature Genetics: doi:10.1038/ng.2469

3

1. List of Members Full List of Members of the ICGC MMML-Seq Kiel: Ole Ammerpohl1, Alexander Claviez2, Daniela Esser3, Nadine Hornig1, Wolfram Klapper4, Christiane Pott5, Gesine Richter1, Julia Richter1, Philip Rosenstiel3, Markus Schilhabel3, Stefan Schreiber6, Reiner Siebert1, Georg Hemmrich-Stanisak3, Monika Szczepanowski4, Heiko Trautmann5, Rabea Wagener1; Göttingen: Sonja Ebert7, Dieter Kube7, Katharina Meyer7, Lorenz Trümper7; Giessen: Birgit Burkhardt8,9, Jasmin Lisfeld9, Marius Rohde9; Heidelberg: Benedikt Brors10, Jürgen Eils10, Roland Eils10,11, Barbara Hutter10, Natalie Jäger10, Jan Korbel12, Chris Lawerenz10, Peter Lichter10, Bernhard Radlwimmer10, Sylwester Radomski10, Matthias Schlesner10, Ingrid Scholz10, Thorsten Zenz13,14; Ulm: Peter Möller15; Berlin: Dido Lenze16, Michael Hummel16; Wuerzburg: Ellen Leich17, Jordan Pischimarov17, Andreas Rosenwald17; Duesseldorf: Vera Binder18, Arndt Borkhardt18, Jessica Hoell18, Kebria Hezaveh18; Leipzig: Stephan H. Bernhart19, Hans Binder20, Steve Hoffmann19, Lydia Hopp20, Markus Kreuz21, David Langenberger19, Markus Loeffler21, Maciej Rosolowski21, Peter F. Stadler22; Frankfurt: Martin-Leo Hansmann23; Essen: Ralf Küppers24, Marc Weniger24, Rene Scholtysik24; Regensburg: Rainer Spang25 1Institute of Human Genetics, Christian-Albrechts-University Kiel, Germany;

2Department of Pediatrics,

University Hospital Schleswig-Holstein, Campus Kiel, Germany; 3Institute of Clinical Molecular

Biology, Christian-Albrechts-University, Kiel, Germany; 4Hematopathology Section, Christian-

Albrechts-University, Kiel, Germany; 5Department of Internal Medicine II: Hematology and Oncology,

University Medical Centre, Campus Kiel, Germany; 6Department of General Internal Medicine,

Christian-Albrechts-University, Kiel, Germany; 7Department of Hematology and Oncology, Georg-

Augusts-University of Göttingen, Göttingen, Germany; 8University Hospital Münster - Pediatric

Hematology and Oncology, Germany; 9University Hospital Giessen, - Pediatric Hematology and

Oncology, Germany; 10

Deutsches Krebsforschungszentrum Heidelberg (DKFZ), Division Theoretical Bioinformatics, Heidelberg, Germany;

11Institute of Pharmacy and Molecular Biotechnology, Bioquant,

University of Heidelberg, Heidelberg, Germany; 12

EMBL Heidelberg, Genome Biology, Heidelberg, Germany;

13Department of Medicine V, University of Heidelberg, Heidelberg, Germany;

14Department

of Translational Oncology, National Center for Tumor Diseases (NCT) and German Cancer Research Center (DKFZ), Heidelberg, Germany;

15Institute of Pathology, Medical Faculty of the Ulm University,

Ulm, Germany; 16

Institute of Pathology, Charité – University Medicine Berlin, Germany; 17

Institute of Pathology, University of Wuerzburg, Germany;

18Department of Pediatric Oncology, Hematology and

Clinical Immunology, Heinrich-Heine-University, Düsseldorf, Germany; 19

Transcriptome Bioinformatics, LIFE Research Center for Civilization Diseases, Leipzig, Germany;

20Interdisciplinary Center for

Bioinformatics, University Leipzig, Germany; 21

Institute for Medical Informatics Statistics and Epidemiology, Leipzig, Germany;

22Bioinformatics Group, Department of Computer Science and

Interdisciplinary Center of Bioinformatics, University of Leipzig, Leipzig, Germany; 23

Senckenberg Institute of Pathology, University of Frankfurt Medical School, Frankfurt am Main, Germany;

24Institute

of Cell Biology (Cancer Research), University of Duisburg-Essen, Essen, Germany; 25

Institute of Functional Genomics, University of Regensburg, Germany.

Nature Genetics: doi:10.1038/ng.2469

4

Full List of Members of the MMML Pathology group: Thomas F.E. Barth1, Heinz-Wolfram Bernd2, Sergio B. Cogliatti3, Alfred C. Feller2, Martin L. Hansmann4, Michael Hummel5, Wolfram Klapper6, Peter Möller1, Hans-Konrad Müller-Hermelink7, German Ott7, Andreas Rosenwald7, Harald Stein5, Monika Szczepanowski6, Hans-Heinrich Wacker6. Genetics group: Thomas F.E. Barth1, Petra Behrmann8, Peter Daniel10, Judith Dierlammm8, Eugenia Haralambieva7, Lana Harder11, Paul-Martin Holterhus12, Ralf Küppers13, Dieter Kube13, Peter Lichter14, Jose I. Martín-Subero11, Peter Möller1, Eva M. Murga-Peñas9, German Ott7, Christiane Pott16, Armin Pscherer15, Andreas Rosenwald7, Carsten Schwaenen17, Reiner Siebert11, Heiko Trautmann16, Martina Vockerodt18, Swen Wessendorf16. Bioinformatics group: Stefan Bentink19, Hilmar Berger20, Dirk Hasenclever20, Markus Kreuz20, Markus Loeffler20, Maciej Rosolowski20, Rainer Spang19. Project coordination: Benjamin Stürzenhofecker14, Lorenz Trümper14, Maren Wehner14. Steering committee: Markus Loeffler19, Reiner Siebert11, Harald Stein5, Lorenz Trümper14. 1Institute of Pathology, University Hospital of Ulm, Germany;

2Institute of Pathology, University

Hospital Schleswig-Holstein Campus Lübeck, Germany; 3Institute of Pathology, Kantonsspital St.

Gallen, Switzerland; 4Institute of Pathology, University Hospital of Frankfurt, Germany;

5Institute of

Pathology, Campus Benjamin Franklin, Charité–Universitätsmedizin Berlin, Germany; 6Institute of

Hematopathology, University Hospital Schleswig-Holstein Campus Kiel/ Christian-Albrechts University Kiel, Germany,

7Institute of Pathology, University of Würzburg, Germany;

8Cytogenetic and Molecular

Diagnostics, Internal Medicine III, University Hospital of Ulm, Germany; 9University Medical Center

Hamburg-Eppendorf, Hamburg, Germany; 10

Department of Hematology, Oncology and Tumor Immunology, University Medical Center Charité, Germany;

11Institute of Human Genetics, University

Hospital Schleswig-Holstein Campus Kiel/Christian-Albrechts University Kiel, Germany; 12

Division of Pediatric Endocrinology and Diabetes, Department of Pediatrics, University Hospital Schleswig-Holstein Campus Kiel / Christian-Albrechts University Kiel, Germany;

13Institute for Cell Biology

(Tumor Research), University of Duisburg-Essen, Germany; 14

Department of Hematology and Oncology, Georg-August University of Göttingen, Germany;

15German Cancer Research Center

(DKFZ), Heidelberg, Germany; 16

Second Medical Department, University Hospital Schleswig-Holstein Campus Kiel/ Christian-Albrechts University Kiel, Germany;

17Cytogenetic and Molecular Diagnostics,

Internal Medicine III, University Hospital of Ulm, Germany; 18

Department of Pediatrics I, Georg-August University of Göttingen, Germany;

19Institute of Functional Genomics, University of Regensburg,

Germany; 20

Institute for Medical Informatics, Statistics and Epidemiology, University of Leipzig, Germany.

Nature Genetics: doi:10.1038/ng.2469

5

2. Analyzed Materials A. Burkitt Lymphomas subjected to next generation sequencing Of all four patients pre-treatment tumor tissue and germline material (peripheral blood, buffy coats) were studied with informed consent of the respective parents and patients (if above 14 years). The ICGC Malignant Lymphoma study was approved by the Institutional Review Board of the Medical Faculty of the University of Kiel (A150/10) and of the recruiting centers.

B. Validation cohort from the MMML consortium A cohort of 103 previously characterized mature aggressive B-cell lymphomas from the Deutsche Krebshilfe Network “Molecular Mechanisms in Malignant Lymphomas” (MMML) in which previous studies had identified an IG-MYC fusion was used for validation. These cases have been extensively characterized by histopathology review, immunohistochemistry, interphase FISH, gene expression profiling, array-CGH/SNP-arrays and mutation analysis as described recently1-9. The protocols of the MMML network have been approved by central (University of Göttingen) and local review boards (Institutional Review Board of the Medical Faculty of the University of Kiel, D403/05). In addition to cases included in previous publications (GEO Accessions: GSE4475, GSE10172, GSE22470) a total of 37 IG-MYC positive hitherto unreported cases studied according to the MMML protocols were studied herein2,7,9.

C. B-cell lymphoma cell lines All cell lines were obtained from the German Cell Culture Collection (DSMZ, Braunschweig, Germany). Cell line authenticity has been confirmed by STR analysis using the StemElite ID System (Promega, Mannheim, Germany). Sequencing of the ID3 gene has been performed in all cell lines. Based on the results of Sanger sequencing cell lines were selected for ID3 RT-PCR and Western Blot analysis. Furthermore, CCND3 mutation analysis was performed on most of the cell lines. Based on the results of the ID3 mutation analysis four cell lines were selected for functional analyses.

Nature Genetics: doi:10.1038/ng.2469

6

3. Supplementary Tables Supplementary Table 1: Clinical and molecular characteristics of four pediatric Burkitt lymphoma patients subjected to genome, exome, transcriptome and methylome sequencing.

patient code BL1 BL2 BL3 BL4

patient ID 4182393 4119027 4125240 4190495

age at diagnosis (years) 11 12 4 15

gender male male male male

diagnosis BL, sporadic BL, sporadic BL, sporadic BL, sporadic

GEP GCB GCB GCB NA

molecular diagnosis (according to

1)

molecular Burkitt lymphoma

molecular Burkitt lymphoma

molecular Burkitt lymphoma

NA

Pathway activation pattern (according to

6)

BL-PAP BL-PAP BL-PAP NA

CD20 +++ +++ +++ +++

CD10 +++ +++ +++ +++

CD5 - - - -

BCL2 - - - +/-

BCL6 +++ +++ ++ +++

MUM1 +++ - - NA

HLA-DR ++ +++ +++ NA

Ki67 (in %) 100 99 97 95

MYC translocation IGH-MYC IGH-MYC IGH-MYC IGL-MYC

t(14;18)/ BCL2 break neg neg neg neg

BCL6 break neg neg neg neg

MALT1 break neg neg neg NA

Treatment / Clinical Trial B-NHL-BFM 04 B-NHL-BFM 04 NHL-BFM 95 B-NHL-BFM 04

Murphy Stage III III II III

Karnofsky Stage 40-60% 40-60% 40-60% 60-80%

B-symptoms yes no no no

bone marrow involvement no no no no

outcome CR CR CR R

BL, Burkitt lymphoma; +++, high expression (> 75%); ++, moderate expression (51% - 75%); +, slight expression (25% - 50%); -, no expression (< 25%); BFM, Berlin-Frankfurt-Münster NHL trials; GEP, gene expression profile; GCB, germinal centre B cell signature; CR, complete remission; R, relapse; NA, not available

Nature Genetics: doi:10.1038/ng.2469

7

Supplementary Table 2: Statistics of whole-genome sequencing patient code

tissue total read count total reads

mapped (in %) properly

paired (in %) singletons

(in %) aligned read

count pe insertsize

(mapq>0) qc bases / total bases w/o n

coverage (qc bases)

BL1 germline 1391968880 90.65 89.98 0.31 1261819789 269 119782926733/2861327131 41.86x

BL1 tumor 1156401981 90.88 89.03 0.45 1050938120 268 90030595847/2861327131 31.46x

BL2 germline 659371114 93.85 92.92 0.26 618819790 246 59032432335/2861327131 20.63x

BL2 tumor 760511023 94.02 93.35 0.32 715032463 222 64638321205/2861327131 22.59x

BL3 germline 1504280586 87.29 86.20 0.44 1313086523 235 123374717398/2861327131 43.12x

BL3 tumor 1162204055 88.68 83.80 0.78 1030642555 328 82971878082/2861327131 29.0x

BL4 germline 1195691065 91.05 90.44 0.30 1088676714 240 104086615244/2861327131 36.38x

BL4 tumor 1056601011 92.39 90.04 0.52 976193674 267 87709640067/2861327131 30.65x

Total read count, number of reads obtained from sequencer; total reads mapped, percentage of reads which could be mapped to reference genome; properly paired, percentage of reads which form a proper pair on reference genome; singletons, percentage of reads for which no mate was aligned by the mapper; aligned read count, number of reads which were mapped to reference genome (product of total read count * total read mapped (in %)/100); pe insertsize, average size of sequenced fragment calculated from proper pairs with mapping quality > 0; qc bases / total bases w/o n, number of sequenced bases which passed qc criteria / number of bases in reference genome without N’s; coverage (calculated from values of previous column). Coverage calculations following duplicate removal considered all informative bases of the reference genome (excluding Ns). A mean Phred-scaled base quality of at least 25 across the length of the read was required. For target capture sequencing, only bases of reads overlapping on-target regions were considered for coverage calculations.

Nature Genetics: doi:10.1038/ng.2469

8

Supplementary Table 3: Statistics of whole-exome sequencing

patient code

tissue total read

count

total reads mapped

(in %)

properly paired (in %)

singletons (in %)

aligned read count

pe insertsize (mapq>0)

qc bases on target

ratio

on target coverage

(qc bases)

qc bases on target / total bases on target

BL1 germline 77953512 93.41 91.42 0.26 72816375 154 5205781108 0.79 114.47x 4116466904/35961919

BL1 tumor 81836099 89.98 87.12 0.42 73636121 146 5248147699 0.80 117.29x 4218060980/35961919

BL2 germline 78898735 97.05 96.11 0.18 76571222 177 5499017560 0.75 114.61x 4121504034/35961919

BL2 tumor 130748000 95.28 93.83 0.2 124576694 177 8533713914 0.76 179.44x 6452901137/35961919

BL3 germline 80725816 97.34 96.33 0.16 78578509 180 5643736901 0.75 117.89x 4239502035/35961919

BL3 tumor 67987635 92.62 90.24 0.27 62970147 154 4504983139 0.81 101.27x 3641874744/35961919

BL4 germline 84771073 94.52 92.79 0.22 80125618 160 5738807862 0.79 125.3x 4505865849/35961919

BL4 tumor 74735636 82.42 77.29 0.47 61597111 130 4378155105 0.82 99.53x 3579450691/35961919

Additional columns for exome sequencing: On target ratio, number of bases in reads which overlap the target region divided by all sequenced bases; qc bases on target/total bases on target, number of bases which passed qc criteria in reads which overlap target region/all bases of target region; on target coverage, calculated from values of previous column. Coverage calculations following duplicate removal considered all informative bases of the reference genome (excluding Ns). A mean Phred-scaled base quality of at least 25 across the length of the read was required. For target capture sequencing, only bases of reads overlapping on-target regions were considered for coverage calculations.

Supplementary Table 4: Statistics of whole-transcriptome sequencing

patient code

tissue total read count total reads mapped (in %)

properly paired (in %)

uniquely mapped (in%)

singletons (in %)

BL1 tumor 134646148 91.66 91.41 86.20 0.25

BL2 tumor 122070424 91.37 91.09 87.3 0.28

BL3 tumor 118552896 87.95 87.68 85.49 0.28

BL4 tumor 126625502 92.04 91.62 84.46 0.42

total read count, number of reads obtained from sequencer; total reads mapped, percentage of reads which could be mapped to reference genome; properly paired, percentage of reads which form a proper pair on reference genome; uniquely mapped, percentage of reads which could be mapped uniquely to reference genome; singletons, percentage of reads for which no mate was aligned by the mapper.

Nature Genetics: doi:10.1038/ng.2469

9

Supplementary Table 6: Comparison of SNPs called by the SNV pipeline with SNP Array 6.0 (Affymetrix) data

patient code

SNPChip6.0 NGS SNPChip6.0 vs NGS (hc) Comparison SNP Array 6.0 vs NGS

positions on array

SNVs called

not reference

allele

array hom

array het

positions covered

(>7 reads)

variant position

position called

matching alleles

matching hom

array hom,

ngs het

matching het

array het, ngs

hom

called (in %)

called (sg, in %)

called (hc, in %)

called (sg, hc, in

%)

BL1 906380 895769 408666 176666 229165 894620 405965 400264 400134 175627 866 222697 944 99.34 98.85 97.94 97.47

BL2 906380 900494 403671 179071 221921 895547 401172 376629 376458 175307 2937 197975 239 99.38 98.51 93.30 92.47

BL3 906380 897467 407430 176984 227407 894597 404531 390534 390394 175864 806 213102 622 99.29 98.89 95.85 95.47

BL4 906380 897770 405540 180207 222921 896257 403271 391628 391492 177972 1768 210955 797 99.44 98.75 96.57 95.90

array, SNP Array 6.0; hom, homozygous; het, heterozygous; NGS, combined data of whole-exome and whole-genome sequencing data (>7 reads); sg, same genotype; hc, high confidence SNP positions from the SNP array which were not homozygous for the reference allele in the SNP array and which were covered by at least seven reads in the tumor BAM file, were included in the comparison. Supplementary Table 7: Summary of somatic SNVs identified by genome and exome sequencing

number of SNVs

patient code exonic non-exonic total

BL1 34 1567 1601

BL2 35 2420 2455

BL3 32 2065 2097

BL4 55 5122 5177

Supplementary Table 8: Summary of somatic structural aberrations identified by genome and exome sequencing patient code

deletions duplications insertions inversions replacements translocations total length (in bp)

total

BL1 167 0 56 0 5 NA 228

< 10 1072 BL2 157 0 46 0 15 NA 218

BL3 140 0 38 0 6 NA 184

BL4 330 0 104 0 8 NA 442

BL1 14 0 56 0 16 NA 86

10 - 50 222 BL2 12 0 25 0 12 NA 49

BL3 6 0 20 0 9 NA 35

BL4 12 0 32 0 8 NA 52

BL1 17 7 7 4 3 4 42

> 50 / TX 139 BL2 11 7 1 8 4 1 32

BL3 18 1 1 4 1 4 29

BL4 25 3 0 4 3 1 36

NA, not available; TX, translocation

Nature Genetics: doi:10.1038/ng.2469

10

Supplementary Table 10: Combined whole-exome and whole-genome data identified high confidence somatic mutations of MYC and TP53 and detected previously known mutations

patient code

gene chr position ref alt region consequence analyzed

within MMML

SNV confirmed by MMML

BL1 MYC chr8 128749190 G C intronic NA Y Y

BL1 MYC chr8 128749254 G A intronic NA Y Y

BL1 MYC chr8 128749349 G A intronic NA Y Y

BL2 MYC chr8 128749349 G A intronic NA Y Y

BL2 MYC chr8 128749446 T G intronic NA Y Y

BL3 MYC chr8 128750598 C T exonic N45N Y Y

BL3 MYC chr8 128750921 T G exonic F153C Y Y

BL3 TP53 chr17 7578204 A C exonic S215R Y Y

BL2 TP53 chr17 7578415 A T exonic V172D Y Y

chr, chromosome; ref, reference allele; alt, mutated allele; NA, not analyzable; Y, Yes. For BL4 no previous data were available.

Supplementary Table 11: Validation of somatic SNVs using Sanger sequencing

gene chr position ref/alt consequence patient code mutation status

CCND3 chr6 41903688 A/T I218K BL3 confirmed

CREBBP chr16 3789616 G/A Q1377X BL3 confirmed

CSNK1A1 chr5 148904672 T/C E98G BL4 confirmed

DPCR1 chr6 30916653 C/T R138C BL4 confirmed

FBXO11 chr2 48036837 A/T I783N BL1 confirmed

48045966 A/G L653P BL3 confirmed

ID3 chr1

23885677 G/A Q81X BL1 confirmed

23885677 G/A Q81X BL4 confirmed

23885616 A/G splicing BL4 confirmed

23885511 C/T splicing BL1 confirmed

MEGF6 chr1 3428569 C/T splicing BL3 confirmed

NHLH1 chr1 160340768 C/T R83X BL3 confirmed

RHOA chr3

49413009 C/T R5Q BL2 confirmed

49413009 C/T R5Q BL4 confirmed

49412955 A/C I23R BL4 confirmed

ROCK1 chr18 18625301 T/C E181G BL1 questionable*

SIN3A chr15 75715017 G/A Q113X BL2 confirmed

SMARCA4 chr19 11134252 G/A R973Q BL1 confirmed

11105679 T/C splicing BL3 confirmed

TCF3 chr19 1611845 C/T R606Q BL4 confirmed

PID, patient ID; chr, chromosome; position, affected nucleotide position based on GRCh37/hg19; ref, reference allele; alt, variant allele; consequence, amino acid change introduced by SNV; splicing, splice-site is affected by SNV; *, variant allele frequency as determined from SNV calling was only 0.08 which is below the sensitivity limit of Sanger sequencing.

Nature Genetics: doi:10.1038/ng.2469

11

Supplementary Table 15: ID3 mutations at hotspots of somatic hypermutation machinery analyzed samples Mutations overlapping RGYW motif Mutations at G of RGYW motif

primary lymphoma observed mutations 64 observed mutations at G/C 48

observed mutations overlapping motif 35 observed mutations at G of motif 27

sequence length 607 occurrences of motif in sequence 40

bases overlapping motif 154 number of G/C in sequence 378

expected mutations overlapping motif 16.23723 expected mutations at G of motif 5.079

cell lines observed mutations 12 observed mutations at G/C 12

observed mutations overlapping motif 11 observed mutations at G of motif 8

sequence length 607 occurrences of motif in sequence 40

bases overlapping motif 154 number of G/C in sequence 378

expected mutations overlapping motif 3.044481 expected mutations at G of motif 1.269841

Mutation in the genome region of the ID3 gene (chr1:23885428-23886034) were analyzed for a bias towards targeting the RGYW motif (or its reverse complement WRCY). Expected values were calculated by dividing the number of bases overlapping the motif by the number of all bases in the region and multiplying with the total number of mutations (non-mutations) observed (left part) or by dividing the number of G/C in the motif by the number of all G/C in the region and multiplying with the total number of mutations (non-mutations) observed at G/C positions. In primary IG-MYC positive mature aggressive B-cell lymphomas ID3-mutations were significantly enriched (P-value = 0.0054) at hotspots of the somatic hypermutation machinery (RGYW). Additionally, a significant proportion of mutations (N=27) hitting G of the RGYW motif (P-value = 7.7e-7). In concordance, ID3 mutations of cell lines were also significantly enriched at the RGYW motif (P-value = 0.02889).

Nature Genetics: doi:10.1038/ng.2469

12

Supplementary Table 16: ID3 mutation pattern as well as wildtype allele were identified by 454 deep sequencing.

Sample ID

pattern 454 read

count 454 read fraction

clone count

c.1

39_263d

el

c.1

43A

>G

c.1

44C

>A

c.1

46C

>T

c.1

52T

>A

c.1

90C

>G

, T

c.2

02G

>A

c.2

05A

>G

c.2

27A

>G

c.2

32C

>G

c.2

37_248d

el

c.2

41C

>T

c.2

48T

>G

c.3

00+

1G

>A

,T

18

normal 317 0.033 1 . . .

mutation pattern 1 5789 0.600 3 . . T

mutation pattern 2 3360 0.348 3 G G .

mutation pattern 3 66 0.007 0 . G .

mutation pattern 4 47 0.005 0 G . T

mutation pattern 5 44 0.005 1 G G T

91

normal 1079 0.132 0 . . . .

mutation pattern 1 4094 0.499 3 G . G del

mutation pattern 2 2760 0.337 1 . A . .

mutation pattern 3 99 0.012 0 G . . .

mutation pattern 4 44 0.005 0 . A G del

mutation pattern 5 42 0.005 0 . . G del

mutation pattern 6 19 0.002 0 G A . .

527

normal 2288 0.187 3 . . .

mutation pattern 1 6547 0.536 9 A T .

mutation pattern 2 2910 0.238 2 . . G

mutation pattern 3 426 0.035 1 A T G

mutation pattern 4 29 0.002 0 . T .

580

normal 1784 0.105 0 . .

mutation pattern 1 8067 0.475 6 . A

mutation pattern 2 7007 0.413 5 T .

mutation pattern 3 119 0.007 0 T A

586

normal 535 0.051 0 . .

mutation pattern 1 5178 0.493 6 . A

mutation pattern 2 3044 0.290 7 T T

mutation pattern 3 1226 0.117 2 T A

mutation pattern 4 501 0.048 0 T .

mutation pattern 5 23 0.002 0 . T

623

normal 2888 0.113 1 . . .

mutation pattern 1 10098 0.396 6 . . G

mutation pattern 2 9678 0.380 8 A G .

mutation pattern 3 953 0.037 0 A G G

mutation pattern 4 816 0.032 0 . G .

mutation pattern 5 686 0.027 0 A . G

mutation pattern 6 262 0.010 0 A . .

mutation pattern 7 88 0.003 0 . G G

Nature Genetics: doi:10.1038/ng.2469

13

clone count, the number of clone are given which showed the same mutation pattern in the conventional subcloning experiments. Results of ID3 exon 1 sequencing from six primary Burkitt lymphoma samples using 454 technology. To eliminate sequencing errors, only variants previously identified by Sanger sequencing were included in this analysis. Additionally, only mutation patterns supported by a read fraction higher than the fraction of non-reference reads in the control samples were considered. The finding of multiple hits on the same allele was confirmed by a number of conventional subcloning experiments of the same cases (see colum clone count).

Nature Genetics: doi:10.1038/ng.2469

14

Supplementary Table 17: Analysis of B-cell lymphoma cell lines.

cell line name diagnosis age gender ID3 Sanger sequencing

ID3 RT-PCR

ID3 Western blot

CCND3 Sanger sequencing

Functional analyses

BL-2 BL 7 male + + + + +

BL-41 BL 8 male + NA + + +

BL-70 BL 16 male + + + +

Blue-1 BL 29 male + NA + +

CA-46 BL na male + + + +

Daudi BL 16 male + NA NA +

DG-75 BL 10 female + + + + +

EB-1 BL 9 female + NA + +

HT DLBCL 70 male + NA + +

Karpas-422 DLBCL 73 female + NA + +

MC-116 B-cell lymphoma n.d. n.d. + NA NA +

Namalwa BL child n.d. + NA + +

Raji BL 12 male + + + + +

Ramos BL 3 male + NA NA +

SU-DHL-10 B-NHL 25 male + NA NA NA

SU-DHL-5 B-NHL 17 female + + + +

SU-DHL-6 B-NHL 43 male + NA + +

U-698-M BL 7 male + NA + NA

BL, Burkitt lymphoma; DLBCL, diffuse large B-cell lymphoma; B-NHL, B-cell non-Hodgkin lymphoma; NA, not analyzed; n.d., no data available; +, cell line was used for corresponding analysis.

Nature Genetics: doi:10.1038/ng.2469

15

Supplementary Table 18: Mutation analysis of ID3, CCND3, RHOA and NHLH1 in cell lines

cell line subtype ID3 CCND3 RHOA NHLH1

mutation splice-site mutation biallelic RT-PCR Western blot mutation mutations mutations

BL-2 BL

c.300G>A(sm), c.300+1G>C

E1t-E2, E1-E2*

slight expression wt wt wt

BL-41 BL c.[202G>C];[202G>C],

p.[Glu68*];[Glu68*] Y

no expression

c.798_846del (#), p.Pro267Leufs*21

wt wt

BL-70 BL c.139_264del, p.Cys47Pro*32 c.300+1G>A Y E1t-E2,

E1del-E2* no expression

c.801_802insC, p.Lys268Glnfs*56

wt wt

Blue-1 BL c.236_240delACCTG,

p.Asp79Alafs*13 expressed c.869T>G, p.Ile290Arg wt wt

CA-46 BL c.160C>G, p.Leu54Val; c.190C>T, p.Leu64Phe

E1-E2 strong

expression wt wt wt

Daudi BL c.160C>G, p.Leu54Val;

c.241C>T, p.Gln81* wt wt wt

DG-75 BL wt

E1-E2 strong

expression c.811_812insG,

p.Ser273Leufs*51 wt wt

EB-1 BL wt

expressed wt wt nd

Namalwa BL c.220_360+66del (#),

p.Ile74Val*26 expressed wt wt wt

Raji BL wt

E1-E2 expressed wt wt wt

Ramos BL wt

wt nd nd

U-698-M BL c.166C>T, p.Pro56Ser; c.233T>C, p.Leu78Pro

slight expression NA nd nd

MC-116 B-cell

lymphoma c.300G>A(sm),

c.300+1G>A c.866_872del,

p.Ile290Cysfs*12 nd nd

HT DLBCL wt

No expression wt wt wt

Karpas-422 DLBCL wt

No expression wt wt wt

SU-DHL-10 B-NHL wt

NA nd nd

SU-DHL-5 B-NHL wt

E1-E2* slight expression wt nd nd

SU-DHL-6 B-NHL wt

slight expression wt nd nd

NA, not analyzable; nd, not done; wt, wildtype; Y, Yes; sm, silent mutation affecting last base of ID3 exon1; E1-E2, normal splicing of exon 1 to exon 2; E1t, truncated exon 1; E1del, deletion in exon1; *, verified by sequencing; (#), complex aberration. Mutation code of structural variants (e.g. deletions, duplications): Due to sequence issues, especially sequence homology of the start and end of structural mutations in some patients two possible genomic codes (c.) which always resulted in the same amino acid change are conceivable. Mutation analysis targeted exon 5 of CCND3 and the coding exons of RHOA (with the exception of parts of exon 4) and NHLH1. Exon 4 of RHOA couldn´t be completely sequenced due to T-stretches directly before and behind exon 4. Using an internal sequencing primer only the last part of exon 4 could be sequenced and interpreted for mutation analysis.

Nature Genetics: doi:10.1038/ng.2469

16

Supplementary Table 19: Correlation of ID3 mutation status with clinical data all molecular diagnosis (n=100) only mBL (lacking IGH-BCL2 fusion and BCL6 breaks; n=53)

ID3-mutated ID3-unmutated P-value ID3-mutated ID3-unmutated P-value

gender female 12 29% 22 38%

10 28% 3 18% male 30 71% 36 62% 0.395 26 72% 14 82% 0.511

CD10 neg 2 5% 13 24%

2 6% 1 6% (>0%: pos, =0%: neg) pos 38 95% 42 76% 0.021 32 94% 15 94% 1

CD5 neg 36 90% 48 92%

30 88% 14 88% (>0%: pos, =0%: neg) pos 4 10% 4 8% 0.724 4 12% 2 11% 1

BCL2 neg 33 85% 28 50%

29 85% 12 75% (>25%: pos, ≤25%: neg) pos 6 15% 28 50% < 0.001 5 15% 4 25% 0.442

BCL6 neg 0 0% 12 21%

0 0% 1 6% (>25%: pos, ≤25%: neg) pos 39 100% 44 79% 0.001 33 100% 16 94% 0.34

MUM1 neg 22 63% 41 77%

20 67% 12 75% (>25%: pos, ≤25%: neg) pos 13 37% 12 23% 0.155 10 33% 4 25% 0.739

KI67 neg 2 5% 24 44%

2 6% 0 0%

(90%:pos, <90%: neg) pos 38 95% 31 56% < 0.001 32 94% 16 100% 1

MYC status IG-MYC 42 100% 58 100%

36 100% 17 100%

t(14;18) neg 42 100% 44 76%

36 100% 17 100%

pos 0 0% 14 24% < 0.001 0 0% 0 0%

BCL6 break neg 42 100% 49 84%

36 100% 17 100%

pos 0 0% 9 16% 0.009 0 0% 0 0%

GCB ABC signature ABC 0 0% 6 10%

GCB 36 86% 48 83%

31 86% 17 100% unclassified 6 14% 4 7% 0.051 5 14% 0 0% 0.163

molecular diagnosis intermediate 6 14% 23 40%

mBL 36 86% 19 33%

36 100% 17 100% non-mBL 0 0% 16 28% < 0.001

Clinical data of three index cases (BL4 was not analyzed in the forerunner project MMML) and validation cohort where array-based gene expression data (N=100) from the MMML study were available.

Nature Genetics: doi:10.1038/ng.2469

17

Supplementary Table 20: Characteristics of the six non-mBL with ID3 mutations

patient code

ID3 status

age gender panel

diagnosis CD

20

CD

10

CD

5

BC

L2

BC

L6

MU

M1

HL

AD

R

KI6

7

MY

C s

tatu

s

MY

C b

reak

t(8;1

4)

t(14;1

8)

BC

L6

bre

ak

MA

LT

1 b

reak

GC

B A

BC

sig

na

ture

Mo

lecu

lar

dia

gn

osis

(accord

ing to 1

)

Path

way

acti

vati

on

pa

ttern

(a

ccord

ing to 6

)

mB

L s

ign

atu

re

ind

ex s

co

re

569 mut 49 male DLBCL 4 4 0 0 3 NA 4 95 IG-MYC pos pos neg neg neg GCB intermediate mind-L 0.927

275 mut 42 female DLBCL 4 3 0 3 2 3 4 90 IG-MYC pos pos neg neg neg unclassified intermediate BL-PAP 0.894

574 mut 10 male BL 4 4 0 0 2 0 4 99 IG-MYC pos pos neg neg neg GCB intermediate mind-L 0.923

619 mut 5 female BL 4 4 0 0 2 0 4 90 IG-MYC pos pos neg neg neg GCB intermediate mind-L 0.505

593 mut 15 male BL 4 4 0 NA 3 2 4 95 IG-MYC pos pos neg neg neg GCB intermediate mind-L 0.839

572 mut 15 male BL 4 4 0 0 3 2 0 95 IG-MYC pos pos neg neg NA GCB intermediate mind-L 0.101

mut, mutated; 0, 0% cells are positive; 1, 1- 25% cells are positive; 2, 26- 50% cells are positive; 3, 51- 75% cells are positive; 4, 76- 100% cells are positive; NA, not analyzed With respect to characteristics of conserved oncogenic expression signatures the cases belonged to different patterns of pathway activity (PAPs), mBL specific PAP (BL-PAP), DLBCL-PAPs (PAP-1, 2, 3) and samples with a heterogeneous pattern of molecular individual lymphoma (mind-L)6. According to the publication of Hummel et al.,9 lymphomas were classified according to their mBL-signature index score as molecular Burkitt lymphoma (mBL, score greater than 0.95). In contrast, lymphomas with a mBL-signature index score of less than 0.05 were assigned to the non-mBL group. Cases with a score between 0.05 and 0.95 could not be assigned unambiguously to the mBL or non-mBL group. These cases were assigned to the intermediate group, representing the transition zone between the mBL and non-mBL groups9.

Nature Genetics: doi:10.1038/ng.2469

18

Supplementary Table 21: Predicted effect of ID3 mutations suitable for modeling on the TCF3/TCF4 interaction. Mutation Location Predicted

effect TCF3/4

Comment on predicted effect and interaction partner

p.Ser39Arg DNA severe Contacts DNA, adding a much larger residue would disrupt interaction. Positive charged added close to DNA backbone (might increase

affinity).

p.Tyr48Cys dimer severe Makes various hydrophobic contacts with TCF3/4 mutation would disrupt many of these (aa54 Leu, aa22 Phe, aa18 Ile). Close to DNA

binding site.

p.Ser49Phe surface severe No contacts to other subunits or DNA, but adds a hydrophobic aa on the surface

p.Leu51His dimer severe contacts four hydrophobic residues (aa45 Leu, aa49 Val, aa25 Leu, aa52 Ile) which would be lost

p.Leu54Val dimer slight Central residue at the dimer interface, but change is mild (Leu to Val)

p.Val55Glu dimer moderate contacts one residues (aa52 Ile) which would result in some (limited) changes

p.Val55Ala dimer slight contacts one residues (aa52 Ile) which would result in some (limited) changes

p.Pro56Ser surface moderate No contacts to other subunits or DNA but replaces a Pro with a Ser, which might some (limited) changes as the Pro doesn’t appear to be in

a Pro-specific orientation

p.Pro56Ala surface moderate No contacts to other subunits or DNA but replaces a Pro with a Ala, which might some (limited) changes as the Pro doesn’t appear to be in

a Pro-specific orientation

p.Pro56Arg surface moderate No contacts to other subunits or DNA but replaces a Pro with a Ala, which might some (limited) changes as the Pro doesn’t appear to be in

a Pro-specific orientation

p.Leu64Val interior slight no contacts to other subunits or DNA, contacts monomer, slight changes

p.Leu64Phe interior slight no contacts to other subunits or DNA, contacts monomer, slight changes

p.Leu64Arg interior severe Probably disrupts hydrophobic packing, adds a positive charge next to another Arg (either aa37 or aa57)

p.Ser65Arg DNA severe Contacts the DNA backbone, changing the character of this aa would affect this (might even increase affinity)

p.Ser65Asn DNA severe Contacts the DNA backbone, changing the character of this aa would affect this (might even increase affinity)

p.Glu68Lys DNA severe Contacts the DNA and an Arg from TCF3/4 (aa14) would likely have severe consequences

p.Ile69Val interior slight Contact within the monomer, mild change

p.Arg72_Gln81dup tetramer slight Would prevent tetramerization, could affect solubility, would be a problem for dimer/DNA binding

p.Ile74Asn dimer severe Hydrophobic contacts with three residues (aa28 Met, aa25 Leu, aa21 Ala) which would be lost upon mutation to the polar Asn

p.Ile74Val dimer slight Hydrophobic contacts with three residues (aa28 Met, aa25 Leu, aa21 Ala) which would mostly maintained as change is slight (Ile - Val)

p.Asp75_Tyr76insTyrAsp dimer severe Extensive contacts at the dimer interface (e.g. aa25 Leu, aa51 Val, aa48 Ala) places an Asp where a Ile was

p.Tyr76Cys dimer severe Various hydrophobic contacts (aa52 Ile, aa53 Leu, aa49 Val, aa48 Ala) and one possible hydrogen bond (aa56 Glu), would all be lost upon

mutation to Cys

p.Leu78Val surface slight Mild change at the residue is not so central

p.Asp79Gly surface or tetramer

slight No contacts predicted, but might interfere with tetramer

p.Leu80Pro dimer severe Extensive hydrophobic contacts within the dimer interface (e.g. aa51 Val, aa52 Ile, aa55 Leu), would be lost

p.Gln81_Leu84del tetramer moderate At the C-terminus of the model, might affect tetramerization

p.Gln81* tetramer moderate At the C-terminus of the model, might affect tetramerization, deletes the entire C-terminus and would eradicate function of this region

p.82-100del tetramer moderate At the C-terminus of the model, might affect tetramerization, deletes the entire C-terminus and would eradicate function of this region

additional information on the splice-site / stop-gain mutations (Q81X)

splice site mutation c.300+1G>A, G>T c.300+2T>C in-frame deletion of amino acids 82-100

Nature Genetics: doi:10.1038/ng.2469

19

splice site mutation c.301-1G>A Frame shift deletion, replacement of amino acids 101-119 (end) with PSSLRNLSSPTTKGAFATDSAVSX

aa, aminoacid; Classification of locations: Dimer, predicted to lie at the heterodimer interface; Tetramer, predicted to lie at the interface between two heterodimers; Interior, predicted to lie at the interior of the ID3 structure; Surface, predicted to lie at the ID3 surface; DNA, predicted to lie at the ID3/DNA interface. The severity of mutations was judged manually on the nature of the residue environment and the amino acid change: Slight < Moderate < Severe Please note, that within the 61 residues that can be modeled TCF3 and TCF4 are 92% identical. Moreover, the 5 changes all conserve the overall character of the amino acid in terms of hydrophobic or polar character. For the purposes of modeling or interpretation, the sequences are thus indistinguishable. The ID3 protein model based on the HLH domains and their interacting partners. Most of the single nucleotide ID3 mutations could be modeled and the effect could be predicted. In the context of the model (which is just the helix-loop-helix domain) these changes are not predicted to have a severe effect, though we cannot rule out that the loss of part/all of the C-terminus could have a severe effect on the protein function. Indeed, Deed et al., 199611 found that replacement of the C-terminal amino acids with a sequence similar to what we observed in the splice site mutant c.301-1G>A drastically reduced the ability of ID3 to antagonize TCF3 binding to DNA and Chen et al., 199712 showed that the C-terminal 12 residues are critical for ID3's transcriptional inhibitory activity on TCF3. It might be that there are probably two classes of mutations: first, mutations in the HLH domain that affect TCF3/4 binding and thus impair the inhibitory effect of ID3 and second, splice-site and stop-gain mutations which impair the inhibition of transcriptional activation (while the interaction remains probably intact).

Nature Genetics: doi:10.1038/ng.2469

20

4. Supplementary Figures

Supplementary Figure 1: Coverage and BAF plots of whole-genome sequencing of all four index samples. For each patient in the upper part a genome wide coverage plot showing in black balanced genomic regions, in green chromosomal gains and in red chromosomal losses. The BAF plots in each panel below corresponds to the allelic specific copy number at the corresponding chromosomal region showing in black balanced status and in orange aberrant allelic distribution.

Nature Genetics: doi:10.1038/ng.2469

21

Supplementary Figure 2: Complex rearrangement of chromosome 1q in BL1 and BL4. A coverage plot of chromosome 1 of patients BL1 and BL4 with complex genomic rearrangements of 1q is shown. Above the pattern of abnormally mapped ends showing complex translocations (trans), inversions (inv) and deletions (del) in chromosome 1q are indicated.

Nature Genetics: doi:10.1038/ng.2469

22

Supplementary Figure 3: Mutational spectrum of all detected SNVs. (a) Comparison of SNVs detected in the germline and the tumor of the four index patients showing all SNVs in the upper, non exonic SNVs in the middle and exonic SNVs in the lower part. The pattern of germline SNVs was highly consistent between the four samples. However, differences especially in the distribution of somatic exonic SNVs could be observed, but needs to be interpreted with caution due to small numbers. (b) Comparing the total count of SNVs showed highly similar numbers and distribution of germline SNVs in the four samples. However, the number and distribution for somatic SNVs was strikingly different in BL4 compared to the other samples. BL4 was the only one of the index cases where the disease relapsed after initial treatment. (Note that the sequenced tumor sample was taken before treatment).

Nature Genetics: doi:10.1038/ng.2469

23

Supplementary Figure 4: Heatmap of the expression pattern of all mutated genes in mBLs (red), Intermediates (grey), non-mBLs (blue) and in germinal centre cells (white). Heatmap of the genes affected by potentially protein changing somatic mutations, showing their expression in lymphoma tumor samples (Hummel et al. NEJM 20069) and in germinal center control samples. The probe sets (rows) are sorted by their fold changes between the mBL (N = 44) and the control samples (last N = 13 columns on the right). The most obvious pattern within the heatmap can be seen at the top left site which mostly represents the expression pattern of genes recurrently mutated in mBL. Expression data: blue, low expression; red, high expression; GCB/ABC signature: ABC (green), GCB (orange), unclassified (grey); MOLDIAG (molecular diagnosis): mBLs (red), Intermediates (grey), non-mBLs (blue), germinal center cells (used as controls, white).

Nature Genetics: doi:10.1038/ng.2469

24

Supplementary Figure 5: Biallelic mutation in the ID3 gene was identified in BL4 (upper panel, IGV) and verified by Sanger sequencing (middle panel, Sanger sequences are shown in reverse orientation as ID3 is encoded on the minus strand). The detected splice-site mutation leads to aberrant splicing as shown by transcriptome sequencing (lower panel).

Nature Genetics: doi:10.1038/ng.2469

25

Supplementary Figure 6: ID3 mutations identified by Sanger sequencing of IG-MYC positive B-cell lymphoma (a) and B-cell lymphoma cell lines (b). In (a) and (b) a schematic ID3 gene structure with identified single nucleotide mutations above (red: nonsense mutation, black: missense mutations) and the structural mutations below (red: deletion, green: duplication) are shown. For each mutation the DNA and the protein mutation code are depicted. The number behind each mutation code indicates the number of samples in which the respective mutation was identified. E1, exon 1; E2, exon 2

Nature Genetics: doi:10.1038/ng.2469

26

Supplementary Figure 7: Biallelic involvement was identified on the genomic (DNA) and on the RNA (cDNA) level by (RT-)PCR followed by Sanger sequencing. (a) RT-PCR reaction of the BL-70 cell line yielded two bands on the agarose gel. The DNA of the bands was isolated using the EasyPure DNA Purification Kit (Biozym, Hessisch Oldendorf, Germany) and sequenced using PCR primers and 3100 ABI Sequencer (Applied Biosystems). Sequence analysis identified on the one allele the 125 bp deletion and normal splicing of exon1 on exon2 (upper panel) whereas on the other allele a truncated exon1 was spliced on exon2 (below). The splice-site mutation leads to an in-frame message with a loss of 57 bp of exon1. (b) ID3 mutation analysis identified a homozygous genomic deletion in sample 095. The panel below shows the wildtype sequence of exon1 of the first breakpoint.

Nature Genetics: doi:10.1038/ng.2469

27

Supplementary Figure 8: Distribution of mutations over the ID3 gene are skewed to the functional parts of the protein. Mutations (orange bars, height corresponds to number of mutations) accumulate over the HLH domain (blue rectangle in exon 1) while there are hardly any mutations affecting the N-terminal part of the protein or the intron though these parts contain a number of RGYW motifs (purple bars) and are located within the 2 kb window from the transcription start site (TSS) which is typically affected by somatic hypermutation (with a probability exponentially decreasing with increasing distance from the TSS). This indicates selection of mutations affecting ID3 function. The x-axis indicates the distance from TSS in base pairs.

Nature Genetics: doi:10.1038/ng.2469

28

Supplementary Figure 9: Frequencies of distinctive ID3 mutation patterns as well as wildtype allele fractions identified by deep sequencing. ID3 exon 1 from six primary Burkitt lymphoma samples and three controls (CD19+ MACS sorted cell from peripheral blood of healthy individuals) was sequenced using 454 technology to a coverage of 8197 - 38009 (median 20240) evaluated reads per sample. To eliminate sequencing errors, only variants previously identified by Sanger sequencing were included in this analysis. Additionally, only mutation patterns supported by a read fraction higher than the fraction of non-reference reads in the control samples (mean of three samples plus three standard deviations; 0.188 %) were considered. The numbers at the bar segments indicate the altered positions in the corresponding mutation pattern. Numbers of reads supporting each mutational pattern as well as mutated positions in the minor fractions are indicated in Supplementary Table 16. In all samples, these analyses confirmed that the majority of sequences were two differently mutated DNA strands in line with biallelic mutation of ID3. The finding of multiple hits on the same allele was confirmed by a number of conventional subcloning experiments of the same cases which led to the same conclusions (Supplementary Table 16). In some samples different combinations of mutations could be identified at low levels. This could indicate a low level of ongoing somatic hypermutation. However, in our view these are most likely PCR artifacts originating from an incomplete elongation in one PCR cycle and aberrant subsequent elongation in the following PCR cycle by which PCR hybrids can be created (Küppers et al., 199713). #, c.300+1G>T; x, c.300+1G>A

Nature Genetics: doi:10.1038/ng.2469

29

Supplementary Figure 10: Western blot analysis identified cell lines with complete loss of ID3 protein (13 kD) expression. Total protein lysates of different Burkitt lymphoma cell lines (mutated and unmutated) and other B-cell lymphoma cell lines were subjected to Western blot analysis using an anti-ID3 antibody (Santa Cruz) and an actin antibody. ID3 expression could be identified in Burkitt lymphoma cell lines (Raji, DG-75, EB-1). In contrast, other B-cell lymphoma cell lines showed no (Karpas-422) or slight expression (SU-DHL-5, -6) of ID3. The BL-70 and BL-41 Burkitt lymphoma cell lines both with biallelic ID3 mutation showed complete loss of ID3 protein expression. In contrast, Namalwa (Burkitt lymphoma cell line with a complex ID3 deletion) showed strong expression. The BL-2 cell line which had a splice site aberration of ID3 showed only a slight expression.

Nature Genetics: doi:10.1038/ng.2469

30

Supplementary Figure 11: Comparison of ID3 mutated and unmutated IG-MYC positive B-cell lymphoma. Significant differences in age at diagnosis, chromosomal complexity (expressed by the number of aberrant segments determined by Array-CGH), and IGH mutation rate between ID3-mutated and -unmutated IG-MYC positive B-cell lymphomas (N=100) can be observed.

Nature Genetics: doi:10.1038/ng.2469

31

Supplementary Figure 12: Characteristics of ID3-mutated and -unmutated proto-typic mBLs. (a) Significant differences in chromosomal complexity were determined by Array-CGH. No significant differences occurred for the age at diagnosis and IGH mutation rate between ID3 mutated and unmutated IG-MYC positive mBL excluding cases with t(14;18) and BCL6 break (N=53). (b) Kaplan-Meier analysis and log-rank test indicates a slightly but not significantly increased overall survival (OS) of ID3 mutated as compared to unmutated mBL (P-value=n.s.). For N=32 ID3 mutated patients we observed 10 events and a 3-year overall survival of 0.72 [0.58-0.89]. In the group of 17 patients without mutation we observed 8 events and a 3-year survival of 0.54 [0.34-0.86]. The overall survival was defined as time from first day of therapy to death from any cause. Patients without an event in OS were censored at the last day with valid information.

Nature Genetics: doi:10.1038/ng.2469

32

Supplementary Figure 13: ID3 mutated (red) and unmutated (blue) index samples share similar ID3 DNA methylation patterns at CpGs at the ID3 locus. Whole-methylome sequencing of BL1-BL4 identified a heterogeneous pattern of hypo- and hypermethylation surrounding the ID3 gene locus. However, these patterns were consistent between ID3 mutated (red) and unmutated samples (blue). The methylation pattern of each sample is shown as smoothed methylation values ranging from 0 (fully unmethylated) to 1 (fully methylated). The plot was generated using the R package bsseq version 0.2.8 and default parameters13

Nature Genetics: doi:10.1038/ng.2469

33

Supplementary Figure 14: 450k Illumina methylation array (a) and Bisulfite-Pyrosequencing (b) analysis of the ID3 locus shows no differences between ID3 mutated and unmutated samples (within a and b the right panel indicates the ID3 mutation status). (a) DNA methylation results (avg. beta, 0: completely unmethylated to 1: completely methylated) of

Nature Genetics: doi:10.1038/ng.2469

34

tumor and germline material of the four index patients and two BL cell lines are shown. The cell lines have been analyzed repeatedly (Raji N=5, Daudi N=5) and the mean of avg. beta values are shown. (b) DNA methylation (in %) of three different regions of ID3 (UCSC hg19; assay1: chr1:23886266-23886379; assay2: chr1:23885911-23886068, assay4: chr123884928-23885032) were analyzed by bisulfite-pyrosequencing of 12 Burkitt cell lines. 4 CpGs were analyzed by both methods (*) and showed excellent agreement. mc_mean, mean of methylation of two methylated controls; pool pb, pooled DNA of 20 healthy peripheral blood samples; WGA_mean, mean of methylation of two whole genome amplified DNA samples ID3 mutation status: wt, wildtype; mut, mutated; +, detected; ++, two mutations detected. ID3 expression pattern: +, slight expression; -, no expression; ++, expression; +++, high expression, NA: not analyzed.

Nature Genetics: doi:10.1038/ng.2469

35

Supplementary Figure 15: A clone-wise comparison of CN gains and losses in ID3-mutated and unmutated cases using Fisher`s exact test revealed changes that changes in 18q were associated with ID3 wildtype status in proto-typic mBL (samples with t(14;18) were excluded from this analysis). Due to the larger overlap (81/100 cases vs. 45/100 cases) data derived by aCGH were used to compare the CN profiles between groups. Genome wide display of chromosome specific aberrations (chromosome names on the abscissae) indicating in blue genomic gains and in yellow genomic losses showed the discriminating pattern between ID3 mutated and unmutated mBLs. A gain of chr18:51,894,728-54,354,319 (hg19) and loss within chr18:61,779,156-72,349,045 (hg19) was detected in 4/14 and 5/14 unmutated but 0/34 cases with ID3 mutation (nominal P-value<0.01). For visualization the P-values were transformed to the negative log10 scale in which 2 corresponds to a nominal P-value of 0.01 and 1.3 to a P-value of 0.05. The overall genetic complexity (i.e. the number of aberrant regions per sample) between the mutated and unmutated group was compared by the Mann–Whitney U test.

Nature Genetics: doi:10.1038/ng.2469

36

Supplementary Figure 16: TCF4 expression is increased in ID3 wildtype mBL with 18q gain (n=4) in comparison to ID3 mutated mBL (n=34). All ID3 mutated mBLs showed normal copy number for 18q. Dots indicate the mean difference of normalized gene expression data and the lines mark the 95% confidence interval. The gene-dose effects of TCF4 expression of probe sets between ID3 wildtype mBL with 18q gain to ID3 mutated mBL (n=34) was compared by Student’s t-test. Adjustment for multiple testing was performed by FDR= 10. Significant probe sets (FDR= 10%) are marked in red.

Nature Genetics: doi:10.1038/ng.2469

37

Supplementary Figure 17: Model of the interaction between ID3 (red) and TCF3/4 (blue) together with a DNA fragment (grey). Mutated residues in ID3 predicted in the model to impact moderately or severely on DNA binding (yellow) or the dimer interface (green) as described in Table S19 are shown as bonds and labeled. Models for TCF3 and TCF4, though generated separately, were indistinguishable regarding the observations made regarding ID3 mutations. The DNA is shown only for illustration of the approximate location of the protein relative to it. The figure was generated by VMD version 1.9.1.a10 (Humphrey et al, J Mol Graph 199614).

Nature Genetics: doi:10.1038/ng.2469

38

Supplementary Figure 18: ID3 mutations affected highly conserved regions. Highly conserved regions were identified by ClustalW2 analysis of the ID3 protein sequence of 37 species available at ENSEMBL. Structural aberration (deletions and duplications) and single amino acid changes identified by Sanger sequencing (arrows indicate mutated positions were primarily found to be located in the highly conserved regions encoding the two helices of the ID3 protein. Only positions of mutations are provided and not the respective quantity of mutations. *, stop gain at the indicated position; red arrows, mutated amino acid positions, does not indicate quantity of mutations

Nature Genetics: doi:10.1038/ng.2469

39

Supplementary References: 1. Klapper, W. et al. Patient age at diagnosis is associated with the molecular

characteristics of diffuse large B-cell lymphoma. Blood 119, 1882-1887 (2012). 2. Salaverria, I. et al. Translocations activating IRF4 identify a subtype of germinal

center-derived B-cell lymphoma affecting predominantly children and young adults. Blood 118, 139-1347 (2011).

3. Scholtysik, R. et al. Detection of genomic aberrations in molecularly defined Burkitt's lymphoma by array-based, high resolution, single nucleotide polymorphism analysis. Haematologica 95, 2047-2055 (2010).

4. Martín-Subero, J.I. et al. New insights into the biology and origin of mature aggressive B-cell lymphomas by combined epigenomic, genomic, and transcriptional profiling. Blood 113, 2488-2497 (2009).

5. Schwaenen, C. et al. Microarray-based genomic profiling reveals novel genomic aberrations in follicular lymphoma which associate with patient survival and gene expression status. Genes Chromosomes Cancer 48, 39-54 (2009).

6. Bentink, S. et al. Pathway activation patterns in diffuse large B-cell lymphomas. Leukemia 22, 1746-1754 (2008).

7. Klapper, W. et al. Molecular profiling of pediatric mature B-cell lymphoma treated in population-based prospective clinical trials. Blood 112, 1374-1381 (2008).

8. Dierlamm, J. et al. Gain of chromosome region 18q21 including the MALT1 gene is associated with the activated B-cell-like gene expression subtype and increased BCL2 gene dosage and protein expression in diffuse large B-cell lymphoma. Haematologica 93, 688-696 (2008).

9. Hummel, M. et al. A biologic definition of Burkitt's lymphoma from transcriptional and genomic profiling. N Engl J Med 354, 2419-2430 (2006).

10. Deed, R.W. Jasiok, M. and Norton, J.D. Attenuated function of a variant form of the helix-loop-helix protein, Id-3, generated by an alternative splicing mechanism. FEBS Lett. 393, 113-116 (1996).

11. Chen, B., Han, B.H., Sun, X.-H. and Lim, R.L. Inhibition of muscle-specific gene expression by Id3: requirement of the C-terminal region of the protein for stable expression and function. Nucl. Acid Res. 25, 423-30 (1997).

12. Küppers, R. Ongoing somatic mutation in mantle cell lymphomas questioned. Br J Haematol 97, 932-934 (1997).

13. Hansen, K.D. et al. Increased methylation variation in epigenetic domains across cancer types. Nat. Gen. 43, 768–775 (2011).

14. Humphrey, W. , Dalke, A. and Schulten, K. VMD: visual molecular dynamics. J Mol Graph. 14, 33-38, 27-28 (1996).

Nature Genetics: doi:10.1038/ng.2469