HAL Id: inserm-02440510https://www.hal.inserm.fr/inserm-02440510
Submitted on 15 Jan 2020
HAL is a multi-disciplinary open accessarchive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come fromteaching and research institutions in France orabroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, estdestinée au dépôt et à la diffusion de documentsscientifiques de niveau recherche, publiés ou non,émanant des établissements d’enseignement et derecherche français ou étrangers, des laboratoirespublics ou privés.
Transcriptomic definition of molecular subgroups ofsmall round cell sarcomas
Sarah Watson, Virginie Perrin, Delphine Guillemot, Stéphanie Reynaud,Jean-Michel Coindre, Marie Karanian, Jean-Marc Guinebretière, Paul
Fréneaux, Francois Le Loarer, Megane Bouvet, et al.
To cite this version:Sarah Watson, Virginie Perrin, Delphine Guillemot, Stéphanie Reynaud, Jean-Michel Coindre, etal.. Transcriptomic definition of molecular subgroups of small round cell sarcomas. The Journalof pathology and bacteriology, John Wiley & Sons, 2018, 245 (1), pp.29-40. �10.1002/path.5053�.�inserm-02440510�
Watson et al. 1
Title: Transcriptomic definition of molecular subgroups of small round cell sarcomas Running Title: Molecular classification of sarcoma subtypes Authors Sarah Watson1,2, Virginie Perrin1,2, Delphine Guillemot3, Stephanie Reynaud3, Jean-Michel Coindre4,5, Marie Karanian6, Jean-Marc Guinebretière7, Paul Freneaux8, François Le Loarer4,5, Megane Bouvet3, Louise Galmiche-Rolland9,10, Frédérique Larousserie11, Elisabeth Longchampt12, Dominique Ranchere-Vince6, Gaelle Pierron3*, Olivier Delattre1,2,3,13*, Franck Tirode1,2,14* *Co-senior authors Affiliations 1 INSERM U830, Laboratory of Genetics and Biology of Cancer, F-75005, Paris, France 2 Institut Curie, Paris Sciences et Lettres, F-75005, Paris, France 3 Institut Curie, Unité de génétique somatique, F-75005, Paris, France 4 Institut Bergonié, Department of Pathology, F-33000, Bordeaux, France. 5 Université Bordeaux 2, F-33000, Bordeaux, France. 6 Centre Leon Bérard, Department of Pathology, F-69008, Lyon, France. 7 Service de Pathologie, Hôpital René-Huguenin, Institut Curie, F-92210, Saint-Cloud, France. 8 Département de Biologie des Tumeurs, Institut Curie, Service d'anatomie pathologique, F-75005, Paris, France 9 Service d’Anatomie Pathologique, Hôpital Necker Enfants malades, F-75015, Paris, France 10 Université Paris Descartes, F-75006, Paris, France 11 Service d'Anatomie Pathologique, Hôpital Cochin, F-75014, Paris, France 12 Service d'Anatomie et de Cytologie Pathologiques, Hôpital Foch, F-92151, Suresnes, France 13 Ligue Contre le Cancer, Equipe labellisée 14 Univ Lyon, Université Claude Bernard Lyon 1, INSERM 1052, CNRS 5286, Cancer Research Center of Lyon, Centre Léon Bérard, F-69008, Lyon, France Corresponding authors: Franck Tirode, Ph.D. Biology of Rare Sarcomas Group Cancer Research Center of Lyon, INSERM U1052 Centre Léon Bérard, 28 Rue Laënnec, 69373 Lyon Cedex 08 Tel.: +33 4 69 85 61 45 Email: [email protected] Olivier Delattre, M.D., Ph.D. INSERM U830 “Genetics and Biology of Cancers” Institut Curie Research Centre 26 rue d’Ulm, 75248 Paris cedex 05, France Tel.: +33 1 56 24 66 79 Fax.: +33 1 56 24 66 30 Email: [email protected] Conflict of Interest statement:
Authors have nothing to disclose
Watson et al. 2
Word count: 3929
Abstract
Sarcoma represents a highly heterogeneous group of tumours. We report here the first
unbiased and systematic search for gene fusions combined with unsupervised expression
analysis of a series of 184 small round cell sarcomas. Fusion genes were detected in 59%
percent of samples, with half of them being observed recurrently. We identified biologically
homogeneous groups of tumours such as the CIC-fused (to DUX4, FOXO4 or NUTM1) and
BCOR-rearranged (BCOR-CCNB3, BCOR-MAML3, ZC3H7B-BCOR and BCOR internal
duplication) tumour groups. VGLL2-fused tumours represented a more biologically and
pathologically heterogeneous group. This study also refined the characteristics of some
entities such as EWSR1-PATZ1 spindle cell sarcoma or FUS-NFATC2 bone tumours that
are different from EWSR1-NFATC2 tumours and transcriptionally resemble CIC-fused
tumour entities. We also describe a completely novel group of epithelioid and spindle-cell
rhabdomyosarcomas characterized by EWSR1- or FUS-TFCP2 fusions. Finally, expression
data identified some potentially new therapeutic targets or pathways.
Keywords
Sarcoma, Fusion genes, FET-TFCP2, FUS-NFATC2, EWSR1-PATZ1, VGLL2-NCOA2,
BCOR-Rearranged, CIC-Fused, RNAseq.
Introduction
Small round cell sarcoma is a heterogeneous group of tumours mostly affecting children and
young adults and characterized by an overall poor prognosis [1]. They remain challenging for
Watson et al. 3
pathologists since those tumours share overlapping morphological and immunophenotypical
features. The identification of specific fusion transcripts has considerably improved their
diagnosis as many subtypes are characterized by a specific fusion gene, such as
EWSR1/FUS-ETS, SS18-SSX and PAX-FOXO1 fusions in Ewing sarcoma [2], synovial
sarcoma [3] and aRMS [4], respectively, or the more recently described CIC-DUX4 [5],
EWSR1-NFATc2 [6] and BCOR-CCNB3 [7] in “Ewing-like” tumours. As a result, diagnostic
evaluation of specific chromosome translocations by FISH or of the resulting fusion
transcripts by RT-PCR has become an asset for the molecular assessment of small round
cell sarcoma. However, a number of sarcomas remains uncharacterised, raising both
diagnostic and therapeutic issues.
In the present study, we report RNA-sequencing of 184 small round cell sarcoma samples
from three different cohorts. By applying several fusion algorithms and performing
unsupervised clustering analysis, we were able to show that recurrent fusions define
homogeneous groups of tumours based on pathological features and expression profile
analyses, including FUS-NFATC2-positive, EWSR1-PATZ1-positive, CIC-fused or BCOR-
rearranged tumours. We also identified a new epithelioid rhabdomyosarcoma group
characterized by a EWSR1- or FUS-TFCP2 fusion gene. Finally, we propose genes and
pathways specifically expressed in a variety of different entities that may serve as biomarkers
or potential therapeutic targets.
Materials and Methods
Tumour samples. Tumour samples were chosen based on their pathological diagnosis either
as unclassified sarcoma or as known sarcoma that did not carry pathognomonic genetic
aberrations or suspicious for sarcoma with simple genetic or monomorphic features
Watson et al. 4
(supplementary material, Figure S1). Samples were used in accordance with the French
Biobank legislation.
Paired-end RNA sequencing. Total RNAs were isolated from crushed frozen tumours using a
Trizol reagent kit (Life technologies). Library constructions were performed following the
TruSeq Stranded mRNA LS protocol (Illumina). Sequencing were performed on either HiSeq
2500 (100nt paired-end) or NextSeq 500 (150nt paired-end) Illumina sequencing machines.
All fastq files were deposited on the European Genome-phenome Archive
(https://www.ebi.ac.uk/ega/studies/EGAS00001002189). Raw sequencing files of
SMARCA4-DTS and MRT control samples were previously deposited in the Sequence Read
Archive (#SRP052896).
Bioinformatics analyses. For fusion gene discovery, sequencing reads were injected into
both the deFuse tools [8] and the FusionMap tool [9] with hg19 genome as reference. Only
fusion transcripts supported by both tools with at least 2 split reads (and 3 spanning pairs for
defuse) were considered. Gene expression values were extracted by Kallisto v0.42.5 [10]
with GRCh38 release 79 genome annotation. Clustering, BGA, PCA and t-SNE were
computed using R packages Cluster v2.0.3, made4 version 1.44.0, FactoMiner v1.31.4 and
Rtsne version 0.10, respectively. Gene ontology analyses were performed online
(https://david.ncifcrf.gov/) with DAVID v6.7 tool [11].
Fusion validation: PCR amplification of the fusion point using specifically designed primers
and 50ng of cDNA were then performed with AmpliTaq Gold® DNA Polymerase with Buffer II
(Thermo Fisher Scientific, Waltham, MA USA 02451) prior to Sanger sequencing on an
Applied 3500XL Genetic Analyzer.
Watson et al. 5
Immunohistochemistry. Immunohistochemistry was performed using the following antibodies
and methods: NUTM1 (C52B1, Cell Signaling, dilution 1:50) and ALK (5A4, Cell Signaling,
dilution 1:50) IHC were performed on a Ventana BenchMark platform with Cell Conditioning
Solution 1 pre-treatment. NUTM1 was incubated for 56min and ALK for 80min prior to be
revealed with the ultraView Universal DAB Detection Kit. ETV4 (clone 16, Santa Cruz
Biotechnology, dilution 1:15) IHC was performed overnight at 4°C, following a pre-heating at
98°C for 30 min in Trisbuffer pH9, and revealed with REAL EnVision kit (Dako, Glostrup,
Denmark).
Results
In molecular diagnostic routine, primitive small round or spindle cell sarcomas are
systematically tested for the most common fusion genes using multiplexed Q-PCR
(supplementary material, Table S1). In the present study, we performed whole transcriptome
sequencing to investigate its efficacy for proper tumour classification. Our investigation
cohort was composed of 94 samples randomly selected from about 700 retrospective
samples in which no common fusion gene could be identified by standard diagnostic
investigation (see the general scheme of the analysis in supplementary material, Figure S1).
We first investigated the presence of gene fusions and the retained fusion candidates were
subsequently validated and searched for within the remaining available samples (around 600
cases) using specific RT-PCR assay. The new cases thus identified were also RNA-
sequenced and generated our follow-up cohort (12 samples). Basic clinical data, initial
diagnosis and results of molecular analyses for all the 184 cases sequenced in this study are
summarized in the supplementary material, Table S2. We next performed expression profile
analyses that we compared using t-Distributed Stochastic Neighbour Embedding [12] and
unsupervised clustering analyses (Figure 1A and B, supplementary material, Figure S2) to a
Watson et al. 6
subset of 78 well defined cases composing our control cohort. For each group, differential
expression analyses against each of all other tumour types were performed. Top variant
genes are reported in the supplementary material, Table S3 together with ontology
enrichment and gene set enrichment analyses (GSEA).
A fusion gene could be identified in almost 3/5 of the investigation cohort samples
We identified fusion genes in 55 out of the 94 tumours of the investigation cohort. Except for
one case (SARC036) for which numerous fusions were found on chromosome 1, suggestive
of chromothripsis, the mean number of fusion events detected per sample was 1.8. Forty
samples carried a single fusion gene, while multiple (2 to 9) gene fusions were detected in 14
samples (supplementary material, Table S2). All fusion genes were subsequently confirmed
by RT-PCR and Sanger sequencing.
Most of the RNAseq fusion-positive samples (26/55) expressed previously described fusion
genes that were not included in our RT-PCR routine procedure, namely single cases of each
of the following fusions: EWSR1-PBX1, ACTB-GLI1, PAX3-MAML3, COL1A1-PDGFB,
VGLL2-CITED2, FUS-NFATc2, BCOR-MAML3, ZC3H7B-BCOR, TPR-NTRK1, BRD3-
NUTM1, KIAA1549-BRAF; two cases with EWSR1-PATZ1, TPM3-NTRK1, EML4-ALK or
NAB2-STAT6 fusions; and three cases with VGLL2-NCOA2 or CIC-NUTM1 fusions. We also
identified a variant of a SS18-SSX2 fusion with an atypical SS18 gene breakpoint. RT-PCR
screening of these fusions in the initial cohort led to the identification of eleven additional
cases: 3 EWSR1-PATZ1, 1 EML4-ALK, 2 FUS-NFATC2, 3 VGLL2-NCOA2 and 2 CIC-
NUTM1 (supplementary material, Table S2).
Novel fusions were detected in 29 samples (supplementary material, Table S2). Two cases
presented variants of known fusions: UXT-TFE3, a variant of the ASPSCR1- or SFPQ-TFE3
fusions found in alveolar soft part sarcomas and renal cell carcinoma; and IKBKG-ALK, a
Watson et al. 7
new variant among the numerous ALK fusions found in inflammatory myofibroblastic
tumours. In two cases, we identified a completely new fusion between EWSR1 or FUS and
TFCP2. Eleven samples harboured previously undescribed fusions, involving genes relevant
for tumorigenesis, but for which no additional sample could be identified during the RT-PCR
screen. These fusions were then considered as private to their respective tumour. Finally, in
the remaining 14 fusion-positive cases, we were unable to propose a definitive driver event
due to our lack of knowledge on the implication in cancer of the fused genes.
Emerging VGLL2-NCOA2/CITED, CIC-fused and BCOR-rearranged sarcoma subtypes
A total of 7 samples harbouring a VGLL2 fusion with either NCOA2 (n=6) or CITED (n=1)
were identified (supplementary material, Figure S3A) with two samples forming a
primary/relapse couple (SARC070_Primary and SARC070_Relapse). These fusion genes
characterized tumours occurring in very young children (below 5yo), as previously described
[13]. Centralized review of cases highlighted two subtypes with specific pathological aspects.
In three cases (SARC061, SARC065 and SARC070_primary), the tumours were composed
of a large amount of fibrous stroma with scarce tumour cells (Figure 2A). Tumour cells
presented neither atypia nor abnormal mitotic figures. They were negative for desmin,
whereas few cells (below 1%) showed a nuclear positivity for myogenin. Fewer than 25% of
cells were Ki67 positive. In contrast, in the four other cases (SARC070 at relapse, SARC085,
SARC088 and SARC102), the cellular mass was far denser with cells presenting numerous
atypia and mitoses. Myogenin staining was strongly positive in around 10% of cells and
desmin and Ki67 immuno-reactivity was displayed by over 30% of tumour cells (Figure 2B).
Consistently with this histological heterogeneity, t-SNE analysis and unsupervised clustering
revealed a relatively disparate group of tumours (Figure 1A and supplementary material,
Figure S2A) which samples could further be separated according to the histology, when
Watson et al. 8
different clustering conditions were applied (supplementary material, Figure S2C-D). When
considering these two histological subtypes separately it appeared that the “fibrous” subtype
was enriched in immune/inflammatory response genes, while the “dense” subtype was
enriched in cell cycle/proliferation genes (supplementary material, Table S3). Supervised
group comparisons and between group analyses (BGA) indicated that VGLL2-fused tumours
expressed numerous muscle-related genes as well as genes involved in extracellular matrix
and in the epithelial-mesenchymal transition process. Despite clear positivity for muscle
differentiation markers, none of the two subtypes clustered with rhabdomyosarcoma
samples.
A total of five CIC-NUTM1 (supplementary material, Figure S3B) fusions were retrieved. All
but one case were observed in young children, at various locations (supplementary material,
Table S2). Tumours harbouring CIC-NUTM1 fusion clustered tightly with the other CIC-
DUX4- or -FOXO4-positive samples (Figure 1). All these tumours overexpressed ETS family
members of the PEA3 type [5,14,15] as well as a variety of secreted and matrix protein
genes such as VGF, BMP2, glypican 3/4, NRG1, PTX3, pleiotrophin and spondins. GSEA
revealed enrichment in genes of extracellular matrix but also in genes overlapping
proliferation/response to drug/immune response categories (supplementary material, Table
S3).
The seven samples presenting a rearrangement of BCOR, including BCOR-MAML3 or
ZC3H7B-BCOR fusions [16] and internal tandem duplication (ITD) [17–19] (supplementary
material, Figure S3C), grouped together with the BCOR-CCNB3-positive samples from the
control cohort, describing a very-well defined cluster. These samples also shared some
pathological features, as previously described [17]. Among the most significantly enriched
gene ontologies for this BCOR-rearranged group were “developmental protein” and
Watson et al. 9
“homeobox” (supplementary material, Table S3) with a strong overexpression of HOX-A, -B,
-C and -D family genes as well as HMX1, PITX1, ALX4 or DLX1. More specifically, we
observed very strong gene sets and ontology enrichments for the morphogenesis,
development and differentiation of neurons and for the skeletal system development. Finally,
different genes encoding membrane receptors including RET, FGFR2/3, EGFR, PDGFRA,
NTRK3, KIT or NGFR were also highly and constantly overexpressed in these tumours and
may hence constitute actionable target genes.
Identification of a FUS-NFATC2 fusion gene in a new bone tumour entity
transcriptionally distinct from EWSR1-NFATC2-positive tumours
We identified FUS-NFATC2 fusion genes in three tumours of the femur of adult patients
(median age 38.3yo), like the only case described in the literature (17). Tumours displayed
areas composed of round tumour cells arranged in sheets or short fascicles. Tumour cells
were embedded in a variably myxoid stroma (Figure 3A). Focal hemangiopericytic vascular
network was present in all cases (Figure 3B). More distinctively, these tumours harboured
focal myxohyaline foci raising suspicion of cartilaginous differentiation (Figure 3C). All
tumours displayed brisk mitotic activity and necrosis. Altogether, these tumours were
histologically reminiscent of CIC-fused sarcomas. Clustering analyses indicated that FUS-
NFATC2-positive tumours grouped together and were clearly distinct from all other FET-
fused samples, including EWSR1-NFATC2-positive tumours (Figure 1). While the EWSR1-
NFATC2 tumours were strongly enriched in genes associated with inflammatory and immune
responses (supplementary material, Tables S3 and S4), the FUS-NFATC2 tumours were
enriched in proliferation and drug resistance signatures. In accordance with the potential
areas of cartilaginous differentiation, genes involved in the extracellular matrix of
cartilaginous tissues (like ACAN, COL9A2, MATN3, COMP or CILP2) or encoding secreted
Watson et al. 10
proteins (pleiotrophin, DKK3, ANGPTL2, SBSPON or WNT5B) were preferentially expressed
in FUS-NFATC2-positive tumours.
EWSR1-PATZ1 fusion gene is found in a new tumour group unrelated to any other
EWSR1-fused tumour.
While only single cases of tumour carrying an EWSR1-PATZ1 fusion have been previously
reported [21–23], we identified in our cohorts a total of 5 cases (Figure 4A and
supplementary material, Figure S3D). All tumours were from soft tissues and occurred
across a very broad age range (from 0.9 to 68.5yo). Centralized pathological review of three
cases indicated three different morphological aspects (Figure 4B): i) A first tumour was
composed of bundles of relatively monomorphic spindle cells with an eosinophilic cytoplasm
and enlarged hyperchromatic nuclei; ii) A second case consisted of a diffuse proliferation
with several vessels forming a “haemangioma-like” vasculature. Tumour cells were mostly
round, and focally spindle, with scant cytoplasm and an enlarged nucleus containing one
nucleolus; iii) The last case resided in a diffuse proliferation of spindle cells displaying an
eosinophilic cytoplasm, oval nuclei with an irregular chromatin distribution with one or several
nucleoli. Some nests of epithelioid cells with abundant, eosinophilic or clear, cytoplasm and
atypical nuclei could be seen. Nevertheless, two features were observed consistently: a
fibrous stroma and the presence of at least one component of spindle-shaped cells. All cases
were negative for EMA and AE1/AE3 and inconsistent for PS100, cytoplasmic CD99 and
vimentin staining. Ki67 was high (between 20% and 70%). In agreement with this
heterogeneous histological presentation, the proposed diagnoses for EWSR1-PATZ1-
positive tumours were quite variable including unclassified spindle cell sarcoma, myxoid
liposarcoma, Ewing-like sarcoma, or unclassified malignant neuroectodermal tumours.
Nevertheless, all five tumours tightly clustered together and away from other EWSR1-fused
Watson et al. 11
tumours, indicative of a transcriptionally different entity. GSEA identified enrichment of genes
correlated with SMARCA2 expression in prostate cancer (supplementary material, Table S3).
A new “epithelioid rhabdomyosarcoma” entity characterized by EWSR1/FUS-TFCP2
fusion
New fusions involving either EWSR1 (exon 5) or FUS (exon 6) and TFCP2 (exon2) were
identified in three cases (Figure 5A and supplementary material, Figure S4A), linking part of
the low complexity domains of EWSR1/FUS to the CP2 DNA binding and the SAM/pointed
domains of TFCP2. The three tumour samples formed a discrete cluster away from any other
EWSR1/FUS-fused tumour samples (Figure 1 and supplementary material, Figure S2).
EWSR1/FUS-TFCP2-positive tumours arose in young adult females (age range 16-38yo)
and developed in either the pelvic region, chest wall or the sphenoid bone. All three tumours
were extremely aggressive, since patient survival did not exceed 5 months. Upon review, all
three tumours were composed of an epithelioid proliferation arranged in small sheets or short
fascicles. Tumour cells presented monotonous round nuclei with high grade features and
prominent nucleoli (Figure 5B). They were associated with variable amounts of fibrous
stroma with focal sclerosing areas that were present in all cases. The three tumours stained
positive for desmin, MYOD1 and myogenin. Gene expression profile comparison confirmed a
strong expression of MYOD1 and DES as well as an impressive overexpression of TERT
and ALK (supplementary material, Figure S4B), the latter being heterogeneously expressed
with both a cytoplasmic and membranous staining (Figure 5B). In silico functional analyses
highlighted T cell immune response as well as keratin intermediate filament enrichment
(supplementary material, Table S3). Altogether the observation of positivity for both muscle
and epithelial markers suggests that this EWSR1/FUS-TFCP2 fusion defines a new
aggressive “epithelioid rhabdomyosarcoma” entity.
Watson et al. 12
Expression profiling reveals distinct transcriptomic patterns and potential biomarkers
Supervised group comparisons and between group analyses (BGA) highlighted genes
specifically expressed in the molecularly defined entities (Figure 6; supplementary material,
Table S3). More specifically, crossing pairwise differential analysis and BGA analysis, we
identified genes that were specifically expressed in each tumour type including VGLL2-fused
tumours (LANCL2), CIC-fused tumours (ETV4), BCOR-rearranged (HES7), FUS-NFATC2-
positive sarcomas (CD8B), EWSR1-PATZ1-positive tumours (GPR12) and FET-TFCP2-
positive tumours (REQL), but also for almost all of the other tumour entities (supplementary
material, Figure S5).
Discussion
We have used RNA sequencing to investigate unclassified small round or spindle cell
sarcomas. We first confirm that sarcomas, like haematological malignancies [24],
demonstrate a high incidence of gene fusions as compared to carcinomas. In this respect, it
is noteworthy that the mesenchyme, from which sarcomas are derived, and haematopoiesis,
from which arise leukaemias and lymphomas, both originate from the mesoderm. One
hypothesis may rely on the activity of specific recombinases in mesodermal derived tissues.
In this respect we can mention the role of the RAG1 recombinase in the generation of
lymphoma-specific translocations [25] and the recently suspected role of the PGBD5
recombinase in malignant rhabdoid tumours [26].
Combining fusion gene discovery and expression profiling, our analysis resulted in the
delineation of biologically homogeneous groups of tumours. While the CIC-NUTM1 fusion
was discovered as a new brain tumor entities [14], we show here that they are encompassed
in an homogeneous CIC-fused group of tumors together with CIC-DUX4- and CIC-FOXO4-
positive samples [27]. The overexpression of genes of the PEA3 type of the ETS family
Watson et al. 13
observed in all CIC-fused samples is known to be a consequence of a loss of function of CIC
[28], which was also involved in resistance to MAPK inhibitors [28,29] or in the promotion of
metastasis [30]. Further analyses should confirm whether a dominant negative effect on wild
type CIC is a consequence of CIC-fusions and if it may be correlated with the
aggressiveness of CIC-fused tumors. The BCOR-rearranged group of tumors includes
various BCOR-fusion genes and BCOR-ITD. BCOR-ITD were described in CCSK [19], in a
new brain tumor entity CNS HGNET-BCOR [14] and in soft tissue undifferentiated round cell
sarcoma of infancy sharing strong similarities with CCSK [17]. This homogeneous biological
entity may also present clinical specificities depending on the type of genetic lesions. Indeed,
while BCOR-CCNB3 was exclusively observed in bone, BCOR-ITD was only observed in soft
tissues. BCOR rearrangements may lead to an abnormal activity of the non-conventional
polycomb repressive BCOR (or PRC1.1) complex [31]. In this regard, the observation that
BCOR-rearrangements are associated with increased expression of genes correlated with
SMARCA2 expression (Figure 6A) may suggest that impairment of PRC1.1 activity lead to
an increased activity of the antagonist SWI/SNF complex. Thanks to a unique collection, we
found several samples carrying FUS-NFATC2 or EWSR1-PATZ1 fusions, which formed tight
groups. FUS-NFATC2-positive tumours present different morphologies but with features
rather reminiscent of CIC-fused tumours and are definitively different from EWSR1-NFATC2-
positive tumours. EWSR1-PATZ1 tumours present relatively divergent cell morphologies,
rendering their pathological identification challenging, though areas of spindle cells found in
all three samples may guide pathologists during diagnosis. The identification of more cases
is nevertheless mandatory to improve the characterization of these ultra-rare FET-fused
sarcomas.
Watson et al. 14
We also describe here a new entity carrying an EWSR1- or FUS-TFCP2 fusion gene. TFCP2
(also known as LSF or LBP1) does not resemble any of the usual FET-fusion partners. It was
first described as an activator of the late SV40 promoter and later found to bind globins and
HIV-1 promoters [32,33]. TFCP2 is ubiquitously expressed and plays determinant roles in
lineage-specific gene expression or cell cycle regulation [34]. Potential involvement of
TFCP2 in oncogenesis has been suggested and may depend on tumor type: In
hepatocellular carcinomas, TFCP2 is overexpressed [35] and its inhibition by small
molecules leads to cell cycle arrest [36], whereas in skin melanoma TFCP2 is under-
expressed, and its re-expression induces growth arrest [37]. Interestingly, in addition to the
expression of MYOD1 (supplementary material, Table S3), a number of up-regulated genes
in these FUS/EWSR1-TFCP2-positive tumours are involved in muscle biology (such as
Cholinergic Receptor Nicotinic subunit alpha1, delta and gamma or sarcoglycan alpha). This
may suggest either a muscle cellular origin of these tumours or a muscle differentiation
program triggered by the fusion protein. Consistently with the genes expressed, pathological
review confirmed that TFCP2-rearranged tumours were an epithelioid variant of
rhabdomyosarcoma, although the morphological spectrum of these tumours needs to be
assessed on a larger scale. Despite a common muscle phenotype, these samples do not
cluster together with other rhabdomyosarcomas. Considering potential therapeutic targets, it
is noteworthy that ALK and TERT are highly expressed in these tumours. Moreover, recently
developed small molecules impeding the DNA-binding activity of TFCP2 [36,38], which
remains in the fusion protein, might also be effective in hindering EWSR1-TFCP2 binding
and possibly in inhibiting the development of these very aggressive tumours.
Samples carrying VGLL2-NCOA2 or VGLL2-CITED2 fusion genes demonstrate some
transcriptome heterogeneity related to two different histological subtypes: one presenting
Watson et al. 15
only few tumour cells surrounded by fibrous stroma, rather suggestive of relatively benign
tumours, the other with a spindle cell rhabdomyosarcoma-like morphology similar to that
reported [13] and exhibiting expression profiles enriched in cell cycle genes. Interestingly,
one fibrous tumour and one rhabdomyosarcoma-like tumour represented the primary and the
relapse tumours from the same patient, respectively. It is therefore likely that the
rhabdomyosarcoma-like tumour represents a more aggressive evolution of the fibrous
tumour. Further analyses including exome-sequencing of the two tumours may enable the
identification of additional genetic mutations accounting for this evolution.
In addition to the fundamental role of expert surgical pathology, molecular analysis is now
becoming an integral part of the diagnosis of small round cell tumours, in particular for
resolving frequent diagnostic dilemmas. With the increase of sarcoma subclasses, it
becomes difficult for the pathologist to ascertain a precise diagnosis in a reliable and cost
effective way. In this respect, and depending on the expertise and technical availabilities at
each pathology department, two molecular approaches may be proposed. One may rely on
the design of a specific panel of genes, based on our selection of genes that are specifically
expressed in each tumour group, tested in a simple assay (p.e., using Nanostring
technology. Alternatively, thanks to the constant decrease of whole transcriptome costs and
the availability of robust bioinformatics tools, we strongly believe that RNA-sequencing is a
key approach. Indeed, our work indicates that the convergent information from gene fusion
detection and expression profiles-based clustering is extremely powerful to identify
biologically homogeneous groups of tumours. Hence, in the short term, RNA-sequencing is
expected to constitute a mandatory methodology for an efficient molecular diagnosis of
sarcoma, a requirement recently proposed in the GENSARC study [39]. Such a precise
subgrouping of sarcomas is essential i) to investigate the clinical characteristics of each
Watson et al. 16
subgroup, particularly regarding the risk of evolution and response to current treatment
options, ii) to identify new therapeutic opportunities, as potentially ALK overexpression in
EWSR1/FUS-TFCP2 sarcomas, iii) to construct relevant animal models to design robust pre-
clinical studies, and iv) to design highly specific assays to monitor circulating cell and tumour
DNA during treatment. Further increasing the number of samples, within large international
consortia, is essential to identify groups of tumours of sufficient size to enable robust
conclusions and to allow the collection of otherwise “orphan” samples.
Acknowledgments
The authors wish to thank pathologists who provided tumour material: E. Angot, H. Antoine-
Poirel, S. Aubert, A. Babik, J.P. Barbet, C. Bastien, A.M. Bergemer-Fouquet, D. Berrebi, L.
Boccon-Gibod, C. Bossard, S. Boudjemaa, E. Cassagnau, J. Champigneulle, M.A.
Chrestian, A. Clemenson, S. Collardeau-Frachon, S. Corby, J.F. Cote, A. Coulomb-
Lhermine, A. Croue, C. Daniliuc, A. De Muret , A. Dhouibi, C. Douchet, F. Dujardin, J.M.
Dumollard, H. Duval, M. Fabre, C. Fernandez, S. Fraitag, F. Galateau-Salle, L. Gibault, A.
Gomez-Brouchet, C. Guettier, M.F. Heymann, J.F. Ikoli, J.F. Jazeron, C. Jeanne-Pasquier,
A. Jouvet, R. Kaci, M. Karanian-Philippe, O. Kerdraon, J. Klijanienko, C. Labit-Bouvier, M.
Lae, T. Lazure, F. Le Pessot , F. Lemoine, A. Liprandi, F. Llamas - Gutierrez, M.C. Machet,
A. Maran-Gonzalez, L. Marcellin, P. Marcorelles, C. Marin, A. Maues De Paula, C.A.
Maurage, A. Moreau, A. Neuville, H. Perrochia, J.M. Picquenot, C. Renard, V. Rigau, J.
Riviere, C. Rouleau, A. Rouquette, H.Sartelet, I. Serre, N. Stock, H. Szabo, P. Terrier, M.
Terrier-Lacombe, V. Thomas De Montpreville, J. Tran Van Nhieu, E. Uro-Coste, P. Validire, I.
Valo, V. Verkarre, J.M. Vignaud, M.O. Vilain, M.L. Wassef, D. Zachar, L. Zemoura and the
tumorothèque Necker-Enfants Malades. We also thank Brigitte Manship for the careful
Watson et al. 17
reading of the manuscript. This work was supported by the Institut National de la Santé et de
la Recherche Médicale, the Institut Curie, the Ligue National Contre Le Cancer, the Institut
National du Cancer and la Direction générale de l'offre de soins (INCa-DGOS_5716), the
European PROVABES (ERA- 649 NET TRANSCAN JTC-2011), ASSET (FP7-HEALTH-
2010-259348), and EEC (HEALTH-F2-2013-602856) projects. U830 is also indebted to the
Société Française des Cancers de l’Enfant, Enfants et Santé, Courir pour Mathieu, Dans les
pas du Géant, La course de l’espoir du mont Valérien, Au nom d'Andréa, Association
Abigaël, Association Marabout de Ficelle, Les Bagouz à Manon, Les amis de Claire, the
association Adam. SW was supported by a grant from the Fondation Nuovo-Soldati. High-
throughput sequencing has been performed by the NGS platform of Institut Curie, supported
by the grants ANR-10-EQPX-03 and ANR10-INBS-09-08 from the Agence Nationale de la
Recherche (investissements d’avenir) and by the Canceropôle Ile-de-France.
Author contributions
SW, GP, OD and FT designed the study and wrote the manuscript. SW, VP, DG, SR and MB
performed all the experiments. SW, GP and FT, acquired and analysed the data. JMC, MK,
JMG, PF, DRV and FLL provided samples and were the reference senior pathologists. LGR,
FL and EL recruited patients and provided numerous samples and clinical information.
References
1 Antonescu C. Round cell sarcomas beyond Ewing: emerging entities. Histopathology 2014; 64: 26-37
2 Delattre O, Zucman J, Melot T, et al. The Ewing family of tumors – A subgroup of small-round-cell tumors defined by specific chimeric transcripts. N Engl J Med 1994; 331: 294-299
3 Clark J, Rocques PJ, Crew AJ, et al. Identification of novel genes, SYT and SSX, involved in the t(X;18)(p11.2;q11.2) translocation found in human synovial sarcoma. Nat Genet 1994; 7: 502-508
Watson et al. 18
4 Galili N, Davis RJ, Fredericks WJ, et al. Fusion of a fork head domain gene to PAX3 in the solid tumour alveolar rhabdomyosarcoma. Nat Genet 1993; 5: 230-235
5 Kawamura-Saito M, Yamazaki Y, Kaneko K, et al. Fusion between CIC and DUX4 up-regulates PEA3 family genes in Ewing-like sarcomas with t(4;19)(q35;q13) translocation. Hum Mol Genet 2006; 15: 2125-2137
6 Szuhai K, IJszenga M, de Jong D, et al. The NFATc2 gene is involved in a novel cloned translocation in a Ewing sarcoma variant that couples its function in immunology to oncology. Clin Cancer Res 2009; 15: 2259-2268
7 Pierron G, Tirode F, Lucchesi C, et al. A new subtype of bone sarcoma defined by BCOR-CCNB3 gene fusion. Nat Genet 2012; 44: 461-466
8 McPherson A, Hormozdiari F, Zayed A, et al. deFuse: An Algorithm for Gene Fusion Discovery in Tumor RNA-Seq Data. PLOS Comput Biol 2011; 7: e1001138
9 Ge H, Liu K, Juan T, et al. FusionMap: detecting fusion genes from next-generation sequencing data at base-pair resolution. Bioinforma Oxf Engl 2011; 27: 1922-1928
10 Bray NL, Pimentel H, Melsted P, et al. Near-optimal probabilistic RNA-seq quantification. Nat Biotechnol 2016; 34: 525-527
11 Huang DW, Sherman BT, Lempicki RA. Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res 2009; 37: 1-13
12 Van der Maaten L, Hinton G. Visualizing High-Dimensional Data Using t-SNE. J Mach Learn Res 2008; 9: 2579-2605
13 Alaggio R, Zhang L, Sung Y-S, et al. A Molecular Study of Pediatric Spindle and Sclerosing Rhabdomyosarcoma: Identification of Novel and Recurrent VGLL2-Related Fusions in Infantile Cases. Am J Surg Pathol 2016; 40: 224-235
14 Sturm D, Orr BA, Toprak UH, et al. New Brain Tumor Entities Emerge from Molecular Classification of CNS-PNETs. Cell 2016; 164: 1060-1072
15 Le Guellec S, Velasco V, Pérot G, et al. ETV4 is a useful marker for the diagnosis of CIC-rearranged undifferentiated round-cell sarcomas: a study of 127 cases including mimicking lesions. Mod Pathol 2016; 29: 1523-1531
16 Specht K, Zhang L, Sung Y-S, et al. Novel BCOR-MAML3 and ZC3H7B-BCOR Gene Fusions in Undifferentiated Small Blue Round Cell Sarcomas. Am J Surg Pathol January 2016
17 Kao Y-C, Sung Y-S, Zhang L, et al. Recurrent BCOR Internal Tandem Duplication and YWHAE-NUTM2B Fusions in Soft Tissue Undifferentiated Round Cell Sarcoma of Infancy: Overlapping Genetic Features With Clear Cell Sarcoma of Kidney. Am J Surg Pathol 2016; 40: 1009-1020
Watson et al. 19
18 Kao Y-C, Sung Y-S, Zhang L, et al. BCOR Overexpression Is a Highly Sensitive Marker in Round Cell Sarcomas With BCOR Genetic Abnormalities: Am J Surg Pathol 2016; 40: 1670-1678
19 Roy A, Kumar V, Zorman B, et al. Recurrent internal tandem duplications of BCOR in clear cell sarcoma of the kidney. Nat Commun 2015; 6: 8891
20 Brohl AS, Solomon DA, Chang W, et al. The genomic landscape of the Ewing sarcoma family of tumors reveals recurrent STAG2 mutation. PLoS Genet 2014; 10: e1004475
21 Mastrangelo T, Modena P, Tornielli S, et al. A novel zinc finger gene is fused to EWS in small round cell tumor. Oncogene 2000; 19: 3799-3804
22 Qaddoumi I, Orisme W, Wen J, et al. Genetic alterations in uncommon low-grade neuroepithelial tumors: BRAF, FGFR1, and MYB mutations occur at high frequency and align with morphology. Acta Neuropathol (Berl) 2016; 131: 833-845
23 Johnson A, Severson E, Gay L, et al. Comprehensive Genomic Profiling of 282 Pediatric Low‐ and High‐Grade Gliomas Reveals Genomic Drivers, Tumor Mutational Burden, and Hypermutation Signatures. The Oncologist September 2017: theoncologist.2017-0242
24 Lieber MR. Mechanisms of human lymphoid chromosomal translocations. Nat Rev Cancer 2016; 16: 387-398
25 Papaemmanuil E, Rapado I, Li Y, et al. RAG-mediated recombination is the predominant driver of oncogenic rearrangement in ETV6-RUNX1 acute lymphoblastic leukemia. Nat Genet 2014; 46: 116-125
26 Kentsis A, Eisenberg A, Blackford AN, et al. PGBD5 promotes site-specific oncogenic mutations in human tumors. Nat Genet 2017; 49: 1005-1014
27 Sugita S, Arai Y, Tonooka A, et al. A novel CIC-FOXO4 gene fusion in undifferentiated small round cell sarcoma: a genetically distinct variant of Ewing-like sarcoma. Am J Surg Pathol 2014; 38: 1571-1576
28 Dissanayake K, Toth R, Blakey J, et al. ERK/p90RSK/14-3-3 signalling has an impact on expression of PEA3 Ets transcription factors via the transcriptional repressor capicúa. Biochem J 2011; 433: 515-525
29 Wang B, Krall EB, Aguirre AJ, et al. ATXN1L, CIC, and ETS Transcription Factors Modulate Sensitivity to MAPK Pathway Inhibition. Cell Rep 2017; 18: 1543-1557
30 Okimoto RA, Breitenbuecher F, Olivas VR, et al. Inactivation of Capicua drives cancer metastasis. Nat Genet 2017; 49: 87-96
31 Gearhart MD, Corcoran CM, Wamstad JA, et al. Polycomb Group and SCF Ubiquitin Ligases Are Found in a Novel BCOR Complex That Is Recruited to BCL6 Targets. Mol Cell Biol 2006; 26: 6880-6889
Watson et al. 20
32 Swendeman SL, Spielholz C, Jenkins NA, et al. Characterization of the genomic structure, chromosomal location, promoter, and development expression of the alpha-globin transcription factor CP2. J Biol Chem 1994; 269: 11663-11671
33 Yoon JB, Li G, Roeder RG. Characterization of a family of related cellular transcription factors which can modulate human immunodeficiency virus type 1 transcription in vitro. Mol Cell Biol 1994; 14: 1776-1785
34 Veljkovic J, Hansen U. Lineage-specific and ubiquitous biological roles of the mammalian transcription factor LSF. Gene 2004; 343: 23-40
35 Yoo BK, Emdad L, Gredler R, et al. Transcription factor Late SV40 Factor (LSF) functions as an oncogene in hepatocellular carcinoma. Proc Natl Acad Sci U S A 2010; 107: 8357-8362
36 Rajasekaran D, Siddiq A, Willoughby JLS, et al. Small molecule inhibitors of Late SV40 Factor (LSF) abrogate hepatocellular carcinoma (HCC): Evaluation using an endogenous HCC model. Oncotarget 2015; 6: 26266-26277
37 Goto Y, Yajima I, Kumasaka M, et al. Transcription factor LSF (TFCP2) inhibits melanoma growth. Oncotarget 2016; 7: 2379-2390
38 Grant TJ, Bishop JA, Christadore LM, et al. Antiproliferative small-molecule inhibitors of transcription factor LSF reveal oncogene addiction to LSF in hepatocellular carcinoma. Proc Natl Acad Sci 2012; 109: 4503-4508
39 Italiano A, Di Mauro I, Rapp J, et al. Clinical effect of molecular methods in sarcoma diagnosis (GENSARC): a prospective, multicentre, observational study. Lancet Oncol 2016; 17: 532-538
Figure Legend
Figure 1: Fusion genes define tumour groups.
Kallisto-extracted expression values were used for all genes having a CDS annotation in
Ensembl GRCh38p5. A. t-Distributed Stochastic Neighbour Embedding (t-SNE) analysis.
Tumour samples (coloured spheres) carrying identical or similar fusion genes are linked to
the centroid of the group (small grey sphere). Fusion genes defining major groups are
indicated. Unclassified sarcomas are not represented here for the sake of clarity but were
taken into account in the analysis. An enlarged view of the centre of the figure is presented in
Watson et al. 21
the supplementary material, Figure S2A. B. Hierarchical clustering. 1-pearson correlation and
Ward’s method were used as distance and clustering method, respectively; identified
recurrent fusion genes are indicated. An enlarged view of TFE3-fused, CIC-fused and
BCOR-rearranged samples are presented below (see supplementary material, Figure S2B
for the full annotated cluster).
Figure 2: Subtypes of VGLL2-NCOA2-positive samples
A. Representative images of fibrous tumour samples. SARC070_primary and SARC065
tumour samples contained abundant fibrous stroma, displayed low Ki67 and were mostly
negative for myogenin and desmin. B. Representative images of dense tumour samples.
SARC070_relapse and SARC102 cells were packed and strongly positive for Ki67, with
numerous cells being positive for myogenin and desmin. Scale bar: 100 µm
Figure 3: FUS-NFATC2-positive tumours
Morphology (hematoxylin/eosin staining) of FUS-NFATC2 tumours. A. Proliferation of round
tumour cells embedded in a variable myxoid stroma. B. Round to spindle tumour cells
arranged in sheets associated with haemangioperycitic vessels. C. Tumour cells were focally
embedded within myxohyaline stroma reminiscent of cartilaginous differentiation. Scale bars:
100 µm
Figure 4: EWSR1-PATZ1-positive tumours
A. Schematic diagram of native EWSR1 and PATZ1 proteins indicating fusion point of
EWSR1-PATZ1 fusion gene. LC: Low complexity domain; RRM: RNA recognition motif;
RGG: Arg-Gly-Gly rich region; Zn: Zinc Finger, RanBP2-type; BTB/POZ: Broad-Complex,
Tramtrack and Bric a brac / POxvirus and Zinc finger; H: AT hook; C2H2: Cys2-His2 Zinc-
finger. Exon number based on the indicated RefSeq accession number is indicated below
Watson et al. 22
each protein scheme together with amino acid length scale. B. H & E staining of three
EWSR1-PATZ1-positive samples demonstrating a variety of morphologies (scale bar: 100
µm).
Figure 5: FET-TFCP2 fusion
A. Schematic diagram of native proteins and fusion proteins. EWSR1-TFCP2 fusion was
identified in SARC049 while FUS-TFCP2 was identified in RNA009_16_062 and
RNA020_16_136 samples. LC: Low complexity domain; RRM: RNA recognition motif; RGG:
Arg-Gly-Gly rich region; Zn: Zinc Finger, RanBP2-type; SAM: Sterile alpha motif / pointed
domain. Exon number based on the indicated RefSeq accession number is indicated below
each protein scheme. Amino acid length scale is presented at the bottom. B. Morphological
spectrum of FET-TFCP2 tumours. Hematoxylin/eosin staining (left panels) illustrating
epithelioid (top), spindling (middle) and sclerosing (bottom) areas and immunohistochemistry
(right panels) for MYOD1 (top), desmin (middle), and ALK (bottom) seen in all cases are
illustrated in EWSR1-TFCP2-positive sample SARC049. ALK staining was cytoplasmic-
positive with nuclear and membrane exclusion. Morphological aspects were evocative of an
epithelioid and spindled variant of rhabdomyosarcoma, with numerous mitotic figures. Scale
bar: 40 µm
Figure 6: Expression profiles functional analyses
Heatmap representing the specificity of the top 100 genes (detailed in the supplementary
material, Table S3) identified in the between group analysis for each 24 tumour groups
containing at least two samples. Data were centred and scaled (in row direction) prior
analysis. Colour key of the scaled expression values is shown.
Watson et al. 23
Supporting information:
Supplementary Figure S1: Constitution of the cohorts
Supplementary Figure S2: Details of the unsupervised analyses
Supplementary Figure S3: Details of VGLL2-NCOA2/CITED2, CIC-NUTM1, BCOR-ITD and
EWSR1-PATZ1 fusion points.
Supplementary Figure S4: Sequence of the fusion points and genes expressed in FET-
TFCP2-positive tumors
Supplementary Figure S5: Expression of the most specific gene for each tumor entity (with
more than 2 samples)
Supplementary Table S1: List of the fusion genes routinely tested by RT-PCR at the Institut
Curie Unité de Génétique Somatique.
Supplementary Table S2: Description of the samples from the three cohorts
Supplementary Table S3: Differentially expressed genes in the BGA and supervised
analyses and Gene ontology / GSEA enrichment for all sarcomas subtypes
Supplementary Table S4: Differentially expressed genes and gene ontology enrichment for
FUS-NFATC2- vs EWSR1-NFATC2-positive tumors
A
B
BCOR-CCNB3
CIC-DUX4
EWSR1-WT1
BAF-deficient
FET-ETS
EWSR1-PATZ1
NAB2-STAT6
FET-NR4A3
HEY1-NCOA2
FET-TFCP2
BRD3/4-
NUTM1
FUS-NFATC2
EWSR1-NFATC2
VGLL2-NCOA2/CITED2
BCOR-ITD
BCOR-MAML3 /
ZC3H7B-BCOR
EML4-ALK
EWSR1-
CREB1/ATF1
FET-CREB3L1/2
FET-DDIT3
SS18-SSX
CIC-NUTM1
Tsne
2
Tsne 1-60 60
40
-60
-60
0
0
0
UT
X-T
FE
3
AS
PS
CR
1-T
FE
3
SF
PQ
-TF
E3
CIC
-DU
X4
CIC
-NU
TM
1
CIC
-FO
XO
4
BC
OR
-CC
NB
3
BC
OR
-MA
ML3
ZC
3H
7B
-BC
OR
BC
OR
-IT
D
FE
T-E
TS
CIC
-Fu
sed
EW
SR
1-W
T1
BC
OR
-Rearr
anged
FU
S-N
FA
TC
2
HE
Y1-N
CO
A2
SS
18-S
SX
EW
SR
1-P
TA
TZ
1
PA
X-F
OX
O
TF
E3-F
used
NA
B2-S
TA
T6
FE
T-N
R4A
3
BR
D3/4
-NU
T
EM
L4-A
LK
VG
LL2-F
used
FE
T-T
FC
P2
BA
F-d
eficie
nt
ER
MS
EW
SR
1-C
RE
B1/
EW
SR
1-N
FA
TC
2
AT
/RT
Ward
'sdis
tance
0.0
0.2
0.4
0.6
0.8
AT
F1
Watson et al. Figure 1
A
B
HES Ki67 Desmin
HES Myogenin
SA
RC
070 (
rela
pse)
SA
RC
102
Ki67
SARC070 (primary) SARC065
Ki67
HES Desmin
Myogenin
HES
Ki67
Watson et al. Figure 2
A
B
C
Watson et al. Figure 3
54321 6 7 8 910 13 1411 12 15 16 17 NM_005243
EWSR1 RRM ZnLC RGG RGG RGG
200 300 4001000 500 687600
54321NM_014323
BTB/POZ C2H2 PATZ1
1610 1620 1630 1640 1650 1660 1670 1680 1690 1700
* * * * * * * * * * * * * * * * * * *
accagcctccagctgggctacatcgaccttcctcctccgaggctgggtgagaatgggctacccatctctgaagaccccgacggcccccgaaag
T S L Q L G Y I D L P P P R L G E N G L P I S E D P D G P R K
320 330 340 NM_014323
H
A
B
SA
RC
041
SA
RC
017
G155_R
NA
045_17_001
Watson et al. Figure 4
SARC049: EWSR1-TFCP2
54321 191817161514131211109876
CP2 transcription factor SAMLC
RNA009_16_062 & RNA020_16_136 : FUS-TFCP2CP2 transcription factor SAMLC
201918171615141312111098754321 6
200 300 4001000 500 700600
A
B
54321 6 7 8 910 13 1411 12 15 16 17 NM_005243
EWSR1 RRM ZnLC RGG RGG RGG
54321 6 7 8 9 10 13 1411 12 15 NM_004960
FUSRRM ZnLC RGGRGG
MYOD1
desmin
ALK
HES
HES
HES
151413121110987654321 NM_005653
CP2 transcription factor TFCP2SAMCP2 transcription factor
Watson et al. Figure 5
0-1-2-3 1 2 3
TF
E3-F
used
Em
bry
onalrh
abdom
yosarc
om
a
BC
OR
-rearr
anged
CIC
-Fu
sed
EW
SR
1-W
T1
EM
L4-A
LK
PA
X-F
OX
O
FE
T-E
TS
EW
SR
1-P
AT
Z1
EW
SR
1-N
FA
TC
2
FE
T-N
R4A
3
FE
T-T
FC
P2
NT
RK
3-F
used
FU
S-N
FA
TC
2
FE
T-C
RE
B3L1/2
NT
RK
1-F
used
HE
Y1
-NC
OA
2
FE
T-C
RE
B3L2/3
BR
D3/4
-NU
TM
1
NA
B2-S
TA
T6
SS
18-S
SX
VG
LL2-F
used s
arc
om
aclu
ste
r1
VG
LL2-F
used s
arc
om
aclu
ste
r2
EW
SR
1-C
RE
B1/A
TF
1
Scaled expression value (Log2)
Watson et al. Figure 6
Selection of 78 samples:
Ewing Sarcoma (n=7),
CIC-DUX4 sarcoma (n=6)
Rhabdomyosarcoma (n=6)
BCOR-CCNB3 sarcoma (n=5)
Clear cell sarcoma (n=5)
Solitary fibrous tumor (n=5)
EWSR1-NFATC2 Ewing-like sarcoma (n=4)
Extraskeletal myxoid chondrosarcoma (n=4)
NUT midline carcinoma (n=4)
Alveolar soft part sarcoma (n=3)
Renal cell carcinoma (n=3)
Desmoplastic small round cell tumor (n=3)
Myxoid liposarcoma (n=3)
Congenital fibrosarcoma (n=2)
Low-grade fibromyxoid sarcoma (n=2)
Mesenchymal chondrosarcoma (n=2)
Synovial sarcoma (n=1)
SMARCA4-DTS (n=4)
Malignant rhabdoid tumor (n=7)
SCCOHT (n=1)
+ CIC-FOXO sarcoma (n=1)**
Identification of a pathognomonic
alteration
No pathognomonic alteration identified
Investigation cohort N=94
Robust fusion gene
detected in 55 cases
No robust fusion gene
detected in 39 cases
Single fusion gene
detected in 40 cases
Multiple fusion genes
detected in 15 cases
(1 sample with
chromothripsis)
Sarcoma samples received for molecular
diagnosis
700+ retrospective samples without
detectable fusion with good quality RNA
Follow-up cohort N=12
Watson et al., Supplementary Figure S1
Supplementary Figure S1: Constitution of the cohorts
* See Supplementary table S1
** : RNAseq raw fastq files kindly provided by Yasuhito Arai and Tadashi Hasegawa [27]
*** Screened fusions were:
BCOR-MAML3, CDKN2A-CCDC12, CIC-NUTM1, EML4-ALK, EWSR1-PATZ1, EWSR1-
TFCP2, FUS-TFCP2, FUS-NFATC2, KMT2A-YAP1 / YAP1-KMT2A, MN1-TAF3, NF1-
RHOT1, PARG-BMS1, VGLL2-NCOA2 and VGLL2-CITED2.
Control cohort N=78
Random selection
of 94 cases
RNA-sequencing
RNA-sequencing
PCR screening***
RNA-sequencing
yes no
Molecular investigations including
QPCR on known fusion transcripts *
Pool of samples without
detectable fusion
Watson et al., Supplementary Figure S2
A
-40
20
0
-20
0
0
Tsne 1
Tsne
3
VGLL2-Fused
NTRK3-Fused
NTRK1-Fused
BRD3/4-NUTM1
PAX-FOXO
FET-CREB3L1/2
FET-DDIT3
EWSR1-CREB1/ATF1
ERMS
Ward's distance
0.0
0.2
0.4
0.6
0.8
RNA010_16_052
RNA011_16_065
RNA011_16_066
SARC066
RNA010_16_053
RNA011_16_067
RNA010_16_054
RNA001_16_002
RNA004_16_011
SARC055
SARC069
SARC058
RNA007_16_037
RNA002_16_002
RNA003_16_007
SARC046
RNA004_16_010
SARC053
SARC079
SARC067
SARC087
SARC011
SARC059
SARC043
RNA002_16_003
RNA006_16_025
RNA006_16_026
RNA004_16_009
SARC056
SARC060
RNA006_16_027
RNA005_16_019
RNA003_16_003
SARC042
SARC022
SARC002
SARC019
SARC024
SARC031
SARC073
SARC062
SARC003
SARC090
SARC033
SARC047
RNA004_16_012
SARC085
SARC102
RNA004_16_013
SARC061
SARC065
SARC070_(Primary)
SARC070_(Relapse)
RNA001_16_004
SARC005
SARC063
RNA009_16_062
SARC049
RNA020_16_136
SARC006
SARC016
SARC012
SARC021
SARC071
RNA002_16_004
RNA034_16_245
RNA041_16_300
RNA041_16_303
RNA003_16_005
SARC098
SARC010
SARC091
SARC020
SARC089
SARC072
SARC095
RNA007_16_034
SARC039
SARC035
ASPSCR1-TFE3
ASPSCR1-TFE3
ASPSCR1-TFE3
UXT-TFE3
ASPSCR1-TFE3
ASPSCR1-TFE3
SFPQ-TFE3
EWSR1-ATF1
EWSR1-ATF1
EWSR1-CREB1
EWSR1-CREB1
EWSR1-ATF1
TAF15-NR4A3
TAF15-NR4A3
EWSR1-NR4A3
EWSR1-NR4A3
PAX3-MAML3
BRD4-NUTM1
BRD3-NUTM1
TANC1-DAPL1;FAM172A-THBS4;GABRA1-PSMB3 ;ARHGEF28-EFCAB3;ICK-BTBD9
BRD4-NUTM1
BRD4-NUTM1
BRD4-NUTM1
EML4-ALK
EML4-ALK
EML4-ALKETV6-NTRK3
TPM3-NTRK1
IKBKG-ALK
KMT2A-YAP1
PDLIM5-USP8
FUS-DDIT3
EWSR1-PBX1
EWSR1-CREB3L1
FUS-CREB3L2
VGLL2-CITED2
VGLL2-NCOA2
VGLL2-NCOA2
VGLL2-NCOA2
VGLL2-NCOA2
FUS-DDIT3
EWSR1-DDIT3
ACTB-GLI1
FUS-TFCP2
EWSR1-TFCP2
FUS-TFCP2
GNB1-PTGER3;RERE-CAMTA1;FAR1-ACSM5
SEC14L1-CYP39A1
MAPKAPK5-ACAD10
CREB3L2-AUTS2;TAX1BP1-RARB;PACS1-SEPT9
EWSR1-NFATC2
EWSR1-NFATC2
EWSR1-NFATC2
EWSR1-NFATC2
CDKN2A-CCDC12
PVT1-KIAA0125
FET-ETS
CIC-Fused
EWSR1-WT1
BCOR-Rearranged
FUS-NFATC2
HEY1-NCOA2SS18-SSX
EWSR1-PTATZ1
PAX-FOXO
TFE3-Fused
NAB2-STAT6
FET-NR4A3
BRD3/4-NUT
EML4-ALK
VGLL2-Fused
FET-TFCP2
BAF-deficient
ERMS
EWSR1-CREB1/ATF1
EWSR1-NFATC2
AT/RT
Watson et al., Supplementary Figure S2
B Fusion Gene SampleID
Ward's distance
0.0
0.2
0.4
0.6
0.8
RNA001_16_005RNA006_16_032RNA006_16_030RNA006_16_031
SARC036RNA005_16_020RNA006_16_029
SARC086RNA007_16_040
SARC025RNA007_16_036
SARC097SARC075
SMARCA4_MRT02SMARCB1_MRT05SMARCB1_MRT04
RNA009_16_064SARC007SARC027SARC082SARC015SARC096
RNA003_16_001RNA004_16_015RNA004_16_014RNA004_16_016RNA003_16_002
SARC074SARC030SARC078SARC068SARC048SARC050
RNA009_16_058SARC028SARC032
RNA003_16_004SARC077SARC001SARC009SARC023SARC057
RNA006_16_028SARC040SARC076
RNA005_16_018SARC083
RNA009_16_063SARC099SARC100SARC004SARC081
RNA001_16_007SARC034SARC029SARC045SARC017SARC026
RNA045_17_001SARC041
RNA034_16_244RNA002_16_005RNA005_16_023RNA007_16_039
SARC044SARC037
RNA007_16_038SARC052
RNA001_16_001RNA001_16_006RNA005_16_017RNA005_16_021RNA001_16_008
SARC064SARC080
SMARCA4_MRT01RNA005_16_024
SARC018SARC093SARC008SARC014
SMARCB1_MRT01SMARCB1_MRT02SMARCB1_MRT03
RNA020_16_134RNA022_16_147
SARC094SARC054SARC092SARC088
RNA001_16_003RNA002_16_008RNA002_16_007
EWN1069TSARC051SARC013SARC038SARC084
EWN1072T4SXT1
INI42RNA012_16_073RNA003_16_008
SARC101RNA012_16_074
PAX3-FOXO1PAX7-FOXO1
PAX3-FOXO1
PPFIBP1-TUBA1BPDLIM5-LOC285419
RASSF3-XPOT;IGF2AS-LSP1STAG1-ATF7
TPM3-NTRK1
Chr1-chromothripsisCCNT2-KYNU;RGL1-CCDC149;SMARCD1-KNTC1
KPNA6-EPHB2;XPR1-IGSF21;TNFRSF14-ACOT7;CAMSAP1L1-CSRP1;EPB41-RBM34
BCOR-CCNB3BCOR-CCNB3BCOR-CCNB3BCOR-CCNB3BCOR-CCNB3
BCOR_ITDBCOR_ITDBCOR_ITDBCOR_ITD
BCOR-MAML3BCOR_ITD
ZC3H7B-BCOREPC2-PHC2;ETNK2-IKBKE
TRPS1-RIMBP2;RIMBP2-TRPS1MN1-TAF3
FAM133B-DRG1;LATS1-SNX9;HPS4;LOC96610;HADH-TBC1D24;TRIM4;CALN1;LARGE;TPST1
NF1-RHOT1AKAP13-ABHD2
ETV6-NTRK3KIAA1549-BRAF
MEMO1-BCL11A1H19-COL1A
COL1A-PDGFB
TPR-NTRK1
PVT1-ATP10BSS18-SSX2SS18-SSX2
HEY1-NCOA2HEY1-NCOA2
EWSR1-PATZ1EWSR1-PATZ1EWSR1-PATZ1EWSR1-PATZ1EWSR1-PATZ1
NAB2-STAT6NAB2-STAT6NAB2-STAT6NAB2-STAT6NAB2-STAT6NAB2-STAT6NAB2-STAT6EWSR1-FLI1EWSR1-FLI1EWSR1-FLI1EWSR1-FLI1EWSR1-FLI1
FUS-ERGFUS-ERG
FUS-NFATC2FUS-NFATC2FUS-NFATC2
EWSR1-WT1EWSR1-WT1EWSR1-WT1
CIC-DUX4CIC-DUX4CIC-DUX4CIC-DUX4
CIC-NUTM1CIC-DUX4
CIC-FOXO4CIC-DUX4
CIC-NUTM1CIC-NUTM1CIC-NUTM1CIC-NUTM1
FET-ETS
CIC-Fused
EWSR1-WT1
BCOR-Rearranged
FUS-NFATC2
HEY1-NCOA2SS18-SSX
EWSR1-PTATZ1
PAX-FOXO
TFE3-Fused
NAB2-STAT6
FET-NR4A3
BRD3/4-NUT
EML4-ALK
VGLL2-Fused
FET-TFCP2
BAF-deficient
ERMS
EWSR1-CREB1/ATF1
EWSR1-NFATC2
AT/RT
Watson et al., Supplementary Figure S2
B (continued) Fusion Gene SampleID
Cohort
Sequencin
g_R
un
Te
chnolo
gy
Tu
mor_
tissue
Tu
mor
site
Gender
Age
Tu
mor
Type
0 1 2 3 4 5
ward's distance
SARC017_EWSR1.PATZ1SARC026_EWSR1.PATZ1SARC041_EWSR1.PATZ1
RNA009_16_058_ZC3H7B.BCOR
RNA009_16_063_COL1A.PDGFBRNA006_16_028_ETV6.NTRK3
SARC040_KIAA1549.BRAFSARC076_MEMO1.BCL11A1
SARC083RNA005_16_018_H19.COL1ARNA003_16_004_MN1.TAF3
SMARCA4_MRT02SMARCB1_MRT05SMARCB1_MRT04
SARC007SARC082_CCNT2.KYNU;RGL1.CCDC149;SMARCD1.KNTC1
SARC027_Chr1.chromothripsisSARC096
RNA009_16_064_TPM3.NTRK1RNA007_16_037
SARC064_FUS.ERGSARC080_FUS.ERG
RNA001_16_001_EWSR1.FLI1.1RNA001_16_006_EWSR1.FLI1.2
RNA005_16_021_EWSR1.FLI1RNA005_16_017_EWSR1.FLI1
RNA001_16_008_EWSR1.FLI1.1SARC013_CIC.DUX4EWN1069_CIC.DUX4SARC051_CIC.DUX4
INI42_CIC.DUX4RNA012_16_073_CIC.NUTM1
EWN1072_CIC.DUX4SARC038_CIC.DUX4
4SXT1_CIC.FOXO4SARC101_CIC.NUTM1
RNA012_16_074_CIC.NUTM1RNA003_16_008_CIC.NUTM1
SARC084_CIC.NUTM1SARC093
SMARCA4_MRT01RNA005_16_024
SARC014SARC018SARC008
SMARCB1_MRT01SMARCB1_MRT02SMARCB1_MRT03
SARC039_PVT1.KIAA0125SARC067_BRD4.NUTM1
RNA006_16_026_BRD4.NUTM1RNA006_16_025_BRD4.NUTM1RNA002_16_003_BRD4.NUTM1
SARC043_TANC1.DAPL1;FAM172A.THBS4;GABRA1.PSMB3 ;ARHGEF28.EFCAB3;ICK.BTBD9 RNA003_16_001_BCOR.CCNB3RNA004_16_015_BCOR.CCNB3RNA004_16_014_BCOR.CCNB3RNA003_16_002_BCOR.CCNB3
SARC074_BCOR_ITDRNA004_16_016_BCOR.CCNB3
SARC050_BCOR_ITDSARC068_BCOR_ITDSARC078_BCOR_ITDSARC030_BCOR_ITD
SARC048_BCOR.MAML3
SARC066_UXT.TFE3RNA011_16_065_ASPSCR1.TFE3RNA011_16_066_ASPSCR1.TFE3RNA010_16_053_ASPSCR1.TFE3RNA011_16_067_ASPSCR1.TFE3
RNA010_16_054_SFPQ.TFE3SARC069_EWSR1.CREB1SARC055_EWSR1.CREB1
RNA001_16_002_EWSR1.ATF1SARC046_EWSR1.NR4A3
RNA003_16_007_TAF15.NR4A3RNA002_16_002_TAF15.NR4A3
RNA004_16_010_EWSR1.NR4A3SARC052B2.STAT6
RNA007_16_038B2.STAT6SARC037B2.STAT6SARC044B2.STAT6
RNA007_16_039B2.STAT6RNA002_16_005B2.STAT6RNA005_16_023B2.STAT6
RNA010_16_052_ASPSCR1.TFE3SARC079
SARC058_EWSR1.ATF1RNA004_16_011_EWSR1.ATF1
SARC020SARC102_VGLL2.NCOA2
RNA002_16_007_EWSR1.WT1SARC087_BRD3.NUTM1
RNA006_16_027_ETV6.NTRK3RNA005_16_019_TPM3.NTRK1
RNA003_16_003SARC063_ACTB.GLI1
SARC089SARC035
RNA007_16_034RNA034_16_245_EWSR1.NFATC2RNA041_16_300_EWSR1.NFATC2RNA002_16_004_EWSR1.NFATC2RNA041_16_303_EWSR1.NFATC2
SARC053_PAX3.MAML3SARC056_EML4.ALKSARC060_EML4.ALK
RNA004_16_009_EML4.ALKSARC042_IKBKG.ALKSARC062_FUS.DDIT3
SARC073SARC022SARC010
SARC019_PDLIM5.USP8SARC024SARC031
SARC002_KMT2A.YAP1RNA004_16_012RNA004_16_013
SARC065_VGLL2.NCOA2SARC061_VGLL2.NCOA2
SARC070 (Primary)SARC033_EWSR1.CREB3L1
SARC047_FUS.CREB3L2SARC003_EWSR1.PBX1
SARC090SARC005_EWSR1.DDIT3
RNA001_16_004_FUS.DDIT3SARC059SARC072SARC095SARC099
SARC100_TPR.NTRK1SARC049_EWSR1.TFCP2
RNA009_16_062_FUS.TFCP2RNA020_16_136_FUS.TFCP2SARC012_SEC14L1.CYP39A1
SARC016SARC006_GNB1.PTGER3;RERE.CAMTA1;FAR1.ACSM5
SARC070 (Relapse)_VGLL2.NCOA2SARC021_MAPKAPK5.ACAD10
SARC071_CREB3L2.AUTS2;TAX1BP1.RARB;PACS1.SEPT9RNA001_16_005_PAX3.FOXO1RNA006_16_032_PAX7.FOXO1
RNA006_16_030RNA006_16_031_PAX3.FOXO1
SARC036RNA005_16_020_PPFIBP1.TUBA1B
RNA006_16_029_PDLIM5.LOC285419SARC086
SARC025_STAG1.ATF7RNA007_16_040_RASSF3.XPOT;IGF2AS.LSP1
SARC097RNA007_16_036
SARC075RNA001_16_003_EWSR1.WT1RNA002_16_008_EWSR1.WT1
SARC094_FUS.NFATC2RNA020_16_134_FUS.NFATC2RNA022_16_147_FUS.NFATC2
SARC054SARC085_VGLL2.CITED2
SARC088SARC091SARC092SARC098
RNA003_16_005_CDKN2A.CCDC12SARC077_FAM133B.DRG1;LATS1.SNX9;HPS4;LOC96610;HADH.TBC1D24;TRIM4;CALN1;LARGE;TPST1
SARC028_EPC2.PHC2;ETNK2.IKBKESARC009_NF1.RHOT1
SARC001SARC023_AKAP13.ABHD2
SARC032_TRPS1.RIMBP2;RIMBP2.TRPS1SARC057SARC011
SARC015_KPNA6.EPHB2;XPR1.IGSF21;TNFRSF14.ACOT7;CAMSAP1L1.CSRP1;EPB41.RBM34SARC034_SS18.SSX2
RNA001_16_007_SS18.SSX2SARC004
SARC081_PVT1.ATP10BSARC029_HEY1.NCOA2SARC045_HEY1.NCOA2
RNA045_17_001_EWSR1-PATZ1
RNA034_16_244_EWSR1-PATZ1
To
pH
at/C
ufflin
ks e
xp
ressio
n
C
Watson et al., Supplementary Figure S2
SARC017_EWSR1-PATZ1RNA045_17_001_EWSR1-PATZ1
SARC026_EWSR1-PATZ1RNA034_16_244_EWSR1-PATZ1
SARC041_EWSR1-PATZ1
SARC066_UXT.TFE3RNA010_16_052_ASPSCR1.TFE3RNA011_16_065_ASPSCR1.TFE3RNA011_16_066_ASPSCR1.TFE3RNA010_16_053_ASPSCR1.TFE3RNA011_16_067_ASPSCR1.TFE3
RNA010_16_054_SFPQ.TFE3SARC053_PAX3.MAML3
SARC079SARC058_EWSR1.ATF1
RNA001_16_002_EWSR1.ATF1RNA004_16_011_EWSR1.ATF1
SARC033_EWSR1.CREB3L1SARC047_FUS.CREB3L2
SARC002_KMT2A.YAP1SARC003_EWSR1.PBX1
SARC069_EWSR1.CREB1SARC055_EWSR1.CREB1
SARC090RNA007_16_037
RNA002_16_007_EWSR1.WT1SARC087_BRD3.NUTM1
SARC020SARC102_VGLL2.NCOA2
SARC089SARC072SARC095SARC091
SARC070 (Relapse)_VGLL2.NCOA2SARC085_VGLL2.CITED2
SARC088SARC067_BRD4.NUTM1
RNA006_16_026_BRD4.NUTM1RNA006_16_025_BRD4.NUTM1RNA002_16_003_BRD4.NUTM1
SARC011SARC015_KPNA6.EPHB2;XPR1.IGSF21;TNFRSF14.ACOT7;CAMSAP1L1.CSRP1;EPB41.RBM34
SARC057SARC059
SARC043_TANC1.DAPL1;FAM172A.THBS4;GABRA1.PSMB3 ;ARHGEF28.EFCAB3;ICK.BTBD9SARC056_EML4.ALKSARC060_EML4.ALK
RNA004_16_009_EML4.ALKSARC042_IKBKG.ALKSARC062_FUS.DDIT3
SARC073SARC022SARC010
SARC019_PDLIM5.USP8SARC024SARC031
RNA004_16_012RNA004_16_013
SARC065_VGLL2.NCOA2SARC061_VGLL2.NCOA2
SARC070 (Primary)SARC049_EWSR1.TFCP2
RNA009_16_062_FUS.TFCP2RNA020_16_136_FUS.TFCP2SARC012_SEC14L1.CYP39A1
SARC016SARC006_GNB1.PTGER3;RERE.CAMTA1;FAR1.ACSM5
SARC092SARC021_MAPKAPK5.ACAD10
SARC071_CREB3L2.AUTS2;TAX1BP1.RARB;PACS1.SEPT9RNA034_16_245_EWSR1.NFATC2RNA041_16_300_EWSR1.NFATC2RNA002_16_004_EWSR1.NFATC2RNA041_16_303_EWSR1.NFATC2
RNA006_16_027_ETV6.NTRK3RNA005_16_019_TPM3.NTRK1
SARC063_ACTB.GLI1RNA003_16_003
SARC035RNA007_16_034
SARC039_PVT1.KIAA0125RNA009_16_063_COL1A.PDGFB
RNA006_16_028_ETV6.NTRK3SARC040_KIAA1549.BRAF
SARC076_MEMO1.BCL11A1SARC083
RNA005_16_018_H19.COL1ARNA003_16_004_MN1.TAF3
SARC099SARC100_TPR.NTRK1
SARC029_HEY1.NCOA2SARC045_HEY1.NCOA2
SARC032_TRPS1.RIMBP2;RIMBP2.TRPS1SARC034_SS18.SSX2
RNA001_16_007_SS18.SSX2SARC004
SARC081_PVT1.ATP10BSARC009_NF1.RHOT1
SARC001SARC023_AKAP13.ABHD2
SARC077_FAM133B.DRG1;LATS1.SNX9;HPS4;LOC96610;HADH.TBC1D24;TRIM4;CALN1;LARGE;TPST1SARC028_EPC2.PHC2;ETNK2.IKBKE
SARC046_EWSR1.NR4A3RNA003_16_007_TAF15.NR4A3RNA002_16_002_TAF15.NR4A3
RNA004_16_010_EWSR1.NR4A3SARC005_EWSR1.DDIT3
RNA001_16_004_FUS.DDIT3SARC098
RNA003_16_005_CDKN2A.CCDC12SARC052B2.STAT6
RNA007_16_038B2.STAT6SARC037B2.STAT6SARC044B2.STAT6
RNA007_16_039B2.STAT6RNA002_16_005B2.STAT6RNA005_16_023B2.STAT6
RNA001_16_005_PAX3.FOXO1RNA006_16_032_PAX7.FOXO1
RNA006_16_030RNA006_16_031_PAX3.FOXO1
SARC036RNA005_16_020_PPFIBP1.TUBA1B
RNA006_16_029_PDLIM5.LOC285419SARC086
SARC025_STAG1.ATF7RNA007_16_040_RASSF3.XPOT;IGF2AS.LSP1
SARC097RNA007_16_036
SMARCA4_MRT02SMARCB1_MRT04SMARCB1_MRT05
SARC096RNA009_16_064_TPM3.NTRK1RNA001_16_003_EWSR1.WT1RNA002_16_008_EWSR1.WT1
SARC094_FUS.NFATC2RNA020_16_134_FUS.NFATC2RNA022_16_147_FUS.NFATC2
SARC007SARC027_Chr1.chromothripsis
SARC082_CCNT2.KYNU;RGL1.CCDC149;SMARCD1.KNTC1SARC054SARC075
SMARCA4_MRT01SARC093SARC014SARC018SARC008
RNA005_16_024SMARCB1_MRT02SMARCB1_MRT03SMARCB1_MRT01
SARC064_FUS.ERGSARC080_FUS.ERG
RNA001_16_001_EWSR1.FLI1.1RNA001_16_006_EWSR1.FLI1.2
RNA005_16_021_EWSR1.FLI1RNA001_16_008_EWSR1.FLI1.1
RNA005_16_017_EWSR1.FLI1SARC013_CIC.DUX4EWN1069_CIC.DUX4SARC051_CIC.DUX4
SARC084_CIC.NUTM1EWN1072_CIC.DUX4SARC038_CIC.DUX4
4SXT1_CIC.FOXO4INI42_CIC.DUX4
RNA012_16_073_CIC.NUTM1SARC101_CIC.NUTM1
RNA012_16_074_CIC.NUTM1RNA003_16_008_CIC.NUTM1
RNA003_16_001_BCOR.CCNB3RNA004_16_015_BCOR.CCNB3RNA004_16_014_BCOR.CCNB3RNA003_16_002_BCOR.CCNB3RNA004_16_016_BCOR.CCNB3
SARC074_BCOR_ITDSARC068_BCOR_ITDSARC078_BCOR_ITDSARC030_BCOR_ITDSARC050_BCOR_ITD
SARC048_BCOR.MAML3RNA009_16_058_ZC3H7B.BCOR
0 1 2 3 4 5
ward's distance
Cohort
Sequencin
g_R
un
Te
chnolo
gy
Tu
mor_
tissue
Tu
mor
site
Gender
Age
Tu
mor
Type
Kalllis
toexp
ressio
n
D
Watson et al., Supplementary Figure S2
Supplementary Figure S2: Detailed unsupervised analyses.
A) Zoom on the center region of the TSNE analysis (Figure 1A) with another angle, demonstrating
dispersion of EWSR1-CREB3L1/2, NTRK1-, NTRK3- and VGLL2-fused tumors, and at a lesser extend
EWSR1-CREB1/ATF1 and FET-DDIT3-fused tumors, as compared to the more defined EWSR1-NR4A3,
EWSR1-WT1 or EWSR1-PATZ1 groups. B). Details of the clustering analysis (identical to Figure 1B)
showing all the fusion genes identified. C&D) Expression profiles were generated using two different
methods: in C) sequencing reads were aligned with TopHat2 and expression profiles were extracted using
Cufflinks2 tool. In D) expression profiles were extracted with Kallisto tools that does not require prior
alignment. In both cases, hierarchical clustering using the ten percent most variant genes based on
interquartile range, proved to be quite similar. The main difference with Figure 1 concerned the VGLL2-fused
samples that split in two or more groups. No clustering bias due to either patient ‘s clinical data (age, tumor
site, gender or tissue type) or experimental condition (technology used or sequencing run) could be
observed.
Age
<3<10<25>25
Gender
MF
Tumor site
Head and neckTrunkDistal limbsProximal limbs
Tumor tissue
soft tissuebonebrain tissue
Technology
HiSeq PE100NextSeq PE150
Cohort
INVESTIGATION COHORTCONTROL COHORTSCREENED SAMPLES
Legend
Tumor Type Sequencing Run
Inflammatory myofibroblastic tumorFIC
Myxoid Liposarcoma
Low-grade fibromyxoid sarcoma
Myoepithelioma
FET-TFCP2Extraskeletal myxoid chondrosarcoma
MRT Extracerebral
Lipofibromatosis-like Neural Tumor
EWSR1-PATZ1
Mesenchymal chondrosarcoma
EWSR1-NFATc2 sarcoma
Biphenotypic sinonasal sarcoma
Alveolar soft part sarcoma
Clear cell sarcoma
DSCRTEML4-ALK sarcoma
Dermatofibrosarcoma protuberans
Ewing sarcoma
CIC-Fused sarcoma
AT/RTARMS
ERMS
BCOR-Rearranged sarcoma
Unclassified SarcomaVGLL2-Fused
NUT midline carcinoma
Solitary fibrous tumorSMARCA4-DTSSCCOHT
Synovial Sarcoma
Pericytoma
Renal cell carcinoma
HiSeq.RunB127
NextSeq.Run46
NextSeq.Run49
NextSeq.Run1
NextSeq.Run41
NextSeq.Run34
NextSeq.Run35
HiSeq.RunB102
HiSeq.RunB137
HiSeq.RunB68
HiSeq.RunB72
NextSeq.Run48
HiSeq.RunB135
HiSeq.RunB70
HiSeq.RunA119
HiSeq.RunA53
EXT.HASEGAWA
HiSeq.RunB143
NextSeq.Run54
NextSeq.Run29
NextSeq.Run38
HiSeq.RunA125
NextSeq.Run118
NextSeq.Run142
HiSeq.RunA304
NextSeq.Run73
NextSeq.Run81
NextSeq.Run43
NextSeq.Run45
Watson et al., Supplementary Figure S2
Watson et al., Supplementary Figure S3
C
SARC084 & RNA012_16_074
GACATCTTCACCTTTGACCGTACAGCATCTGCATTGCCGGGACCGGATATG
D I F T F D R T A S A L P G P D M
CIC exon 18 NUTM1 exon 3
SARC101
GACATCTTCACCTTTGACCGTACAGTGTACATTCCGAAGAAGGCAGCCTCC
D I F T F D R T V Y I P K K A A S
CIC exon 18 NUTM1 exon 6
RNA003_16_008
CGCAAGAAGAGGAAGAACTCCACGGTGTACATTCCGAAGAAGGCAGCCTCC
R K K R K N S T V Y I P K K A A S
NUTM1 exon 6CIC exon 17
RNA012_16_073
CCCAGCCCCGCAGGGGGCCCAGACCACGGACCTCCTGCTCCTGAGGCACCC
P S P A G G P D H G P P A P E A P
NUTM1 exon 6CIC exon 20
B
WT CSKDLEAFNPESKELLDLVEFTNEIQTLLGSSVEWLHPSDLASDNYW*
SARC068 CSKDLEAFNPESKELLDLVEFTNEIQTLLGSSVSLEAFNPESKELLDLVEFTNEIQTLLGSSVEWLHPSDLASDNYW*
SARC050 CSKDLEAFNPESKELLDLVEFTNEIQTLLGSSVFMEAFNPESKELLDLVEFTNEIQTLLGSSVEWLHPSDLASDNYW*
SARC078 CSKDLEAFNPESKELLDLVEFTNEIQTLLGSSVEWLHPSDLASDKELLDLVEFTNEIQTLLGSSVEWLHPSDLASDNYW*
SARC074 CSKDLEAFNPESKELLDLVEFTNEIQTLLGSSVEWLHPSDLASDELLDLVEFTNEIQTLLGSSVEWLHPSDLASDNYW*
SARC030 CSKDLEAFNPESKELLDLVEFTNEIQTLLGSSVEWLHPSDLASDNYWLDLVEFTNEIQTLLGSSVEWLHPSDLASDNYW*
A
SARC061
SARC070
SARC088
SARC102VGLL2 exon 2 NCOA2 exon 14
CCAAGGAGCTCTGGGCCCTGGCGAGGAATGATTGGTAACAGTGCTTCTCGG
P R S S G P W R G M I G N S A S R
VGLL2 exon 2 NCOA2 exon 13
CCAAGGAGCTCTGGGCCCTGGCGAGCTTTTAATAACCCACGACCAGGGCAA
P R S S G P W R A F N N P R P G Q
GGTCTGGGCCTCAGCGTGGACTCAGGAATGATTGGTAACAGTGCTTCTCGG
G L G L S V D S G M I G N S A S R
VGLL2 exon 3 NCOA2 exon 14
SARC065
Watson et al., Supplementary Figure S3
D
Supplementary Figure S3: Details of VGLL2-NCOA2/CITED2 (A), CIC-NUTM1 (B), BCOR-ITD (C) and
EWSR1-PATZ1 (D) fusion points.
Sanger sequences of the fusion points are given in A, B and D together with amino acid sequences of the
translated fusion genes. Amino acid sequences of the internal tandem duplications in exon 15 of BCOR are
shown for BCOR-ITD-positive tumors (C).
EWSR1 exon 8 PATZ exon 1
SARC041
CGGGGAGGAGGACGCGGTGGAATGGGAATGAACTTCAAAATTAAAGACGGCCCCCGAAAGAGG
R G G G R G G M G M N F K I K D G P R K R
EWSR1 intron 8
(936nt from exon8)
EWSR1 exon 8 PATZ exon 1
SARC026
CGGGGAGGAGGACGCGGTGGAATGGGAAGTGACCCCGACGGCCCCCGAAAG
R G G G R G G M G S D P D G P R K
EWSR1 exon 8 PATZ exon 1
RNA045_17_001
CGGGGAGGAGGACGCGGTGGAATGGGCCTCCAGCTGGGCTACATCGACCTT
R G G G R G G M G L Q L G Y I D L
CGGGGAGGAGGACGCGGTGGAATGGGTCCCACTTTTCATTATGCTGCCGGGGGCCCCCGAAAGAGGAGCCGG
R G G G R G G M G P T F H Y A A G G P R K R S R
EWSR1 exon 8 PATZ exon 1
RNA034_16_244
EWSR1 intron 8 (1689nt from exon8)
EWSR1 exon 9 (NM_001163287)
GCTGGGTGAGAATGGGCTACCCATC
L G E N G L P I
EWSR1 exon 8 PATZ exon 1
SARC017
CCTTCCTCCTCCGAGGCTGGGTGAG
L P P P R L G E
CGGGGAGGAGGACGCGGTGGAATGGG
R G G G R G G M G
Watson et al., Supplementary Figure S4
CGTGGAGGCAGAGGTGGCATGGGTGATGTCCTTGCATTGCCCATTTTTAAG
R G G R G G M G D V L A L P I F K
FUS exon 6 TFCP2 exon 2
RNA020_16_136 & RNA009_16_062
CCAGCAGCCACTGCACCTACAAGTGATGTCCTTGCATTGCCCATTTTTAAG
P A A T A P T S D V L A L P I F K
EWSR1 exon 5 TFCP2 exon 2
SARC049
B
AV
GLL2-F
used c
luste
r 2 (
4)
BC
OR
-Rearr
anged
sarc
om
a(1
2)
NU
T m
idlin
ecarc
inom
a(5
)
FIC
(2)
VG
LL2-F
used c
luste
r 1 (
3)
DS
RC
T (
3)
EW
SR
1-N
FA
TC
2 s
arc
om
a(4
)
extr
askele
tal m
yxoid
chondro
sarc
om
a(4
)
Uncla
ssifie
dsarc
om
a(6
0)
Mesenchym
alchondro
sarc
om
a(2
)
LP
F-N
T (
3)
Low
gra
de f
ibro
myxoid
sarc
om
a(2
)F
US
-NF
AT
C2 s
arc
om
a(3
)
BA
F-D
eficie
nt(1
2)
Solit
ary
fib
rous
tum
or
(7)
EW
SR
1-P
AT
Z1 s
arc
om
a(5
)R
enalcell
carc
inom
a(3
)C
IC-F
used
(12)
Myxoid
Lip
osarc
om
a (
3)
Synovia
l S
arc
om
a(2
)A
lveola
rsoft
part
sarc
om
a(4
)C
lear
cell
sarc
om
a(5
)E
RM
S (
2)
FE
T-E
TS
Ew
ing s
arc
om
a(7
)E
ML4
-ALK
sarc
om
a(3
)A
RM
S (
4)
FE
T-T
FC
P2 (
3)
2
4
6
8 ENSG00000171094 (ALK)
Inte
nsitie
sin
Log2(T
PM
+2) ENSG00000164362 (TERT)
FU
S-N
FA
TC
2 s
arc
om
a(3
)
NU
T m
idlin
ecarc
inom
a(5
)
DS
RC
T (
3)
FE
T-E
TS
Ew
ing s
arc
om
a(7
)
Synovia
l S
arc
om
a(2
)
EW
SR
1-P
AT
Z1 s
arc
om
a(5
)
Renalcell
carc
inom
a(3
)
AR
MS
(4)
Alv
eola
rsoft
part
sarc
om
a(4
)
Myxoid
Lip
osarc
om
a (
3)
CIC
-Fu
sed
(12)
VG
LL2-F
used c
luste
r 2 (
4)
BA
F-D
eficie
nt(1
2)
Cle
ar
cell
sarc
om
a(5
)E
WS
R1-N
FA
TC
2 s
arc
om
a(4
)
extr
askele
tal m
yxoid
chondro
sarc
om
a(4
)
Uncla
ssifie
dsarc
om
a(6
0)
BC
OR
-Rearr
anged
sarc
om
a(1
2)
EM
L4
-ALK
sarc
om
a(3
)
Solit
ary
fib
rous
tum
or
(7)
FIC
(2)
LP
F-N
T (
3)
Mesenchym
alchondro
sarc
om
a(2
)
VG
LL2-F
used c
luste
r 1 (
3)
Low
gra
de f
ibro
myxoid
sarc
om
a(2
)
ER
MS
(2)
FE
T-T
FC
P2 (
3)
2
4
6
8
Inte
nsitie
sin
Log2(T
PM
+2)
**
**
Supplementary Figure S4: Sequence of the fusion points and genes expressed in FET-TFCP2-positive
tumors
A. Sanger sequencing of EWSR1-TFCP2-positive sample (SARC049) and FUS-TFCP2-positive
samples (RNA020_16_136, similar to RNA009_16_062)
B. Boxplots for ALK and TERT gene expression level as log2(transcript per million + 2) across all
tumor samples demonstrate strong signals in FET-TFCP2 tumor samples. Number of samples for
each boxplot is indicated under brackets. ** : Welsh t-test p-value < 0.01.
B
C D
E F
G H
ENSG00000175832 (ETV4)
2
4
6
8
10
Inte
nsitie
sin
Lo
g2
(TP
M+
2)
**
ENSG00000179111 (HES7)
2
4
6
8
Inte
nsitie
sin
Lo
g2
(TP
M+
2)
***
ENSG00000183337 (BCOR)
2
4
6
8
Inte
nsitie
sin
Lo
g2
(TP
M+
2)
n.s.
Inte
nsitie
sin
Lo
g2
(TP
M+
2)
*****
ENSG00000132434 (LANCL2)
3
4
5
6
7
2
4
6
8
**ENSG00000162624 (LHX8)
Inte
nsitie
sin
Lo
g2
(TP
M+
2)
***
ENSG00000172116 (CD8B)
2
4
6
8
Inte
nsitie
sin
Lo
g2
(FP
KM
+2
)
2
4
6
8
10
ENSG00000004700 (RECQL)
Inte
nsitie
sin
Lo
g2
(TP
M+
2)
12 ***
2
4
6
8 ENSG00000132975 (GPR12)
Inte
nsitie
sin
Lo
g2
(TP
M+
2)
***
Watson et al., Supplementary Figure S5
A
I
J
K L
M N
OP
Watson et al., Supplementary Figure S5
ENSG00000165553 (NGB)
2
4
6
8
Inte
nsitie
sin
Lo
g2
(TP
M+
2)
***
ENSG00000158022 (TRIM63)
2
4
6
8
10
12
Inte
nsitie
sin
Lo
g2
(TP
M+
2)
***
2
4
6
8
10 ENSG00000179761 (PIPOX)
Inte
nsitie
sin
Lo
g2
(TP
M+
2)
***
ENSG00000151715 (TMEM45B)
2
4
6
8
Inte
nsitie
sin
Lo
g2
(TP
M+
2)
***
2
4
6
8
10 ENSG00000188992 (LIPI)
Inte
nsitie
sin
Lo
g2
(TP
M+
2)
***
ENSG00000197696 (NMB)
2
4
6
8
10
12
Inte
nsitie
sin
Lo
g2
(TP
M+
2)
***
2
4
6
8
10ENSG00000119866 (BCL11A)
Inte
nsitie
sin
Lo
g2
(TP
M+
2)
**
2
4
6
8
10ENSG00000181092 (ADIPOQ)
Inte
nsitie
sin
Lo
g2
(TP
M+
2)
*
Supplementary Figure S5: Expression of the most specific gene for each tumor entity (with more than 2 samples)
A. Expression level of LANCL2 gene throughout all samples demonstrating its specificity for VGLL2-Fused tumors
B. Expression level of ETV4 gene throughout all samples demonstrating its specificity for CIC-Fused tumors
C. Expression level of BCOR gene throughout all samples demonstrating its lack of specificity for BCOR-rearranged tumors
D. Expression level of HES7 gene throughout all samples demonstrating its specificity for BCOR-rearranged tumors
E. Expression level of LHX8 gene throughout all samples demonstrating its specificity for EML4-ALK-positive tumors
F. Expression level of CD8B gene throughout all samples demonstrating its specificity for FUS-NFATC2-positive tumors
G. Expression level of GPR12 gene throughout all samples demonstrating its specificity for EWSR1-PATZ1-positive tumors
H. Expression level of RECQL gene throughout all samples demonstrating its specificity for FET-TFCP2-positive tumors
I. Expression level of TRIM63 gene throughout all samples demonstrating its specificity for TFE3-fused sarcoma
J. Expression level of NGB gene throughout all samples demonstrating its specificity for EWSR1-WT1 desmoplastic small
round cell tumors
K. Expression level of PIPOX gene throughout all samples demonstrating its specificity for PAX-FOXO alveolar
rhabdomyosarcoma
L. Expression level of TMEM45 gene throughout all samples demonstrating its specificity for EWSR1-NFATC2-positive
tumors
M. Expression level of NMB gene throughout all samples demonstrating its specificity for FET-NR4A3 extraskeletal myxoid
chondrosarcoma
N. Expression level of LIPI gene throughout all samples demonstrating its specificity for FET-ETS Ewing sarcoma
O. Expression level of ADIPOQ gene throughout all samples demonstrating its specificity FET-DDIT3 myxoid liposarcoma
P. Expression level of BCL11A gene throughout all samples demonstrating its specificity for NUTM1-BRD3/4 NUT-midline
carcinoma
Q. Expression level of NEMP2 gene throughout all samples demonstrating its specificity for NAB2-STAT6 Solitary Fibrous
Tumors
R. Expression level of GLP1R gene throughout all samples demonstrating its specificity for FET-CREB1/ATF1 clear cell
sarcoma
Number of tumors by groups are indicated under brackets. *, ** and ***: Welsh t-test p-value < 0.05, 0.01 and 10-4,
respectively, n.s.: not significant
Q
***
R
ENSG00000189362 (NEMP2)
2
4
6
8
10
Inte
nsitie
sin
Lo
g2
(TP
M+
2)
ENSG00000112164 (GLP1R)
2
4
6
8
Inte
nsitie
sin
Lo
g2
(TP
M+
2) *
Watson et al., Supplementary Figure S5