genomic characterization of metastatic patterns from ... · 28/06/2021 · 2 nguyen, b., et al....
TRANSCRIPT
Nguyen, B., et al. (2021). BioRxiv [preprint] 1
Genomic characterization of metastatic patterns from prospective clinical sequencing of 25,000 patients Bastien Nguyen1,2,3*, Christopher Fong1,2,3*, Anisha Luthra1,2,3, Shaleigh A. Smith1,2,3, Renzo G. DiNatale4,5, Subhiksha Nandakumar1,2,3, Henry Walch1,2,3, Walid K. Chatila1,2,3, Ramyasree Madupuri1,2,3, Ritika Kundra1,2,3, Craig M. Bielski2,6, Brooke Mastrogiacomo1,2,3, Adrienne Boire2, Sarat Chandarlapaty2,7, Karuna Ganesh4,7, James J. Harding6,7, Christine A. Iacobuzio-Donahue2,9, Pedram Razavi6,7, Ed Reznik1,2,3, Charles M. Rudin4,7, Dmitriy Zamarin6,7, Wassim Abida7, Ghassan K. Abou-Alfa7, Carol Aghajanian7, Andrea Cercek7, Ping Chi7, Darren Feldman7, Alan L. Ho7, Gopakumar Iyer7, Yelena Y. Janjigian7, Michael Morris7, Robert J. Motzer7, Eileen M. O'Reilly7, Michael A. Postow7, Nitya P. Raj7, Gregory J. Riely7, Mark E. Robson7, Jonathan E. Rosenberg7, Anton Safonov7, Alexander N. Shoushtari7, William Tap7, Min Yuen Teo7, Anna M. Varghese7, Martin Voss7, Rona Yaeger7, Marjorie G. Zauderer7, Nadeem Abu-Rustum8, Julio Garcia-Aguilar8, Bernard Bochner8, Abraham Hakimi8, William R. Jarnagin8, David R. Jones8, Daniela Molena8, Luc Morris8, Eric Rios-Doria8, Paul Russo8, Samuel Singer8, Vivian E. Strong8, Debyani Chakravarty9, Lora H. Ellenson9, Anuradha Gopalan9, Jorge S. Reis-Filho9, Britta Weigelt9, Marc Ladanyi9, Mithat Gonen3, Sohrab P. Shah3, Joan Massague10, Jianjiong Gao1,2,3, Ahmet Zehir9, Michael F. Berger1,2,9, David B. Solit1,2,6,7, Samuel F. Bakhoum2, Francisco Sanchez-Vega3,8,11 and Nikolaus Schultz1,2,3,11 1Marie-Josée and Henry R. Kravis Center for Molecular Oncology, Memorial Sloan Kettering Cancer Center, New York, NY, USA 2Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA 3Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY, USA 4Molecular Pharmacology Program, Sloan Kettering Institute, New York, NY, USA 5Urology and Renal Transplantation Service, Virginia Mason Medical Center, Seattle, WA, USA 6Weill Medical College at Cornell University, New York, NY, USA 7Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, NY, USA 8Department of Surgery, Memorial Sloan Kettering Cancer Center, New York, NY, USA 9Department of Pathology, Memorial Sloan Kettering Cancer Center, New York, NY, USA 10Cancer Biology and Genetics Program, Sloan Kettering Institute, New York, NY, USA 11Senior author *These authors contributed equally to this work Correspondence: [email protected] and [email protected]
INTRODUCTION Although most cancer deaths are due to metastatic spread, little is known about the genomic determinants of cancer metastasis. Once metastatic cancer cells have detached from the primary tumor site, they can invade all parts of the body (Lambert et al., 2017; Massagué and Obenauf, 2016). However, the distribution of metastatic sites for a given primary tumor is not random and is dictated by factors such as anatomical location, cell of origin and molecular subtype, among others (Gao et al., 2019; Nguyen et al., 2009). Furthermore, tumor cell-extrinsic factors such as treatment, target organ microenvironment and other systemic factors such as circulating chemokines and cytokines can also influence the pattern of metastatic progression (Massagué and Ganesh, 2021). The classical seed-and-soil hypothesis, according to which disseminated cancer cells preferentially colonize organs that enable and are compatible with their own growth, has been explored for more than a century (Paget, 1889). Yet much remains unknown about the interplay between tumor genomic features and metastatic potential, as well as organ-specific patterns of metastasis. Molecular profiling of tumors coupled with clinical annotation of metastatic events could help provide insight into this question. However, large-scale cancer sequencing efforts have so far
focused on primary, untreated tumors (e.g., The Cancer Genome Atlas (Sanchez-Vega et al., 2018)), or they have characterized the overall genomic landscape of metastatic disease without explicitly interrogating specific routes of metastatic dissemination (Priestley et al., 2019; Robinson et al., 2017; Zehir et al., 2017). Other studies have investigated the genomic complexity of cancer metastasis by reconstructing tumor evolution across different organs at varying levels of resolution, but they have been limited by small sample sizes (Brastianos et al., 2015; Brown et al., 2017; Eckert et al., 2016; Hu et al., 2020; Jiménez-Sánchez et al., 2017; Makohon-Moore et al., 2017; Naxerova et al., 2017; Noorani et al., 2020; Reiter et al., 2020; Shih et al., 2020). Identifying associations between genomic features and specific patterns of metastatic spread is an active area of research and several landmark studies on this topic have been published during the past few years (Birkbak and McGranahan, 2020). In particular, richly annotated datasets combining genomic features and detailed clinical history of metastases for individual patients have been recently made available through large collaborative efforts such as METABRIC in breast cancer (Rueda et al., 2019) and TRACERx in clear-cell renal cell carcinoma (Turajlic et al., 2018). However, a study involving thousands of participants across multiple tumor types in which clinical and genomic data has been homogeneously processed through a unified computational pipeline is still lacking.
Progression to metastatic disease remains the main cause of cancer death. Yet, the underlying genomic mechanisms driving metastasis remain largely unknown. Here, we present MSK-MET, an integrated pan-cancer cohort of tumor genomic and clinical outcome data from more than 25,000 patients. We analyzed this dataset to identify associations between tumor genomic alterations and patterns of metastatic dissemination across 50 tumor types. We found that chromosomal instability is strongly correlated with metastatic burden in some tumor types, including prostate adenocarcinoma, lung adenocarcinoma and HR-positive breast ductal carcinoma, but not in others, such as colorectal adenocarcinoma, pancreatic adenocarcinoma and high-grade serous ovarian cancer. We also identified specific somatic alterations associated with increased metastatic burden and specific routes of metastatic spread. Our data offer a unique resource for the investigation of the biological basis for metastatic spread and highlight the crucial role of chromosomal instability in cancer progression.
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted June 30, 2021. ; https://doi.org/10.1101/2021.06.28.450217doi: bioRxiv preprint
Nguyen, B., et al. (2021). BioRxiv [preprint] 2
Figure 1. Overview of the MSK-MET cohort. Metastatic patterns of 50 tumor types. For each tumor type, the following attributes are shown from left to right: tumor type abbreviation, number of patients, distribution of age at sequencing (red vertical line indicates the median), overall survival in years from time of sequencing (red vertical line indicates the median OS), sex ratio (female = gold, male = grey), distribution of metastatic burden across all patients (ranging from 0 to 6 distinct metastatic sites), and a heatmap with the percentage of metastatic patients with metastases at specific metastatic sites (the entire clinical course was taken into consideration). The number in each cell indicates the frequency of patients having at least one reported metastasis at that given site. For each tumor type, the distribution of all metastasis events by 21 organ sites is shown as a stacked barplot to the right of the heatmap. For each metastatic site, the distribution of all 50 tumor types is shown as a stacked barplot below the heatmap. For each metastatic site, the number of patients having at least one metastasis is indicated in parentheses. Frequencies for sex-specific target organs (female genital, ovary and male genital) were calculated using patients of the corresponding sex. See also Table S1 and Figure S1.
We assembled a pan-cancer cohort of >25,000 patients with tumor genomic profiling and clinical information on metastatic events and outcomes, which we designate MSK-MET (Memorial Sloan Kettering - Metastatic Events and Tropisms). All samples were profiled using the MSK-IMPACT targeted sequencing platform (Cheng et al., 2015), which identifies somatic mutations, rearrangements and copy-number alterations in 341-468 cancer genes, as well as tumor mutational burden (TMB), chromosomal instability and microsatellite instability. Metastatic events were extracted from the electronic health records (EHR) and mapped to a reference set of 21 anatomic locations. We analyzed genomic differences between primary and metastatic samples and between primary tumors from metastatic and non-metastatic patients, stratified by tumor type and molecular subtypes. Our analysis identified associations between metastatic burden (defined as the number of distinct organs affected by metastases throughout a patient’s clinical course) and specific genomic features, including mutational burden, chromosomal instability, and somatic alterations in individual cancer genes. We also identified associations between genomic alterations and organ-specific patterns of metastatic dissemination and progression. The clinical
and genomic data used in our study have been made publicly available and constitute a valuable resource that will help further our understanding of metastatic disease. RESULTS Overview of the MSK-MET cohort A total of 25,775 patients were included in the present study, consisting of 15,632 (61%) primary and 10,143 (39%) metastatic specimens spanning 50 different tumor types (Figure S1A-D; Table S1). The median interval between sample acquisition and sequencing was 62 days (interquartile range (IQR) = 0-287 days). The median sequencing coverage was 653x (IQR = 525-790x) and the median tumor purity assessed by pathologists was 40% (IQR = 20-50%) (Figure S1B). The majority of sequenced samples obtained from metastatic sites were from lymph nodes (n=2305, 23%), liver (n=2289, 23%), lung (n=982, 10%), or bone (n=726, 7%). Among primary tumors, 11,741 (75%) were from patients with metastatic disease at the time of sequencing or at a later time (Figure S1D). Over the entire course of the disease, a total of 99,419 metastatic events from 21,546 metastatic patients were retrieved from the EHR and mapped to 21 organ sites. The most
Pan−cancer PanCan 25775Head and Neck Squamous HNSC 412Adenoid Cystic Carcinoma ACYC 152
Thyroid Papillary THPA 307Thyroid Poorly Differentiated THPD 131
Small Cell Lung Cancer SCLC 314Lung Adenocarcinoma LUAD 4064
Lung Squamous Cell Carcinoma LUSC 537Lung Neuroendocrine LNET 109Pleural Mesothelioma PLMESO 244
Breast Ductal HR+HER2− IDC HR+HER2− 1524Breast Ductal HR+HER2+ IDC HR+HER2+ 256Breast Ductal HR−HER2+ IDC HR−HER2+ 115
Breast Ductal Triple Negative IDC TN 341Breast Lobular HR+ ILC HR+HER2− 373
Esophageal Adenocarcinoma ESCA 542Stomach Adenocarcinoma STAD 320
Small Bowel cancer SBS 94Colorectal Hypermutated CRC HM 351
Colorectal MSS CRC MSS 3197Gastrointestinal Neuroendocrine GINET 150
Gastrointestinal Stromal GIST 403Appendiceal Adenocarcinoma APAD 202
Anal Squamous Cell ANSC 87Hepatocellular Carcinoma HCC 205
Gallbladder Cancer GBC 171Cholangio Extrahepatic CHOL extra 119Cholangio Intrahepatic CHOL intra 407
Pancreatic Adenocarcinoma PAAD 1779Pancreatic Neuroendocrine PANET 211
Renal Clear Cell CCRCC 421Upper Tract Urothelial UTUC 197
Bladder Urothelial BLCA 961Prostate Adenocarcinoma PRAD 2172Testicular Non−Seminoma TGCT non−SEM 240
Testicular Seminoma TGCT SEM 97Ovarian High−Grade Serous HGSOC 997Ovarian Low−Grade Serous LGSOC 89
Ovarian Clear Cell CCOV 97Uterine Carcinosarcoma UCEC CSRC 192
Uterine Endometrioid UCEC ENDO 567Uterine Hypermutated UCEC HM 268
Uterine Serous UCEC SEROUS 288Cervical Squamous Cell CESC 103
Melanoma Cutaneous SKCM 864Melanoma Mucosal MUCM 175
Melanoma Uveal UM 103Cutaneous Squamous Cell CSCC 105
Sarcoma Leiomyo SARC LMS 291Sarcoma Lipo SARC LS 228
Sarcoma UPS/MXF SARC UPSMFS 20318 90 0 5
748974747857616890796664637366657472898180648963757879667887788289717360899186847682797369734476829178
748974747857616890796664637366657472898180648963757879667887788289717360899186847682797369734476829178
748974747857616890796664637366657472898180648963757879667887788289717360899186847682797369734476829178
748974747857616890796664637366657472898180648963757879667887788289717360899186847682797369734476829178
748974747857616890796664637366657472898180648963757879667887788289717360899186847682797369734476829178
748974747857616890796664637366657472898180648963757879667887788289717360899186847682797369734476829178
748974747857616890796664637366657472898180648963757879667887788289717360899186847682797369734476829178
748974747857616890796664637366657472898180648963757879667887788289717360899186847682797369734476829178
748974747857616890796664637366657472898180648963757879667887788289717360899186847682797369734476829178
748974747857616890796664637366657472898180648963757879667887788289717360899186847682797369734476829178
748974747857616890796664637366657472898180648963757879667887788289717360899186847682797369734476829178
748974747857616890796664637366657472898180648963757879667887788289717360899186847682797369734476829178
748974747857616890796664637366657472898180648963757879667887788289717360899186847682797369734476829178
748974747857616890796664637366657472898180648963757879667887788289717360899186847682797369734476829178
748974747857616890796664637366657472898180648963757879667887788289717360899186847682797369734476829178
748974747857616890796664637366657472898180648963757879667887788289717360899186847682797369734476829178
748974747857616890796664637366657472898180648963757879667887788289717360899186847682797369734476829178
748974747857616890796664637366657472898180648963757879667887788289717360899186847682797369734476829178
748974747857616890796664637366657472898180648963757879667887788289717360899186847682797369734476829178
748974747857616890796664637366657472898180648963757879667887788289717360899186847682797369734476829178
748974747857616890796664637366657472898180648963757879667887788289717360899186847682797369734476829178
748974747857616890796664637366657472898180648963757879667887788289717360899186847682797369734476829178
Unsp
ecifie
d (1
5919
)
2840335150242016122344484654312427201022107105071817131435559414384863826415335345756
3634
19
375381668033504927513037404510311729104779134939192338266733526145720181329323420263351434534713569
1619429202843371636191414281410107272791045565023653133212210147514119798161231
46681622101591736953421010214102100431196224110163334358
411738112444222042165246364734494257216772654753357656766483303021152516584556402714433031428810482116
12796111466355714826222959217366731391329522831564337281353173162017021299
2966179861014117710293470582641315288242458463542222621179343592907565413671301920382484713
15210212212213214423371419623652152221327494106467266413222223124122142334810
1411211322423416194202538291611522512195114112165716550466476322012121235
33476130595048373719676049547028251261718109212912929141544282965168101715151713171728312724331634
9112410151115984151441117833351211610620165517402145422711121781046
8565112215107555494131033602311744682317445763954263135949244
4411325351122351222020221310331172071810756231467410414407
811010110111228186472621100301101403618573836312017162026241017174
1900122220245211231913919314613101231606721
74635061545456564160028104
900000112042121533423417376106125252125
2114152516923821301061
910101102140
0
229380563900011011121282215
2700894
61313383324118182132213351342770333205331311091026242327103210312
22215191642331110213100100100000002011101100011128112211
101000111081011109011002000100100111000334111102212101
Dist
ant L
N (6
049)
Lung
(792
9)Pl
eura
(352
5)
Med
iast
inum
(895
)
Live
r (87
93)
Bilia
ry tr
act (
2668
)
Intra
−Abd
. (61
59)
Bowe
l (32
12)
CNS/
Brai
n (2
921)
Bone
(719
7)PN
S (1
870)
Adre
nal G
land
(180
8)
Kidn
ey (8
74)
Blad
der/U
T (1
824)
Fem
ale
Gen
ital (
2076
)
Ova
ry (9
82)
Mal
e G
enita
l (90
8)
Skin
(132
7)He
ad a
nd N
eck
(390
)
Brea
st (3
08)
FemaleMale
Sex Met. burden
0 1 32 4 5 6+ 0 50 100%25 75
Frequency
SexAge OSMet. burdenTumor type Abbreviation N
Head and Neck
Endocrine
Thoracic
Breast
Core GI
Develop-mental GI
Genitourinary
Gynecologic
Skin
Soft Tissue
y.y.
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted June 30, 2021. ; https://doi.org/10.1101/2021.06.28.450217doi: bioRxiv preprint
Nguyen, B., et al. (2021). BioRxiv [preprint] 3
common target organ sites were lung, liver, bone, or unspecified (Figure S1E). The frequencies of organ-specific metastasis of individual tumor types were similar to previous reports (Budczies et al., 2015; Gao et al., 2019) (Figure S1F). Internal validation using 4,859 (22.5%) patients included in previous studies with available metastatic events extracted through manual chart review (Abida et al., 2017; Jones et al., 2021; Razavi et al., 2018; Shoushtari et al., 2021; Yaeger et al., 2018) revealed a high concordance and sensitivity with metastatic events extracted from the EHR (Figure S1G-H). We used this data to map patterns of metastatic dissemination from 50 tumor types to 21 metastatic organ sites (Figure 1). For the whole cohort, the median age at sequencing was 64y, ranging from a median of 33y for patients with testicular non-seminoma to a median of 70y for patients with cutaneous squamous cell carcinoma. Overall, the median follow-up time was 30 months and the five-year survival rate was 40%, ranging from 90% in testicular seminoma to 10% in pancreatic adenocarcinoma. There was a median of four metastatic events per patient, ranging from one in hypermutated colorectal cancer to eight in high-grade serous ovarian carcinoma. Metastatic patterns differed by tumor types and histological subtypes. For example, compared to lung adenocarcinoma, lung neuroendocrine cancer had a higher prevalence of liver metastasis but a lower prevalence of CNS/Brain metastasis. Similarly, lobular breast cancer had a lower prevalence of lung metastasis but a higher prevalence of bone, ovary and peritoneum metastasis, compared to ductal breast cancer as reported before (Borst and Ingold, 1993). Differences in metastatic patterns were also observed across molecular subtypes of the same tumor type. For example, and in line with a previous study (Kennecke et al., 2010), HR-/HER2+ ductal breast cancer had a higher prevalence of CNS/Brain metastasis, while the HR+/HER2- subtype had a higher prevalence of bone metastasis. Genomic differences between primary and metastatic tumors To determine sample type specific genomic differences across 50 tumor types, we compared the genomic features of primary (n=15,632) and metastatic tumors (n=10,143) (independent of the metastatic status of patients). The number of sequenced primaries was higher than the number of sequenced metastases for most tumor types, with some exceptions, such as cutaneous melanoma, high-grade serous ovarian cancer and adenoid cystic carcinoma. In 16 tumor types, metastases were significantly more chromosomally unstable, as inferred by a higher fraction of genome altered (FGA), compared to primary tumors, consistent with previous findings (Bakhoum et al., 2018; Ben-David and Amon, 2020; Hieronymus et al., 2018; Shukla et al., 2020; Stopsack et al., 2019; Watkins et al., 2020) (Figure 2A-B; Table S2). The difference in tumor purity- and ploidy-adjusted FGA (adjusted FGA) was confirmed in 11 tumor types using a subset of samples with available FACETS data (n=17,224) (Table S2). FACETS allowed us to estimate the frequency of whole-genome duplication (WGD) and assess the clonality of individual variants. As previously reported, WGD frequencies varied across tumor types (Bielski et al., 2018). In eight tumor types, we observed a significantly higher frequency of WGD in metastases compared to primary tumors (Figure 2A-B; Table S2). The higher chromosomal instability and higher frequency of WGD were particularly marked in uterine endometrioid, which can be explained by differences in the distribution of genomic subtypes within these two groups (Cancer Genome Atlas Research Network et al., 2013). Tumor mutational burden (TMB) was significantly higher in metastases from 10 tumor types, while TMB was lower only in metastases from hypermutated uterine cancer (Figure 2A-B; Table S2). Consistent with the evolutionary bottleneck
hypothesis (Birkbak and McGranahan, 2020), metastases from 12 tumor types were significantly more homogeneous, with a higher fraction of clonal mutations compared to primary tumors. We further explored the clinical significance of TMB by comparing the percentage of patients with a high TMB ( 10 mut/Mb) and observed a higher percentage of TMB-high tumors in metastases from lung adenocarcinoma, HR+/HER2- ductal, lobular breast and prostate cancer patients. In five tumor types, we detected a significantly higher proportion of any actionable mutations (OncoKB levels 1 to 3, Methods) in metastases compared to primary tumors, but these differences were not significant after adjusting for differences in FGA and TMB (Figure 2B; Table S2). Next, we investigated differences in the frequency of arm-level copy number alterations between primary tumors and metastases. Because FGA was generally higher in metastases, we used a multivariable model to adjust for FGA and found 26 statistically significant differences (Figure 2B; Table S2). For example, in pancreatic adenocarcinoma, gain of chromosome 12p gain, where the oncogene KRAS is located, was more frequent in metastases than in primary tumors (17% vs. 4%, q-value = 0.002). In HR+/HER2- ductal and lobular breast cancer, loss of chromosome 16q, a feature of low-grade breast cancer (Natrajan et al., 2009), was more frequent in primary tumors than in metastases (41% vs. 30%, q-value = 1.56E-07 and 68% vs. 56%, q-value = 0.002, respectively). Finally, we investigated the frequency of recurrent oncogenic alterations between primary tumors and metastases and identified a total of 67 statistically significant differences across 17 tumor types. We also investigated the frequency of oncogenic pathways and identified 47 statistically significant differences across 11 tumor types (Figure 2C; Table S2). Amongst the statistically significant alterations, 53 were more frequent in metastases, while only 14 alterations were more frequent in primary tumors. TP53 mutations were the most commonly observed significant alterations and were more frequent in metastases in 7 tumor types (lung adenocarcinoma, prostate adenocarcinoma, HR+/HER2- ductal breast, colorectal MSS, lobular breast cancer, pancreatic neuroendocrine and uterine endometrioid). A possible explanation is that TP53 mutation is a later event in some of these tumor types; in others, it may simply be a hallmark of more aggressive disease. The notable exception was head and neck cancer, where TP53 mutations were more frequent in primary tumors. Other genomic alterations that were most often enriched in metastases included CDKN2A deletion (significant in 5 tumor types), PTEN mutations and deletion (4 tumor types) and MYC amplification (4 tumor types). The most common significantly enriched oncogenic pathways in metastases were p53, Cell Cycle and DNA damage repair. The most significant differences were observed for alterations known to be associated with resistance to hormonal therapy in hormone-sensitive tumors. For example, AR amplification and AR mutations were significantly more frequent in prostate cancer metastases (1% vs. 30% and 0% vs. 6%, q-value < 0.05), and ESR1 mutations were more frequent in HR+/HER2- ductal breast cancer (2% vs. 19%, q-value < 0.05), lobular breast cancer (2% vs. 13%, q-value < 0.05), and endometrioid uterine cancer metastases (3% vs. 10%, q-value < 0.05). These differences can likely be attributed to positive selection due to therapy since most patients with prostate cancer and ER+ breast cancer receive hormone therapy. TERT mutations were more frequent in metastases from papillary thyroid cancer and cutaneous melanoma patients (46% vs. 69% and 70% vs. 81%, q-value < 0.05), but higher in primary tumors from head and neck squamous cell carcinoma patients (41% vs. 25%, q-value < 0.05). ALK fusions, a predictive biomarker for the use of ALK inhibitors, were slightly more frequent in lung adenocarcinoma metastases (3% vs. 6%, q-value < 0.05). KRAS mutations were
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted June 30, 2021. ; https://doi.org/10.1101/2021.06.28.450217doi: bioRxiv preprint
Nguyen, B., et al. (2021). BioRxiv [preprint] 4
Figure 2. Genomic differences between primary tumors and metastases. (A) Comparisons of the median fraction genome altered (FGA), median whole-genome duplication (WGD) frequency, median tumor mutation burden (TMB), and median clonal fraction for each tumor type in metastatic vs. primary tumors. Tumor types with statistically significant differences are labeled. For TMB both axes were limited to 10mut/Mb. (B) The following clinical and genomic features are shown side-by-side for primary (top row within each cancer type) and metastatic (bottom row) sequenced samples using a combination of barplots and violin plots; from left to right: sample counts, FGA, fraction of samples with WGD, TMB, clonality, fraction of samples with high TMB, and distribution of the highest actionable alteration levels. The black vertical line in each violin plots represents the median. The heatmap shows the frequency of individual arm level alterations in primary tumors and metastases (only the frequency of the more frequent event, gain or loss, is shown). Tumor types are ordered from top to bottom by decreasing FGA in metastasis and grouped by organ systems. * indicates q-value < 0.05. WGD and clonality were available for a subset of 17,224 samples with FACETS data. (C) Statistically significant differences in the frequency of oncogenic alterations and pathways between primary tumors and metastases in individual tumor types. Triangles summarize oncogenic alteration frequencies in primary tumors vs. metastases and are colored according to alteration type. Gene names in italics refer to specific genes, those in regular font refer to pathways. See also Table S2.
9
467
71
319
3810
192832
2863
4162
621
11
16
617
13
1374
378
161
7
58
202527
210
111
201
24
8
529
80234446
1117
77
677
21232630
26
699
111824
42
45
891011
192125
111223
57
21
4468810
2223
51333455568
1036
41
20
693
81
1032
5324
38
6
4247
2545
1750
66
38
1332
611
2448
142123
3610
20
1012
2839
326
20
73234
61311
20
925
847
58668
813
81
111013
30373843
192
1215
619
2832
36
715
2622182328
4245
86
30766
118
2236
789101213
3532
676587911
88
177
2957
0 1 5 10 20 50 100%
CDKN2A
TERT
GRIN2ATERT
PTENSETD2
Cell CycleCDKN2A
RASA1POLE
ATM
PI3Kp53
TERTTP53
MYCCell Cycle
PTENTERT
CDKN2A
MYCp53
ERBB2ESR1TP53
PTEN
DDRRTK/RAS
p53Cell Cycle
KRASTP53
DDRPI3K
EpigeneticCell Cycle
TGF-βSMAD4
CDKN2A
TGF-βp53
RTK/RASMDM4ESR1
ERBB2TP53
MYCPI3Kp53
NKX3-1SMAD4
MYCDNMT3B
SRCBCL2L1FBXW7PIK3CA
TP53
DDRNOTCH
MYCEpigeneticRTK/RASCell Cycle
p53ESR1CBFBMYCPAK1
MAP3K1FGFR1CCND1
TP53PIK3CA
NOTCHMYC
Cell CycleWNT
RTK/RASDDR
EpigeneticPI3Kp53AR
NKX3-1AR
RB1CDKN1BCTNNB1
MYCAPC
PTENTP53
TGF-βNOTCH
WNTNRF2MYCDDR
Cell CycleEpigenetic
p53ALK
SMARCA4FOXA1
MYCEGFR
NKX2-1TERT
KEAP1CDKN2A
RBM10KRASTP53
Esophageal AdenocarcinomaThyroid Papillary
Melanoma Cutaneous
Renal Clear Cell
Lung Squamous Cell Carcinoma
Uterine Hypermutated
Head and Neck Squamous
Gastrointestinal Stromal
Uterine Endometrioid
Pancreatic Neuroendocrine
Pancreatic Adenocarcinoma
Breast Lobular HR+
Colorectal MSS
Breast Ductal HR+HER2−
Prostate Adenocarcinoma
Lung Adenocarcinoma2
0 1 5 10 20 50 100%2MutationAmplificationDeletionFusion
Higher in met.
Lower in met.Pathway
Alteration frequency
Enriched inprimary
Enriched inmetastasis
q-value < 0.05ns.
THPA
HNSC
SCLC
ILC HR+HER2-
IDC HR+HER2+LUSC
UCEC HM
ESCA
SKCMGIST
CRC MSSIDC HR+HER2-
UCEC ENDO
PAAD
LUADPRAD
0 10 20 30 40 50
0
10
20
30
40
50
CCRCCSKCM
UCEC ENDO
IDC HR+HER2-
PAAD
CRC MSSLUAD
PRAD
0 20 40 60 80
0
20
40
60
80
CCRCC
THPA
TGCT non-SEM
IDC HR+HER2+
GIST
PAAD
ILC HR+HER2-
LUAD
IDC HR+HER2-PRAD
0 2 4 6 8 10
0
2
4
6
8
10IDC TN
ESCA
LUSC
UCEC HM
SKCM
SCLC
CRC MSS
IDC HR+HER2- PAADPRAD
UCEC ENDO
LUAD
50 60 70 80 90 100
50
60
70
80
90
100
Met
asta
sis
med
ian
%
Met
asta
sis
med
ian
%
Met
asta
sis
med
ian
mut
/Mb
Met
asta
sis
med
ian
%
Primary median % Primary median % Primary median mut/Mb Primary median %
Fraction of genome altered Whole genome duplication Tumor mutational burden Clonal fractionA
B
C
179128597210250215197146275198763
4515236611181228601312
11488403111102103658717935521107
2090133270122420881407412916113051546934683
1815711886854574963117081071
3716811299163210583941
506510914714319862206125442
3661573237669319564735063129332111585
24791443935851162152
Thyroid Papillary (307)Thyroid Poorly Differentiated (131)
Adenoid Cystic Carcinoma (152)Head and Neck Squamous (412)
Renal Clear Cell (421)Bladder Urothelial (961)
Upper Tract Urothelial (197)Testicular Seminoma (97)
Testicular Non−Seminoma (240)Prostate Adenocarcinoma (2172)
Appendiceal Adenocarcinoma (202)Colorectal Hypermutated (351)Stomach Adenocarinoma (320)
Small Bowel cancer (94)Gastrointestinal Neuroendocrine (150)
Anal Squamous Cell (87)Colorectal MSS (3197)
Gastrointestinal Stromal (403)Esophageal Adenocarinoma (542)
Sarcoma Lipo (228)Sarcoma UPS/MXF (203)
Sarcoma Leiomyo (291)Cutaneous Squamous Cell (105)
Melanoma Uveal (103)Melanoma Cutaneous (864)
Melanoma Mucosal (175)Gallbladder Cancer (171)
Cholangio Extrahepatic (119)Cholangio Intrahepatic (407)
Pancreatic Adenocarcinoma (1779)Hepatocellular Carcinoma (205)Pancreatic Neuroendocrine (211)
Breast Lobular HR+ (373)Breast Ductal HR+HER2− (1524)
Breast Ductal HR−HER2+ (115)Breast Ductal HR+HER2+ (256)
Breast Ductal Triple Negative (341)Uterine Hypermutated (268)Uterine Endometrioid (567)
Ovarian Clear Cell (97)Ovarian Low−Grade Serous (89)
Cervical Squamous Cell (103)Uterine Serous (288)
Ovarian High−Grade Serous (997)Uterine Carcinosarcoma (192)
Pleural Mesothelioma (244)Lung Adenocarcinoma (4064)
Lung Squamous Cell Carcinoma (537)Lung Neuroendocrine (109)
Small Cell Lung Cancer (314)
*
*
*
***
*
*
**
*
**
**
*
*
*
*
*
*
*
*
*
*
*
**
*
*
**
*
*
*
*
*
*
*
*
*
***
**
*
*
**
*
*
*
*
*
* *
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
* *
**
*
*
*
*
% Fraction mut/Mb % Fraction Fraction#
L1 L2 L3A
L3B
L4 Onc
VUS
None
Actionabl.
50 40 30 20 10 10 20 30 40 50
%loss %gainq-value < 0.05*
0 2479 0 100 0 1 0 10 100 0 100 0 1 0 1 p q p q p q p q p q p q p q p q p q p q p q p q q q q p q p q p q p q p q q q
No. patients FGA WGD TMB Clonality TMBhiActionability Arm−Level Copy−Number Alterations0 2479 0 100 0 1 0 10 100 0 100 0 1 0 1 1 2 3 4 5 6 7 8 9 10 11 12 13 1516 17 18 19 20 2214 21
Met.Prim.
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted June 30, 2021. ; https://doi.org/10.1101/2021.06.28.450217doi: bioRxiv preprint
Nguyen, B., et al. (2021). BioRxiv [preprint] 5
more frequent in metastases from pancreatic neuroendocrine patients (1% vs. 10%, q-value < 0.05) as was the overall frequency of RTK/RAS pathway alteration in this tumor type (6% vs. 21%, q-value < 0.05). While KRAS mutations are a hallmark of pancreatic adenocarcinoma, this could suggest the existence of a transdifferentiation mechanism from neuroendocrine to an adenocarcinoma phenotype during metastatic progression. Collectively, these data indicate that metastases have higher chromosomal instability across many tumor types and that mutations in a multitude of driver alterations occur at different frequencies in primary and metastatic tumors. Genomic differences between primary samples from metastatic and non-metastatic patients Many of the primary tumors included in the previous analysis were from patients with metastatic disease. To identify genomic determinants of metastatic disease present in primary tumors we compared the genomic features of primary tumors from metastatic patients (n=11,993) to primary tumors from non-metastatic patients (n=3,669). The median follow-up time for these two groups was 33 months and 27 months, respectively. In 10 tumor types, FGA was significantly higher in primary tumors from metastatic patients as compared to primary tumors from patients without metastases. Compared to non-metastatic patients, TMB was significantly higher in six tumor types but lower in head and neck squamous cell carcinoma (Figure S2A-B; Table S3). When interrogating the frequencies of recurrent oncogenic alterations, we identified statistically significant frequency differences in 32 genes across 12 tumor types and 21 oncogenic pathways across 9 tumor types (Figure S2C; Table S3), with most observed at higher frequencies in primary tumors from metastatic patients. Compared to non-metastatic patients, TP53 mutations were significantly more frequent in metastatic patients with lung adenocarcinoma, HR+/HER2- ductal breast cancer, lobular breast cancer, urothelial bladder cancer, prostate adenocarcinoma, and endometrioid uterine cancer. TERT promoter mutations were more frequent in metastatic patients with papillary thyroid cancer. The frequency of MYC amplification was significantly higher in metastatic patients with prostate adenocarcinoma, microsatellite stable (MSS) colorectal cancer, and TN ductal breast cancer. On the other hand, SPOP mutations were less frequent in primary tumors from metastatic prostate adenocarcinoma patients, PIK3CA mutations were less frequent in the primary tumors of HR+/HER2- ductal breast cancer metastatic patients, and CDKN2A mutations were less frequent in the primary tumors of pancreatic cancer metastatic patients. These findings support the hypothesis that a higher chromosomal instability is associated with metastatic progression in multiple tumor types and that several individual driver mutations might inform metastatic risk. Only a few of these, such as SPOP mutations in prostate cancer, which has been previously reported to be more frequent in primary tumors (Armenia et al., 2018), are associated with decreased metastatic potential. Genomic features associated with metastatic burden To explore the genomic determinants of metastatic burden, we analyzed the relationship between genomic alterations and the number of metastatic sites per patient (n=21,546). Not surprisingly, a higher metastatic burden was significantly associated with shorter overall survival in most (39/50, 78%) tumor types (Table S4). We observed that chromosomal instability, as inferred by FGA, was positively correlated with metastatic burden on a pan-cancer level and in 11 individual tumor types. TMB, on the other hand, was not associated with metastatic burden on a pan-cancer level; it was positively correlated with metastatic burden in four tumor types, and negatively associated with metastatic burden in endometrioid and hypermutated uterine cancer (Figure 3A-B;
Table S5). One of the strongest correlations between FGA and metastatic burden was observed in prostate cancer (rho = 0.33, q-value = 7.0E-45), which is in line with previous studies (Hieronymus et al., 2018; Taylor et al., 2010). Conversely, we did not observe such association in many tumor types, including MSS colorectal cancer, where chromosomal instability is already high in patients with low metastatic burden (Figure 3B). Next, we investigated the association between recurrent oncogenic alterations and metastatic burden and identified a total of 24 statistically significant associations across 8 tumor types. We also investigated the association with oncogenic pathways and identified 16 statistically significant differences across 7 tumor types (Figure 3C; Table S5). Consistent with its role as a gatekeeper against chromosomal instability (Bieging et al., 2014), we observed a significant positive correlation between TP53 mutations and metastatic burden in prostate adenocarcinoma, lung adenocarcinoma and HR+/HER2- ductal breast cancer. There was also a significant positive correlation between p53 pathway alterations and metastatic burden in endometrioid uterine cancer. In metastatic prostate adenocarcinoma, AR amplification frequency was positively associated with metastatic burden. The frequency of ESR1 mutations increased with metastatic burden in HR+/HER2- ductal and lobular breast cancer. CDKN2A deletion frequency was positively correlated with metastatic burden in bladder urothelial cancer, lung adenocarcinoma and papillary thyroid cancer, while MYC amplification frequency was associated with increasing metastatic burden in lung adenocarcinoma and prostate adenocarcinoma. Of note, the frequency of four oncogenic alterations and one oncogenic pathway were negatively correlated with metastatic burden; FOXA1 in prostate adenocarcinoma, CBFB in HR+/HER2- ductal breast cancer, CDH1 in lobular breast cancer, ERCC2 in urothelial bladder cancer and the epigenetic pathway in MSS colorectal cancer (Figure 3C; Table S5). These results demonstrate that the relationship between higher chromosomal instability and increasing metastatic burden is tumor lineage dependent and that several driver mutations are associated with metastatic burden in both directions. Genomic differences of metastases according to their organ location Next, we investigated the genomic characteristics of metastases (n=10,143) according to their organ location. As expected, the location of the sequenced metastases differed by tumor type (Figure S3A). We found 17 significant associations between FGA and the metastatic site in six tumor types, 10 of which were also significant when using adjusted FGA (Figure S3B; Table S6). CNS/Brain metastases from patients with lung adenocarcinoma, MSS colorectal cancer and cutaneous melanoma had a significantly higher FGA, while lymph node metastases from patients with lung adenocarcinoma, pancreatic adenocarcinoma, bladder urothelial and cutaneous melanoma had a significantly lower FGA. There were seven significant associations between TMB and the metastatic site. A total of 31 genomic alterations in nine tumor types were significantly associated with specific metastatic sites and 25 oncogenic pathways across six tumor types (Figure S3C; Table S6). TP53 mutations were significantly more frequent in CNS/Brain metastasis from lung adenocarcinoma and liver metastasis from pancreatic adenocarcinoma, but less frequent in liver metastasis from urothelial bladder cancer and neuroendocrine lung cancer, as well as in intra-abdominal metastasis from pancreatic adenocarcinoma. In HR+/HER2- ductal breast cancer ESR1 mutations were significantly more frequent in liver metastasis. In lobular breast cancer, RHOA mutations were significantly more frequent in ovarian metastasis and FOXA1 mutations were enriched in liver metastasis. In lung adenocarcinoma, CDKN2A deletion was more frequent in skin and
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted June 30, 2021. ; https://doi.org/10.1101/2021.06.28.450217doi: bioRxiv preprint
Nguyen, B., et al. (2021). BioRxiv [preprint] 6
Figure 3. Genomic features associated with metastatic burden. (A) Spearman’s correlation coefficient between FGA (circle) and TMB (diamond) with metastatic burden. Associations without a significant trend are shown in grey, and the lines indicate 95% CI. (B) Correlation between FGA and TMB with metastatic burden in the entire data set, prostate adenocarcinoma, hypermutated uterine cancer, and MSS colorectal cancer. Boxplots display median point, IQR boxes and 1.5 IQR whiskers for all samples. Split violin plots show the distribution of FGA and TMB in primary tumors (left, not filled) and metastases (right, filled). (C) Statistically significant oncogenic alterations and pathways associated with metastatic burden in individual tumor types. Spearman’s correlation coefficient is shown for each event, and the lines indicate 95% CI. Gene names in italics refer to specific genes, those in regular font refer to pathways. See also Table S4 and S5.
liver metastases but less frequent in lymph nodes. Similarly, in urothelial bladder cancer, CDKN2A deletion was more frequent in lung metastases but less frequent in lymph nodes. PTEN mutations, as well as PI3K pathway alterations, were higher in brain metastases from melanoma, which is in line with a previous melanoma-specific study (Bucheit et al., 2014). Among others, we found that ERG fusions were less frequent in bone metastasis of prostate cancer patients, NF1 mutations were more frequent in lung metastasis of melanoma patients and that FGFR3 mutations were more frequent in lung metastasis of bladder urothelial patients. Taken together, our results show that metastases from different organs can have different genomic makeup.
Genomic features associated with metastasis to specific target organs We analyzed the relationship between genomic features of metastatic patients and their organ-specific patterns of metastasis (n=21,546). We found 13 significant associations between FGA and organotropisms in 11 tumor types, seven of which were also significant when using adjusted FGA (Table S7). We observed a significant positive association between FGA and patients with liver metastasis in four tumor types (HR+/HER2- ductal breast, prostate adenocarcinoma, pancreatic adenocarcinoma and head and neck squamous), patients with lung metastasis in two tumor types (endometrioid uterine and cutaneous melanoma), and bonemetastasis in two tumor types (HR+/HER2- ductal breast and
ARTP53
PTENMYC
ARPTEN
FOXA1p53
PI3KCell Cycle
DDRMYC
EGFRTP53
CDKN2AMYCp53
Cell Cycle
ESR1TP53CBFB
Cell CycleEpigenetic
PI3K
KRASSMAD4
RTK/RASTGF-β
Epigenetic
CDKN2ATERT
RBM10Cell Cycle
CCND1/FGF19Cell Cycle
ESR1CDH1
CDKN2AERCC2
p53
-0.3 -0.1 0 0.1 0.2 0.3
Uterine Endometrioid
Bladder Urothelial
Breast Lobular HR+
Breast Ductal HR+HER2+
Thyroid Papillary
Colorectal MSS
Breast Ductal HR+HER2−
Lung Adenocarcinoma
Prostate Adenocarcinoma-0.2
0
20
40
60
80
100
0125
102050
100200500
0
20
40
60
80
100
0125
102050
100200500
0
20
40
60
80
100
0125
102050
100200500
0
20
40
60
80
100
1 2 3 4 5 6+0125
102050
100200500
1 2 3 4 5 6+
-0.6 -0.4 -0.2 0 0.2 0.4 0.6Spearman’s correlation
Gallbladder Cancer (153)Hepatocellular Carcinoma (150)
Cholangio Extrahepatic (112)Pancreatic Adenocarcinoma (1669)
Cholangio Intrahepatic (356)Pancreatic Neuroendocrine (189)
Sarcoma Leiomyo (258)Sarcoma UPS/MXF (169)
Sarcoma Lipo (196)Small Cell Lung Cancer (272)
Lung Neuroendocrine (100)Pleural Mesothelioma (195)
Lung Squamous Cell Carcinoma (435)Lung Adenocarcinoma (3349)
Cutaneous Squamous Cell (50)Melanoma Mucosal (134)
Melanoma Cutaneous (716)Melanoma Uveal (77)
Breast Ductal HR−HER2+ (94)Breast Ductal Triple Negative (259)
Breast Lobular HR+ (285)Breast Ductal HR+HER2− (1151)
Breast Ductal HR+HER2+ (207)Head and Neck Squamous (378)Adenoid Cystic Carcinoma (140)
Thyroid Poorly Differentiated (123)Thyroid Papillary (261)
Small Bowel cancer (86)Anal Squamous Cell (70)
Appendiceal Adenocarcinoma (189)Colorectal MSS (2938)
Stomach Adenocarcinoma (292)Colorectal Hypermutated (285)
Gastrointestinal Neuroendocrine (131)Esophageal Adenocarcinoma (455)
Gastrointestinal Stromal (229)Uterine Carcinosarcoma (174)
Ovarian High−Grade Serous (949)Ovarian Clear Cell (80)
Uterine Serous (243)Ovarian Low−Grade Serous (86)
Cervical Squamous Cell (88)Uterine Hypermutated (171)Uterine Endometrioid (307)
Bladder Urothelial (714)Renal Clear Cell (352)
Testicular Non−Seminoma (220)Upper Tract Urothelial (159)
Prostate Adenocarcinoma (1762)Testicular Seminoma (88)
PanCan (21546)
Higher withincreasing met. burden
Lower withincreasing met. burden
FGATMBq-value < 0.05
PanCan
Prostate Adenocarcinoma
Uterine Hypermutated
Colorectal MSS
Metastatic burden Metastatic burden
Prim. Met.
FGA (%) TMB (mut/Mb)
Spearman’s correlation
Higher withincreasing met. burden
Lower withincreasing met. burden
MutationAmplificationDeletionPathway
B
C
A
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted June 30, 2021. ; https://doi.org/10.1101/2021.06.28.450217doi: bioRxiv preprint
Nguyen, B., et al. (2021). BioRxiv [preprint] 7
Figure 4. Genomic features associated with metastasis to specific target organs. (A) Statistically significant oncogenic alterations and pathways associated with organ-specific patterns of metastatic spread. Gene names in italics refer to specific genes, those in regular font refer to pathways. (B) Schematic drawing summarizing the main findings from (A). See also Table S7.
prostate adenocarcinoma). For TMB, we found eight significant associations between TMB and organ-specific patterns of metastatic in six tumor types, four positives (lung adenocarcinoma to brain and adrenal gland, pancreatic adenocarcinoma to liver, head and neck squamous to head and neck) and four negatives (prostate adenocarcinoma to bone, cutaneous melanoma to intra-abdominal, lung adenocarcinoma to pleura and lung neuroendocrine to liver). We found 57 significant recurrent oncogenic alterations associated with specific patterns of metastasis in 10 tumor types. When interrogating oncogenic pathway alterations, we found 48 significant associations in 12 tumor types (Figure 4A; Table S7). Lung adenocarcinoma, MSS colorectal cancer, and prostate cancer were associated with the highest number of significant associations. These results are summarized in Figure 4B. For example, lung adenocarcinoma patients with CNS/Brain metastasis had a higher frequency of TP53 mutations, TERT amplification, and EGFR mutations, but a lower frequency of RBM10 mutations. MSS colorectal cancer patients with lung metastasis had a higher frequency of KRAS mutations which was previously reported (Cejas et al., 2009; Pereira et al., 2015; Tie et al., 2011) but a lower frequency of SRC amplification. Prostate cancer patients with bone metastasis had a higher frequency of AR amplification and PTEN deletion but a lower frequency of ERG fusion; those with liver metastasis had a higher frequency of PTEN loss, RB1 loss and APC mutations; those with brain metastasis had a higher frequency of AR amplification and NOTCH pathway alterations; and those with lung metastasis had a higher frequency of APC mutations and CTNNB1 mutations. Experimental work has revealed the role of WNT pathway activation in driving prostate cancer metastasis (Leibold et al., 2020) and discovered a vulnerability to tankyrase inhibition in WNT altered prostate cancer. When interrogating the association between oncogenic pathways and organotropisms, we found that 26% of prostate cancer patients with lung metastasis had WNT pathway
alterations, compared to 13% of patients without lung metastasis (Figure 4A). As previously reported (Gerratana et al., 2020), ESR1 mutations were more frequent in HR+/HER2- ductal breast cancer patients with liver metastasis (16% vs. 5%). CBFB mutations were less frequent in HR+/HER2- ductal breast cancer patients with bone metastasis, which was demonstrated in a mouse model (Ran et al., 2020) while alterations in the PI3K pathway were more frequent in patients with bone metastasis. HR+/HER2- ductal breast cancer patients with brain metastasis had a lower frequency of MAP3K1 mutations, which were recently shown to be a surrogate for the less aggressive luminal A breast cancer subtype (Nixon et al., 2019). In line with a previous study (Bucheit et al., 2014), PTEN mutations were more frequent in cutaneous melanoma patients with brain metastases, while TP53 mutations were less frequent in those patients. Thyroid papillary cancer patients with bone metastasis had a higher frequency of BRAF mutations, and esophageal cancer patients with lung metastasis had a higher frequency of ERBB2 amplification. In sum, while we did not observe gene or pathway alterations associated with specific routes of metastatic spread shared across different tumor types, our analysis revealed specific genomic alterations linked to specific organotropisms in individual tumor types. DISCUSSION We present MSK-MET, a unique, curated cohort of cancer patients with available genomic sequencing data and clinical information on metastatic disease and cancer outcome. Our study expands a previous pan-cancer dataset (Zehir et al., 2017) by including a larger number of patients with longer follow-up and by including a comprehensive description of metastatic events at the patient level. We demonstrate that mining of electronic health records can be used to extract relevant clinical information, and we present a pan-cancer map of metastasis in a contemporary cohort of patients treated at a single tertiary referral center.
↑KRAS mut ↑ESR1 mut
↓CBFB mut↑PI3K
↑AR amp↑PTEN del↓ERG fusion
↑PTEN del↑RB1 del
↑APC mut
↑APC mut↑CTNNB1 mut
↑WNT
↓MAP3K1 mut
↑AR amp↑NOTCH
↓BRAF mut
↑ERBB2amp
↓TP53 mut↑PTEN mut↑TP53 mut
↑TERT amp↑EGFR mut
↓RBM10 mut
583
636768
1719
8377
8071
542394444
43
97
80
2526272726
96
119
565659
312526
54
115
28242426
356
94
1713
54749
976
757886
2628
7983
7279
950525361
76
143
74
333539
3235
1511
713
666870
223136
36
209
22313336
2710
6810
218
5861
MYCp53
RTK/RAS
TGF-β
WNT
APCBRAF
KRAS
MYCSMAD4
SRCTP53
Cell Cycle
EpigeneticMYC
NOTCHNRF2
p53
PI3K
ALKCDKN2A
EGFR
KRASMDM2
RBM10SMARCA4
STK11TERTTP53
Intra−AbdBowelLungBone
CNS/BrainLiver
Intra−AbdIntra−Abd
LiverIntra−Abd
LiverIntra−AbdIntra−Abd
LungBone
CNS/BrainIntra−AbdIntra−Abd
LiverLungBowel
BoneCNS/Brain
PNSDistant LN
Adrenal GlandAdrenal Gland
Intra−AbdPleura
Adrenal GlandBone
CNS/BrainAdrenal Gland
PleuraCNS/Brain
Adrenal GlandLungLiverPNSLung
Adrenal GlandBone
CNS/BrainPNS
PleuraPNS
CNS/BrainAdrenal Gland
PleuraAdrenal Gland
CNS/BrainCNS/Brain
Adrenal Gland
Colorectal MSS
Lung Adenocarcinoma
10
27
71
37
5316
976
347
28
106463
2317
367
62
95
66
8344450
65
98
3526
151621
530
1313
55
175
1413
329
2912
3
53
61
31
73
7237
3552
2114
17
24240
3426
682
125
172
80
13465663
116
312
5136
313230
1249
1926
1112
1121
3435
824
519
3010
RB1
HIPPO
PI3K
Epigenetic
RTK/RASERBB2
EpigeneticBRAF
p53PTENTP53
MYCp53
TP53
Cell CycleEpigenetic
MYCp53
PI3KAKT2
CDKN2AMYCTP53
MYCp53
PI3KCBFBESR1
MAP3K1MYC
PIK3CATP53
Cell CycleEpigenetic
NOTCHPI3K
RTK/RASWNTAPC
AR
CTNNB1ERG
NKX3-1PTEN
RB1
Bowel
Bone
Liver
Pleura
LungLung
PNSBone
CNS/BrainCNS/BrainCNS/Brain
LungDistant LNDistant LN
LiverLiverLiverLiverLiverLiverLiver
Intra−AbdLiver
LiverDistant LN
BonePNS
BoneLiver
CNS/BrainDistant LN
PNSDistant LN
LiverBowel
PNSCNS/Brain
LiverLiverLungLiverLung
Male GenitalBone
CNS/BrainIntra−Abd
LungBonePNS
BoneLiverLiver
Sarcoma UPS/MXF
Pleural Mesothelioma
Lung Neuroendocrine
Adenoid Cystic Carcinoma
Esophageal Adenocarinoma
Thyroid Papillary
Melanoma Cutaneous
Head and Neck Squamous
Pancreatic Adenocarcinoma
Breast Ductal HR+HER2−
Prostate Adenocarcinoma0 1 2 5 10 20 50 100%
Alteration frequency0 1 2 5 10 20 50 100%
Alteration frequencyTargetOrganAlteration
TargetOrganAlteration
MutationAmplificationDeletionFusionPathwayHigher in pts w/ TO
Lower in pts w/ TO
A B
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted June 30, 2021. ; https://doi.org/10.1101/2021.06.28.450217doi: bioRxiv preprint
Nguyen, B., et al. (2021). BioRxiv [preprint] 8
Our analysis of genomic alterations from unpaired primary and metastatic samples revealed that metastases generally had a higher level of chromosomal instability, along with a higher frequency of WGD and TP53 mutations. These results are consistent with previous studies that have shown an association between chromosomal instability and cancer progression (Bakhoum et al., 2018; Ben-David and Amon, 2020; Hieronymus et al., 2018; Shukla et al., 2020; Stopsack et al., 2019; Watkins et al., 2020). Our results also suggest that metastases generally have a higher fraction of clonal mutations. This lower intra-tumor heterogeneity could be attributed to clonal selection and selective pressure from cancer therapy (Birkbak and McGranahan, 2020). We also identified several genomic alterations and signaling pathways enriched in metastatic samples. As described before (Hu et al., 2020; Pareja et al., 2020; Razavi et al., 2018), the most significant enrichments were associated with known drug resistance mechanisms (e.g., AR alterations in prostate cancer, and ESR1 mutations in breast cancer). We also compared primary tumor samples from metastatic and non-metastatic patients. In several tumor types, we observed a higher chromosomal instability and a higher frequency of TP53 mutations amongst other drivers in primary samples from metastatic patients whereas the clonal fraction was generally similar. This highlights the importance of chromosomal instability in governing cancer progression and metastasis and shows that clonal selection is rather a hallmark of metastasis. In an analysis aimed at identifying genomic alterations associated with metastatic burden, we found that higher chromosomal instability was correlated with metastatic burden in several tumor types. This association, however, was absent in many tumor types, including colorectal cancer, where copy-number alteration patterns may be established early in tumor development. Several mechanisms can explain the pro-metastatic effects of chromosomal instability and have been reviewed before (Ben-David and Amon, 2020). It is believed that chromosomal instability can promote tumor progression by increasing subclonal diversity and tumor evolution (Watkins et al., 2020), but aneuploidy itself is not a universal promoter of transformation and recent studies suggest that aneuploidy is cancer-type-specific (Ben-David and Amon, 2020), which is in line with our observations. Beyond global chromosomal instability, we also identified several specific genomic alterations and signaling pathways associated with metastatic burden. The majority, including alterations associated with drug resistance, were enriched in samples from patients with higher metastatic burden. Few were associated with lower metastatic burden, including FOXA1 mutations in prostate cancer and CBFB mutations in breast cancer. Lastly, we investigated associations between genomic alterations and specific routes of metastatic dissemination. We compared independent metastatic samples according to their organ sites. This revealed that the genomic landscape of metastasis differed according to their target organs. Previous studies have also interrogated the differences between primary tumors and metastatic sites using either independent samples (Armenia et al., 2018; Priestley et al., 2019; Robinson et al., 2017; Shih et al., 2020) or paired samples (Brastianos et al., 2015; Brown et al., 2017; Eckert et al., 2016; Hu et al., 2020; Jiménez-Sánchez et al., 2017; Makohon-Moore et al., 2017; Naxerova et al., 2017; Noorani et al., 2020; Reiter et al., 2020). Clinical data extraction from the EHR allowed us to explore the genomic alterations of metastatic patients by taking into consideration a greater part of the metastatic events occurring in a patient's clinical course. We have generated a variety of hypotheses linking specific genomic alterations to specific organotropisms occurring in a cancer-specific manner. Future functional characterization of these
alterations could result in the identification of novel biomarkers and therapeutic approaches for metastatic cancer. Our study has several limitations. Firstly, while the overall cohort is large, sample size varied significantly between tumor types, which prevented us from drawing robust conclusions in less common tumor types. Secondly, the ICD billing codes used in our study likely do not fully capture all metastatic events and may be affected by inter-physician variability. Future improvements to the clinical data extraction process could come from the use of natural-language processing and machine learning approaches, which will be required to mine the wealth of data contained in EHR systems at scale. Despite these limitations, metastatic patterns observed across tumor types were consistent with previous reports (Budczies et al., 2015; Gao et al., 2019) and manual chart review of a subset of cases. Our study validated previous findings from others and was able to generate a vast array of new associations in several tumor types. The next challenge will be to prioritize the most clinically useful candidates to identify and validate prognostic and predictive biomarkers that will have the potential to influence the clinical management of patients. Although this study represents a first step towards understanding how genomic alterations shape tumor progression, metastatic burden and organotropisms, more integrated studies are needed to fully investigate the impact of tumor cell-extrinsic effects, such as cancer therapy, target organ microenvironment and systemic factors. These studies will require comprehensive clinical timelines with accurate information about all lines of therapy and metastatic events. Additionally, single-cell profiling methods will be required to fully understand the cross-talk between tumor cells and the metastatic niche. Finally, our systematic study highlights the importance of chromosomal instability in progression and metastasis and drugs targeting this hallmark could represent an attractive strategy in several tumor types. MSK-MET will be publicly available via the cBioPortal for Cancer Genomics (Cerami et al., 2012; Gao et al., 2013) and will provide a valuable resource for the community and stimulate further research and applications in cancer care. ACKNOWLEDGMENTS This study was supported by the MSK Cancer Center Support Grant (P30 CA008748), the MSK MIND initiative, Cycle for Survival, the Alan and Sandra Gerry Metastasis and Tumor Ecosystems Center, The Fund for Innovation in Cancer Informatics (ICI), the Robertson Foundation, a Prostate Cancer Foundation Challenge Award, and the Marie-Josée and Henry R. Kravis Center for Molecular Oncology. DECLARATION OF INTERESTS S.C. receives consulting fees from Novartis, Lilly, Sanofi and research funding from Daiichi-Sankyo and Paige.ai. J.J.H. receives consulting fees from Bristol Myers Squibb, Merck, Exelexis, Eisai, QED, Cytomx, Zymeworks, Adaptiimmune, ImVax and research funding from Bristol Myers Squibb. C.I.D. receives research funding from Bristol Myers Squibb. P.R. received consultation/Ad board/Honoraria from Novartis, Foundation Medicine, AstraZeneca, Epic Sciences, Inivata, Natera, and Tempus and institutional grant/funding from Grail, Illumina, Novartis, Epic Sciences, ArcherDx. C.M.R. has consulted regarding oncology drug development with AbbVie, Amgen, Astra Zeneca, Epizyme, Genentech/Roche, Ipsen, Jazz, Lilly, and Syros, and serves on the scientific advisory boards of Bridge Medicines, Earli, and Harpoon Therapeutics. D.Z. receives research funding from Astra Zeneca, Plexxikon, and Genentech and consulting fees from Merck, Synlogic Therapeutics, GSK, Bristol Myers Squibb, Genentech, Xencor, Memgen, Immunos, Agenus, Hookipa, Calidi, Synthekine. B.W. has a ad hoc membership advisory board Repare Therapeutics. W.A. receives speaking honoraria from Roche, Medscape, Aptitude Health, Clinical Education Alliance and consulting fees from Clovis Oncology, Janssen, ORIC pharmaceuticals, Daiichi Sankyo and reserach funding from AstraZeneca, Zenith Epigenetics, Clovis Oncology, ORIC pharmaceuticals, Epizyme. G.K.A. receives research funding from Arcus, Agios, Astra Zeneca, Bayer, BioNtech, BMS, Celgene, Flatiron, Genentech/Roche, Genoscience, Incyte, Polaris, Puma, QED, Sillajen, Yiviva, and consulting fees from Adicet, Agios, Astra Zeneca, Alnylam, Autem, Bayer, Beigene, Berry Genomics, Cend, Celgene, CytomX, Eisai, Eli Lilly Exelixis, Flatiron, Genentech/Roche, Genoscience, Helio, Incyte, Ipsen, Legend Biotech, Loxo,
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted June 30, 2021. ; https://doi.org/10.1101/2021.06.28.450217doi: bioRxiv preprint
Nguyen, B., et al. (2021). BioRxiv [preprint] 9
Merck, MINA, Nerviano,QED, Redhill, Rafael, Silenseed, Sillajen, Sobi, Surface Oncology, Therabionics, Twoxar, Vector, Yiviva. E.M.O. receives research Funding from Genentech/Roche, Celgene/BMS, BioNTech, BioAtla, AstraZeneca, Arcus, Elicio, Parker Institute, AstraZeneca and consulting fees from Cytomx Therapeutics (DSMB), Rafael Therapeutics (DSMB), Sobi, Silenseed, Tyme, Seagen, Molecular Templates, Boehringer Ingelheim, BioNTech, Ipsen, Polaris, Merck, IDEAYA, Cend, AstraZeneca, Noxxon, BioSapien, Bayer (spouse), Genentech-Roche (spouse), Celgene-BMS (spouse), Eisai (spouse). M.A.P. receive consulting fees from BMS, Merck, Array BioPharma, Novartis, Incyte, NewLink Genetics, Aduro, Eisai, Pfizer and honoraria from BMS and Merck and research support from RGenix, Infinity, BMS, Merck, Array BioPharma, Novartis, AstraZeneca. G.J.R. has insitutional research funding from Mirait, Takeda, Merck, Roche, Novartis, and Pfizer. A.N.S. has advisory board / personal fees from Bristol-Myers Squibb, Immunocore, Castle Biosciences and research support from Bristol-Myers Squibb, Immunocore, Xcovery, Polaris, Novartis, Pfizer, Checkmate Pharmaceuticals and research funding from OBI-Pharma, GSK, Silenseed, BMS, Lilly. R.Y. receives consulting fees from Array BioPharma/Pfizer, Mirati Therapeutics, Natera and research funding from Pfizer, Boehringer Ingelheim. D.R.J. is member of the Advisory Council for Astra Zeneca and member of the Clinical Trial Steering Committee for Merck. D.M. reports disclosures from AstraZeneca, Johnson & Johnson, Boston Scientific, Bristol-Myers Squibb, Merck. S.P.D. is shareholder and consultant for Canexia Health Inc. M.F.B receives consulting fees from Roche, Eli Lilly, PetDx and research funding from Grail. D.B.S. has received consulted for and received honoraria from Pfizer, Lilly/Loxo Oncology, Vividion Therapeutics, Scorpion Therapetuics and BridgeBio. METHODS Samples and patients A total of 43,400 solid tumor samples from 38,933 patients sequenced at Memorial Sloan Kettering Cancer Center from 2013-11-18 to 2020-01-06 (6.1y) and included in the AACR Project Genomics Evidence Neoplasia Information Exchange (GENIE) (AACR Project GENIE Consortium, 2017) 9.0-public database were considered for this study. All tumors were profiled using the Memorial Sloan Kettering Integrated Molecular Profiling of Actionable Cancer Targets (MSK-IMPACT) clinical sequencing assay, a hybridization capture-based, next-generation sequencing platform (Cheng et al., 2015). Tumor types were defined using a unique cancer type and one or more cancer type detailed (Table S1). For endometrial and colorectal cancers, we defined a subset of hypermutated (HM) tumors as those having an oncogenic POLE mutation or exhibiting more than 25 mutations/Mb or having MSIsensor score (Niu et al., 2014) > 10. Exclusion criteria were as follows: unavailable matched normal; low sequencing coverage (<100x); low tumor purity as defined by the absence of somatic alterations (including silent); pediatric patients (<18y at time of sequencing); patients with more than one unique sequenced tumor type; cancer of unknown primary; tumor type in which metastasis are rare (e.g. Gliomas); breast cancer with unavailable molecular subtype information; tumor types with small sample size (e.i. n <80 and either primary n <30 or metastasis n <30). Finally, one sample per patient was selected using a set of priority rules as follows: the presence of a FACETS fit that passed qc > highest purity > highest sample coverage > most recent gene panel. A total of 25,775 samples spanning 50 tumor types were used for analysis (Figure S1A-D, Tables S1). This set included samples that were sequenced with three generations of the MSK-IMPACT panel, containing 341 genes (n = 1,801 samples), 410 genes (n = 6,372 samples), and 468 genes (n = 17,602 samples). Clinical data extraction procedures for the identification and mapping of metastatic events Clinical data were retrieved from the institutional electronic health records (EHR) database on 2020-11-05. Metastatic events were extracted from the pathology report of the sequenced samples and patients’ electronic health records. The anatomic location of the sequenced samples is described in the sample pathology reports as a free-text description by pathologists. The EHR includes International Classification of Diseases (ICD) billing codes which
classify a comprehensive list of diseases, disorders, injuries and other health conditions including metastatic events. Metastatic events from the sample pathology report and the ICD billing codes from the EHR were systematically mapped to a curated list of 21 organs (Table S8). Lymph nodes were also classified as distant or regional given the anatomic location of the primary tumor. Of note, the classification of distant vs. regional was not possible for tumor types in which the anatomic location of the primary tumor is not well defined (e.i. melanoma cutaneous, cutaneous squamous cell, sarcoma lipo and sarcoma UPS/MXF). The organ site mapping for metastatic cancer is available at https://github.com/clinical-data-mining/organ-site-mapping. For a user providing a table of organ site descriptions or ICD Billing codes, annotations of the 21 organ sites will be generated. Furthermore, additional annotations recognizing local extension and distant lymph node spread can be created. Metastatic burden was defined as the number of distinct organs (excluding regional lymph nodes) affected by metastases throughout a patient’s clinical course (ranging from 1 to 15 in the present study). Patients with more than six affected organ sites were grouped for analyses of metastatic burden. Comparison of metastatic sites automatically extracted from electronic health records vs. manual chart review A total of 4,859 patients (22.5%) with metastatic sites extracted through manual chart review and previously published were available (Abida et al., 2017; Jones et al., 2021; Razavi et al., 2018; Shoushtari et al., 2021; Yaeger et al., 2018). Ten tumor types were represented including the most frequent (prostate, lung, breast, colorectal, and melanoma). There was a strong correlation between the number of metastatic sites retrieved from manual chart review and the number of metastatic sites automatically extracted from electronic health records (Figure S1G). For colorectal hyper mutant and MSS only the first metastatic events were reported so we restricted the comparison to the first metastatic event extracted from EHR. It is also important to note that the manual chart review was done before this study. Therefore, the present study has a longer follow-up which resulted in a higher number of metastatic sites. We also calculated the sensitivity for each metastatic site and each tumor type (Figure S1H). The median sensitivity was 77% across tumor types and metastatic sites. Genomic analysis Tumor mutational burden (TMB) was calculated for each sample as the total number of nonsynonymous mutations, divided by the number of bases sequenced. Fraction of genome altered (FGA) was calculated for each sample as the percentage of the genome with absolute log2 copy ratios >0.2. Log2 copy-number ratios were derived as previously described (Cheng et al., 2015). Chromosome arm-level copy number alterations were computed using the ASCETS tool (Spurr et al., 2020) using default parameters. Allele-specific analyses of copy number alterations were performed using the FACETS tool (Shen and Seshan, 2016), which infers purity- and ploidy-corrected integer DNA copy number calls from sequencing data. The quality of FACETS fits was determined using a set of criteria as described in facets-preview (https://github.com/taylor-lab/facets-preview). To estimate a tumor purity- and ploidy-adjusted version of the FGA, we defined “adjusted FGA” as the fraction of the genome different from the major integer copy number (Mcn), where Mcn is defined as the integer total copy number spanning the largest portion of the genome. Tumor samples were considered to have undergone whole-genome doubling (WGD) if more than 50% of their autosomal genome had Mcn >2. The clonality of each mutation (clonal or subclonal or indeterminate) was determined as described in facets-suite (https://github.com/mskcc/facets-suite). For each tumor sample, the fraction of clonal mutations (clonal
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted June 30, 2021. ; https://doi.org/10.1101/2021.06.28.450217doi: bioRxiv preprint
Nguyen, B., et al. (2021). BioRxiv [preprint] 10
fraction) was determined by dividing the total number of clonal mutations by the sum of clonal and subclonal mutations. MSI-H status was defined by an MSIsensor score >10 (Niu et al., 2014). Somatic alterations were annotated using OncoKB for oncogenicity and clinical actionability (Chakravarty et al., 2017) (Data version: v2.8, released on 2020-09-17). For hypermutated colorectal and hypermutated uterine cancer, only genes that were recurrently mutated based on MutSig-CV (q-value<0.1) were considered for association analyses. For each tumor type, recurrent oncogenic alterations were defined as those considered oncogenic or likely oncogenic by OncoKB and present in at least 5% of either primary or metastatic samples (median of 15 per tumor type, Table S1). Canonical oncogenic pathway-level alterations were computed using curated pathway templates as previously reported (Ding et al., 2018; Sanchez-Vega et al., 2018). Segmented copy-number data were processed using the CNtools package v1.4. Statistical analyses Comparisons between groups (primary vs. metastatic tumors, primary samples from metastatic vs. non-metastatic patients, and metastases according to their organ location) were performed using the non-parametric Mann-Whitney U test for continuous variables or the Fisher’s exact test for categorical variables. Differences in the frequency of actionable mutations (Levels 1 to 3, as defined by OncoKB) between groups (primary tumors vs. metastases and primary tumors from metastatic vs. non-metastatic patients) were further tested using a multivariable logistic regression model adjusted for TMB and FGA. Differences in the frequency of arm-level copy number alterations between groups (primary vs. metastatic tumors and primary samples from metastatic vs. non-metastatic patients) were tested using a multivariable logistic regression model adjusted for FGA. A genomic feature was considered to be significantly correlated with metastatic burden if (a) the Spearman’s correlation between the two variables was statistically significant (q-value < 0.05) and (b) the coefficient associated with the genomic feature as a predictive variable in a multivariable linear regression model adjusted for sample type (metastatic vs. primary tumor) was statistically significant (p-value < 0.05). The second condition was required because the ratio of metastatic samples to primary samples was associated with metastatic burden and could otherwise act as a confounding factor. We assessed genomic features associated with the presence or absence of metastasis in a target organ using only target organs present in at least 5% of the patients. A genomic feature was considered to be significantly associated with metastasis to specific target organs if (a) the Mann-Whitney U test for continuous variables or the Fisher’s exact test for categorical variables was statistically significant (q-value < 0.05) and (b) the coefficient associated with the genomic feature as a predictive variable in a multivariable logistic regression model adjusted for sample type (metastatic vs. primary tumor, categorical) and metastatic burden (1 to 6, numerical) was statistically significant (p-value < 0.05). The second condition was required because the ratio of metastatic samples to primary samples and metastatic burden were associated with metastasis to specific target organs and could otherwise act as a confounding factor. When TMB and FGA were used in a generalized linear model (linear and logistic model), their distributions were harmonized using a normal transformation as described before (Vokes et al., 2019) then scaled from 0 to 1 by subtracting the minimum and dividing by the maximum. Logistic regression was performed using Firth’s bias-reduction method as implemented in the R package brglm (Kosmidis and Firth, 2020). Overall survival (OS) was measured from the time of sequencing to death and was censored at the last time the patient was known to be alive. If a patient had more than one sequenced sample, the first time of sequencing was used.
Median follow-up time was calculated using the reverse Kaplan-Meier method. Median overall survival and five-year survival rate were calculated by the Kaplan-Meier method. The association between metastatic burden and overall survival was assessed using univariable Cox proportional hazards regression models. All reported p-values are two-tailed. Multiple testing correction was applied within each tumor type using the false discovery rate (q-value) method and q-value < 0.05 was considered significant. All analyses were performed using R v3.5.2 (www.R-project.org) and Bioconductor v3.4. REFERENCES AACR Project GENIE Consortium (2017). AACR Project GENIE: Powering Precision Medicine through an International Consortium. Cancer Discov. 7, 818–831. Abida, W., Armenia, J., Gopalan, A., Brennan, R., Walsh, M., Barron, D., Danila, D., Rathkopf, D., Morris, M., Slovin, S., et al. (2017). Prospective Genomic Profiling of Prostate Cancer Across Disease States Reveals Germline and Somatic Alterations That May Affect Clinical Decision Making. JCO Precis Oncol 2017. Armenia, J., Wankowicz, S.A.M., Liu, D., Gao, J., Kundra, R., Reznik, E., Chatila, W.K., Chakravarty, D., Han, G.C., Coleman, I., et al. (2018). The long tail of oncogenic drivers in prostate cancer. Nat. Genet. 50, 645–651. Bakhoum, S.F., Ngo, B., Laughney, A.M., Cavallo, J.-A., Murphy, C.J., Ly, P., Shah, P., Sriram, R.K., Watkins, T.B.K., Taunk, N.K., et al. (2018). Chromosomal instability drives metastasis through a cytosolic DNA response. Nature 553, 467–472. Ben-David, U., and Amon, A. (2020). Context is everything: aneuploidy in cancer. Nat. Rev. Genet. 21, 44–62. Bieging, K.T., Mello, S.S., and Attardi, L.D. (2014). Unravelling mechanisms of p53-mediated tumour suppression. Nat. Rev. Cancer 14, 359–370. Bielski, C.M., Zehir, A., Penson, A.V., Donoghue, M.T.A., Chatila, W., Armenia, J., Chang, M.T., Schram, A.M., Jonsson, P., Bandlamudi, C., et al. (2018). Genome doubling shapes the evolution and prognosis of advanced cancers. Nat. Genet. 50, 1189–1195. Birkbak, N.J., and McGranahan, N. (2020). Cancer Genome Evolutionary Trajectories in Metastasis. Cancer Cell 37, 8–19. Borst, M.J., and Ingold, J.A. (1993). Metastatic patterns of invasive lobular versus invasive ductal carcinoma of the breast. Surgery 114, 637–641; discussion 641–642. Brastianos, P.K., Carter, S.L., Santagata, S., Cahill, D.P., Taylor-Weiner, A., Jones, R.T., Van Allen, E.M., Lawrence, M.S., Horowitz, P.M., Cibulskis, K., et al. (2015). Genomic Characterization of Brain Metastases Reveals Branched Evolution and Potential Therapeutic Targets. Cancer Discov. 5, 1164–1177. Brown, D., Smeets, D., Székely, B., Larsimont, D., Szász, A.M., Adnet, P.-Y., Rothé, F., Rouas, G., Nagy, Z.I., Faragó, Z., et al. (2017). Phylogenetic analysis of metastatic progression in breast cancer using somatic mutations and copy number aberrations. Nat. Commun. 8, 14944. Bucheit, A.D., Chen, G., Siroy, A., Tetzlaff, M., Broaddus, R., Milton, D., Fox, P., Bassett, R., Hwu, P., Gershenwald, J.E., et al. (2014). Complete loss of PTEN protein expression correlates with shorter time to brain metastasis and survival in stage IIIB/C melanoma patients with BRAFV600 mutations. Clin. Cancer Res. 20, 5527–5536. Budczies, J., von Winterfeld, M., Klauschen, F., Bockmayr, M., Lennerz, J.K., Denkert, C., Wolf, T., Warth, A., Dietel, M., Anagnostopoulos, I., et al. (2015). The landscape of metastatic progression patterns across major human cancers. Oncotarget 6, 570–583. Cancer Genome Atlas Research Network, Kandoth, C., Schultz, N., Cherniack, A.D., Akbani, R., Liu, Y., Shen, H., Robertson, A.G., Pashtan, I., Shen, R., et al. (2013). Integrated genomic characterization of endometrial carcinoma. Nature 497, 67–73. Cejas, P., López-Gómez, M., Aguayo, C., Madero, R., de Castro Carpeño, J., Belda-Iniesta, C., Barriuso, J., Moreno García, V., Larrauri, J., López, R., et al. (2009). KRAS mutations in primary colorectal cancer tumors and related metastases: a potential role in prediction of lung metastasis. PLoS One 4, e8199. Cerami, E., Gao, J., Dogrusoz, U., Gross, B.E., Sumer, S.O., Aksoy, B.A., Jacobsen, A., Byrne, C.J., Heuer, M.L., Larsson, E., et al. (2012). The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. Cancer Discov. 2, 401–404.
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted June 30, 2021. ; https://doi.org/10.1101/2021.06.28.450217doi: bioRxiv preprint
Nguyen, B., et al. (2021). BioRxiv [preprint] 11
Chakravarty, D., Gao, J., Phillips, S.M., Kundra, R., Zhang, H., Wang, J., Rudolph, J.E., Yaeger, R., Soumerai, T., Nissan, M.H., et al. (2017). OncoKB: A Precision Oncology Knowledge Base. JCO Precis Oncol 2017. Cheng, D.T., Mitchell, T.N., Zehir, A., Shah, R.H., Benayed, R., Syed, A., Chandramohan, R., Liu, Z.Y., Won, H.H., Scott, S.N., et al. (2015). Memorial Sloan Kettering-Integrated Mutation Profiling of Actionable Cancer Targets (MSK-IMPACT): A Hybridization Capture-Based Next-Generation Sequencing Clinical Assay for Solid Tumor Molecular Oncology. J. Mol. Diagn. 17, 251–264. Ding, L., Bailey, M.H., Porta-Pardo, E., Thorsson, V., Colaprico, A., Bertrand, D., Gibbs, D.L., Weerasinghe, A., Huang, K.-L., Tokheim, C., et al. (2018). Perspective on Oncogenic Processes at the End of the Beginning of Cancer Genomics. Cell 173, 305–320.e10. Eckert, M.A., Pan, S., Hernandez, K.M., Loth, R.M., Andrade, J., Volchenboum, S.L., Faber, P., Montag, A., Lastra, R., Peter, M.E., et al. (2016). Genomics of Ovarian Cancer Progression Reveals Diverse Metastatic Trajectories Including Intraepithelial Metastasis to the Fallopian Tube. Cancer Discov. 6, 1342–1351. Gao, J., Aksoy, B.A., Dogrusoz, U., Dresdner, G., Gross, B., Sumer, S.O., Sun, Y., Jacobsen, A., Sinha, R., Larsson, E., et al. (2013). Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal. Sci. Signal. 6, l1. Gao, Y., Bado, I., Wang, H., Zhang, W., Rosen, J.M., and Zhang, X.H.-F. (2019). Metastasis Organotropism: Redefining the Congenial Soil. Dev. Cell 49, 375–391. Gerratana, L., Davis, A.A., Polano, M., Zhang, Q., Shah, A.N., Lin, C., Basile, D., Toffoli, G., Wehbe, F., Puglisi, F., et al. (2020). Understanding the organ tropism of metastatic breast cancer through the combination of liquid biopsy tools. Eur. J. Cancer 143, 147–157. Hieronymus, H., Murali, R., Tin, A., Yadav, K., Abida, W., Moller, H., Berney, D., Scher, H., Carver, B., Scardino, P., et al. (2018). Tumor copy number alteration burden is a pan-cancer prognostic factor associated with recurrence and death. Elife 7. Hu, Z., Li, Z., Ma, Z., and Curtis, C. (2020). Multi-cancer analysis of clonality and the timing of systemic spread in paired primary tumors and metastases. Nat. Genet. 52, 701–708. Jiménez-Sánchez, A., Memon, D., Pourpe, S., Veeraraghavan, H., Li, Y., Vargas, H.A., Gill, M.B., Park, K.J., Zivanovic, O., Konner, J., et al. (2017). Heterogeneous Tumor-Immune Microenvironments among Differentially Growing Metastases in an Ovarian Cancer Patient. Cell 170, 927–938.e20. Jones, G.D., Brandt, W.S., Shen, R., Sanchez-Vega, F., Tan, K.S., Martin, A., Zhou, J., Berger, M., Solit, D.B., Schultz, N., et al. (2021). A Genomic-Pathologic Annotated Risk Model to Predict Recurrence in Early-Stage Lung Adenocarcinoma. JAMA Surg. 156, e205601. Kennecke, H., Yerushalmi, R., Woods, R., Cheang, M.C.U., Voduc, D., Speers, C.H., Nielsen, T.O., and Gelmon, K. (2010). Metastatic behavior of breast cancer subtypes. J. Clin. Oncol. 28, 3271–3277. Kosmidis, I., and Firth, D. (2020). Jeffreys-prior penalty, finiteness and shrinkage in binomial-response generalized linear models. Biometrika 108, 71–82. Lambert, A.W., Pattabiraman, D.R., and Weinberg, R.A. (2017). Emerging Biological Principles of Metastasis. Cell 168, 670–691. Leibold, J., Ruscetti, M., Cao, Z., Ho, Y.-J., Baslan, T., Zou, M., Abida, W., Feucht, J., Han, T., Barriga, F.M., et al. (2020). Somatic Tissue Engineering in Mouse Models Reveals an Actionable Role for WNT Pathway Alterations in Prostate Cancer Metastasis. Cancer Discov. 10, 1038–1057. Makohon-Moore, A.P., Zhang, M., Reiter, J.G., Bozic, I., Allen, B., Kundu, D., Chatterjee, K., Wong, F., Jiao, Y., Kohutek, Z.A., et al. (2017). Limited heterogeneity of known driver gene mutations among the metastases of individual patients with pancreatic cancer. Nat. Genet. 49, 358–366. Massagué, J., and Ganesh, K. (2021). Metastasis-Initiating Cells and Ecosystems. Cancer Discov. 11, 971–994. Massagué, J., and Obenauf, A.C. (2016). Metastatic colonization by circulating tumour cells. Nature 529, 298–306. Natrajan, R., Lambros, M.B.K., Geyer, F.C., Marchio, C., Tan, D.S.P., Vatcheva, R., Shiu, K.-K., Hungermann, D., Rodriguez-Pinilla, S.M., Palacios, J., et al. (2009). Loss of 16q in high grade breast cancer is associated with estrogen receptor status: Evidence for progression in tumors with a luminal phenotype? Genes Chromosomes Cancer 48, 351–365. Naxerova, K., Reiter, J.G., Brachtel, E., Lennerz, J.K., van de Wetering, M., Rowan, A., Cai, T., Clevers, H., Swanton, C., Nowak, M.A., et al. (2017). Origins of lymphatic and distant metastases in human colorectal cancer. Science 357, 55–60. Nguyen, D.X., Bos, P.D., and Massagué, J. (2009). Metastasis: from dissemination to organ-specific colonization. Nat. Rev. Cancer 9, 274–284.
Niu, B., Ye, K., Zhang, Q., Lu, C., Xie, M., McLellan, M.D., Wendl, M.C., and Ding, L. (2014). MSIsensor: microsatellite instability detection using paired tumor-normal sequence data. Bioinformatics 30, 1015–1016. Nixon, M.J., Formisano, L., Mayer, I.A., Estrada, M.V., González-Ericsson, P.I., Isakoff, S.J., Forero-Torres, A., Won, H., Sanders, M.E., Solit, D.B., et al. (2019). PIK3CA and MAP3K1 alterations imply luminal A status and are associated with clinical benefit from pan-PI3K inhibitor buparlisib and letrozole in ER+ metastatic breast cancer. NPJ Breast Cancer 5, 31. Noorani, A., Li, X., Goddard, M., Crawte, J., Alexandrov, L.B., Secrier, M., Eldridge, M.D., Bower, L., Weaver, J., Lao-Sirieix, P., et al. (2020). Genomic evidence supports a clonal diaspora model for metastases of esophageal adenocarcinoma. Nat. Genet. 52, 74–83. Paget, S. (1889). The distribution of secondary growths in cancer of the breast. Lancet 133, 571–573. Pareja, F., Ferrando, L., Lee, S.S.K., Beca, F., Selenica, P., Brown, D.N., Farmanbar, A., Da Cruz Paula, A., Vahdatinia, M., Zhang, H., et al. (2020). The genomic landscape of metastatic histologic special types of invasive breast cancer. NPJ Breast Cancer 6, 53. Pereira, A.A.L., Rego, J.F.M., Morris, V., Overman, M.J., Eng, C., Garrett, C.R., Boutin, A.T., Ferrarotto, R., Lee, M., Jiang, Z.-Q., et al. (2015). Association between KRAS mutation and lung metastasis in advanced colorectal cancer. Br. J. Cancer 112, 424–428. Priestley, P., Baber, J., Lolkema, M.P., Steeghs, N., de Bruijn, E., Shale, C., Duyvesteyn, K., Haidari, S., van Hoeck, A., Onstenk, W., et al. (2019). Pan-cancer whole-genome analyses of metastatic solid tumours. Nature 575, 210–216. Ran, R., Harrison, H., Syamimi Ariffin, N., Ayub, R., Pegg, H.J., Deng, W., Mastro, A., Ottewell, P.D., Mason, S.M., Blyth, K., et al. (2020). A role for CBF in maintaining the metastatic phenotype of breast cancer cells. Oncogene 39, 2624–2637. Razavi, P., Chang, M.T., Xu, G., Bandlamudi, C., Ross, D.S., Vasan, N., Cai, Y., Bielski, C.M., Donoghue, M.T.A., Jonsson, P., et al. (2018). The Genomic Landscape of Endocrine-Resistant Advanced Breast Cancers. Cancer Cell 34, 427–438.e6. Reiter, J.G., Hung, W.-T., Lee, I.-H., Nagpal, S., Giunta, P., Degner, S., Liu, G., Wassenaar, E.C.E., Jeck, W.R., Taylor, M.S., et al. (2020). Lymph node metastases develop through a wider evolutionary bottleneck than distant metastases. Nat. Genet. 52, 692–700. Robinson, D.R., Wu, Y.-M., Lonigro, R.J., Vats, P., Cobain, E., Everett, J., Cao, X., Rabban, E., Kumar-Sinha, C., Raymond, V., et al. (2017). Integrative clinical genomics of metastatic cancer. Nature 548, 297–303. Rueda, O.M., Sammut, S.-J., Seoane, J.A., Chin, S.-F., Caswell-Jin, J.L., Callari, M., Batra, R., Pereira, B., Bruna, A., Ali, H.R., et al. (2019). Dynamics of breast-cancer relapse reveal late-recurring ER-positive genomic subgroups. Nature 567, 399–404. Sanchez-Vega, F., Mina, M., Armenia, J., Chatila, W.K., Luna, A., La, K.C., Dimitriadoy, S., Liu, D.L., Kantheti, H.S., Saghafinia, S., et al. (2018). Oncogenic Signaling Pathways in The Cancer Genome Atlas. Cell 173, 321–337.e10. Shen, R., and Seshan, V.E. (2016). FACETS: allele-specific copy number and clonal heterogeneity analysis tool for high-throughput DNA sequencing. Nucleic Acids Res. 44, e131. Shih, D.J.H., Nayyar, N., Bihun, I., Dagogo-Jack, I., Gill, C.M., Aquilanti, E., Bertalan, M., Kaplan, A., D’Andrea, M.R., Chukwueke, U., et al. (2020). Genomic characterization of human brain metastases identifies drivers of metastatic lung adenocarcinoma. Nat. Genet. 52, 371–377. Shoushtari, A.N., Chatila, W.K., Arora, A., Sanchez-Vega, F., Kantheti, H.S., Rojas Zamalloa, J.A., Krieger, P., Callahan, M.K., Betof Warner, A., Postow, M.A., et al. (2021). Therapeutic Implications of Detecting MAPK-Activating Alterations in Cutaneous and Unknown Primary Melanomas. Clin. Cancer Res. 27, 2226–2235. Shukla, A., Nguyen, T.H.M., Moka, S.B., Ellis, J.J., Grady, J.P., Oey, H., Cristino, A.S., Khanna, K.K., Kroese, D.P., Krause, L., et al. (2020). Chromosome arm aneuploidies shape tumour evolution and drug response. Nat. Commun. 11, 1–14. Spurr, L.F., Touat, M., Taylor, A.M., Dubuc, A.M., Shih, J., Meredith, D.M., Pisano, W.V., Meyerson, M.L., Ligon, K.L., Cherniack, A.D., et al. (2020). Quantification of aneuploidy in targeted sequencing data using ASCETS. Bioinformatics. Stopsack, K.H., Whittaker, C.A., Gerke, T.A., Loda, M., Kantoff, P.W., Mucci, L.A., and Amon, A. (2019). Aneuploidy drives lethal progression in prostate cancer. Proc. Natl. Acad. Sci. U. S. A. 116, 11390–11395.
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted June 30, 2021. ; https://doi.org/10.1101/2021.06.28.450217doi: bioRxiv preprint
Nguyen, B., et al. (2021). BioRxiv [preprint] 12
Taylor, B.S., Schultz, N., Hieronymus, H., Gopalan, A., Xiao, Y., Carver, B.S., Arora, V.K., Kaushik, P., Cerami, E., Reva, B., et al. (2010). Integrative genomic profiling of human prostate cancer. Cancer Cell 18, 11–22. Tie, J., Lipton, L., Desai, J., Gibbs, P., Jorissen, R.N., Christie, M., Drummond, K.J., Thomson, B.N.J., Usatoff, V., Evans, P.M., et al. (2011). KRAS mutation is associated with lung metastasis in patients with curatively resected colorectal cancer. Clin. Cancer Res. 17, 1122–1130. Turajlic, S., Xu, H., Litchfield, K., Rowan, A., Chambers, T., Lopez, J.I., Nicol, D., O’Brien, T., Larkin, J., Horswell, S., et al. (2018). Tracking Cancer Evolution Reveals Constrained Routes to Metastases: TRACERx Renal. Cell 173, 581–594.e12. Vokes, N.I., Liu, D., Ricciuti, B., Jimenez-Aguilar, E., Rizvi, H., Dietlein, F., He, M.X., Margolis, C.A., Elmarakeby, H.A., Girshman, J., et al. (2019). Harmonization of Tumor Mutational Burden Quantification and Association With
Response to Immune Checkpoint Blockade in Non-Small-Cell Lung Cancer. JCO Precis Oncol 3. Watkins, T.B.K., Lim, E.L., Petkovic, M., Elizalde, S., Birkbak, N.J., Wilson, G.A., Moore, D.A., Grönroos, E., Rowan, A., Dewhurst, S.M., et al. (2020). Pervasive chromosomal instability and karyotype order in tumour evolution. Nature 587, 126–132. Yaeger, R., Chatila, W.K., Lipsyc, M.D., Hechtman, J.F., Cercek, A., Sanchez-Vega, F., Jayakumaran, G., Middha, S., Zehir, A., Donoghue, M.T.A., et al. (2018). Clinical Sequencing Defines the Genomic Landscape of Metastatic Colorectal Cancer. Cancer Cell 33, 125–136.e3. Zehir, A., Benayed, R., Shah, R.H., Syed, A., Middha, S., Kim, H.R., Srinivasan, P., Gao, J., Chakravarty, D., Devlin, S.M., et al. (2017). Mutational landscape of metastatic cancer revealed from prospective clinical sequencing of 10,000 patients. Nat. Med. 23, 703–713.
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted June 30, 2021. ; https://doi.org/10.1101/2021.06.28.450217doi: bioRxiv preprint
Nguyen, B., et al. (2021). BioRxiv [preprint] 13
SUPPLEMENTAL FIGURES
Solid tumors sequenced at MSKCC from 2013-11-18 to 2020-01-06 and included in the AACR Project GENIE
(n = 43,400)
Samples with sufficient quality(n = 41,799)
1,601 samples excluded based on:• Unavailable matched normal• Low sequencing coverage (<100x)• Low tumor purity (absence of somatic alterations including silent)
Samples from adult patients with a unique tumor type
(n = 39,325)
2,474 samples excluded based on:• Pediatric status of patients (<18y)• Presence of >1 sequenced tumor type
11,227 samples excluded based on:• Cancer of unknown primary• Tumor types in which distant metastases are rare (Gliomas)
• Unavailable breast cancer subtypes• Tumor type with small n (n < 80 and either primary n < 30 or metastasis n < 30)
Samples with well-defined tumor type (n = 28,098)
Samples eligible for analysis(n = 25,775)
2,323 samples excluded based on:• >1 sample per patient*
*priority rules: the presence of a FACETS fit that passed qc > highest purity > highest sample coverage > most recent gene panel
<30y 30-40y 40-50y 50-60y 60-70y 70-80y >80y0
2000
4000
6000
8000
Age at sequencing report<1m 3-6m 1-2y 2-3y 3-4y 4-5y 5-6y >6y
0
2000
4000
6000
8000
Interval b/w procedure and sequencing
1-2 2-3 4-5 5-6 6-7 7-8 8-9 9-10 11-12 >120
2000
4000
6000
8000
Sample coverage (x100)<10 10-20 20-30 30-40 40-50 50-60 60-70 70-80 >80
0
2000
4000
6000
8000
Tumor purity (%)
1-3m 6m-1y
10-11
median = 63.5y.IQR = 54-71y.
median = 62d.IQR = 0-287d.
median = 653IQR = 526-790
median = 40%IQR = 20-50%.
Thyroid Poorly Differentiated (131)Thyroid Papillary (307)
Adenoid Cystic Carcinoma (152)Head and Neck Squamous (412)
Sarcoma UPS/MXF (203)Sarcoma Lipo (228)
Sarcoma Leiomyo (291)Melanoma Uveal (103)
Cutaneous Squamous Cell (105)Melanoma Mucosal (175)
Melanoma Cutaneous (864)Ovarian Low−Grade Serous (89)
Ovarian Clear Cell (97)Cervical Squamous Cell (103)Uterine Carcinosarcoma (192)
Uterine Hypermutated (268)Uterine Serous (288)
Uterine Endometrioid (567)Ovarian High−Grade Serous (997)
Breast Ductal HR−HER2+ (115)Breast Ductal HR+HER2+ (256)
Breast Ductal Triple Negative (341)Breast Lobular HR+ (373)
Breast Ductal HR+HER2− (1524)Cholangio Extrahepatic (119)
Gallbladder Cancer (171)Hepatocellular Carcinoma (205)Pancreatic Neuroendocrine (211)
Cholangio Intrahepatic (407)Pancreatic Adenocarcinoma (1779)
Testicular Seminoma (97)Upper Tract Urothelial (197)
Testicular Non−Seminoma (240)Renal Clear Cell (421)
Bladder Urothelial (961)Prostate Adenocarcinoma (2172)
Lung Neuroendocrine (109)Pleural Mesothelioma (244)
Small Cell Lung Cancer (314)Lung Squamous Cell Carcinoma (537)
Lung Adenocarcinoma (4064)Anal Squamous Cell (87)Small Bowel cancer (94)
Gastrointestinal Neuroendocrine (150)Appendiceal Adenocarcinoma (202)
Stomach Adenocarcinoma (320)Colorectal Hypermutated (351)Gastrointestinal Stromal (403)
Esophageal Adenocarcinoma (542)Colorectal MSS (3197)
Endocrine (438)Head and Neck (564)Soft Tissue (722)Skin (1247)
Gynecologic (2601)
Breast (2609)
Developmental GI(2892)
Genitourinary (4088)
Thoracic (5268)
Core GI (5346)
Prim
ary
(389
1)P
rimar
y di
stan
t met
.(1
1741
)M
etas
tasi
s (1
0143
)
Male Genital (15)Breast (17)Kidney (19)Biliary tract (33)Distant LN (44)Mediastinum (44)Bladder/UT (58)Head and Neck (69)Female Genital (75)Adrenal Gland (124)Skin (128)Ovary (139)
Bowel (279)Pleura (381)CNS/Brain (421)
Intra−Abdominal (674)
Bone (726)
Lung (982)
Other (1321)
Liver (2289)
Lymph (2305)
0 10000 18912
Breast (318)
Head and Neck (415)
Mediastinum (958)
Male Genital (968)
Kidney (1069)
Skin (1462)
Ovary (1485)
PNS (1903)
Bladder/UT (2241)
Female Genital (2266)
Adrenal Gland (2477)
Biliary tract (2910)
CNS/Brain (3809)
Pleura (3924)
Bowel (4510)
Intra−Abd. (7045)
Distant LN (8889)
Bone (9035)
Liver (10353)
Lung (14470)
Unspecified (18912)
0 20 40 60 80 1000
20
40
60
80
100
This study (%)
Yang
Gao
et a
l. (%
)
rho = 0.84p = 7.7284e−09
OvarianLiverColonGastricThyroidRenalPancreasMelanomaLungProstateBreast Bone
LungLiverBrainPeritoneum
Distant LNLungPleuraMediastinumLiverBiliary tractIntra−AbdominalBowelCNS/BrainBonePNSKidneyBladder/UTFemale GenitalOvarySkinHead and Neck
A B
C D E F
G
H
73729/1005
77454/591
5368/128
7497/131
7963/80
4210/24
8625/29
538/15
574/7
86826/961
92123/133
98155/158
79205/260
93144/155
7996/122
359/26
8955/62
8636/42
1003/3
78133/171
8176/94
7629/38
6714/21
9110/11
574/7
719/257
79/125
53/62
72/29
143/21
102/20
961448/1505
97113/116
100644/647
94351/373
9167/74
804/5
9288/96
9557/60
9679/82
8630/35
8815/17
477/15
504/8
501/2
251/4
1001/1
88207/235
96124/129
7226/36
699/13
8533/39
1002/2
1004/4
759/12
6815/22
00/4
8815/17
00/1
94205/219
1005/5
9463/67
9664/67
7311/15
9325/27
1008/8
9518/19
10011/11
951817/1911
97769/789
9314/15
96559/584
8249/60
8122/27
89120/135
95142/150
95105/110
9036/40
1001/1
7710/13
835/6
1002/2
753/4
00/1
466/13
603/5
503/6
00/2
8511/13
672/3
898/9
1001/1
6946/67
7333/45
715/7
507/14
1001/1
7758/75
7727/35
1002/2
7828/36
501/2
8669/80
8335/42
8713/15
9110/11
753/4
1008/8
172/12
402/5
00/4
00/3
PanCan (4859)Prostate Adeno. (1740)
Colorectal MSS (908)Breast Duct HR+HER2− (714)
Melan. Cutaneous (566)Lung Adeno. (235)
Breast Duct TN (203)Breast Lob HR+ (189)
Breast Duct HR+HER2+ (155)Breast Duct HR−HER2+ (78)
Colorectal HM (71)
Dist
ant L
NLu
ngPl
eura
Med
iast
inum
Live
rBi
liary
trac
tIn
tra−A
bd.
Bowe
lCN
S/Br
ain
Bone
PNS
Kidn
eyBl
adde
r/UT
Fem
ale
Gen
ital
Ova
rySk
inHe
ad a
nd N
eck
0 50 100%25 75
Sensitivity
0 20 40 60 80 1000
20
40
60
80
100
This study (%)
Jan
Budc
zies
et a
l. (%
)
rho = 0.81p < 2.22e−16
Testicular Cervical Ovarian MelanomasUrinary Liver ProstateKidney Head and neck Bilary Pancreatic BreastColorectalEsophagogastric Lung Lung
BoneLiverCNS/BrainDistant LNPleuraIntra−AbdominalAdrenal GlandSkinKidneyOvary
0 5 10 150
5
10
15
This study (#)
Char
t rev
iew
(#)
Colorectal HM (71)
0 200 400 6000
100200300400500600
Colorectal MSS (908)
0 10 20 30 400
10
20
30
40
Breast Duct HR−HER2+ (78)
0 100 300 5000
100200300400500600
Breast Duct HR+HER2− (714)
0 20 40 60 800
20406080
100
Breast Duct HR+HER2+ (155)
0 20 60 100 1400
20406080
100120140
Breast Duct TN (203)
0 50 100 1500
50
100
150Breast Lob HR+ (189)
0 10 20 300
10
20
30
Lung Adeno. (235)
0 50 100 1500
50
100
150
Melan. Cutaneous (566)
0 200 600 10000
200400600800
1000
Prostate Adeno. (1740)
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted June 30, 2021. ; https://doi.org/10.1101/2021.06.28.450217doi: bioRxiv preprint
Nguyen, B., et al. (2021). BioRxiv [preprint] 14
Figure S1. Study design and characteristics of the patients and samples included in MSK-MET, related to Figure 1. (A) CONSORT flow diagram of the study. (B) Distribution of age at time of sequencing, time interval between surgical procedure and sequencing, tumor sample coverage, tumor purity assessed by the pathologist. (C) Distribution of 25,775 tumors across 50 tumor types grouped by ten organ systems. (D) Distribution of the 25,775 tumors according to the sample type (primary vs. metastasis), site of metastatic sample and whether the primary sample was from a patient with evidence of distant metastasis at the time of the study or not. (E) Distribution of the 99,220 metastatic events mapped to 21 organ sites. (F) Comparison of the frequency of metastasis in several target organs from different tumor types reported in (Gao et al., 2019) and in (Budczies et al., 2015) vs. the present study. (G) Comparison of the number of metastasis using data from manual chart reviews and clinical data automatically extracted from the EHR (This study). (H) Heatmap showing the recall rate (sensitivity) across several target organs from different tumor types using patients retrieved from manual chart reviews.
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted June 30, 2021. ; https://doi.org/10.1101/2021.06.28.450217doi: bioRxiv preprint
Nguyen, B., et al. (2021). BioRxiv [preprint] 15
Figure S2. Genomic differences between primary samples from metastatic and non-metastatic patients, related to Figure 2. (A) Scatterplot showing the comparison of the median FGA, median WGD, median TMB and median clonality for each tumor type in primary samples from metastatic and non-metastatic patients. (B) The following clinical and genomic features are shown side-by-side for primary samples from metastatic and non-metastatic patients using a combination of barplots and violin plots; from left to right: sample counts, FGA, fraction of samples with WGD, TMB, clonality, fraction of samples with TMB-high status and distribution of highest actionable alteration. The black vertical line in each violin plots represents the median. Heatmap shows the frequency of arm level alterations in primary tumors and metastases. Tumor types are ordered from top to bottom by decreasing FGA in metastasis and grouped by organ systems. * indicates q-value < 0.05. WGD and clonality were available for a subset of 10,106 samples with FACETS data. (C) Statistically significant differences in the frequency of oncogenic alterations between primary tumors and metastases across all tumor types. Triangles summarize oncogenic alterations frequencies in primary samples from metastatic vs. non-metastatic patients and are colored according to alteration type.
MutationAmplificationDeletionFusion
Higher in pts w/ met
Lower in pts w/ metPathway
Alteration frequency
Enriched inpts w/o met.
Enriched inpts w/ met
q-value < 0.05ns.
Fraction of genome altered Whole genome duplication Tumor mutational burden ClonalityA
B
C
GIST
UCEC SEROUS
UCEC HM
CRC MSS
PRAD
ILC HR+HER2-
BLCA
UCEC ENDOLUAD
IDC HR+HER2−
0 10 20 30 40 50
0
10
20
30
40
50
UCEC ENDO
CRC MSS
PRADLUAD
BLCA
IDC HR+HER2−
0 20 40 60 80 100
0
20
40
60
80
100
IDC HR+HER2−
HNSC
THPA
GIST
IDC HR−HER2+CCRCC
LUAD
0 2 4 6 8 10
0
2
4
6
8
10SKCM
0 20 40 60 80 100
0
20
40
60
80
100
Met
asta
sis
med
ian
%
Met
asta
sis
med
ian
%
Met
asta
sis
med
ian
mut
/Mb
Met
asta
sis
med
ian
%
No met. median % No met. median % No met. median mut/Mb No met. median %
19
30
2
1
20
3817
11
43
1299
4050
715
1531
36
21419
12
416
18
151921
6222
8912
1749
2445
1535
1122
45
28
1
6
11
12
56
237
54
1817
2220
1
6864
5412
753
19
62428
45
823
12
28303651
56
144
232938
57912
2455
44
6578
45
0 1 5 10 20 50 100%
JUN
KMT2C
MYC
TP53
TERT
Cell CycleCDKN2A
MYCMYC
MYCMYC
p53TP53
MED12
p53Cell CycleRTK/RAS
MDM2STAG2
TP53FGFR3
MYCPI3Kp53
MYCCDK12PTENTP53
SPOP
RTK/RASCell Cycle
p53PI3K
MDM2ARID1AFGFR1
CBFBCCND1
TP53PIK3CA
TGF-βWNT
NRF2DDR
Cell Cyclep53
FOXA1SMARCA4
NKX2-1EGFR
KEAP1CDKN2A
TP53
Sarcoma UPS/MXF
Upper Tract Urothelial
Gastrointestinal Stromal
Breast Lobular HR+
Thyroid Papillary
Pancreatic Adenocarcinoma
Colorectal MSS
Breast Ductal Triple Negative
Uterine Endometrioid
Bladder Urothelial
Prostate Adenocarcinoma
Breast Ductal HR+HER2−
Lung Adenocarcinoma2
0 1 5 10 20 50 100%29335657391117522209665252381193353810715
9863267612248631842650861183517
1847243106164354
6610931973297331836925103788137671867726843
96810312345782112684577364
44211004711979110961832594417302531315639308
421131618130
18975823118243813022
Thyroid Papillary (128)Thyroid Poorly Differentiated (72)
Adenoid Cystic Carcinoma (50)Head and Neck Squamous (197)
Renal Clear Cell (275)Bladder Urothelial (763)
Upper Tract Urothelial (152)Testicular Seminoma (61)
Testicular Non−Seminoma (122)Prostate Adenocarcinoma (1312)
Appendiceal Adenocarcinoma (88)Colorectal Hypermutated (311)Stomach Adenocarinoma (210)
Small Bowel cancer (58)Gastrointestinal Neuroendocrine (79)
Anal Squamous Cell (52)Colorectal MSS (2090)
Gastrointestinal Stromal (270)Esophageal Adenocarinoma (420)
Sarcoma Lipo (140)Sarcoma UPS/MXF (129)
Sarcoma Leiomyo (130)Cutaneous Squamous Cell (54)
Melanoma Uveal (34)Melanoma Cutaneous (181)
Melanoma Mucosal (118)Gallbladder Cancer (85)
Cholangio Extrahepatic (74)Cholangio Intrahepatic (311)
Pancreatic Adenocarcinoma (1071)Hepatocellular Carcinoma (168)Pancreatic Neuroendocrine (99)
Breast Lobular HR+ (210)Breast Ductal HR+HER2− (941)Breast Ductal HR−HER2+ (65)
Breast Ductal HR+HER2+ (147)Breast Ductal Triple Negative (198)
Uterine Hypermutated (206)Uterine Endometrioid (442)
Ovarian Clear Cell (61)Ovarian Low−Grade Serous (32)
Cervical Squamous Cell (66)Uterine Serous (195)
Ovarian High−Grade Serous (350)Uterine Carcinosarcoma (129)
Pleural Mesothelioma (211)Lung Adenocarcinoma (2479)
Lung Squamous Cell Carcinoma (393)Lung Neuroendocrine (51)
Small Cell Lung Cancer (152)
*
*
**
**
**
*
*
*
*
*
*
*
*
*
**
*
**
*
*
*
*
*
*
*
*
*
No. patients FGA WGD TMB Clonality TMBhi Actionabl. Arm−Level Copy−Number Alterations0 1897 0 100 0 1 0 10 100 0 100 0 1 0 1 1 2 3 4 5 6 7 8 9 10 11 12 13 1516 17 18 19 20 2214 21
MetsNo mets
1897 0 100 0 1 0 10 100 0 100 0 1 0 1% Fraction mut/Mb % Fraction Fraction#
0 p q p q p q p q p q p q p q p q p q p q p q p q q q q p q p q p q p q p q q q
L1 L2 L3A
L3B
L4 Onc
VUS
None
Actionabl.
50 40 30 20 10 10 20 30 40 50
%loss %gainq-value < 0.05*
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted June 30, 2021. ; https://doi.org/10.1101/2021.06.28.450217doi: bioRxiv preprint
Nguyen, B., et al. (2021). BioRxiv [preprint] 16
Figure S3. Genomic differences of metastases according to organ location, related to Figure 4. (A) Distribution of sequenced metastasis according to organ location with the heatmap showing the percentage of metastatic samples where each row represents a tumor type and each column represents the organ location. For each tumor type, the distribution of all 21 organ locations is shown as a stacked barplot to the right of the heatmap. For each organ, the distribution of all 50 tumor types is shown as a stacked barplot below the heatmap. For each tumor type, the number of metastasis samples is indicated in parentheses. For each organ, the number of metastasis samples is indicated in parentheses. (B) Statistically significant association between FGA (black circle) and TMB (white diamond) and specific metastatic sites. (C) Statistically significant oncogenic alterations associated with specific metastatic sites.
232664931563738363616161024121591418610522951313284853634374450179171391322274128355025
235141222910280313324171330163632556946334547311866835201211311508666587187241811
131168224613933121518189131382584251363217112262101116131819194028242531182215161318287334
1039572014110212712161411421101301017824108031241961761061419158191412612281250
751081521211301782720553011100241020118052420100030003210214
700000102010016724191277144738132215101101113362517211462780400600
433157111270281652750020006300100602270200322139200101
41213116831245242210200200014110101100203320101000100
30000001060000725021141903242224011031216838116341112784
10000000002000901582401140022021020000000325430000100
11102010205310610200001000000010101020103010033012100
1011023350000114200000001110101722210000200001200000
1000010000000020002110100000000020003436911430200100
170632120000001100000000000000301010000000101006100
100000000001002000011010000000002300108220030400200
020300102000011010000200000000100123020000100000100
000231102010210010000003000000100000000000000010100
001001000000000010001200007002420003020000000000100
000100010000000000200003000000020023020000100000110
000000000010011010001000000000001000020000000002000
000000000000000000000000000000103020000000000200010
Lym
ph (2
305)
Live
r (22
89)
Oth
er (1
321)
Lung
(982
)Bo
ne (7
26)
Intra
−Abd
omin
al (6
74)
CNS/
Brai
n (4
21)
Pleu
ra (3
81)
Bowe
l (27
9)Ova
ry (1
39)
Skin
(128
)Ad
rena
l Gla
nd (1
24)
Fem
ale
Gen
ital (
75)
Head
and
Nec
k (6
9)
Blad
der/U
T (5
8)
Dist
ant L
N (4
4)
Med
iast
inum
(44)
Bilia
ry tr
act (
33)
Kidn
ey (1
9)Br
east
(17)
Mal
e G
enita
l (15
)
PanCan (10143)Head and Neck Squamous (215)Adenoid Cystic Carcinoma (102)
Thyroid Papillary (179)Thyroid Poorly Differentiated (59)
Small Cell Lung Cancer (162)Lung Adenocarcinoma (1585)
Lung Squamous Cell Carcinoma (144)Lung Neuroendocrine (58)Pleural Mesothelioma (33)
Breast Ductal HR+HER2− (583)Breast Ductal HR+HER2+ (109)
Breast Ductal HR−HER2+ (50)Breast Ductal Triple Negative (143)
Breast Lobular HR+ (163)Esophageal Adenocarinoma (122)
Stomach Adenocarinoma (110)Small Bowel cancer (36)
Colorectal Hypermutated (40)Colorectal MSS (1107)
Gastrointestinal Neuroendocrine (71)Gastrointestinal Stromal (133)
Appendiceal Adenocarcinoma (114)Anal Squamous Cell (35)
Hepatocellular Carcinoma (37)Gallbladder Cancer (86)
Cholangio Extrahepatic (45)Cholangio Intrahepatic (96)
Pancreatic Adenocarcinoma (708)Pancreatic Neuroendocrine (112)
Renal Clear Cell (146)Upper Tract Urothelial (45)
Bladder Urothelial (198)Prostate Adenocarcinoma (860)Testicular Non−Seminoma (118)
Testicular Seminoma (36)Ovarian High−Grade Serous (647)
Ovarian Low−Grade Serous (57)Ovarian Clear Cell (36)
Uterine Carcinosarcoma (63)Uterine Endometrioid (125)Uterine Hypermutated (62)
Uterine Serous (93)Cervical Squamous Cell (37)Melanoma Cutaneous (683)
Melanoma Mucosal (57)Melanoma Uveal (69)
Cutaneous Squamous Cell (51)Sarcoma Leiomyo (161)
Sarcoma Lipo (88)Sarcoma UPS/MXF (74)
-5 -3 -1 1 3 5log2(fold change)
Ovarian High-Grade Serous
Prostate Adenocarcinoma
Lung Neuroendocrine
Bladder Urothelial
Colorectal MSS
Melanoma Cutaneous
Pancreatic Adenocarcinoma
Lung Adenocarcinoma
Intra−Abd
Liver
Liver
LymphLiver
Intra−AbdOvary
CNS/Brain
LymphCNS/Brain
LymphLung
LymphLung
Intra−AbdLiverLiver
PleuraLymph
LiverCNS/Brain
PleuraCNS/Brain
Adrenal Gland
0-2-4 2 4 0 1 2 5 10 20 50 100%
15
83
43
1319
2820
7
6872
5311
57
14403130
1020
7467
268479
735
4125
2919
8081
672
127879
63
3614
3165
3026
919
1616
2831
2527
147
55
29
25
3336
339
1438
23
3542
8239
29
2758
1015
2437
6481
146458
8219
1946
11325562
857
245558
84
267
4476
1841
412
2760
1021
3960
714
68
Alteration frequency
ESR1
TP53
FOXA1RHOA
HIPPOPI3K
NF1PTEN
Cell Cyclep53
RTK/RASFGFR3
TP53
MYCPI3K
ARERGMYC
PTEN
RTK/RASTGF-β
WNTAPC
BRAF
Cell Cycle
Epigenetic
p53
AKT2CDKN2A
TP53
Cell CycleDDR
Epigeneticp53
PI3KWNT
CDKN2A
EGFR
STK11TERTTP53
Liver
Liver
LiverOvary
LymphCNS/Brain
LymphLung
CNS/Brain
LymphLungLungLungLung
LiverLiverLungBoneLiverLiver
LiverLungLung
Intra−AbdIntra−Abd
LiverLymph
Intra−AbdLiverLungLiver
LymphIntra−Abd
LiverLiverLiver
LymphIntra−Abd
Liver
PleuraPleura
CNS/BrainCNS/Brain
PleuraCNS/Brain
PleuraLymph
LiverSkin
Adrenal GlandLymphPleura
LungPleura
CNS/BrainCNS/Brain
Breast Ductal HR+HER2−
Lung Neuroendocrine
Breast Lobular HR+
Melanoma Cutaneous
Bladder Urothelial
Prostate Adenocarcinoma
Colorectal MSS
Pancreatic Adenocarcinoma
Lung Adenocarcinoma
TargetOrganAlteration
A B C
0 50 100%25 75
FrequencyMutationAmplificationDeletionFusion
Higher in TO
Lower in TOPathway
FGATMB
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted June 30, 2021. ; https://doi.org/10.1101/2021.06.28.450217doi: bioRxiv preprint
Nguyen, B., et al. (2021). BioRxiv [preprint] 17
SUPPLEMENTAL TABLES Supplemental tables can be found online. Table S1. Summary of the 50 tumor types included in the MSK-MET cohort, Related to Figure 1 Table S2. Genomic differences between primary tumors and metastases, Related to Figure 2 Table S3. Genomic differences between primary samples from metastatic and non-metastatic patients, Related to Figure S2 Table S4. Association between metastatic burden and overall survival assessed using Cox proportional hazards model, Related to Figure 3 Table S5. Genomic associations with metastatic burden, Related to Figure 3 Table S6. Genomic differences of metastases according to their organ location, Related to Figure S3 Table S7. Genomic features associated with metastasis to specific target organs, Related to Figure 4 Table S8. Mapping between free-text description from pathology reports and ICD billing codes from EHR to a curated list of 21 organs, Related to Figure 1
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted June 30, 2021. ; https://doi.org/10.1101/2021.06.28.450217doi: bioRxiv preprint