the phenotypes of proliferating glioblastoma cells reside on a …€¦ · 1 the phenotypes of...

23
The phenotypes of proliferating glioblastoma cells reside on a single axis of variation. 1 Running Title: A draft single-cell atlas of human glioma 2 Lin Wang 1,3,* , Husam Babikir 1,3,* , Sören Müller 1,3,* , Garima Yagnik 1,3 , Karin Shamardani 1,3 , Francisca 3 Catalan 1,3 , Gary Kohanbash 4 , Beatriz Alvarado 1,2,3 , Elizabeth Di Lullo 2,3 , Arnold Kriegstein 2,3 , Sumedh 4 Shah 1,3 , Harsh Wadhwa 1,3 , Susan M. Chang 1,3 , Joanna J. Phillips 1,3 , Manish K. Aghi 1,3 , Aaron A. 5 Diaz 1,3,† 6 1 Department of Neurological Surgery. 7 2 Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research. 8 3 University of California, San Francisco, San Francisco, CA 94158. 9 4 Department of Neurological Surgery, University of Pittsburgh, Pittsburgh, PA 15261. 10 *Equal contribution 11 †Correspondence: Aaron Diaz, 1450 3 rd Street, San Francisco, CA, 94158, [email protected], (415) 12 514-0408 13 The authors certify that they have no affiliations with or involvement in any organization or entity with 14 any financial interest or non-financial interest in the subject matter or materials discussed in this 15 manuscript. 16 Abstract 17 Although tumor-propagating cells can be derived from glioblastomas (GBMs) of the proneural and 18 mesenchymal subtypes, a glioma stem-like cell (GSC) of the classical subtype has not been identified. 19 It is unclear if mesenchymal GSCs (mGSCs) and/or proneural GSCs (pGSCs) alone are sufficient to 20 generate the heterogeneity observed in GBM. We performed single-cell/nuclei RNA-sequencing of 28 21 gliomas, and single-cell ATAC-sequencing for 8 cases. We find that GBM GSCs reside on a single axis 22 of variation, ranging from proneural to mesenchymal. In silico lineage tracing using both transcriptomics 23 and genetics supports mGSCs as the progenitors of pGSCs. Dual inhibition of pGSC-enriched and 24 mGSC-enriched growth and survival pathways provides a more complete treatment than combinations 25 targeting one GSC phenotype alone. This study sheds light on a long-standing debate regarding 26 lineage relationships among GSCs and presents a paradigm by which personalized combination 27 therapies can be derived from single-cell RNA signatures, to overcome intra-tumor heterogeneity. 28 Significance 29 Research. on May 29, 2020. © 2019 American Association for Cancer cancerdiscovery.aacrjournals.org Downloaded from Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on September 25, 2019; DOI: 10.1158/2159-8290.CD-19-0329

Upload: others

Post on 27-May-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

The phenotypes of proliferating glioblastoma cells reside on a single axis of variation. 1

Running Title: A draft single-cell atlas of human glioma 2

Lin Wang1,3,*, Husam Babikir1,3,*, Sören Müller1,3,*, Garima Yagnik1,3, Karin Shamardani1,3, Francisca 3

Catalan1,3, Gary Kohanbash4, Beatriz Alvarado1,2,3, Elizabeth Di Lullo2,3, Arnold Kriegstein2,3, Sumedh 4

Shah1,3, Harsh Wadhwa1,3, Susan M. Chang1,3, Joanna J. Phillips1,3, Manish K. Aghi1,3, Aaron A. 5

Diaz1,3,† 6

1 Department of Neurological Surgery. 7

2 Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research. 8

3University of California, San Francisco, San Francisco, CA 94158. 9

4Department of Neurological Surgery, University of Pittsburgh, Pittsburgh, PA 15261. 10

*Equal contribution 11

†Correspondence: Aaron Diaz, 1450 3rd Street, San Francisco, CA, 94158, [email protected], (415) 12

514-0408 13

The authors certify that they have no affiliations with or involvement in any organization or entity with 14

any financial interest or non-financial interest in the subject matter or materials discussed in this 15

manuscript. 16

Abstract 17

Although tumor-propagating cells can be derived from glioblastomas (GBMs) of the proneural and 18

mesenchymal subtypes, a glioma stem-like cell (GSC) of the classical subtype has not been identified. 19

It is unclear if mesenchymal GSCs (mGSCs) and/or proneural GSCs (pGSCs) alone are sufficient to 20

generate the heterogeneity observed in GBM. We performed single-cell/nuclei RNA-sequencing of 28 21

gliomas, and single-cell ATAC-sequencing for 8 cases. We find that GBM GSCs reside on a single axis 22

of variation, ranging from proneural to mesenchymal. In silico lineage tracing using both transcriptomics 23

and genetics supports mGSCs as the progenitors of pGSCs. Dual inhibition of pGSC-enriched and 24

mGSC-enriched growth and survival pathways provides a more complete treatment than combinations 25

targeting one GSC phenotype alone. This study sheds light on a long-standing debate regarding 26

lineage relationships among GSCs and presents a paradigm by which personalized combination 27

therapies can be derived from single-cell RNA signatures, to overcome intra-tumor heterogeneity. 28

Significance 29

Research. on May 29, 2020. © 2019 American Association for Cancercancerdiscovery.aacrjournals.org Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on September 25, 2019; DOI: 10.1158/2159-8290.CD-19-0329

Tumor-propagating cells can be derived from mesenchymal and proneural glioblastomas. 30

However, a stem cell of the classical subtype has yet to be demonstrated. We show that classical-31

subtype gliomas are comprised of proneural and mesenchymal cells. This study sheds light on a long-32

standing debate regarding lineage relationships between glioma cell-types. 33

Introduction 34

Malignant gliomas are the most common primary tumors of the adult brain and are essentially 35

incurable. Glioma genetics have been studied extensively and yet targeted therapeutics have produced 36

limited results. Glioblastomas (GBMs) have been classified into subtypes based on gene expression 37

(e.g. 1). However, we and others have shown that GBMs contain heterogeneous mixtures of cells from 38

distinct transcriptomic subtypes (2, 3). This intratumor heterogeneity is at least partially to blame for the 39

failures of targeted therapies. 40

Tumor-propagating cells expressing markers of the mesenchymal and proneural transcriptomic 41

subtypes can be derived from GBMs (e.g. 4). However, no GSC of the classical subtype has been 42

convincingly identified. The lineage relationship between proneural GSCs (pGSC) and mesenchymal 43

GSCs (mGSCs) is unknown. Surprisingly little is known about the cellular progeny of GSCs in vivo. It is 44

unclear if proneural and/or mesenchymal GSCs are sufficient to generate the heterogeneity observed in 45

GBM. 46

We performed single-cell RNA sequencing (scRNA-seq), single-nuclei RNA sequencing 47

(snRNA-seq), single-cell assay for transposase-accessible chromatin using sequencing (scATAC-seq), 48

and whole-exome DNA sequencing (exome-seq) of specimens from untreated human gliomas. Via in-49

silico transcriptomic and genetic lineage tracing of these data we defined the lineage relationships 50

between glioma cell types. Integrating this with meta-analysis of sequencing data from The Cancer 51

Genome Atlas (TCGA) (https://cancergenome.nih.gov/) and spatial data from The Ivy Foundation 52

Glioblastoma Atlas Project (Ivy GAP) (http://glioblastoma.alleninstitute.org/), we mapped glioma cells to 53

analogous cell types in the developing brain and to specific tumor-anatomical structures. From the 54

scATAC-seq we elucidated cell-type specific cis-regulatory grammars and associated transcription 55

factors. Using immunohistochemistry (IHC) and automated image analysis of human GBM microarrays 56

we validated learned phenotypes at the protein level. Lastly, we performed an in vitro screen of drug 57

combinations that target genes identified from our single-cell analysis of GSCs. 58

We show that proliferating IDH-wildtype GBM cells can be described by a single axis of gene 59

signature, which ranges from proneural to mesenchymal. At the extremes of this axis reside stem-like 60

cells which express canonical markers of mGSCs and pGSCs. We identify a mGSC signature that 61

Research. on May 29, 2020. © 2019 American Association for Cancercancerdiscovery.aacrjournals.org Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on September 25, 2019; DOI: 10.1158/2159-8290.CD-19-0329

correlates with significantly inferior survival in IDH-wildtype GBM. Our analysis shows that mGSCs, 62

pGSCs, their progeny and stromal/immune cells are sufficient to explain the heterogeneity observed in 63

GBM. We show that combinatorial therapies that address intra-tumor heterogeneity can be designed 64

based on single-cell RNA signatures. 65

Results 66

Single-cell mRNA and bulk DNA profiling of untreated human gliomas 67

We applied scRNA-seq or snRNA-seq to biopsies from 22 primary untreated human IDH-68

wildtype GBMs and 6 primary untreated IDH-mutant gliomas (Table S1). Our goal was to profile both 69

for cellular coverage (to survey cellular phenotypes) and for transcript coverage (to compare genetics). 70

Therefore, we performed sc/snRNA-seq for 19 samples via the 10X Genomics Chromium platform 71

(10X) and obtained 3’-sequencing data for 31,281 cells after quality control. ScRNA-seq for 3 cases 72

was done using the Fluidigm C1 platform (C1), which yielded full-transcript coverage for 291 cells. We 73

incorporated 4 more published cases from our C1 pipeline, adding 384 cells (3). For 6 of the 10X cases 74

and 5 of the C1 cases, the biopsies were minced, split and both scRNA-seq and exome sequencing 75

(exome-seq) were performed (Table S1). 76

We applied our pipeline for sc/snRNA-seq pre-processing (5), quantification of expressed 77

mutations (3, 6), and cell-type identification (7, 8). We identified 10,816 tumor-infiltrating stromal and 78

immune cells based on expressed mutations, clustering and canonical marker genes (Fig. 1 A and Fig. 79

S1A). We term the remaining 20,465 cells neoplastic, as they express clonal malignant mutations. Only 80

neoplastic cells were used for all subsequent analyses. 81

The transcriptional phenotypes of proliferating IDH-wildtype GBM neoplastic cells can be explained 82

by a single axis that varies from proneural to mesenchymal 83

Unbiased principal component analyses (PCAs) of the IDH-wild-type GBM cases revealed two 84

patient-independent clusters of neoplastic cells, consistent across all 3 platforms (Fig. 1B-I; Table S2). 85

A differential-expression test between clusters identified markers of the proneural (e.g. ASCL1, OLIG2) 86

and mesenchymal (e.g. CD44, CHI3L1) subtypes as significant (Table S3). Mesenchymal cells 87

significantly over-express markers of response to hypoxia (e.g. HIF1A) and cytokines that promote 88

myeloid-cell chemotaxis (e.g. CSF1, CCL2, CXCL2). However, mesenchymal cells do not express high 89

levels of MKI67 or other markers of cell-cycle progression. Conversely, proneural cells express high 90

levels of MKI67 as well as cyclin-dependent kinase. We estimated the fraction of actively cycling cells 91

using the Seurat package (9). By this metric, 21%-30% of proneural cells are cycling compared to 92

0.3%-10% of mesenchymal cells (Fig. 1D,G). Importantly, all our clinical specimens contain cells of 93

Research. on May 29, 2020. © 2019 American Association for Cancercancerdiscovery.aacrjournals.org Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on September 25, 2019; DOI: 10.1158/2159-8290.CD-19-0329

both phenotypes: proliferating proneural cells and mesenchymal cells with a quiescent, cytokine-94

secretory phenotype (Fig. S1B). 95

In all PCAs, cells at the left and right extremes of principal component 1 (PC1) express high 96

levels pGSC or mGSC markers (4) respectively, as indicated by PC1 gene loadings (Fig. 1B,E). We 97

therefore interpreted the top-loading genes from each direction as representing pGSC and mGSC gene 98

signatures. We used those signatures to score all cells for stemness, controlling for technical variation 99

as previously described (9, 10). In all PCAs, principal component 2 (PC2) is loaded by markers of cell-100

cycle progression (e.g. MKI67). Thus, while stemness correlates with cell cycle for both proneural and 101

mesenchymal cells, more pGSCs than mGSCs express markers of cell-cycle progression, and at 102

higher levels. 103

IDH-mutant gliomas are composed of cycling stem-like cells, noncycling astrocyte- and 104

oligodendrocyte-like cells 105

A PCA analysis of the IDH-mutant gliomas identified 3 patient-independent populations (Fig. 106

S1C-F). Two of the clusters fall on opposite ends of PC1. They differentially express markers (e.g. 107

OLIG2, GFAP) of astrocyte-like (IDH-A) and oligodendrocyte-like (IDH-O) cell types, as has been 108

previously described in scRNA-seq of IDH-mutant glioma (11). PC2 is positively loaded by cell cycle 109

genes, such as MKI67. A third cluster of cells has high PC2 sample scores and specifically expresses 110

markers of IDH-mutant GSCs (10, 11), as well as genes associated with cancer stem-cells not 111

previously described in IDH-mutant glioma (Table S2). We term this cluster IDH-S following the 112

convention of Venteicher et al. (11). Most cells in IDH-S have low PC1 sample scores, and do not 113

express IDH-A or IDH-O genes. Almost all cycling cells are found in IDH-S (Fig. S1G). 114

IDH-wild-type GBM cells are stratified by differentiation gradients observed in gliogenesis 115

In our differential expression test we observed that proneural cells specifically express markers 116

of the oligodendrocyte lineage (e.g. OLIG2, SOX10), while mesenchymal cells instead express markers 117

of astrocytes (e.g. GFAP, AQP4). We evaluated the mGSC and pGSC gene signatures in scRNA-seq 118

of glia from fetal and adult human brain (12, 13). We found that pGSC-signature genes are enriched in 119

oligodendrocyte progenitor cells. The mGSC signature, however, is comprised of genes expressed by 120

neural stem cells as well as markers of astrocytes (Fig. 2A). 121

In addition to mGSCs and pGSCs, we find neoplastic cells (possessing clonal malignant 122

mutations) that do not express stemness or cell-cycle signatures above background. Instead they 123

express either markers of differentiated astrocytes (e.g. ALDOC) or differentiated oligodendrocytes 124

(e.g. MAG, MOG). While the GSCs express high levels of positive WNT-pathway regulators, these 125

Research. on May 29, 2020. © 2019 American Association for Cancercancerdiscovery.aacrjournals.org Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on September 25, 2019; DOI: 10.1158/2159-8290.CD-19-0329

more differentiated cells express high levels of WNT-pathway agonists (Fig. S2A). Thus, GBMs contain 126

mesenchymal and proneural populations that align with astrocyte and oligodendrocyte differentiation 127

gradients respectively. 128

PGSCs, mGSCs, their differentiated progeny and stromal/immune cells explain the phenotypic 129

heterogeneity observed in GBM 130

We found that our mGSC and pGSC gene signatures are co-expressed across TCGA datasets 131

(Fig. 2B, S2B). While mGSC and pGSC signature genes are correlated among themselves, the mGSC 132

and pGSC signatures are anti-correlated with each other. Moreover, the cell-cycle signature more 133

strongly correlates in TCGA data with the pGSC signature than the mGSC signature, consistent with 134

our scRNA-seq data (Fig. S2C). 135

Using our scRNA-seq data and published scRNA-seq from human brain tissue (12, 13) as a 136

basis, we pooled reads across cells of the same type. This yielded data-driven profiles for mGSCs, 137

pGSCs, astrocytes, oligodendrocytes, neurons, endothelial cells, myeloid cells and T-cells. We then 138

used these cell-type signatures as predictors in a linear regression model. We fit our model to each 139

TCGA GBM RNA-sequencing dataset individually. A heatmap of cell-type specific genes (Fig. 2C) 140

agreed with our regression analysis (Fig. 2D). We found that samples of both the mesenchymal and 141

classical Verhaak subtypes are enriched for mGSCs and depleted of pGSCs. While classical samples 142

are distinguished by higher infiltration of astrocytes, mesenchymal samples contain high levels of 143

infiltrating immune cells. Proneural samples are characterized by the highest levels of pGSCs, 144

oligodendrocytes and neurons. In summary, the full spectrum of heterogeneity observed in TCGA GBM 145

data can be explained by varying proportions of pGSCs and mGSCs, their differentiated progeny and 146

infiltrating stromal/immune content. Cox-regression analysis identifies the mGSC signature as 147

correlating with significantly inferior survival in TCGA data. However, pGSC content is not prognostic 148

(Fig. 2E). 149

GSC niche localization and microenvironment interactions elucidated using reference atlases and 150

quantitative IHC 151

We compared our mGSC, pGSC and cell-cycle signatures to RNA sequencing from the Ivy GAP 152

(Fig. 2F-G). The Ivy GAP has annotated, micro-dissected and RNA sequenced GBM anatomical 153

structures from human specimens. We found that mGSCs are enriched in hypoxic regions. PGSCs are 154

enriched in the tumor’s leading edge and in regions of diffuse infiltration of tumor-adjacent white matter. 155

To visualize and quantify the associations between GSCs, cell cycle and hypoxia, we performed 156

IHC for CD44 (mGSCs), DLL3 (pGSCs), CA9 (hypoxia) and Ki67 (cell cycle) on GBM microarrays (Fig. 157

Research. on May 29, 2020. © 2019 American Association for Cancercancerdiscovery.aacrjournals.org Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on September 25, 2019; DOI: 10.1158/2159-8290.CD-19-0329

2H; Table S4), and non-malignant controls (Fig. S2D). We did not find any cycling CD44+ cells in our 158

samples, while approximately 5% of DLL3+ cells expressed Ki67. On the other hand, CD44+ cells 159

colocalized with CA9 at 2-fold greater frequency than DLL3+ cells. This dovetails with our findings in 160

scRNA-seq, TCGA and GAP data, which show that pGSCs are more proliferative while mGSCs are 161

enriched in hypoxic regions. 162

Genetic and transcriptomic in-silico lineage tracing of GBM GSCs supports a mesenchymal to 163

proneural hierarchy 164

We identified all cycling cells from our IDH-wildtype GBM datasets (Methods). These cells were 165

then used to perform RNA velocity analysis via velocyto (14). This approach compares the fractions of 166

spliced to un-spliced transcripts per gene in order to estimate the time derivative of gene abundance in 167

single cells. When projected back onto a PCA axis (Fig. 3A), we found that RNA velocity supports 168

stable mGSC and pGSC populations as well as an intermediate population that gradually transitions 169

from a mesenchymal to proneural phenotype. For example, in this intermediate population mGSC 170

markers (e.g. CD44, CHI3L1) are highly expressed but have a negative time derivative, conversely 171

pGSC markers (e.g. DLL3, PDGFRA) are lowly expressed but with a high, positive velocity. 172

Mitochondrial expressed mutations have recently been identified as a reliable way to perform 173

genetic lineage tracing due to the robust coverage of mitochondrial genes, even in scRNA-seq (15). 174

Following this approach, for each sample we pooled cells and identified mitochondrial single-nucleotide 175

variants (SNVs) (Methods). For each cell we then annotated the percentage of its sample’s 176

mitochondrial mutations it harbored. We found that the percentage of expressed mitochondrial 177

mutations correlated with progression from the mesenchymal phenotype (Fig. 3B-D). Thus, both 178

genetic and transcriptomic lineage tracing supports a mesenchymal to proneural hierarchy. 179

We also compared chromosomal SNVs and CNVs in our scRNA-seq that were validated by 180

exome-seq data via our previously described approach (3, 6). We restricted ourselves to mutations that 181

occurred at a minimum of 10% variant allele frequency in exome-seq and identified cells in our scRNA-182

seq which expressed these mutations. For all patients, we found that all expressed, validated mutations 183

are present in the GSCs (Fig. S3A). This shows that the GSCs are sufficient to explain the genetic 184

heterogeneity observed in our tumor specimens. 185

IDH-mutant and -wildtype GSCs share a core signature which is prognostic 186

RNA-velocity analysis on IDH-mutant glioma datasets revealed a hierarchy with a single root 187

focused on the IDH-S population and two terminal points corresponding to differentiated IDH-A and 188

IDH-O populations (Fig. S3B-C). We compared gene loadings of PC2 between the original PCAs of our 189

Research. on May 29, 2020. © 2019 American Association for Cancercancerdiscovery.aacrjournals.org Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on September 25, 2019; DOI: 10.1158/2159-8290.CD-19-0329

IDH-wildtype and -mutant gliomas. This identified a core signature of stemness enriched in the cycling 190

cells of both diseases (Fig. S3D). This signature is prognostic in a combined cohort of IDH-mutant 191

grade II/III oligodendrogliomas and astrocytomas (Fig. S3E). An analysis of expressed mitochondrial 192

mutations showed that mitochondrial mutational load correlated with progression along the IDH-O and 193

IDH-A lineages (Fig. S3F-G), further supporting a cellular hierarchy with IDH-S at the apex. 194

scATAC-seq of human gliomas identifies cell-type specific regulatory grammars and supports a 195

single-axis hypothesis 196

We performed scATAC-seq on 4 IDH-wildtype GBMs, 2 IDH-mutant grade-II astrocytomas, and 197

2 IDH-mutant oligodendrogliomas. We considered the IDH-wildtype and IDH-mutant samples 198

separately. IDH-wildtype GBM scATAC-seq cells were triaged according to the presence of observed 199

genomic alterations (Fig. 4A; Methods). For each cell and each gene we computed a gene-body activity 200

score (i.e. the gene-wise sum scATAC-seq read-counts) using Seurat v3. We then clustered gene-body 201

activity scores separately for neoplastic and non-neoplastic cells using Seurat. The gene-body activities 202

of non-neoplastic cells readily separated by canonical markers of myeloid, glial and endothelial cells 203

(Fig. 4B). 204

When we clustered the gene activity scores of neoplastic cells we found 3 clusters of cells. We 205

then performed differential motif enrichment tests between these groups of cells to identify enriched 206

regulatory motifs and associated transcription factors (Fig. 4C-G; Table S5; Methods). 207

In the first cluster, the proneural bHLH genes OLIG2, ASCL1, NEUROG2 were enriched, both 208

their gene-activity scores and representative-motif frequencies were significant. The second cluster 209

showed enrichment for mesenchymal makers (e.g. CD44, STAT3). When we compared the third cluster 210

to the mesenchymal cluster, proneural markers were differentially enriched (Fig. 4E). Likewise, when 211

we compared the third cluster to the proneural cluster we saw differential enrichment of mesenchymal 212

markers (Fig. 4F). Moreover, the most cluster-specific gene activities identified for this third cluster 213

showed partial overlap with both the mesenchymal and proneural clusters, e.g. CD44, CDH9 (Fig. 4C, 214

Table S4). We plotted the expression of the cluster-specific genes from the third cluster on top of a 215

PCA of cycling cells. We found those genes expressed a population which was intermediate to the 216

mesenchymal and proneural populations in our RNA-velocity and mitochondrial-mutation analysis (e.g. 217

MET, Fig. S4A). Thus, we interpret this third cluster as an intermediate population between 218

mesenchymal and proneural clusters. 219

IDH-mutant glioma cells were likewise separated by the presence of clonal mutations into 220

neoplastic cells and non-malignant glial/immune cells (Fig. S4B-C). Clustering by gene-activity scores 221

Research. on May 29, 2020. © 2019 American Association for Cancercancerdiscovery.aacrjournals.org Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on September 25, 2019; DOI: 10.1158/2159-8290.CD-19-0329

identified 3 clusters corresponding to the IDH-S, -O and -A cell types (Fig. S4D-G). Motif enrichment in 222

each cluster was assessed to determine cell-type specific regulatory grammars (Fig. S4H). 223

A drug-combination screen based on single-cell signatures 224

We identified FDA-approved drugs which target genes found to be specific to pGSCs and 225

mGSCs in our single-cell analysis (Fig. 5A, S5A). Drugs were screened in combination to assess 226

synergy in vitro (Methods). Only combinations that targeted both the pGSC and mGSC phenotypes 227

were found to by synergistic (Fig. 5B, Table S6). In particular, inhibition of EGFR and FGFR3 growth-228

factor receptors did not synergize with WNT inhibition (all mGSC-specific targets), but targeting ether 229

EGFR, FGFR3 or WNT synergized with inhibition of pGSC-specific Survivan. 230

Discussion 231

It is known that an individual tumor may contain multiple GSC clones (e.g. 16). We applied 232

exome-seq, sc/snRNA-seq and scATAC-seq to human tumor-specimens and found that all GBMs 233

contain hierarchies of mesenchymal and proneural GSCs and their more differentiated progeny. We 234

also observe these same cellular hierarchies in scRNA-seq profiling of recurrent GBM following 235

treatment (Fig. S5B). However, our study was limited to the analysis of primary tumors. Additional 236

studies will be required to determine if standard therapy exerts a selection pressure on this hierarchy. 237

While our manuscript was in revisions Neftel et al. published a model of GBM heterogeneity 238

based on scRNA-seq analysis of adult and pediatric human specimens (17). Neftel et al. describe a 239

model with 4 cellular states: astrocyte and mesenchymal cell types enriched in Verhaak classical and 240

mesenchymal TCGA samples and NPC- and OPC-like cell-types enriched in proneural TCGA samples. 241

Their data corroborate a mesenchymal to proneural axis, and anti-correlation of mesenchymal (e.g. 242

CD44) and proneural (e.g. DLL3) genes in neoplastic cells (c.f. Neftel et al. Fig. 2C). Both studies 243

contribute to recent advances which use transcriptomics to identify clinically relevant phenotypes (e.g. 244

18, 19). 245

Murine GBMs can be separated into two cell populations that have different capacities for 246

tumorigenicity and self-renewal (20). The Id1high/Olig2- and Id1low/Olig2+ populations found in the model 247

of Barrett et al. (20) match the expression signatures of mGSCs and pGSCs respectively. Moreover, 248

the finding of Barrett et al. (20) that Id1high cells initiate tumors containing both Id1high and Olig2+ cells, 249

while tumors from Id1low/Olig2- cells do not generate Id1high progeny, is consistent with our 250

transcriptomic and genetic lineage tracing in human specimens. More recently, Narayanan et al. 251

showed that ASCL1 is a master-regulator of the proneural phenotype of GBM, and that ASCL1 directly 252

represses mesenchymal-phenotype genes such as CD44 and GFAP (21). This is consistent with our 253

Research. on May 29, 2020. © 2019 American Association for Cancercancerdiscovery.aacrjournals.org Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on September 25, 2019; DOI: 10.1158/2159-8290.CD-19-0329

data, in particular our scATAC-seq analysis shows anti-correlation of CD44 and other mesenchymal 254

genes with ASCL1 gene-body activity and motif enrichment (Fig. 4C, G). Knowledge of GBM cell types, 255

their lineage relationships and functional differences is needed to develop combination therapies that 256

address intratumor heterogeneity. 257

Material and Methods 258

Tumor tissue acquisition 259

We acquired fresh tumor tissue and peripheral blood from patients undergoing surgical 260

resection for glioma at UCSF. De-identified samples were provided by the UCSF Neurosurgery Tissue 261

Bank. Sample use was approved by the Institutional Review Board at UCSF. The experiments 262

performed here conform to the principles set out in the WMA Declaration of Helsinki and the 263

Department of Health and Human Services Belmont Report. All patients provided informed written 264

consent. 265

Tissue processing for scRNA-seq 266

Fresh tissues were minced in collection media (Leibovitz’s L-15 medium, 4 mg/mL glucose, 100 267

u/mL Penicillin, 100 ug/mL Streptomycin) with a scalpel. Samples dissociation was carried out in a 268

mixture of papain (Worthington Biochem. Corp) and 2000 units/mL of DNase I freshly diluted in EBSS 269

and incubated at 37 °C for 30 min. After centrifugation (5 min at 300 g), the suspension was 270

resuspended in PBS. Subsequently, suspensions were triturated by pipetting up and down ten times 271

and then passed through a 70-μm strainer cap (BD Falcon). Last, centrifugation was performed for 5 272

min at 300 g. After resuspension in PBS, pellets were passed through a 40-μm strainer cap (BD 273

Falcon), followed by centrifugation for 5 min at 300 g. The dissociated, single cells were then 274

resuspended in GNS (Neurocult NS-A (Stem Cell Tech.), 2 mM L-Glutamine, 100 U/mL Penicillin, 100 275

ug/mL Streptomycin, N2/B27 supplement (Invitrogen), sodium pyruvate). Nuclei were extracted from 276

frozen tissues following the “Frakenstein” protocol developed by Luciano Martelotto, Ph.D., Melbourne, 277

Centre for Cancer Research, Victorian Comprehensive Cancer Centre, and available from 10X 278

Genomics (https://community.10xgenomics.com/t5/Customer-Developed-Protocols/ct-p/customer-279

protocols). 280

Fluidigm C1-based scRNA-seq 281

Fluidigm C1 Single-Cell Integrated Fluidic Circuit (IFC) and SMARTer Ultra Low RNA Kit were 282

used for single-cell capture and complementary DNA (cDNA) generation. cDNA quantification was 283

performed using Agilent High Sensitivity DNA Kits and diluted to 0.15–0.30 ng/μL. The Nextera XT 284

DNA Library Prep Kit (Illumina) was used for dual indexing and amplification with the Fluidigm C1 285

Research. on May 29, 2020. © 2019 American Association for Cancercancerdiscovery.aacrjournals.org Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on September 25, 2019; DOI: 10.1158/2159-8290.CD-19-0329

protocol. CDNA was purification and size selection were carried out twice using 0.9X volume of 286

Agencourt AMPure XP beads (Beckman Coulter). 287

10X genomics-based scRNA-seq/snRNA-Seq 288

For fresh tissues, 10.2 μL of live cells, at a concentration of 1700 live-cells/μL, were loaded into 289

the 10X Chromium Single Cell capture chip. For frozen tissues, ~15,000 nuclei were loaded per 290

capture. Single-cell/nucleus capture, reverse transcription, cell lysis, and library preparation were 291

performed per manufacturer’s protocol. Sequencing was performed on an Illumina NovaSeq using a 292

paired-end 100bp protocol. 293

Public data acquisition 294

Normalized counts from TCGA RNA-seq data were obtained from the Genomics Data 295

Commons portal (https://gdc.cancer.gov/). Patients diagnosed as GBM and wild-type IDH1 expression 296

(n = 144) were normalized to log2(CPM + 1) and used for analysis. TCGA glioma microarray and 297

associated survival data were obtained from the Gliovis portal (22). Z-score normalized counts from 298

regional RNA-seq of 122 samples from ten patients were obtained via the web interface of the Ivy GAP 299

(http://glioblastoma.alleninstitute.org/) database. RNA-seq of non-malignant human brain tissues were 300

obtained from the GTEx portal (https://www.gtexportal.org/home/datasets). 301

Exome-sequencing and genomic mutation identification 302

The NimbleGen SeqCap EZ Human Exome Kit v3.0 (Roche) was used for exome capture on a 303

tumor sample and a blood control sample from each patient. Samples were sequenced with an 304

Illumina-HiSeq 2500 machine (100-bp paired-end reads). Reads were mapped to the human grch37 305

genome with BWA (23) and only uniquely matched paired reads were used for analysis. PicardTools 306

(http://broadinstitute.github.io/picard/) and the GATK toolkit (24) carried out quality score re-calibration, 307

duplicate-removal, and re-alignment around indels. Large-scale (>100 Exons) somatic copy number 308

variants (CNVs) were inferred with ADTex (25). To increase CNV size, proximal (< 1 Mbp) CNVs were 309

merged. Somatic SNVs were inferred with MuTect (https://www.broadinstitute.org/cancer/cga/mutect) 310

for each tumor/control pair and annotated with the Annovar software package (26). 311

Sc/snRNA-seq data pre-processing 312

Data processing of the C1 data was performed as described previously (3). Briefly, reads were 313

quality trimmed and TrimGalore! (http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/) was 314

used to clip Nextera adapters. HISAT2 (27) was used to perform alignments to the grCh38 human 315

genome. Gene expression was quantified using the ENSEMBL reference with featureCounts (28). Only 316

Research. on May 29, 2020. © 2019 American Association for Cancercancerdiscovery.aacrjournals.org Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on September 25, 2019; DOI: 10.1158/2159-8290.CD-19-0329

correctly paired, uniquely mapped reads were kept. In each cell, expression values were scaled to 317

counts per million (CPM)/100+1 and log transformed. Low-quality cells were filtered by thresholding 318

number of genes detected at 1000 and at least 100,000 uniquely aligned reads. For 10x scRNA-Seq 319

and snRNA-Seq data we utilized CellRanger (version 3.0.2) for data pre-processing and gene 320

expression quantification, following the guidelines from the CellRanger website 321

(https://support.10xgenomics.com/single-cell-gene-322

expression/software/pipelines/latest/advanced/references#premrna). 323

Classification of somatic mutations 324

The presence/absence of somatic CNVs in 10x scRNA-Seq and snRNA-Seq data was 325

assessed with CONICSmat (6). We retained CNVs with a CONICSmat likelihood-ratio test <0.001 and 326

a difference in Bayesian Criterion >300. For each CNV we used a cutoff of posterior probability >0.5 in 327

the CONICSmat mixture model to infer the presence/absence of that CNV in a given cell. For the 328

Fluidigm C1 data we utilized our previous CNV and SNV classifications (3, 6). We retained only SNVs 329

detected in exome-seq at >10% variant frequency in the tumor and <10% variant frequency in patient‐330

matched normal blood. Cells expressing those tumor-restricted variant alleles were considered positive 331

for the respective SNVs. For the presence/absence of somatic CNVs in 10x snATAC-Seq data was 332

estimated with CONICSmat (version 1.0) (6). Here, the gene body activity generated by snapATAC 333

(29) was used to perform the CNVs analysis with with a CONICSmat likelihood-ratio test <0.001 and a 334

difference in Bayesian Criterion >300. SNV calls in 10X sc/snRNA-seq data were obtained by pooling 335

reads by patient and running the GATK RNA-seq best-practices pipeline 336

(https://software.broadinstitute.org/gatk/best-practices/workflow?id=11164). Variant assignments in 337

single cells were then assessed via VarTrix (version 1.0) tool (https://github.com/10xgenomics/vartrix). 338

Dimensionality reduction and calculation of stemness scores 339

First, we filtered cells that have >5% mitochondrial read counts. A gene with non-zero read 340

counts in more than 10 cells was considered as expressed, and each cell was required to have at least 341

200 expressed genes. Seurat v3 was used for tSNE plots based on the first 10 PCs (9). PCA was done 342

using R 3.4.2. The top 1000 genes with the highest regularized variances were identified via Seurat v3 343

for each case. PCA was then performed using those genes that were among the 1000 most variable 344

genes for at least three patients. We defined the mGSC and pGSC gene sets as the top 15% of genes 345

most strongly loading PC1, positively for mGSCs and negatively pGSCs. Stemness scores were 346

calculated using these gene sets as input to the AddModuleScore function from the Seurat package. 347

Cluster-specific genes were identified via the FindMarkers/FindAllMarkers function from Seurat 348

Research. on May 29, 2020. © 2019 American Association for Cancercancerdiscovery.aacrjournals.org Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on September 25, 2019; DOI: 10.1158/2159-8290.CD-19-0329

package, using a likelihood-ratio test. Only genes enriched and expressed at least in 25% of the cells of 349

at least one population and with a log fold-change bigger that 0.25 were considered. 350

Deconvolution of TCGA RNA-sequencing data via linear models 351

To deconvolve GBM RNA-sequencing data from TCGA according to cell types learned from 352

scRNA-seq, we first pooled scRNA-seq read-counts by cell type across mGSCs, pGSCs, non-353

malignant oligodendrocytes, astrocytes, neurons, TAMs, T-cells, and endothelial cells. The data used 354

for this were our GBM scRNA-seq, as well as scRNA-seq of human fetal and adult non-malignant brain 355

tissues (12, 13). The resulting 8 count vectors were independently normalized to log2(CPM/10+1). We 356

then fit a linear model to each TCGA RNA-sequencing dataset (also scaled to log2(CPM/10+1) using 357

these vectors as predictors. We assessed the relative contribution of each predictor to the overall 358

variance explained using the Lindeman, Merenda and Gold (lmg) method as implemented in the 359

relaimpo R package (30). 360

Lineage reconstruction via RNA velocity 361

RNA velocities were computed via velocyto (14). For GBM cases cells with PC2 sample-score > 362

0 were deemed cycling cells and used for velocyto analysis under default parameters. Root and 363

terminal points in lineage reconstruction were identified with the functions prepare_markov and 364

run_markov included in velocyto. These use backward and forward Markov processes on the transition 365

probability matrix of the cells to determine high density regions for the start and end points of 366

trajectories, respectively. 367

SnATAC-Seq data processing 368

The CellRanger ATAC software (version 1.1.0) was used for read alignment, deduplication, and 369

identifying transposase cut sites (https://support.10xgenomics.com/single-cell-370

atac/software/pipelines/latest/algorithms/overview). The output matrix of CellRanger was further 371

processed by using the snapATAC package (https://github.com/r3fang/SnapATAC) (29). We selected 372

the highest quality barcodes for each case based on two criteria: 1) number of filtered fragments at 373

least 1000; 2) fragments in promoter ratio (FRiP) at least 0.2 for the case. Clustering was performed 374

using Seurat v3 SNN graph clustering “FindClusters”, with the gene body-accessibility scores 375

generated by the snapATAC package as input. Differentially accessible regions, peaks, and motif 376

enrichments were computed using snapATAC “findDAR”, “runMACSForAll” and “runHomer”. 377

Immunohistochemistry 378

Research. on May 29, 2020. © 2019 American Association for Cancercancerdiscovery.aacrjournals.org Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on September 25, 2019; DOI: 10.1158/2159-8290.CD-19-0329

Immunohistochemistry (IHC) Optimization was performed on a Leica Bond automated 379

immunostainer using conditions optimized for each antibody (Table S4). Heat-induced antigen retrieval 380

was performed using Leica Bond Epitope Retrieval Buffer 1 (Citrate Buffer, pH 6.0) and Leica Bond 381

Epitope Retrieval Buffer 2 (EDTA solution, pH 9.0) for 20 minutes (ER2(20)). Non-specific antibody 382

binding was blocked using 5% milk in PBST or Novocastra Protein Block (Novolink #RE7158). Positivity 383

was detected using Novocastra Bond Refine Polymer Detection and visualized with 3’3 384

diaminobenzidine (DAB; brown) and alkaline phosphatase (AP; red). A Hematoxylin nuclear 385

counterstain (blue) was applied. Duplex controls were performed to provide a reference of specificity of 386

the selected primary antibody and secondary detection system. For the DLL3 and CA9 duplex, 2 sets of 387

single-stained control tissues were used since the antigens of interest are not commonly co-expressed 388

in normal tissue types. Image analysis of whole slide images (8 slides from GBM patients, 5 control 389

slides) was performed using the Aperio software (Leica) and the ImageDx slide-management pipeline 390

(Reveal Bio). All tissue and staining artifacts were digitally excluded from the reported quantification. 391

Drug screen 392

The U87MG cell line was obtained from American Type Culture Collection and cultured in 393

DMEM, supplemented with 10%FBS and 1% penicillin-streptomycin. Cells were seeded in 96-well 394

plates at a density of 5,000 cells/well with DMEM and held overnight at 37 °C, 5% CO2. The media was 395

then aspirated, and test compounds administered through serial dilution in DMEM. After 48h, the cells 396

were washed with PBS. DMEM containing AlamarBlue (Invitrogen) and supplemented with FBS, 397

penicillin and streptomycin was added, and the plate was then incubated for 4 hours. Cell viability was 398

assessed via absorbance using a microplate reader and compared to untreated control wells. 399

Declarations 400

Ethics approval and consent to participate 401

Study protocols were approved by the UCSF Institutional Review Board. All clinical samples 402

were analyzed in a de-identified fashion. All experiments were carried out in conformity to the principles 403

set out in the WMA Declaration of Helsinki as well as the Department of Health and Human 404

Services Belmont Report. Informed written consent was provided by all patients. 405

Availability of data and material 406

The study data are available from the European Genome-phenome Archive repository, under 407

EGAS00001002185, EGAS00001001900 and EGAS00001003845. Third-party data that were used in 408

the study are available from the GlioVis portal (http://gliovis.bioinfo.cnio.es/), the gbmseq portal 409

(http://gbmseq.org/), and The Glioblastoma Atlas Project (http://glioblastoma.alleninstitute.org/). 410

Research. on May 29, 2020. © 2019 American Association for Cancercancerdiscovery.aacrjournals.org Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on September 25, 2019; DOI: 10.1158/2159-8290.CD-19-0329

Competing interests 411

None declared. 412

Funding 413

This work has been supported by research awards from the UCSF Glioma Precision Medicine 414

Program to AD, JP, and SC; the UC Cancer Research Coordinating Committee (CRN-19-586041) to 415

AD. 416

Author contributions 417

A.D. conceived of and designed the study. G.K and G.Y performed the 10X Genomics single-418

cell sequencing under the supervision of A.D. S.C, J.P. and A.D. identified cases for inclusion in the 419

study. J.P. screened archival tissue specimens for suitability of use. B.A. and E.D. performed the 420

Fluidigm C1 single-cell sequencing under the supervision of A.D. and A.K. H.B. and K. S. performed 421

the single-nuclei RNA and ATAC sequencing under the supervision of AD. H.B., S.S., and H.W. 422

performed the drug screen under the supervision of AD. M.A. provided biopsies via the UCSF 423

Neurosurgery Tissue Core and contributed to the analysis of the data. S.M. and A.D. first formulated 424

the single-axis hypothesis based on the bioinformatics analysis of S.M. L.W. and F.C. performed the 425

bioinformatics analysis of the snRNA-seq, scATAC-seq, genetic and transcriptomic lineage tracing 426

under the supervision of A.D. A.D., H.B., S.M., and L.W. wrote the manuscript. All authors read and 427

approved the final manuscript. 428

Bibliography 429

1. R. G. W. Verhaak et al., Integrated genomic analysis identifies clinically relevant subtypes of 430

glioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR, and NF1. Cancer Cell. 431

17, 98–110 (2010). 432

2. A. P. Patel et al., Single-cell RNA-seq highlights intratumoral heterogeneity in primary 433

glioblastoma. Science. 344, 1396–401 (2014). 434

3. S. Müller et al., Single-cell sequencing maps gene expression to mutational phylogenies in 435

PDGF and EGF driven gliomas. Mol. Syst. Biol. 12, 889 (2016). 436

4. X. Jin et al., Targeting glioma stem cells through combined BMI1 and EZH2 inhibition. Nat. Med. 437

23, 1352–1361 (2017). 438

5. A. Diaz et al., SCell: Integrated analysis of single-cell RNA-seq data. Bioinformatics. 32 (2016), 439

doi:10.1093/bioinformatics/btw201. 440

Research. on May 29, 2020. © 2019 American Association for Cancercancerdiscovery.aacrjournals.org Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on September 25, 2019; DOI: 10.1158/2159-8290.CD-19-0329

6. S. Müller, A. Cho, S. J. Liu, D. A. Lim, A. Diaz, CONICS integrates scRNA-seq with DNA 441

sequencing to map gene expression to tumor sub-clones. Bioinformatics, 10–12 (2018). 442

7. T. E. Bartlett, S. Müller, A. Diaz, Single-cell Co-expression Subnetwork Analysis. Sci. Rep. 7 443

(2017), doi:10.1038/s41598-017-15525-z. 444

8. S. Müller et al., Single-cell profiling of human gliomas reveals macrophage ontogeny as a basis 445

for regional differences in macrophage activation in the tumor microenvironment. Genome Biol. 446

18 (2017), doi:10.1186/s13059-017-1362-4. 447

9. A. Butler, P. Hoffman, P. Smibert, E. Papalexi, R. Satija, Integrating single-cell transcriptomic 448

data across different conditions, technologies, and species. Nat. Biotechnol. 36, 411–420 (2018). 449

10. I. Tirosh et al., Single-cell RNA-seq supports a developmental hierarchy in human 450

oligodendroglioma. Nature. 539, 309–313 (2016). 451

11. A. S. Venteicher et al., Decoupling genetics, lineages, and microenvironment in IDH-mutant 452

gliomas by single-cell RNA-seq. Science. 355, 1391–1402 (2017). 453

12. S. Darmanis et al., A survey of human brain transcriptome diversity at the single cell level. Proc. 454

Natl. Acad. Sci. 112, 7285–7290 (2015). 455

13. T. J. Nowakowski et al., Spatiotemporal gene expression trajectories reveal developmental 456

hierarchies of the human cortex. Science. 358, 1318–1323 (2017). 457

14. G. La Manno et al., RNA velocity of single cells. Nature. 560, 494–498 (2018). 458

15. L. S. Ludwig et al., Lineage Tracing in Humans Enabled by Mitochondrial Mutations and Single-459

Cell Genomics. Cell. 176, 1325-1339.e22 (2019). 460

16. M. Meyer et al., Single cell-derived clonal analysis of human glioblastoma links functional and 461

genomic heterogeneity. Proc. Natl. Acad. Sci. U. S. A. 112, 851–6 (2015). 462

17. C. Neftel et al., An Integrative Model of Cellular States, Plasticity, and Genetics for Glioblastoma. 463

Cell, 1–15 (2019). 464

18. A. C. Decarvalho et al., Discordant inheritance of chromosomal and extrachromosomal DNA 465

elements contributes to dynamic disease evolution in glioblastoma. Nat. Genet. 50, 708–717 466

(2018). 467

19. X. Hu et al., TumorFusions: An integrative resource for cancer-associated transcript fusions. 468

Nucleic Acids Res. 46, D1144–D1149 (2018). 469

20. L. E. Barrett et al., Self-renewal does not predict tumor growth potential in mouse models of 470

Research. on May 29, 2020. © 2019 American Association for Cancercancerdiscovery.aacrjournals.org Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on September 25, 2019; DOI: 10.1158/2159-8290.CD-19-0329

high-grade glioma. Cancer Cell. 21, 11–24 (2012). 471

21. A. Narayanan et al., The proneural gene ASCL1 governs the transcriptional subgroup affiliation 472

in glioblastoma stem cells by directly repressing the mesenchymal gene NDRG1. Cell Death 473

Differ., 1813–1831 (2018). 474

22. R. L. Bowman, Q. Wang, A. Carro, R. G. W. Verhaak, M. Squatrito, GlioVis data portal for 475

visualization and analysis of brain tumor expression datasets. Neuro. Oncol. 19, 139–141 476

(2017). 477

23. H. Li, R. Durbin, Fast and accurate short read alignment with Burrows-Wheeler transform. 478

Bioinformatics. 25, 1754–60 (2009). 479

24. A. McKenna et al., The Genome Analysis Toolkit: a MapReduce framework for analyzing next-480

generation DNA sequencing data. Genome Res. 20, 1297–303 (2010). 481

25. K. C. Amarasinghe et al., Inferring copy number and genotype in tumour exome data. BMC 482

Genomics. 15, 732 (2014). 483

26. K. Wang, M. Li, H. Hakonarson, ANNOVAR: Functional annotation of genetic variants from high-484

throughput sequencing data. Nucleic Acids Res. 38, 1–7 (2010). 485

27. M. Pertea, D. Kim, G. M. Pertea, J. T. Leek, S. L. Salzberg, Transcript-level expression analysis 486

of RNA-seq experiments with HISAT, StringTie and Ballgown. Nat Protoc. 11, 1650–1667 487

(2016). 488

28. Y. Liao, G. K. Smyth, W. Shi, FeatureCounts: An efficient general purpose program for assigning 489

sequence reads to genomic features. Bioinformatics. 30, 923–930 (2014). 490

29. R. Fang et al., Fast and Accurate Clustering of Single Cell Epigenomes Reveals Cis-Regulatory 491

Elements in Rare Cell Types. bioRxiv (2019), doi:10.1101/615179. 492

30. R. Importance, L. Regression, T. Package, U. Groemping, Relative Importance for Linear 493

Regression in R : The Package relaimpo. October. 17, 139–147 (2006). 494

495

Figure Legends 496

Figure 1. Single-cell sequencing reveals a single axis of variation in proliferating GBM cells. (A) 497

Left: t-SNE plot of 8,992 10X scRNA-seq cells, from 6 patients; center: t-SNE plot of 15,975 nuclei from 498

10 patients; right: t-SNE plot of 568 cells from 7 patients. Cells/nuclei are colored by the presence (red) 499

or absence (black) of clonal CNVs. (B) (Top) PCA of neoplastic cells from 10X scRNA-seq of IDH-500

Research. on May 29, 2020. © 2019 American Association for Cancercancerdiscovery.aacrjournals.org Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on September 25, 2019; DOI: 10.1158/2159-8290.CD-19-0329

wildtype GBMs. Curves represent the density of a Gaussian mixture model fit to PC1 sample scores. 501

Heatmaps display the expression of top-loading PC1 genes across cells, sorted by PC1 sample score. 502

(C) Differentially expressed genes between PCA clusters (abs. log2 fold-change>1 and adj. p <0.001 in 503

red). (D) Fractions of cycling cells, ** : Fisher p<0.01, *** : p<0.001. (E-G) PCA and analysis as in (B-504

D), for 10X snRNA-seq of IDH-wildtype GBMs. (H) PCA of neoplastic cells from C1 scRNA-seq of IDH-505

wildtype GBMs. (I) Correlations between the C1-scRNA-seq and 10X-scRNA-seq loadings. 506

Figure 2. MGSCs and pGSCs explain the genetic and phenotypic heterogeneity of GBM. (A) 507

Expression of mGSC and pGSC markers in single cells from non-malignant human brain. (B) 508

Hierarchical clustering of Pearson correlations between mGSC, pGSC and cell-cycle genes in IDH-509

wildtype GBM RNA-seq samples from TCGA (n=144). (C-D) Heatmap (C) and boxplots (D) of the 510

relative contributions of predictor cell-types to the overall variance explained by a linear model fit to 511

each TCGA sample. (E) Kaplan-Meier analysis comparing survival of IDH-wildtype GBMs from TCGA 512

to average expression of the mGSC and pGSC gene signatures in patient-matched RNA sequencing. 513

(F-G) MGSC, pGSC and cell-cycle signatures in Ivy GAP RNA-sequencing of GBM-anatomical 514

structures. (H) Percentages of CD44/DLL3 cells also positive for CA9/Ki67; ** : Wilcoxon p<0.001. 515

Figure 3. In-silico genetic and transcriptomic lineage tracing supports a mGSC to pGSC 516

hierarchy. (A) (left) RNA velocities of cycling neoplastic 10X scRNA-seq IDH-wildtype GBM cells are 517

projected onto a PCA axis. (center) Velocyto-based lineage reconstruction identifies a stable mGSC 518

root population and pGSC terminal population. (right) Gene expression and velocity for mGSC and 519

pGSC marker genes. (B) The percent of expressed mitochondrial mutations found in a given cell, out of 520

all mitochondrial mutations in a patient’s sample. (C) Percent expressed mitochondrial mutations are 521

compared between pGSCs and mGSCs. (D) PCA analysis as in (A), but for 10X snRNA-seq IDH-522

wildtype GBM data. 523

Figure 4. (A) TSNE plot of IDH-wildtype GBM scATAC-seq annotated by the presence of clonal 524

mutations. (B) Clustering of scATAC-seq gene-body activity scores of non-neoplastic cells. (C) 525

Clustering of scATAC-seq gene-body activity scores of neoplastic cells, with boxplots of activity scores 526

for Verhaak-subtype gene sets annotate above. (D-F) Differential motif enrichment tests between 527

neoplastic cell clusters. (G) Neoplastic cluster-specific motifs and associated transcription factors. 528

Figure 5. (A) A schematic overview of the drug screen. (B) HSA synergy scores and dose responses 529

for drug combinations screened in U87 cells in vitro. 530

Research. on May 29, 2020. © 2019 American Association for Cancercancerdiscovery.aacrjournals.org Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on September 25, 2019; DOI: 10.1158/2159-8290.CD-19-0329

50-5-10PC1

0

10

20

PC2

Cycling

Non-cycling

pGSC

mG

SC

10X scRNA-seq

-25 0 25 50-50tSNE 1

-50

-25

025

50tS

NE

210X scRNA-seq

Cells without clonal mutationsCells with clonal mutations

−50

−25

0

25

50

-25 0 25 50tSNE1

-50

10X snRNA-seq

−15 −10 −5 0 5 10 15

−15

−10

−50

510

15

tSNE1

C1 scRNA-seq

OligodendrocytesAstrocytes Endothelial cellsMyeloid

A

B

***

% cycling

Proneural

Mesenchymal

0 5 10 15 20

0

10

20

PC2

-10

Cycling

Non-cycling

pGSC

mG

SC

50-5-10 1510

10X snRNA-seq

MKI

67TO

P2A

CEN

PFPR

C1

UBE

2TTP

X2C

HL1

APO

DTN

RN

TRK2

ALD

OC

NEA

T1N

TM

PC2

scor

e

PDGFRAOLIG2SOX10ASCL1DLL3BCANPTGDSCD99FN1EGFRVIMCHI3L1CD44

PC1 score

Z sc

ore

-22

−150 −100 −50 0 50 100

−50

050

100

PC1

PC2

C1 PCA

−0.1 0.0 0.1Loadings PC1 10x

−0.4

0.0

0.4

Load

ings

PC

1 C

1

MKI67OLIG2

PDGFRA

CCND1

DLL3

CHI3L1CD44

PCC=0.68P value=1.2e-09

MKI

67M

KI67

TOP2

AC

ENPF

PRC

1U

BE2T

TPX2

CH

L1AP

OD

TNR

NTR

K2AL

DO

CN

EAT1AT

1A NTM

Z sc

ore

-22

ND

RG

1N

RN

1N

RXN

1M

YT1

PLPP

R1

VEG

FAC

NP

CEN

PEC

ENPF

MKI

67C

ENPK

TOP2

A

PC2

scor

e

C

D

E F

G

H

I

***

% cycling

Proneural

Mesenchymal

0 10 20 30

Log2 FC-2 -1 0 1

-Log

10(p

val)

100

200

0

300

-3 2

CD44GFAP

CHI3L1OLIG2

DLL3

OLIG1SOX10

TNCBCAN

-Log

10(p

val)

100

200

0

PDGFRA

CD44

OLIG2

CDK1MKI67

GFAP

CHI3L1

SOD2

Log2 FC-2 -1 0 1 2 3

300

Figure 1Research.

on May 29, 2020. © 2019 American Association for Cancercancerdiscovery.aacrjournals.org Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on September 25, 2019; DOI: 10.1158/2159-8290.CD-19-0329

A B−0.8 0.8Cell-cycleMesenchymalProneural

ScRNA-seq derived signatures:

OLIG2PDGFRA

MKI67

TOP2A

DLL3

CCNB1

CHI3L1CD44

CD99

TNR

IGFBP7

PCC

DC

z-score

1-1

BCAN

CHI3L1CLUGFAPSPARCL1AQP4NAMPTCD44TNCC8orf4CD99SMOC1PDGFRATNROLIG1OLIG2

CCND1

NSCs AstrocytesOPCs Oligoden-drocytes

mG

SC m

arke

rspG

SCm

arke

rs

−20

12

−11

23

45

−10

12

3

−20

12

3

−20

2

−10

23

−10

12

34

−20

12

34

pGSC Myeloid EndithelialmGSC

1

Astrocyte Oligodendrocyte Neuron T-cell

z-sc

ore

of im

porta

nce

(lmg)

z-sc

ore

of im

porta

nce

(lmg)

-1

0

-1-1

−1 1ClassicalMesenchymalProneural Verhaak subtypes: z-score

mGSCAstrocyte

pGSCOligodendrocyte

Endothelial Myeloid

High myeloidinfiltration

High pGSC

High mGSC

High Astro-diffintermed. mGSC

High mGSCand Astro-diff.

mGSC signature genespGSC signature genes−0

.50.

51.

0

−0.5

0.5

1.5

2.5

NSCsOPCs Astro Oligo NSCsOPCs Astro Oligo

0

0 20 40 60 80

0.0

0.2

0.4

0.6

0.8

1.0

Months

Ove

rall s

urvi

val

p=0.4951

pGSC signaturehistology GBM, IDH-WT

High expr.Low expr.

0.0

0.2

0.4

0.6

0.8

1.0

Months0 20 40 60 80

Ove

rall s

urvi

val

High expr.Low expr.

p=0.001906

mGSC signaturehistology GBM, IDH-WT

-1.0 1.0z-score

PDGFRA TNROLIG2 BCANAPOD STMN2SCG3 MKI67BIRC5 CENPFUBE2C TOP2ANUSAP1 GTSE1PRC1 TPX2NEAT1 SOD2NMB CHI3L1CD99 CD44VIM

Cell cyclemGSCpGSC

Cellular Tumor

Infiltrating Tumor

Leading Edge

Microvascular proliferation

Pseudopali-sading cells

1

0

-1

pGSC signature mGSC signatue Cell Cycle-signatue

z-sc

ore

of m

ean

expr

essi

on

*** NS, all other comparisons ***

NS, all other comparisons ***

Cellular Tumor

Infiltrating Tumor

Leading Edge

Microvascular proliferation

Pseudopali-sading cells

DLL3

CD

44

% CA9 double positive0 10 20 30 40

100µmCD

44 (D

AB)/C

A9

(Red

)

20µmDLL

3 (D

AB)/C

A9

(Red

)

20µmCD

44 (D

AB)/K

i67

(Red

)

20µmDLL

3 (D

AB)/K

i67

(Red

)

DLL3

CD

44

% KI67 double positive0 10 20 30 40 50

p<0.001

E

F

G H

Expr

essio

n z-

scor

e

Figure 2Research.

on May 29, 2020. © 2019 American Association for Cancercancerdiscovery.aacrjournals.org Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on September 25, 2019; DOI: 10.1158/2159-8290.CD-19-0329

PC1

PC2

Root

Terminal

Low High

Density

PDGFRADLL3 CD44 CHI3L1

0 Max

ExpressionPositiveNegative

RNA velocity

0 Max

ExpressionPositiveNegative

RNA velocity

CD44 CHI3L1DLL3 PDGFRA

PC1

PC2

PC1

PC2

Low High

Density

Terminal

High

DensityDensity

Root

PC1 High

0%

20%

40%

60%

80%

-10 0 10-20PCA 1

PCA

2

-10

0

10

-5

5

Percent of mitochondiral mutaions

0.00

0.50

0.25

0.75

proneural mesenchymal

Perc

ent o

f mito

chon

dria

l mut

atio

ns in

sam

ple

***

A

B C

D

Figure 3Research.

on May 29, 2020. © 2019 American Association for Cancercancerdiscovery.aacrjournals.org Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on September 25, 2019; DOI: 10.1158/2159-8290.CD-19-0329

−10

−5

0

5

10

−10 −5 0 5

Cells without clonal CNVsCells with clonal CNVs

UMAP Dim1

UM

AP D

im2

A B

C

B

−2 2

OligodendrocytesAstrocytes

Endothelial cellsMyeloid

Gene body accessibility Z score

PLP1

MAG

HLA-DRAC1QA

AQP4ALDOC

APOLD1

CLDN5

D

E F

0

100

200

300

−5 0 5

-Log

(p v

alue

)

ASCL2

ASCL1

SOX10

OLIG2

TCF21TCF4

AP4

Rfx2Rfx

Stat3

Rfx1TCF12

TWIST2

Negative: MES Positive: PN

CEBP

EBF2

BCL6

Motif enrichment fold change

NF1

G Motif enrichment of known TF motifs

0 200-log10(P value)

ASCL1ASCL2

OLIG2NeuroG2

JunBJun

Stat3 ATF3

MES

PN

Intermediate

SOX4

NF1 361

119

TEAD3

AP-1

44

351

NeuroD1

Oct-6

579

95

Top enriched de novo motifs Best match -log10(P value)

ASCL1

OLIG2

DLL3

PDGFRA

CD44

GFAP

SOD2

MET

DDR2

−2 2z-score:

0

500

1000

Gen

eac

cess

ibili

ty Verhaaksignature

MESClassicalPN

* *NS

Gen

e ac

cess

ibili

ty

−5 0 5Motif enrichment fold change

Negative: Int Positive: PN

0

100

200

300

-Log

(p v

alue

)

−10

ASCL1

OLIG2

TCF21

TCF4NeuroG2 NeuroD1

SOX10

Oct4ASCL2

SCL

MYF5MYOD

Fra1 Fra2ATF3

MAFK

NFE2L2

ATOH1

Stat30

100

200

300

-Log

(p v

alue

)

−5 0 5

Negative: Int Positive: MES

−10Motif enrichment fold change

Rfx2

Rfx1

GRE

Rfx2

ARE

NF1

Stat3SOX4

Fra1Fra2ATF3 JunB

Bach2

NFE2

NeuroG2TCF21

NeuroG2

Fra1Fra2

Fosl2

Fosl2

TGIF2

BATF

TGIF2BATF

CD44CDH9

CDH9

SOD2

Figure 4Research.

on May 29, 2020. © 2019 American Association for Cancercancerdiscovery.aacrjournals.org Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on September 25, 2019; DOI: 10.1158/2159-8290.CD-19-0329

PDGFRAOLIG2SOX10ASCL1DLL3BCANPTGDSCD99FN1EGFRVIMCHI3L1CD44

pGSC mGSC

Single cell-derived signatures

SP600125 JUN PNYM155 BIRC5 PNTrifluridine TYMS PNiMDK MDK MESAZD4547 FGFR3 MESICG001 WNT MESAG1478 EGFR MES

DrugGenetarget

Cell-typetarget

Candidate therapeutics

Combination drug screen in vitro

A

B

20

10

0

-10

-20

syne

rgis

ticad

ditiv

ean

tago

nist

ic

HSA synergy score0

25.33

26.55

18.15

39.01

39.97

27.78

40.38

54.34

0

5

20

0 10 40ICG001 (uM)

YM15

5 (u

M)

inhibition (%)

0

25.33

26.55

13.93

39.4

40.43

24.41

39.72

55.34

0

5

20

0 5 20AZD4547 (uM)

YM15

5 (u

M)

inhibition (%)

0

25.33

26.55

8.96

38.9

39.01

11.44

33.97

51.58

0

5

20

0 5 10iMDK (uM)

YM15

5 (u

M)

inhibition (%)

0

13.93

24.41

18.15

2.73

21.24

27.78

22.36

36.58

0

5

20

0 10 40ICG001 (uM)

AZD

4547

(uM

)

inhibition (%)

0

6.1

14.06

18.15

4.22

13.04

27.78

17.62

20.85

0

10

20

0 10 40ICG001 (uM)

AG14

78 (u

M)

inhibition (%)

Figure 5

Research. on May 29, 2020. © 2019 American Association for Cancercancerdiscovery.aacrjournals.org Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on September 25, 2019; DOI: 10.1158/2159-8290.CD-19-0329

Published OnlineFirst September 25, 2019.Cancer Discov   Lin Wang, Husam Babikir, Soren Muller, et al.   single axis of variationThe phenotypes of proliferating glioblastoma cells reside on a

  Updated version

  10.1158/2159-8290.CD-19-0329doi:

Access the most recent version of this article at:

  Material

Supplementary

  http://cancerdiscovery.aacrjournals.org/content/suppl/2019/09/25/2159-8290.CD-19-0329.DC1

Access the most recent supplemental material at:

  Manuscript

Authoredited. Author manuscripts have been peer reviewed and accepted for publication but have not yet been

   

   

   

  E-mail alerts related to this article or journal.Sign up to receive free email-alerts

  Subscriptions

Reprints and

  [email protected] at

To order reprints of this article or to subscribe to the journal, contact the AACR Publications

  Permissions

  Rightslink site. Click on "Request Permissions" which will take you to the Copyright Clearance Center's (CCC)

.http://cancerdiscovery.aacrjournals.org/content/early/2019/09/25/2159-8290.CD-19-0329To request permission to re-use all or part of this article, use this link

Research. on May 29, 2020. © 2019 American Association for Cancercancerdiscovery.aacrjournals.org Downloaded from

Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on September 25, 2019; DOI: 10.1158/2159-8290.CD-19-0329