research paper systems biomedicine 1:3, 1–10; …nehorai/paper/xu_cell_analysis_2013.pdf · whole...

10
RESEARCH PAPER www.landesbioscience.com Systems Biomedicine 1 Systems Biomedicine 1:3, 1–10; July/August/September 2103; © 2103 Landes Bioscience RESEARCH PAPER Introduction Gene expression analysis of complex tissues [e.g., the central nervous system (CNS)] from patients and controls is a common method to screen for potentially pathological molecular mecha- nisms of disease. 1,2 However, the substantial cellular heteroge- neity of these tissues poses several difficulties in interpreting human gene expression data. First, it restricts the detection of gene expression changes. 3–5 For example, biologically important changes in gene expression in one cell type may be masked by compensatory changes in another cell type. Also, even massive changes in gene expression may not be detected if the gene is only expressed in a rare cell type. Second, even if the changes in gene expression are detected in a heterogeneous tissue, they are difficult to interpret. Changes in gene expression could be due to the arrival or loss of certain cell types, an expression change in all cells or in a fraction of cells, or combinations of all of these. These challenges highlight the importance of understanding changes in cellular composition of a tissue. Recent work profiling whole human brains across regions and time have suggested that many, and perhaps most gene expression differences seen in these studies is driven by differences in cellular composition across the samples. 6–8 If the human brain expression data can guide us to consistent cellular alterations, even in the context of distinct genetic or environmental causes of a disorder across different individuals, then treatments could be tailored to addressing the common cellular deficits. Emerging methods are attempting to incorporate cellu- lar information into analytical models for human brain data. However, these methods either tend to evade the complexity of cellular composition by using artificially predesigned cellular mixtures in tissues, 9–11 or they rely on previous knowledge from low throughput histology-based approaches, which have identi- fied only three or four cell type-specific genes for each cell type at best, 12 and would preclude analysis of more poorly described cell types. Consequently, to predict the cellular composition in a new *Correspondence to: Joseph D Dougherty; Email: [email protected] Submitted: 01/31/2013; Revised: 06/10/2013; Accepted: 07/03/2013 http://dx.doi.org/10.4161/sysb.25630 Cell type-specific analysis of human brain transcriptome data to predict alterations in cellular composition Xiaoxiao Xu, 1 Arye Nehorai, 1 and Joseph D Dougherty 2,3* 1 The Preston M. Department of Electrical & Systems Engineering; Washington University in St. Louis; St. Louis, MO USA; 2 Department of Genetics; Washington University School of Medicine in St. Louis; St. Louis, MO USA; 3 Department of Psychiatry; Washington University School of Medicine in St. Louis; St. Louis, MO USA Keywords: cell type, transcriptome profiling, cell type-specific genes, gene expression analysis, co-expression network analysis, central nervous system, autism Abbreviations: CNS, central nervous system The central nervous system (CNS) is composed of hundreds of distinct cell types, each expressing different subsets of genes from the genome. High throughput gene expression analysis of the CNS from patients and controls is a common method to screen for potentially pathological molecular mechanisms of psychiatric disease. One mechanism by which gene expression might be seen to vary across samples would be alterations in the cellular composition of the tissue. While the expressions of gene “markers” for each cell type can provide certain information of cellularity, for many rare cell types markers are not well characterized. Moreover, if only small sets of markers are known, any substantial varia- tion of a marker’s expression pattern due to experiment conditions would result in poor sensitivity and specificity. Here, our proposed method combines prior information from mice cell-specific transcriptome profiling experiments with co- expression network analysis, to select large sets of potential cell type-specific gene markers in a systematic and unbiased manner. The method is efficient and robust, and identifies sufficient markers for further cellularity analysis. We then employ the markers to analytically detect changing cellular composition in human brain. Application of our method to temporal human brain microarray data successfully detects changes in cellularity over time that roughly correspond to known epochs of human brain development. Furthermore, application of our method to human brain samples with the neurodevelopmental disorder of autism supports the interpretation that the changes in astrocytes and neurons might contribute to the disorder.

Upload: vancong

Post on 24-Apr-2018

217 views

Category:

Documents


1 download

TRANSCRIPT

ReseaRch PaPeR

www.landesbioscience.com systems Biomedicine 1

systems Biomedicine 1:3, 1–10; July/august/september 2103; © 2103 Landes Bioscience

ReseaRch PaPeR

Introduction

Gene expression analysis of complex tissues [e.g., the central nervous system (CNS)] from patients and controls is a common method to screen for potentially pathological molecular mecha-nisms of disease.1,2 However, the substantial cellular heteroge-neity of these tissues poses several difficulties in interpreting human gene expression data. First, it restricts the detection of gene expression changes.3–5 For example, biologically important changes in gene expression in one cell type may be masked by compensatory changes in another cell type. Also, even massive changes in gene expression may not be detected if the gene is only expressed in a rare cell type. Second, even if the changes in gene expression are detected in a heterogeneous tissue, they are difficult to interpret. Changes in gene expression could be due to the arrival or loss of certain cell types, an expression change in all cells or in a fraction of cells, or combinations of all of these. These challenges highlight the importance of understanding

changes in cellular composition of a tissue. Recent work profiling whole human brains across regions and time have suggested that many, and perhaps most gene expression differences seen in these studies is driven by differences in cellular composition across the samples.6–8 If the human brain expression data can guide us to consistent cellular alterations, even in the context of distinct genetic or environmental causes of a disorder across different individuals, then treatments could be tailored to addressing the common cellular deficits.

Emerging methods are attempting to incorporate cellu-lar information into analytical models for human brain data. However, these methods either tend to evade the complexity of cellular composition by using artificially predesigned cellular mixtures in tissues,9–11 or they rely on previous knowledge from low throughput histology-based approaches, which have identi-fied only three or four cell type-specific genes for each cell type at best,12 and would preclude analysis of more poorly described cell types. Consequently, to predict the cellular composition in a new

*Correspondence to: Joseph D Dougherty; Email: [email protected]: 01/31/2013; Revised: 06/10/2013; Accepted: 07/03/2013http://dx.doi.org/10.4161/sysb.25630

Cell type-specific analysis of human brain transcriptome data to predict alterations

in cellular compositionXiaoxiao Xu,1 arye Nehorai,1 and Joseph D Dougherty2,3*

1The Preston M. Department of electrical & systems engineering; Washington University in st. Louis; st. Louis, MO Usa; 2Department of Genetics; Washington University school of Medicine in st. Louis; st. Louis, MO Usa; 3Department of Psychiatry; Washington University school of Medicine in st. Louis; st. Louis, MO Usa

Keywords: cell type, transcriptome profiling, cell type-specific genes, gene expression analysis, co-expression network analysis, central nervous system, autism

Abbreviations: CNS, central nervous system

The central nervous system (cNs) is composed of hundreds of distinct cell types, each expressing different subsets of genes from the genome. high throughput gene expression analysis of the cNs from patients and controls is a common method to screen for potentially pathological molecular mechanisms of psychiatric disease. One mechanism by which gene expression might be seen to vary across samples would be alterations in the cellular composition of the tissue. While the expressions of gene “markers” for each cell type can provide certain information of cellularity, for many rare cell types markers are not well characterized. Moreover, if only small sets of markers are known, any substantial varia-tion of a marker’s expression pattern due to experiment conditions would result in poor sensitivity and specificity. here, our proposed method combines prior information from mice cell-specific transcriptome profiling experiments with co-expression network analysis, to select large sets of potential cell type-specific gene markers in a systematic and unbiased manner. The method is efficient and robust, and identifies sufficient markers for further cellularity analysis. We then employ the markers to analytically detect changing cellular composition in human brain. application of our method to temporal human brain microarray data successfully detects changes in cellularity over time that roughly correspond to known epochs of human brain development. Furthermore, application of our method to human brain samples with the neurodevelopmental disorder of autism supports the interpretation that the changes in astrocytes and neurons might contribute to the disorder.

2 systems Biomedicine Volume 1 Issue 3

experimental data set, these small numbers of known markers limit statistical power, and render the analysis more vulnerable to any measurement noise or technical artifacts present in a given marker, potentially resulting in poor sensitivity and specificity.

To improve our capability of interpreting human brain tran-scriptome data and predicting the cellular composition, there is a need for methodologies that robustly and systematically integrate prior information to prioritize, interpret, and visualize relation-ships among cell type-specific genes, and identify changes in cel-lular composition. To address this need, we propose a novel cell type-specific analysis method.13 We first identify genes that are significantly enriched in each cell type from our previous cell-specific transcriptome profiling experiments in mouse brain.4,14 Each cell type is assigned with a number of cell-specific genes as “prior” potential gene markers using our previously developed method validated for this purpose.14 We then perform co-expres-sion network analysis on human brain transcriptome data of the prior gene markers of each cell type to uncover the genes with expressions highly correlated with each other. As changes in cellu-larity should result in correlated changes in cell-specific genes, the highly-correlated genes are then selected as the effective markers of the cell type. We believe that analysis based on these system-atically selected gene markers provides an accurate prediction of cellular composition and alterations with better statistical power. In this work, we apply our method to two problems, one where changes in cellularity are known, and one where they are not known. Analysis 1: we employ temporal transcriptome of human brain (specifically, from cortex and cerebellum)6 and successfully detect changes in cellularity over time that roughly correspond to known epochs of human brain development. Analysis 2: we apply our method to transcriptomes from brain samples(specifically, cortex) from individuals with the neurodevelopmental disorder of autism.15 The results indicate an increased signal from astro-cytes and a reduced signal from neuronal cells, suggesting a cor-responding change in cellularity may contribute to the disorder.

Results and Discussion

Analysis 1: Determining temporal cellular changes in human brain

Here we apply our method to predict temporal cellular changes from transcriptome data of developing and adult post-mortem human brains. The dendrograms in Figure 1A and Figure 2A show the gene co-expression analyses (Step 5) on prior gene markers (obtained from Steps 1–4) of astrocytes in cortex and Purkinje cells in cerebellum, and the identification of their effective gene markers. Of all four cell types in cortex and of all eight cell types in cerebellum, the marker lists are given in Table S1 and co-expression analyses are in Figure S1 and Figure S2, respectively. These plots also present an evaluation of the effective gene markers of all cell types by combining them into a single set and repeating the co-expression analysis. In cortex, the effective gene markers corresponding to the four cell types (astro-cytes, mature oligodendrocytes, neurons, and progenitor cells) are completely separately classified, indicating distinct cellular behaviors. On the contrary, in cerebellum, while some cell types’

gene markers are nearly perfectly clustered, such as mature oligo-dendrocytes, progenitor cells, granule cells, and Purkinje cells, some are not as distinct. The largest sub-clusters for each of the remaining cell types are: astrocytes and Bergman glia (6, out of 13 markers), inner golgi neurons (4/6), stellate and basket cells (3/5). One possible reason the markers are intermingled might be that those cell types indeed share similar temporal patterns in cerebellum development.

Figure 1B and Figure 2B show the temporal changes of cell types across development in cortex and cerebellum, respectively. From these plots, at embryonic and early to middle fetal peri-ods, the eigengene of progenitor cells is dominant, while the eigengenes of the other cell types suggest that they have not yet appeared. Consistent with the progenitor cells then differentiat-ing to form specific cell types in succession, the progenitor eigen-gene expression reduces as the others increase. In addition to this broad shift from progenitor to mature cell types, these figures indicate that we also successfully detect the known epochs of the birth and maturation of individual cell types.16–24 For instance, in cortex, neurons are known to be born the first, followed by the astrocytes, and the oligodendrocytes are the last (Fig. 1B), and this is largely consistent across cortical regions (Fig. 3B). In cer-ebellum, our data indicate that the Purkinje cells are born before the granule cells (Fig. 2B). Then, the birth and maturation of granule cells in turn dilute the signal of Purkinje cells. The obser-vations in the enlarged plots of changes of cell types in cortex and cerebellum around birth date (Fig. 1C and Fig. 2B, respectively) are generally consistent with the ontogeny we have summarized from literature (Fig. 1D and Fig. 2D, respectively),16–24 indicating that broadly our method accurately detects known changes in cellular composition of the tissue.

To examine robustness of the approach to marker selection, we performed a bootstrap analysis (Fig. 3A), sampling with replace-ment (n = 30) from the markers and examining the variance of the eigenegenes at each time point. This analysis demonstrated robustness of the method relative to marker choice. As a test of data variation across regions, Figure 3B shows temporal cellu-lar changes in human cortex development, represented by the average of the eigengenes computed for the four cortical regions available at all time points: frontal lobe, parietal lobe, occipital lobe, and temporal lobe. We observe that the cellular patterns are largely consistent across these cortical regions.

Though the order of arrival of each cell type in cerebellum is correct, the time scale we are detecting for these cell types does appear to be a bit compressed relative to the known ontogeny. We suspect that we are detecting the cell types as they arrive at an approximation of their mature gene expression profile, rather than the moment they are born, as our eigengenes are built using prior information from the adult state of these cells. This would be consistent with classical morphological studies of the develop-ment of the cerebellum: while Purkinje cells are born substan-tially earlier than their counterparts, their final morphological maturation depends on instructive cues from the later born cells.24 Likewise, in the cortex we detect an unexpected astrocyte-like signal slightly preceding neurons. This is likely to derive from a transient population of astroglia-like cells (the radial glia) that

www.landesbioscience.com systems Biomedicine 3

are present during the period of neurogenesis, and which later give rise to the more mature astroglial phenotype.25 Overall, how-ever, we expect that the success of our approach in roughly reca-pitulating the cellular maturation of the nervous system suggests it might be equally informative as a method to detect changes in cellularity that occur in gene expression studies of psychiatric disorder, in which the relevant changes are not known.

Analysis 2: Predicting cellular alterations in human neuro-developmental disorder of autism

Here we apply our method to predict cellular alterations of cerebral cortex in autistic patients from transcriptome data.15 Similar to Analysis 1, the dendrograms in Figure 4A show the gene co-expression analysis (Step 5) on prior gene markers (obtained from Steps 1–4) of astrocytes in cortex, and the identification

Figure 1. Temporal cellular changes in human cortex development. (A) Identification of effective gene markers: co-expression analysis on prior gene markers of astrocytes as an example (top), where the effective markers (symbols not shown here) are selected as the largest clustered nodes under the red-line cut-off. effectiveness testing: co-expression analysis on effective gene markers of all cell types (bottom). co-expression analyses of all four cell types in cortex are in Figure S1. (B) cellular changes across development, represented by the changes of eigengenes’ expressions. The emergence of the cell types is indicated by the date when the expressions of eigengenes start to go up sharply. (C) enlarged plot of the changes around birth date. (D) Ontogeny of the cell types summarized from literature,16–22 where the oval-shape indicates time of birth and period of maturation of the cell types. The cell types in the dendrogram are color-coded with the legends in (B).

4 systems Biomedicine Volume 1 Issue 3

Figure 2. Temporal cellular changes in human cerebellum development. (a) Identification of effective gene markers: co-expression analysis on prior gene markers of Purkinje cells as an example (top), where the effective markers are selected as the largest clustered nodes under the red-line cut-off. effectiveness testing: co-expression analysis on effective gene markers of all cell types (bottom). co-expression analyses of all eight cell types in cer-ebellum are in Supplementary Figure S2. (B) cellular changes across development, represented by the changes of the eigengenes’ expressions. The emergence of the cell types is indicated by the date when the expressions of eigengenes start to go up sharply. (C) enlarged plot of the changes around birth date. (D) Ontogeny of the cell types summarized from literature,16–22 where the oval-shape indicates time of birth and period of maturation of the cell types. The cell types in the dendrogram are color-coded with the legends in (B).

of their effective gene markers. Of all four cell types in cortex, the marker lists are given in Table S2 and co-expression analy-ses are in Figure S3. We observe that the effective gene markers

corresponding to the four cell types are almost completely classi-fied, except that three gene markers of progenitor cells are clus-tered with the markers of astrocytes(the largest sub-cluster of

www.landesbioscience.com systems Biomedicine 5

progenitor cells contains six out of nine markers).This perhaps reflects the fact that a subset of astrocytes can serve as adult pro-genitor cells in certain brain regions.26

Figure 4B presents average fold changes (log2 transformed) in

the autism group relative to the controls for the expressions of the effective gene markers of the four cell types in cortex, as well as the corresponding P values from the t tests on the fold-changes. This plot shows a decreased expression of neuronal genes (P value = 7.38e-3) and an increased expression of astrocytic genes (P value = 6.41e-4) in cortices from autistic patients. The results suggest that one of the cellular mechanisms of autism may be a relative deficit in the function or presence of neurons, or an enrichment of astrocytes (perhaps to support or remove dysfunc-tional neuronal cells).

Here the data tested are mixed samples in temporal lobe of cortex (13 autistic samples and 13 controls) and frontal lobe of cortex (16 autistic samples and 16 controls).We also apply our method independently to the samples of each region. The lists of effective gene markers for the temporal lobe and the frontal lobe (also given in Table S2) both substantially overlap the markers of the mixed samples. The overall changes observed in individual regions (Fig. S4; Fig. S5) are consistent as those in Figure 4, con-firming the robustness of the analysis. We note that here we have used a single category of “neurons” in cortex; future investiga-tions can apply the method to investigate particular sub-types of cortical neurons.

Comparison of the cell type-specific markers derived from our method with markers from literature

To further evaluate the performance of our method, we also perform Analyses 1 and 2 for the four cell types in cortex based on the three or four gene markers collected in reference 12, which had been identified by classic tissue-based approaches. These markers are denoted as the “markers from literature” and are given in Table S3. We notice that six of the 13 markers from lit-erature are also found in our method, a significant overlap, given the size of the genome (P value = 2.096e-10 [Fisher’s Exact test]). Figure 5A compares the eigengenes computed from our effec-tive gene markers and the markers from literature for Analysis 1. The expression patterns of the eigengenes from our markers coincide with those from the markers from literature, which once again verifies the effectiveness of the markers collected from our method. This concordance suggests that the there are a number of different genes that can function effectively as markers in this context. By contrast, Figure 5B presents the average fold changes in the autism group relative to the controls for the expressions of the gene markers from literature and P values from the t test for Analysis 2. Though the trends of the fold changes from the litera-ture markers are similar to ours, the P values from the t test are not significant. This suggests that the larger number of markers identified by our method provides superior statistical power for this type of analysis (Fig. 4B).

The results demonstrate several advantages of our method. It is efficient and robust to select cell type-specific gene markers in predicting cellular composition and alterations, for any given transcriptome data set. As verified, this data-driven method achieves high sensitivity and specificity. It enables us to work for

cell types even when few or no markers are available in the litera-ture. Moreover, our method provides a larger number of markers for further analysis, which improves statistical power.

Materials and Methods

Cell-specific transcriptome profiling experiments in mouse brain

We systematically examined and identified genes whose expressions were significantly enriched in each cell type relative to all others, from cell-specific transcriptome profiling experi-ments (conducted on a single platform for dozens of targeted cell

Figure  3. analysis of variation indicates temporal patterns are robust to marker selection and consistent across regions. (A) Bootstraping the markers for each cell type 30 times reveals a consistent temporal pattern regardless of which markers were selected. error bars represent standard error of the means. (B) Temporal cellular changes in human cortex devel-opment, represented by the average of the eigengenes computed for four cortical regions: frontal lobe, parietal lobe, occipital lobe, and tem-poral lobe. error bars represent standard error of the means.

6 systems Biomedicine Volume 1 Issue 3

types) in mouse brain,4,5,14 with our previously described specific-ity index statistics.4 Genes relatively specifically employed by a cell type are expected to contribute to the unique functions of that cell. These genes may both serve as proxies (markers) for detecting the cell type, and in addition may become useful targets for development of pharmacological tools for cell-specific manip-ulations in the future. The specificity index (SI) is a statistic that describes the relative uniqueness (specificity) of the expression of a certain gene in a certain cell type, when compared with all oth-ers. It is described in detail in reference 4. Essentially, for a given gene, it is the average rank, in terms of “fold-change,” across each pairwise comparison of the targeted cell type to each of the other available cell types (i.e., an SI of 1 would mean it was the most enriched gene in every pairwise comparison). Permutation tests are used to shuffle the gene expression values from the cell type of interest to calculate an empirical distribution of possible SI val-ues. Then P values are ascribed to the genes based on the compar-ison between their experimental SI values and the ones that are

calculated from the permuted data. We denote these P values as pSI. Previous validation with in situ hybridization databases indi-cated that genes with pSI < 1e-4 are likely to be nearly uniquely expressed in a particular cell type, while those with more modest P values (pSI < 1e-2) will merely show enrichment.4 Lists of genes with pSI values below thresholds (from 0.05 to 1e-4) for each cell type in mouse brain are available online at “http://java.bactrap.org/bactrap/cells.jsp”

In this work, the genes are denoted by gene symbols using Entrez CDF files, version 14.27 We focus on four cell types in cor-tex, which are astrocytes, mature oligodendrocyte, neurons, and progenitor cells; as well as eight cell types in cerebellum, which are astrocytes and Bergman glia, mature oligodendrocytes, inner golgi neurons, stellate and basket cells, granule cells, progeni-tor cells, and Purkinje cells.14 We note that the cell types of the same names in cortex and cerebellum were assayed independently in each region. Meanwhile, the closely related astrocytes and Bergman glia of the cerebellum are profiled in a single experiment.

Figure 4. cellular alterations predicted from human autism cortical transcriptome data (29 autistic samples and 29 controls). (A) Identification of effec-tive gene markers: co-expression analysis on prior gene markers of astrocytes cells as an example (top), where the effective markers (symbols not shown here) are selected as the largest clustered nodes under the red-line cut-off. effectiveness testing: co-expression analysis on effective gene markers of all cell types (bottom). co-expression analyses of all four cell types in cortex are in Figure S3. (B) average fold changes (log2 transformed) in the autism group relative to the controls for the expressions of the effective gene markers of the four cell types in cortex, as well as of all the measured common genes. error bars represent standard deviations of the means. The P values are obtained from the t tests on the effective gene markers and the same size randomly selected genes out of all the common genes. Results of independent cell type-specific analyses on samples from temporal lobe of cortex and frontal lobe of cortex are given in Figures S4 and S5.

www.landesbioscience.com systems Biomedicine 7

Figure 5. comparison between the effective gene markers from our method and the gene markers from literature. (A) Analysis 1: temporal changes of the cell types in human cortex development, represented by the computed eigengenes. (B) Analysis 2: average fold changes (log2 transformed) in the autism group relative to the controls for the expressions of the gene markers from literature and P values from the t tests.

8 systems Biomedicine Volume 1 Issue 3

Temporal transcriptome of human brain (Analysis 1)The transcriptome data of developing and adult post-mortem

human brains was downloaded from GEO: GSE25219.6 The data set was generated from 1,340 tissue samples collected from 57 human brains in the cortex and cerebellum regions cover-ing 15 time periods (Table 1 in ref. 6) from embryonic develop-ment to late adulthood. The IDs of the transcript cluster probes were mapped to RefSeq accessions and then to gene symbols. The resulting data set includes 17,565 protein-coding genes, with expressions in each region and each time period.

Transcriptome of human brain with autism (Analysis 2)The transcriptome data from the gene expression study of post-

mortem human brains was downloaded from GEO: GSE28521.15 The IDs of the transcript cluster probes were mapped to gene symbols. For those genes with multiple probe sets, probe sets were averaged, resulting in a total of 17,942 genes with expressions in 58 cortex samples (29 autism and 29 controls). In autism or con-trol samples, 13 samples are from temporal lobe of cortex and 16 samples are from frontal lobe of cortex).

Cell type-specific analysis methodOur method consists of the following steps, which are also

summarized in Figure 6.

Figure 6. Workflow diagram of the cell type-specific analysis method.

Step 1: Because cell-specific transcrip-tome profiles are only available from the mouse brain for most cell types, we start from the assumption that most gene mark-ers’ expressions will be conserved across species. Previous genomic analysis of expres-sion across tissues and species suggest this is a reasonable assumption.28 We filter the human brain transcriptome data to the sub-set with clear homologs and data available in both human and mouse using gene sym-bol. We find 8,380 genes in common for the data set for Analysis 1, and 8,846 genes for Analysis 2.

Step 2: Out of the “common genes”, we select the genes having the highest 20% coefficient of variation (CV) of expression levels across samples, as invariant genes will be uninformative as markers. For Analysis 1, the samples are tissue samples of cortex and cerebellum regions covering 15 develop-ing time periods;6 while for Analysis 2 are cortex tissue samples of 29 autism and 29 controls.15 CV is a normalized measure of variability: the ratio of the standard devia-tion of a gene’s expressions in the samples to the mean of the expressions.29 The list of top CV genes is utilized in Step 4.

Step 3: As mentioned, we identify cell type-specific genes from cell-specific tran-scriptome profiling experiments in mouse brain.4,14 Here, we select genes with pSI val-ues below a threshold of 5e-3 for each cell

type for further analysis. The use of 5e-3 ensures sufficient cell type-specificity and moderate number of genes.

Step 4: For each cell type, we overlap its cell type-specific genes obtained in Step 3 with the top CV genes obtained in Step 2. We denote these overlapped genes as prior gene markers for their corresponding cell type.

Step 5: To further uncover the most representative and corre-lated gene markers and remove uncorrelated noisy genes, we per-form gene co-expression analysis for individual cell types, using the weighted gene co-expression network analysis (WGCNA) package in R.30 In each cell type, the Pearson correlation matrix for the expressions of its gene markers is computed first, and a dissimilarity matrix using the Euclidean distance is calculated based on the correlation matrix. Then average linkage hierar-chical clustering31 is performed on the dissimilarity matrix and the clustering tree (dendrogram) is formed (Fig. 1A; Fig. 2A; Fig. 3A). In the dendrogram, the nodes in bottom row represent the genes, and the nodes in other rows represent the clusters to which the genes belong, with the branches connecting the nodes representing the distances (dissimilarities). The distance between merged clusters is monotone decreasing with the level of the split-ting: the height of each node is the intergroup dissimilarity (two

www.landesbioscience.com systems Biomedicine 9

genes with exactly the same expression pattern across samples will have heights of zero).

In the dendrogram of each cell type, by setting a static (fixed) height of cut-off30 as 1 and finding the largest cluster (module) of genes, we identify the most correlated set of genes, which we regard as effective cell type-specific gene makers. If more than one cluster has the largest number of genes, we accept them all as the effective gene markers.

Step 6: To confirm the consistency of the gene markers, we combine the obtained modules in Step 5 of all cell types into a single set and redo the gene co-expression analysis (Fig. 1A; Fig. 2A; Fig. 3A, where the different cell types are coded with different colors). Intuitively, distinct clustering of the gene mark-ers suggests that they are effectively representing the behaviors of distinct cell types. We can quantify this by examining the num-ber of markers in the cell type’s major cluster compared with its total number of potential markers.

Step 7: Based on the effective gene markers of each cell type, various statistical analyses can be performed to identify potential changes in cellularity from the human brain transcriptome data.

For Analysis 1, we aim to determine if our method will detect known changes in cellularity across human brain development, by analyzing the temporal transcriptome data.6 Thus for each cell type we compute an eigengene,32–34 which summarizes the tempo-ral expressions of its effective gene markers and is a representative of the cell type’s temporal behavior. Explicitly, in each cell type, the temporal expression profile of each gene marker is scaled so that it has mean 0 and variance 1.Combining the standardized expression profiles of all the gene markers into one matrix, the temporal expression profile of the eigengene is defined as the first principal component of the matrix. Compared with individual gene marker, the eigengene, as a scaled weighted average of gene markers, is less likely to be affected by noise and thus has a better interpretability of the corresponding cell type.

For Analysis 2, we aim to predict relevant cellular alterations that may contribute to human neurodevelopmental disorder of autism, by analyzing the human brain transcriptome data of autism samples and control samples.15 Thus for each cell type we compute the expression fold changes35 (autism/control) of the effective gene markers, and then compute the mean and stan-dard deviation of the fold changes which represents the cellular difference between autism and control. To test the significance of the fold changes’ mean, we perform a paired t test36 of the fold changes of the gene markers compared with the fold changes of genes randomly selected from the “common genes” obtained in Step 1. Each randomly selected “common gene” list is of the

same size as that of the gene marker list, and the random lists are selected 1,000 times for 1,000 t tests. The P values from these permuted t tests are averaged in their log scales, and the average is converted back to its normal scale, which we denote as the final P value.

Conclusion

Our cell type-specific analysis method provides a novel way to systematically integrate high throughput cellular transcriptome profiling experiments in mouse brain and gene co-expression networks to prioritize and select gene markers of individual cell types. The applications of our method to different human brain transcriptome data sets confirm its effectiveness and as a good foundation for further statistical analysis on these data sets to detect cellular composition and alterations in human tissues. Particularly, in this work, this method successfully detected tem-poral cellular changes that roughly correspond to known epochs of human brain development. We also applied this method to human autism cortical transcriptome data and identified poten-tially relevant cellular alterations that may contribute to the disorder. Validation of these predictions awaits future anatomi-cal studies in post-mortem tissue from individuals with autism, when available. However, we believe that our method is gener-ally applicable to human transcriptome data of other disorders of the nervous system where the mechanisms are unknown. Furthermore, where appropriate cell-specific expression profiles exist, we believe this method could be adapted to other tissues as well. We hope that the identification of cellular disruptions will provide insights to the mechanisms and help identify cell type based treatments of these disorders.

Disclosure of Potential Conflicts of Interest

No potential conflicts of interest were disclosed.

Acknowledgment

This work was supported by the National Science Foundation Grant CIF: IHCS-0963742 and the NINDS (4R00NS067239-03 to JDD). We would like to thank Natalia Rivera and Sukrit Singh for their assistance, and Matthew Tso for his helpful comments.

Supplementary Materials

Supplemental materials may be found here: www.landesbioscience.com/journals/systemsbiomedicine/

article/25630

References1. Cahoy JD, Emery B, Kaushal A, Foo LC, Zamanian

JL, Christopherson KS, Xing Y, Lubischer JL, Krieg PA, Krupenko SA, et al. A transcriptome database for astrocytes, neurons, and oligodendrocytes: a new resource for understanding brain develop-ment and function. J Neurosci 2008; 28:264-78; PMID:18171944; http://dx.doi.org/10.1523/JNEUROSCI.4178-07.2008

2. Verhaak RG, Hoadley KA, Purdom E, Wang V, Qi Y, Wilkerson MD, Miller CR, Ding L, Golub T, Mesirov JP, et al.; Cancer Genome Atlas Research Network. Integrated genomic analysis identifies clini-cally relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR, and NF1. Cancer Cell 2010; 17:98-110; PMID:20129251; http://dx.doi.org/10.1016/j.ccr.2009.12.020

3. Lein ES, Hawrylycz MJ, Ao N, Ayres M, Bensinger A, Bernard A, Boe AF, Boguski MS, Brockway KS, Byrnes EJ, et al. Genome-wide atlas of gene expres-sion in the adult mouse brain. Nature 2007; 445:168-76; PMID:17151600; http://dx.doi.org/10.1038/nature05453

4. Doyle JP, Dougherty JD, Heiman M, Schmidt EF, Stevens TR, Ma G, Bupp S, Shrestha P, Shah RD, Doughty ML, et al. Application of a translational pro-filing approach for the comparative analysis of CNS cell types. Cell 2008; 135:749-62; PMID:19013282; http://dx.doi.org/10.1016/j.cell.2008.10.029

5. Dougherty JD, Fomchenko EI, Akuffo AA, Schmidt E, Helmy KY, Bazzoli E, Brennan CW, Holland EC, Milosevic A. Candidate pathways for promot-ing differentiation or quiescence of oligodendrocyte progenitor-like cells in glioma. Cancer Res 2012; 72:4856-68; PMID:22865458; http://dx.doi.org/10.1158/0008-5472.CAN-11-2632

10 systems Biomedicine Volume 1 Issue 3

6. Kang HJ, Kawasawa YI, Cheng F, Zhu Y, Xu X, Li M, Sousa AM, Pletikos M, Meyer KA, Sedmak G, et al. Spatio-temporal transcriptome of the human brain. Nature 2011; 478:483-9; PMID:22031440; http://dx.doi.org/10.1038/nature10523

7. Oldham MC, Konopka G, Iwamoto K, Langfelder P, Kato T, Horvath S, Geschwind DH. Functional organization of the transcriptome in human brain. Nat Neurosci 2008; 11:1271-82; PMID:18849986; http://dx.doi.org/10.1038/nn.2207

8. Oldham MC, Horvath S, Geschwind DH. Conservation and evolution of gene coexpression net-works in human and chimpanzee brains. Proc Natl Acad Sci U S A 2006; 103:17973-8; PMID:17101986; http://dx.doi.org/10.1073/pnas.0605938103

9. Shen-Orr SS, Tibshirani R, Khatri P, Bodian DL, Staedtler F, Perry NM, Hastie T, Sarwal MM, Davis MM, Butte AJ. Cell type-specific gene expres-sion differences in complex tissues. Nat Methods 2010; 7:287-9; PMID:20208531; http://dx.doi.org/10.1038/nmeth.1439

10. Dorazio RM, Royle JA. Mixture models for esti-mating the size of a closed population when cap-ture rates vary among individuals. Biometrics 2003; 59:351-64; PMID:12926720; http://dx.doi.org/10.1111/1541-0420.00042

11. Wang M, Master SR, Chodosh LA. Computational expression deconvolution in a complex mam-malian organ. BMC Bioinformatics 2006; 7:328; PMID:16817968; http://dx.doi.org/10.1186/1471-2105-7-328

12. Kuhn A, Thu D, Waldvogel HJ, Faull RL, Luthi-Carter R. Population-specific expression analysis (PSEA) reveals molecular changes in diseased brain. Nat Methods 2011; 8:945-7; PMID:21983921; http://dx.doi.org/10.1038/nmeth.1710

13. Xu X, Nehorai A, Dougherty J. Cell type specific analysis of human transcriptome data. Genomic Signal Processing and Statistics, (GENSIPS), 2012 IEEE International Workshop on. Washington, DC, USA, 2012:99-100; http://dx.doi.org/10.1109/GENSIPS.2012.6507737

14. Dougherty JD, Schmidt EF, Nakajima M, Heintz N. Analytical approaches to RNA profiling data for the identification of genes enriched in spe-cific cells. Nucleic Acids Res 2010; 38:4218-30; PMID:20308160; http://dx.doi.org/10.1093/nar/gkq130

15. Voineagu I, Wang X, Johnston P, Lowe JK, Tian Y, Horvath S, Mill J, Cantor RM, Blencowe BJ, Geschwind DH. Transcriptomic analysis of autistic brain reveals convergent molecular pathology. Nature 2011; 474:380-4; PMID:21614001; http://dx.doi.org/10.1038/nature10110

16. Morgane PJ, Mokler DJ, Galler JR. Effects of pre-natal protein malnutrition on the hippocampal formation. Neurosci Biobehav Rev 2002; 26:471-83; PMID:12204193; http://dx.doi.org/10.1016/S0149-7634(02)00012-X

17. Sauvageot CM, Stiles CD. Molecular mechanisms controlling cortical gliogenesis. Curr Opin Neurobiol 2002; 12:244-9; PMID:12049929; http://dx.doi.org/10.1016/S0959-4388(02)00322-7

18. McKay BE, Turner RW. Physiological and morpho-logical development of the rat cerebellar Purkinje cell. J Physiol 2005; 567:829-50; PMID:16002452; http://dx.doi.org/10.1113/jphysiol.2005.089383

19. Mori T, Buffo A, Götz M. The novel roles of glial cells revisited: the contribution of radial glia and astrocytes to neurogenesis. Curr Top Dev Biol 2005; 69:67-99; PMID:16243597; http://dx.doi.org/10.1016/S0070-2153(05)69004-7

20. Carletti B, Rossi F. Neurogenesis in the cerebellum. Neuroscientist 2008; 14:91-100; PMID:17911211; http://dx.doi.org/10.1177/1073858407304629

21. Malatesta P, Appolloni I, Calzolari F. Radial glia and neural stem cells. Cell Tissue Res 2008; 331:165-78; PMID:17846796; http://dx.doi.org/10.1007/s00441-007-0481-8

22. Reillo I, Borrell V. Germinal zones in the develop-ing cerebral cortex of ferret: ontogeny, cell cycle kinetics, and diversity of progenitors. Cereb Cortex 2012; 22:2039-54; PMID:21988826; http://dx.doi.org/10.1093/cercor/bhr284

23. Altman J. Autoradiographic and histological stud-ies of postnatal neurogenesis. 3. Dating the time of production and onset of differentiation of cerebellar microneurons in rats. J Comp Neurol 1969; 136:269-93; PMID:5788129; http://dx.doi.org/10.1002/cne.901360303

24. Altman J, Bayer SA. Development of the cranial nerve ganglia and related nuclei in the rat. Adv Anat Embryol Cell Biol 1982; 74:1-90; PMID:7090875; http://dx.doi.org/10.1007/978-3-642-68479-1_1

25. Schmechel DE, Rakic P. A Golgi study of radial glial cells in developing monkey telencephalon: morpho-genesis and transformation into astrocytes. Anat Embryol (Berl) 1979; 156:115-52; PMID:111580; http://dx.doi.org/10.1007/BF00300010

26. Doetsch F, Caillé I, Lim DA, García-Verdugo JM, Alvarez-Buylla A. Subventricular zone astrocytes are neural stem cells in the adult mammalian brain. Cell 1999; 97:703-16; PMID:10380923; http://dx.doi.org/10.1016/S0092-8674(00)80783-7

27. Dai M, Wang P, Boyd AD, Kostov G, Athey B, Jones EG, Bunney WE, Myers RM, Speed TP, Akil H, et al. Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data. Nucleic Acids Res 2005; 33:e175; PMID:16284200; http://dx.doi.org/10.1093/nar/gni179

28. Liao BY, Zhang J. Evolutionary conservation of expression profiles between human and mouse orthologous genes. Mol Biol Evol 2006; 23:530-40; PMID:16280543; http://dx.doi.org/10.1093/molbev/msj054

29. Hendricks WA, Robey KW. The Sampling Distribution of the Coefficient of Variation. Ann Math Stat 1936; 7:4; http://dx.doi.org/10.1214/aoms/1177732503

30. Langfelder P, Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 2008; 9:559; PMID:19114008; http://dx.doi.org/10.1186/1471-2105-9-559

31. Hastie T, Tibshirani R, Friedman J. 14.3.12 Hierarchical clustering. The Elements of Statistical Learning (2nd ed) 2009:9.

32. Langfelder P, Horvath S. Eigengene networks for studying the relationships between co-expression mod-ules. BMC Syst Biol 2007; 1:54; PMID:18031580; http://dx.doi.org/10.1186/1752-0509-1-54

33. Langfelder P, Castellani LW, Zhou Z, Paul E, Davis R, Schadt EE, Lusis AJ, Horvath S, Mehrabian M. A systems genetic analysis of high density lipoprotein metabolism and network preservation across mouse models. Biochim Biophys Acta 2012; 1821:435-47; PMID:21807117; http://dx.doi.org/10.1016/j.bbalip.2011.07.014

34. Chen C, Cheng L, Grennan K, Pibiri F, Zhang C, Badner JA, Gershon ES, Liu C; Members of the Bipolar Disorder Genome Study (BiGS) Consortium. Two gene co-expression modules dif-ferentiate psychotics and controls. Mol Psychiatry 2012; PMID:23147385; http://dx.doi.org/10.1038/mp.2012.146

35. Tusher VG, Tibshirani R, Chu G. Significance analy-sis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci U S A 2001; 98:5116-21; PMID:11309499; http://dx.doi.org/10.1073/pnas.091062498

36. Zimmerman DW. A note on interpretation of the paired-samples t test. J Educ Behav Stat 1997; 22:349-60