supporting online material for - science...supporting online material for a high-resolution root...
TRANSCRIPT
www.sciencemag.org/cgi/content/full/318/5851/801/DC1
Supporting Online Material for
A High-Resolution Root Spatiotemporal Map Reveals Dominant Expression Patterns
Siobhan M. Brady, David A. Orlando, Ji-Young Lee, Jean Y. Wang, Jeremy Koch, José R. Dinneny, Daniel Mace, Uwe Ohler, Philip N. Benfey*
*To whom correspondence should be addressed. E-mail: [email protected]
Published 2 November 2007, Science 318, 801 (2007)
DOI: 10.1126/science.1146265
This PDF file includes
Materials and Methods SOM Text Figs. S1 to S21 References
Other Supporting Online Material for this manuscript includes the following: (available at www.sciencemag.org/cgi/content/full/318/5851/801/DC1) Tables S1 to S12 and Folders S1 to S6 have been compressed using Winzip. Tables are Excel files. Folders include data sets, text, images, and Excel files. All are described below. Table S1. A description of root tissues, cell types, GFP marker lines and their cell type coverage, and marker line abbreviations used in all figures. Table S2. (A) Genes enriched in each cell type. (B) Queries used to determine which genes are enriched in each cell type. Table S3. GO term, cis element, and array annotation enrichment summary for each cell type. Table S4. Radial and longitudinal patterns. Worksheets contain both raw and transformed values. Table S5. Radial pattern membership. Table S6. Longitudinal pattern membership. Table S7. Radial pattern GO term enrichment. Table S8. Radial pattern array annotation enrichment. Table S9. Longitudinal pattern GO term enrichment
Table S10. Longitudinal pattern array annotation enrichment. Table S11. Percentage co-regulation of gene expression in the second root replicate for each cluster. Cells indicated in red are patterns where probe sets show potential phasing of gene expression. Table S12. The top 50% of varying probe sets in the (A) radial data set and (B) longitudinal data set. Table S13. Correlations between longitudinal section microarray chips. Table S14. AGI loci identifiers and TAIR descriptions for all genes mentioned in the text and supplementary online material.
Folder S1. Radial pattern images (jpeg files) and AGI IDs corresponding with Gene Ontology (GO) and array biological enrichment analysis (text files) for cell type and radial pattern lists. AGI IDs corresponding with enriched transcription families for cell types (text files). Folder S2. Longitudinal patterns represented in heat map and line graph images (jpeg files) and AGI IDs (text files) corresponding with Gene Ontology (GO) and array biological enrichment analysis for longitudinal pattern lists. Folder S3. Clustering heat map images (jpeg files) of probe sets assigned to a pattern in root 1 and their co-regulation in root 2. Folder S4. Matrix heat map images (jpeg files), probe set, AGI ID and TAIR annotation lists (text files) of the intersections between radial and longitudinal patterns. AGI IDs (text files) corresponding with Gene Ontology (GO) and array biological enrichment analysis for intersection lists. Folder S5. Matrix images (jpeg files), probe set, AGI ID and TAIR annotation lists (text files) of the binary clusters. AGI IDs (text files) corresponding with Gene Ontology (GO) and array biological enrichment analysis for intersection lists. Folder S6. Files used for biological enrichment analysis. Genes on ATH1 Affymetrix microarray chip (text file), all singleton Affy ID–AGI ID pairs on the Affymetrix ATH1 microarray chip (text file), all Gene Ontology terms by AGI ID by chromosome (Excel spreadsheet), AGI IDs associated with genes annotated as transcription factors in two out of three transcription factor databases (text file).
Materials and Methods
Plant lines and growth conditions. Arabidopsis thaliana lines in the Columbia (Col-0) (COBL9,
S17, S32, S4, SUC2) and C24 (JO121, J2501, RM1000) ecotypes were used for the microarray
analysis. Plants to be used for microarray analysis were plated on nylon mesh as in (1). All plants
were grown vertically on 1X Murashige and Skoog salt mixture, 1% sucrose, and 2.3 mM 2-(N-
morpholino)ethanesulfonic acid (pH 5.8) in 1% agar. Roots are cut approximately ¾ of the way up,
and then treated with protoplasting solution as described previously (2, 3). The S17, S32 and S4
lines are described in (4). COBL9, SUC2 and JO121 are described in (5-7), respectively. J2501 is an
enhancer trap from the Jim Haselhoff collection (http://www.plantsci.cam.ac.uk/Haseloff/Home.html)
and the microarray .CEL files from this line were a gift from Ken Birnbaum. The RM1000 line was a
gift from Laurent Nussaume. All tissues sampled for the radial data set were 5 to 6 days old. Roots
sampled for the longitudinal data set were 7 days old and from the Col-0 ecotype. Transcriptional
GFP fusion lines (AT4G23790, AT5G14750, AT4G05170, AT5G60200, AT3G05150, AT3G29035,
AT3G43430, AT3G15500) in the Col-0 ecotype are described in (4) and were prepared for
microscopy at 5 days of age. Seeds were sterilized as in (4).
Microarray data acquisition. Two to three biological replicates were performed for each marker line
in the radial data set. GFP-expressing cells from roots of marker lines were sorted according to (3).
Cells marked in each of these marker lines are described in Supplementary Table S1. The sorted cells
were frozen immediately after collection. Samples for the longitudinal data set were dissected in the
following steps. The section marked columella was taken from the most extreme point of the root tip.
Six sections of approximately equivalent size were dissected from the meristematic zone. Two
sections of approximately equivalent size were dissected from the elongation zone where cells
transition from being optically dense to optically transparent as they begin to elongate (3). The first of
1
four approximately equivalently sized sections in the maturation zone was dissected at the point of
visible root hair elongation. The final section was taken approximately midway along the root
longitudinal axis. Sections were collected into RNA extraction buffer and immediately frozen. Total
RNA was isolated from the frozen material using the Qiagen RNeasy kit (Valencia, California, United
States). RNA probes were labeled using the GeneChip® Eukaryotic Small Sample Target Labeling
Assay Version II and hybridized on the Affymetrix ATH1 GeneChip.
Normalization and mixed-model analysis. Mixed-model software used to globally normalize all
arrays and to identify differentially expressed probe sets is described in (1). The output of this
software is 2^residuals for each probe set by treatment. Marker lines or longitudinal sections are
considered a treatment. Normalization across all chips was performed as follows:
[Log2(IntensityPROBE, ARRAY) = logarithmic meanALLARRAY INTENSITY VALUES + residualsPROBE, ARRAY]. For
each array, all probes were removed that were greater than two standard deviations from the
probe-set mean. For all remaining probe values across replicates in a treatment, expression indices
are calculated as:
2^r1 + 2^r2 + 2^rN
N
In the radial data set, these expression indices were then used to calculate p- and q-values for
pairwise comparisons of all probe sets across all treatments. R2 values for all new CEL files are as
follows (COBL9: 1vs.2(0.97); JO121: 1vs.2(0.98), 1vs.3(0.98), 2vs.3(0.97); J2501: 1vs.2(0.96),
1vs.3(0.97), 2vs.3 (0.96); RM1000: 1vs.2(0.98); S4:(1vs.2(0.95), 1vs.3(0.95), 2vs.3(0.99); S17:
1vs.2(0.99), 1vs.3(0.97), 2vs.3(0.98); SUC2: 1vs.2(0.97), 1vs.3(0.96), 2vs.3(0.97); S32: 1vs.2(0.93),
1vs.3(0.97), 2vs.3(0.94)). Longitudinal section correlations are found in Table S13.
2
Cell type-enrichment analysis. All Affy probe set to AGI ID assignments were done using the
Affy_ATH1_array_elements-2006-01-06.txt file downloaded from TAIR (www.arabidopsis.org). A
gene was determined to be enriched in a cell type if it was 1.2-fold enriched with a q-value of less
than 0.001 compared to all other non-overlapping cell types. These queries are described in
Supplementary Table S2. The intersection of genes common to this set of queries was then obtained.
Furthermore, a gene was also determined to be enriched in a cell type if it is 2-fold enriched with no
p-value or q-value threshold as this was determined to identify valid in vivo cell type expression
enrichment in (4).
Biological Enrichment Analysis. This software was written in Java, and a p-value calculated from
the hypergeometric distribution using the Apache Commons Math library
(http://jakarta.apache.org/commons/math/). The software can be obtained by contacting the
corresponding author. This software tests for enrichment of Gene Ontology terms (obtained from
TAIR – 12_09_06), array annotation terms and transcription factor (TF) families specific to microarray
analysis using the Affymetrix ATH1 chip. Gene Ontology terms were obtained from TAIR. As it is
difficult to objectively identify which gene model information to use for each AGI ID we annotated a
GO term to an AGI ID if it appeared once for any gene model. We identified groups of genes that
were enriched in various biological processes based on microarray analysis in the literature. These
processes include root hair morphogenesis (8), primary cell wall biosynthesis (9), secondary cell wall
biosynthesis (9, 10), lateral root initiation (11), M-phase or S-phase of mitosis (12), and hormone-
specific responses (13). Genes were associated with these “array annotation” terms if they were
present in these lists. TFs belonging to a family were identified if they were considered to belong to a
family in 2 of 3 databases (AGRIS, RIKEN, DATF). We tested for enrichment within a query list when
compared to an ATH1 background. Two ATH1 background models were used. The first is all AGI
3
IDs found on the ATH1 chip using the most recent Affymetrix probe set ID (Affy) to AGI ID annotation
file (Affy_ATH1_array_elements-2006-01-06.txt). Many single Affy IDs are matched to multiple AGI
IDs. These occurrences of closely related genes in terms of sequence can produce confounding
results in enrichment analysis. We therefore used a second background model, called the singleton
model that considers only single Affy ID – AGI ID annotations. Term presence on query lists
(obtained from cell type-enrichment analysis, clustering analysis, intersection and binary intersection
analysis) is counted and compared to term presence on the singleton ATH1 chip. A p-value is
calculated using the hypergeometric distribution. Enrichment was considered significant if it the p-
value was less that the log10(-3). Term tables, ATH1 background and singleton ATH1 background
lists are found in Supplementary Folder 6. q-values were calculated using John Storey’s False
Discovery Rate method, also known as the q-value(14). p-value enrichment for each of these
categories were displayed on heatmaps using TMEV3 software (http://www.tm4.org/scgi-
bin/getprogram.cgi?program=mev_old).
Finding Dominant Differential Expression Patterns:
A computational pipeline, summarized in Fig S17, was developed to find distinct dominant
differential expression patterns from the radial or longitudinal input data. The input data was filtered
before analysis to remove probe sets that do not display differential behavior across the root.
Probe sets which are not expressed in the root (no expression above 1.0 in any sample which is
equivalent to between 75 and 100 of MAS5 normalized values) were removed. This filter removed
7266 and 3881 of 22746 probe sets in the radial and longitudinal data sets, respectively. We assume
that the dominant differential expression patterns should have a set of strong exemplars that would
appear in at least the top 50% of varying probe sets. The remaining probe sets were ranked by their
variance across the samples in the data set, and the top 50% were retained for the radial analysis
4
resulting in a final input set of 7740 probe sets (Table S12). This variance filtering removes probe sets
which do not vary significantly and thus would not provide information about differentially expressed
patterns and also significantly reduces the computational load for the clustering step. For the
longitudinal input data set, each probe was ranked independently in each replicate, and the union of
the top 50% was used, resulting in a final data set of 10830 probe sets (Table S12).
These filtered data sets were then clustered using the fuzzy K-means implementation FANNY
[http://cran.r-project.org/doc/packages/cluster.pdf] (15), with the distance between two probe sets
defined as (1-r)/2 where r is the Pearson correlation between the two probe sets and using a K=60.
Other larger choices of K were tested on the radial data set and it was found that the patterns
recovered by K=60 were also recovered by these other K choices, suggesting that K=60 is large
enough to capture the major dominant patterns. Clusters for the 10,830 longitudinal probe sets were
built from the data of root 1 only and root 2 was used to assess the robustness of the identified
patterns (see below).
The result of the fuzzy K-means clustering is a matrix containing the probability of each
probe set's belonging to each cluster, which is used to determine an initial set of patterns. The initial set
of patterns was generated by taking the median profile of all probe sets assigned to a cluster at or
above a probability of 0.4 (m-value). Using the probabilities as opposed to hard cluster assignment
allows for probe sets which do not fit well to any cluster to be removed (9.7% of probe sets for the
radial clustering and 28% of probe sets for the longitudinal clustering). Additionally the probabilities
allow for a probe set that fits well to two different clusters to be used to build both of these patterns
(10%of probe sets for the radial clustering and 2.5% of probe sets for the longitudinal clustering). The
probability cutoff of 0.4 was chosen because it is the point at which the average probe set is assigned
to ~1.0 cluster in the radial clustering output (Figure S18). To ensure that our final set of patterns are
5
distinct from one another, we hierarchically clustered (single linkage) the set of initial patterns, where
the distance between patterns was 1-r (r = Pearson correlation coefficient). The resulting tree was
then cut at a height of 0.1, corresponding to r=0.9, and leaves (patterns) on the same branch were
collapsed into a new pattern by taking the median profile of all probe sets belonging to either pattern.
The end result of this pipeline was a set of 51 radial and 40 longitudinal dominant differential
expression patterns.
Distribution of Dominant Pattern Enrichment for the Radial and Longitudinal Data Sets
We wanted to assess, for the radial and longitudinal pattern collection, how expression was
distributed over cell types or longitudinal sections. In the case of the radial data set, we wanted to
determine the distribution of patterns with respect to single or multiple cell types. Since the
longitudinal axis is a developmental time continuum, we analyzed the patterns for sections of
continuous enrichment.
RADIAL:For each pattern profile, the radial median raw values in non-overlapping cell types were
visually inspected for relative high expression. The number of non-overlapping cell types where
expression was high was scored.
LONGITUDINAL: We were only interested in relative increases of expression across all patterns, and
the raw values were therefore not suitable. For each pattern profile, the longitudinal median
transformed values were ranked by experiment. The highest ranked value was identified as an
enrichment peak. To identify the number of sections that a peak covered, the number of positive
contiguous section values associated with the expression peak were counted.
Assignment of Probe Sets to Patterns
To assess the potential biological roles for the identified patterns, probe sets were assigned to
the patterns. A probe set was assigned to a pattern if the Pearson correlation between the probe set’s
6
expression pattern and the identified dominant expression pattern was greater than or equal to 0.85.
The high value of 0.85 was chosen so that only strongly correlated probe sets would be assigned to
the patterns. One potential problem in using Pearson correlation is that in cases where probe sets
have relatively low or invariant expression, spuriously high correlation values can be obtained. To
avoid this problem, we only assigned probe sets which are expressed and varying in the root, which is
the set of 7740 and 10830 radial and longitudinal probe sets from above. Probe-set lists were built for
each of the 51 radial and 40 longitudinal patterns. The average number of probe sets assigned to a
radial pattern was 129.8, with a range from 14 to 529 (Figure S8, S19). The average number of
probe sets assigned to a longitudinal pattern was 302.4, with a range from 23 to 1272 (Figure S8,
S19). The average number of patterns a probe set was assigned to for the radial data set was 0.85,
and for the longitudinal data set was 1.12 (Fig. S8). Genes were then mapped to these probe sets
using the Affy_ATH1_array_elements-2006-01-06.txt naming file (Table S5, S6) and these lists were
analyzed for potential biological significance (Table S7-S11).
Comparing unsupervised patterns to cell type-specific lists
We predicted that the radial patterns we identify using our unsupervised approach should
contain the cell type-enriched probe sets we previously identified in our supervised approach. To
assess how well the probe sets assigned to the 51 radial patterns related to the cell type-enriched
genes, we compared the lists for each of the patterns from the two methods. For the majority of
cases, there is a clear overlap between one of the cell type-specific pattern lists and one of the lists
from the 51 unsupervised patterns (Fig. S20), suggesting that the unsupervised approach can identify
these cell type-specific patterns as well as other patterns.
Comparing expression across root replicates
7
Comparing between replicates in the longitudinal data set is non-trivial because each replicate was
gathered from independent single roots, and the individual sections from each root could not be taken
from precisely the same places with the same sizes. Thus, there is no obvious choice of similarity
measure to perform a time-series alignment of sections across replicates and assess for variability
between roots.
To assess the reproducibility of expression across the two different root replicates, we
calculated the root mean squared deviation (RMSD) between the expression curves from root 1
(excluding the columella section) and root 2 for all probe sets in our 10,830 set. When the distribution
of these true pair RMSD’s was compared against the random set generated by calculating the RMSD
for each probe set in root 1 (excluding the columella section) to 10 random probe sets in root 2, the
distributions were clearly different indicating that the expression pattern of a probe set between roots
is clearly not random (Fig. S10).
Data from the second root was also used to test how robust the identified dominant expression
patterns were across roots. Since probe sets were assigned to a pattern on the basis of being highly
Pearson correlated to a particular pattern, and thus appear co-regulated in root 1, we asked how well
these same genes were co-regulated in the root 2 data set. The probe sets assigned to each pattern
from root 1 were hierarchically clustered with the distance between two probe sets equal to 1-r, where
r is the Pearson correlation of those two probe sets’ expression in root 2. The resulting tree was cut
into groups at a height of 0.5, corresponding to a Pearson correlation value of 0.5. This value
corresponds to the average inter-probe set correlation for probe sets assigned to a pattern from root 1.
As a measure of co-expression, we used the size of the largest group selected by the clustering as a
percentage of the size of the entire probe set. We found that 19/40 of the patterns show 90% or
greater co-regulation and 30/40 show at least 60% co-regulation in the second root (Fig. S12). This
8
strongly suggests that the probe sets we generated through assignment to the identified patterns are
actually sets of genes that are biologically co-regulated and that the patterns may have a biological
significance.
Intersecting radial and longitudinal patterns (and visualization):
The 51 radial patterns describe dominant differential expression across root cell types
independent of longitudinal section and the 40 longitudinal patterns describe dominant differential
expression across longitudinal sections independent of cell type. In fact, the expression of genes may
be dependent upon both longitudinal section and cell type. To visualize the potential cell type and
longitudinal section-dependent expression patterns in terms of high relative expression we developed
a method for intersecting identified patterns to produce a relative visual expression map across both
cell type and longitudinal section. Radial and longitudinal patterns were intersected by adding the
individual log2 normalized values of each pattern, in cases where both values were positive or 0. This
conditional adding constrains the area in which expression can be present. This was based on the
following rationale: if a radial pattern shows no or very low expression in a specific cell type, then the
genes associated with that pattern are unlikely to be expressed in a longitudinal section that contains
this cell type. Similarly, if a pattern shows high expression in a specific tissue then expression must
be present in at least one longitudinal section which contains that cell type. A conditional add ensures
that both these cases will be true. A standard addition could place expression in a cell type and
longitudinal cross section where one data set was already specified to have low or no expression,
violating the first case described.
Probe-set lists were generated by intersecting the probe set lists for the radial and longitudinal
patterns. Relative expression heat maps for the 221 intersections containing at least 8 probe sets were
generated using all marker lines from the radial data set and all longitudinal sections.
9
We also created a heatmap of expression mapped onto a 3-dimensional Arabidopsis root using a
label or atlas image representing cell type and longitudinal section. We recomputed the radial pattern
accounting for cases where marker lines overlapped. The recomputed pattern was determined by a
weighted sum of all overlapping regions. The 3D root heatmap was then generated by intersecting
the recomputed radial patterns with the previously described longitudinal patterns using the same
conditional add. Intensity values were contrast enhanced in the maximum expressed region.
Identifying Binary Clusters
To assess the spatiotemporal patterns of absolute gene expression, as opposed to the
relative patterns analyzed earlier, we chose to intersect a binary representation of the
ANOVA normalized data. ANOVA normalized values were converted into a binary
representation by mapping all expression values less than 1.0 to 0, and all values
greater than or equal to 1.0 to 1. Any probe set which did not have any expression values
above 1.0 in either the radial or longitudinal data sets were removed before intersection.
Each radial binary vector was intersected with each longitudinal vector using the AND
operator. The binary heatmaps show the individual radial and longitudinal binary patters on the
extreme axis, with red indicating presence in the cell type or section, and dark blue
indicating absence. These patterns are intersected using the AND operator to produce a
heat map indicating where expression should be present (yellow) or absent (black).158 unique
spatiotemporal patterns, with at least 5 probe set members, were found.
Microscopy
Confocal images were obtained using a 10X lens or a 25X water-immersion lens on a Zeiss LSM-510
confocal laser-scanning microscope using the 488-nm laser line for excitation. Roots were stained
with 10 μg/mL propidium iodide for 0.5 to 2 minutes and mounted in water. GFP was rendered in
10
green and propidium iodide in red. Images were saved in .tif format. Images were manually stitched
together in Adobe Photoshop CS2 using the Photomerge command. No other image enhancement
was performed.
Image Analysis and Quantification
Images were straightened and preprocessed as described in (16). The images
were then aligned along the medial axis of the Arabidopsis root and partitioned into 50
equidistant sections. The GFP intensity for each section was obtained by summing all
pixels values within a bin and normalizing by the total number of pixels in each section.
Supplementary Material
Inference of Regulatory Modules
We inferred putative network connections by analyzing enriched cis-elements and TFs contained
within our identified transcriptional patterns on the premise that expression co-regulation was due to
direct physical interaction of a TF with its target cis-element. Further analysis of the spatially and
temporally regulated transcriptional programs identified 3 putative network connections.
MYB Promotion of Auxin Biosynthesis
In radial transcription pattern 5, which shows co-expression of Trp-dependent auxin biosynthetic
genes (Fig. 2E), the MYB binding site was identified as enriched (p<10-4) using the ATHENA TF
binding site enrichment tool (17). A single MYB domain TF is present in this pattern, ALTERED
TRYPTOPHAN REGULATION1 (ATR1). We can therefore infer the following network module for
auxin biosynthesis in cell types previously presumed to be unrelated (Fig. S21A). Note that among
the genes containing MYB binding sites, several are key regulators of Trp-dependent auxin
11
biosynthesis. Furthermore, a loss-of-function allele of ATR1 shows altered Trp-dependent auxin
biosynthesis and indole glucosinolate biosynthesis(18).
Putative Auxin Response Factor-Regulated Gene Expression in the Columella
In radial transcription pattern 13 (Fig 2F), columella-specific expression is found for genes involved in
auxin homeostasis. Furthermore, two auxin response factors (ARFs) are found within this set of
genes. ARF transcription factors are known to regulate gene expression through binding to the cis-
element, TGTCTC, and ARFs are known to heterodimerize(19). The expression patterns of these
ARFs are further regulated by micro-RNA mediated mechanisms(20). All transcriptional interactions
between these ARFs have not yet been identified. Transcriptional co-regulation of ARF gene
expression can provide clues as to specific spatial ARF-regulated modules. ARF10 has been
demonstrated to play a role in lateral root cap development (lateral root cap and columella) in
conjunction with ARF16, and ARF6 plays a role in floral maturation and development(21, 22). We
find strong co-expression of ARF10 and ARF6 in radial pattern 13. Furthermore, we have identified 5
putative targets of these ARFs which contain the TGTCTC element to infer a potential columella-
specific auxin-regulated module (Fig S21B). Note that one of these genes encodes SUPERROOT2
(At4g31500), a gene responsible for auxin homeostasis. This module could form a feedback loop to
regulate auxin levels (auxin regulates action of ARF genes, which in turn regulates expression of
target genes required for maintaining its homeostasis).
Putative WRKY-Regulated Modules in Hair Cells
We identified an enrichment of W box promoter elements (binding sites for WRKY TFs) and the
WRKY TF family in hair cells (p<10-10 and p<10-5.) We analyzed our groups of genes resulting from
the intersection of the radial and longitudinal data sets to delineate putative WRKY-regulated modules
in hair cells along the root’s longitudinal axis. In intersection pattern 150 (Fig. S21C), the WRKY9 TF
12
has a peak of expression in the basal meristematic zone. Three genes with W-boxes are closely
correlated with expression of this WRKY TF – At5g01320, At5g42860 and At5g46230. Further into
the root elongation zone, in intersection pattern 155 (Fig S21C), WRKY65 shows a peak of
expression, and again a further peak of expression in the maturation zone. Two different genes that
contain W-boxes (At5g16910, At5g20050) are potential targets of this WKRY due to their
spatiotemporal co-expression. These are but three examples of transcriptional regulatory modules
that were inferred from the expression data. This data should provide a rich resource for further
inference of modules in specific cell types at specific developmental stages.
13
Supplementary Figure Captions Figure S1. The number of genes enriched in each cell type. Vascular and hair cells contain the largest number of enriched genes, perhaps reflecting their specialized function.
Figure S2. Summary of significant GO term enrichment by cell type. GO term enrichment is displayed after hierarchical clustering. The hypergeometric distribution P value is log10 trans-formed. The figure is a duplicate of Fig. 2A at higher resolution.
Figure S3. Summary of significant array annotation term enrichment by cell type. The hyper-geometric distribution P value is log10 transformed.
Figure S4. Summary of significant cis element enrichment by cell type. Cis element enrichment was determined by ATHENA (17). The hypergeometric distribution P value is log10 transformed.
Figure S5. Putative novel function is assigned to xylem cells in the meristematic zone. Signifi-cantly enriched GO categories (top panel), array annotations (middle panel) and cis elements (bottom panel) associated with xylem cells in the meristematic zone (hypergeometric distribution P value <log10
–3).
Figure S6. Patterns show enrichment of expression in tissues that are ontologically (A) or spa-tially (B) related, or in cell types that are spatially separated (C and D). (A) Pattern 42 shows enrichment in the xylem tissue in all developmental stages. Genes assigned to these patterns show a strong enrichment for genes involved in microtubule-based processes (P = 4.03E–6) (B) Pattern 49 shows enrichment in the more mature xylem and in phloem tissue and a correspond-ing enrichment in ceramidase activity (P = 6.79E–5) and microtubule binding (P = 1.5E–4). (C) Pattern 35 shows enrichment in the columella and in the mature xylem and an enrichment for genes involved in proteolysis (P = 7.21E–4). These tissues are spatially separated by many cells. (D) Pattern 33 shows high enrichment in the hair cells and in developing xylem. Enrichment analysis suggests a shared developmental pathway, presumably related to new cell wall deposi-tion; (COPII vesicle coat (P = 4.07E–5), protein amino acid glycosylation (P = 6.5E–5), cell wall (P = 1.44E–4).
Figure S7. Each developmental zone shows a different distribution of pattern expression peaks. Based on peak expression within a longitudinal pattern, the pattern was assigned to the meris-tematic (A), elongation (B), or maturation (C) zone. The number of contiguous sections that show enrichment within each pattern were then counted. The distribution of these expression peaks was then plotted.
Figure S8. The number of probe sets assigned to each radial or longitudinal pattern. (A) The number of probe sets assigned to each radial pattern (B) The number of probe sets assigned to each longitudinal pattern.
Figure S9. Patterns which display expression fluctuations along the root’s longitudinal axis. Of the 40 dominant expression patterns identified by our computational pipeline, 17 show fluctua-tions in expression. The top and bottom images within each row show expression values that were mean-normalized across the longitudinal data set and log2 transformed to yield relative expression indices. The top panel displays this information in a heat map, and the bottom graph displays this information in a line graph.
Figure S10. Probe set expression between replicates is reproducible. The distribution of root mean squared differences between true probe set pairs (black) and each probe set and 10 ran-domly selected probe sets (red). Probe sets analyzed are the top 50% of expressed probe sets. For this analysis we normalized the data using mean = 0 and variance = 1. The two distributions
14
are separate, demonstrating that the data between replicates is not random, and therefore re-producible.
Figure S11. Probe set co-expression between roots for each pattern. Most probe set members maintain greater than 90% co-expression in a second root. For each pattern, the largest co-regulated group was considered.
Figure S12. The distribution of co-regulation of expression between roots by zone. Probe sets displaying peak enrichment in the meristematic, elongation or maturation zone, or patterns showing fluctuating expression along developmental time were analyzed for co-regulation of expression in a second root. The distribution of this co-regulation is plotted for the summary of each cluster type. Probe set patterns in the meristematic and maturation zone are the most ro-bust, as most of their sets show 90-100% co-regulation of probe set expression across a sec-ond root, while patterns in the elongation zone or patterns whose expression fluctuates are more variable.
Figure S13. Quantified GFP intensity matches the microarray expression levels. GFP quantifi-cation of images presented in Fig. 3 (A) AGAMOUS-LIKE21 (B) WEREWOLF (C) At4g05170, (D) At5g60200. Each root was divided into 50 equidistant units and these are represented on the x axis. Along the y axis are GFP intensity units normalized by the total area of each section.
Figure S14. Expression conferred by transcriptional:GFP fusions validates the microarray ex-pression profiles. Normalized expression of the gene is indicated in the top graph, and images of GFP expression conferred by the respective promoter are indicated in the lower image. (A) Expression of a NAC domain TF, AT3G29035, shows a peak of expression in the distal matura-tion zone. (B) Expression of a sugar transporter, AT3G05150, shows a peak of expression in the elongation zone. (C) Expression of a bZIP transcription factor, AT2G22850, shows a peak of expression in the maturation zone.
Figure S15. Expression conferred by the At3g43430 promoter, a zinc finger TF, shows fluctua-tion along the root longitudinal axis that varies between individual roots. (A) Expression values (y axis) of At3g43430 in both roots demonstrate varying fluctuation between both root samples. (B-E) Four images of roots display fluctuations of expression that differ between individual roots and that validate the observations made with microarray expression profiling. (Top) Quantifica-tion of GFP expression levels along the root’s longitudinal axis as in Fig. S13. (Bottom) The im-aged root whose GFP expression was quantified.
Figure S16. Intersection heat map matrix of a group containing AT4G05170. Intersection matrix is shown in left panel and corresponding GFP image of the transcriptional fusion on the right. The intersection predicts a peak of expression in the endodermis in the maturation zone.
Figure S17. Work-flow summary of our computational pipeline. Input data is filtered to remove low expressed and low varying probe sets. The remaining probe sets are then clustered using fuzzy k-means, and probe sets are kept in each cluster only if their probability of belonging to a cluster is greater than or equal to an m-value of 0.4. The resulting cluster profiles are then col-lapsed using Pearson correlation as a metric with a threshold value of 0.9 to identify distinct, dominant expression patterns. Probe sets are then assigned to each pattern if they correlate with a Pearson correlation coefficient of 0.85 or higher.
Figure S18. The average number of radial patterns a probe set belongs to plotted against m values (probability of belonging to a cluster). An m value of 0.4 was defined as a threshold to ensure that initial cluster profiles were only built from probe sets which had a probability of be-longing to that cluster.
15
15a
Figure S19. Distribution of probe set assignments to 0, 1, 2, 3, 4, 5, or 6 patterns. Due to as-signment of probe sets to patterns by Pearson correlation, genes may belong to more than one pattern.
Figure S20. Overlap between genes identified as cell type–enriched and genes present in the 51 dominant expression profiles. This was performed to ensure that the computational pipeline identified genes enriched in individual cell types as assigned by our first supervised approach.
Figure S21. Inference of transcriptional regulatory modules. (A) A putative MYB-regulated auxin biosynthesis module. A single MYB transcription factor, At5g60890 (ATR1) is present in a set of genes enriched for auxin biosynthesis function (Fig. 2E) and for MYB-binding sites. All genes containing a MYB-binding site are indicated as downstream of ATR1. All genes annotated as having a function in Trp-dependent auxin biosynthesis are indicated with a blue box and yellow shading. (B) A putative ARF-regulated module in the columella. At1g30330 (ARF6) and At2g28350 (ARF10) are strongly co-expressed in the columella. A set of genes also strongly co-expressed with these two members contain the binding site for ARFs and are indicated as downstream of ARF6 and ARF10. (C) Putative WRKY-regulated transcriptional modules identi-fied in different stages of root hair development from intersections of the radial and longitudinal data sets. WRKY TF binding sites (W-boxes) were enriched in root hairs (P < 10–10). The left panel indicates a subset of this hair cell enrichment in the basal meristematic zone. The WRKY TF At1g68150 is present in this gene group and three potential targets of this WRKY TF were inferred as being downstream as they contain a W-box and are co-expressed with At1g68150. The right panel indicates an additional group of genes which show fluctuation of expression in hair cells, first in the elongation zone and further in the maturation zone. This group of genes contains the WRKY TF At1g68150 and its two potential co-expressed downstream targets.
Supplementary Figure 1
16
Supplementary Figure 2
17
Supplementary Figure 3
Supplementary Figure 4
18
Supplementary Figure 5
Supplementary Figure 6
19
Supplementary Figure 7
Supplementary Figure 8
20
Supplementary Figure 9
21
Supplementary Figure 10
22
0
2
4
6
8
10
12
14
16
18
20
0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80 80-90 90-100
% Co-Regulated
# PA T T E R N S
Supplementary Figure 11
Supplementary Figure 12
Supplementary Figure 13
23
Supplementary Figure 14
24
Supplementary Figure 15
25
Supplementary Figure 16
26
Supplementary Figure 17
27
Supplementary Figure 18
00.050.1
0.150.2
0.250.3
0.350.4
0.45
% of Probesets
0 1 2 3 4 5 6# of Patterns
RadialLongitudinal
Supplementary Figure 19
28
Supplementary Figure 20
29
Supplementary Figure 21
30
Supplementary Online Material References
1. Levesque et al., PLoS Biology 4, e143 (2006).
2. Birnbaum et al., Nature Methods 2, 615 (2005).
3. Birnbaum et al., Science 302, 1956 (2003).
4. Lee et al., PNAS 103, 6055 (2006).
5. Imlau et al., Plant Cell 11, 309 (1999).
6. Laplaze et al., J. Exp. Bot. 56, 2433 (2005).
7. Brady et al., Plant Physiology 143, 172 (2007).
8. Jones et al., Plant Journal 45, 83 (2006).
9. Persson et al., PNAS 102, 8633 (2005).
10. Brown et al., Plant Cell 17, 2281 (2005).
11. Vanneste et al., Plant Cell 17, 3035 (2005).
12. Menges et al., Plant Journal 41, 546 (2005).
13. Nemhauser et al., Cell 126, 467 (2006).
14. Storey, Journal of the Royal Statistical Society, SeriesB 64, 479 (2002).
15. Gasch et al., Genome Biology 3, research0059.1 (2002).
16. Mace et al., Bioinformatics 22, e323 (2006).
17. O'Connor et al., Bioinformatics 21, 4411 (2005).
18. Celenza et al., Plant Physiology 137, 253 (2005).
19. Hardtke et al. (2004), vol. 131, pp. 1089-1100.
20. Mallory et al., Plant Cell 17, 1360 (2005).
21. Nagpal et al., Development 132, 4107 (2005).
22. Wang et al., Plant Physiology 17, 2204 (2005).
31