supporting online material for - science...supporting online material for a high-resolution root...

www.sciencemag.org/cgi/content/full/318/5851/801/DC1

Supporting Online Material for

A High-Resolution Root Spatiotemporal Map Reveals Dominant Expression Patterns

Siobhan M. Brady, David A. Orlando, Ji-Young Lee, Jean Y. Wang, Jeremy Koch, José R. Dinneny, Daniel Mace, Uwe Ohler, Philip N. Benfey*

*To whom correspondence should be addressed. E-mail: [email protected]

Published 2 November 2007, Science 318, 801 (2007)

DOI: 10.1126/science.1146265

This PDF file includes

Materials and Methods SOM Text Figs. S1 to S21 References

Other Supporting Online Material for this manuscript includes the following: (available at www.sciencemag.org/cgi/content/full/318/5851/801/DC1) Tables S1 to S12 and Folders S1 to S6 have been compressed using Winzip. Tables are Excel files. Folders include data sets, text, images, and Excel files. All are described below. Table S1. A description of root tissues, cell types, GFP marker lines and their cell type coverage, and marker line abbreviations used in all figures. Table S2. (A) Genes enriched in each cell type. (B) Queries used to determine which genes are enriched in each cell type. Table S3. GO term, cis element, and array annotation enrichment summary for each cell type. Table S4. Radial and longitudinal patterns. Worksheets contain both raw and transformed values. Table S5. Radial pattern membership. Table S6. Longitudinal pattern membership. Table S7. Radial pattern GO term enrichment. Table S8. Radial pattern array annotation enrichment. Table S9. Longitudinal pattern GO term enrichment

Table S10. Longitudinal pattern array annotation enrichment. Table S11. Percentage co-regulation of gene expression in the second root replicate for each cluster. Cells indicated in red are patterns where probe sets show potential phasing of gene expression. Table S12. The top 50% of varying probe sets in the (A) radial data set and (B) longitudinal data set. Table S13. Correlations between longitudinal section microarray chips. Table S14. AGI loci identifiers and TAIR descriptions for all genes mentioned in the text and supplementary online material.

Folder S1. Radial pattern images (jpeg files) and AGI IDs corresponding with Gene Ontology (GO) and array biological enrichment analysis (text files) for cell type and radial pattern lists. AGI IDs corresponding with enriched transcription families for cell types (text files). Folder S2. Longitudinal patterns represented in heat map and line graph images (jpeg files) and AGI IDs (text files) corresponding with Gene Ontology (GO) and array biological enrichment analysis for longitudinal pattern lists. Folder S3. Clustering heat map images (jpeg files) of probe sets assigned to a pattern in root 1 and their co-regulation in root 2. Folder S4. Matrix heat map images (jpeg files), probe set, AGI ID and TAIR annotation lists (text files) of the intersections between radial and longitudinal patterns. AGI IDs (text files) corresponding with Gene Ontology (GO) and array biological enrichment analysis for intersection lists. Folder S5. Matrix images (jpeg files), probe set, AGI ID and TAIR annotation lists (text files) of the binary clusters. AGI IDs (text files) corresponding with Gene Ontology (GO) and array biological enrichment analysis for intersection lists. Folder S6. Files used for biological enrichment analysis. Genes on ATH1 Affymetrix microarray chip (text file), all singleton Affy ID–AGI ID pairs on the Affymetrix ATH1 microarray chip (text file), all Gene Ontology terms by AGI ID by chromosome (Excel spreadsheet), AGI IDs associated with genes annotated as transcription factors in two out of three transcription factor databases (text file).

Materials and Methods

Plant lines and growth conditions. Arabidopsis thaliana lines in the Columbia (Col-0) (COBL9,

S17, S32, S4, SUC2) and C24 (JO121, J2501, RM1000) ecotypes were used for the microarray

analysis. Plants to be used for microarray analysis were plated on nylon mesh as in (1). All plants

were grown vertically on 1X Murashige and Skoog salt mixture, 1% sucrose, and 2.3 mM 2-(N-

morpholino)ethanesulfonic acid (pH 5.8) in 1% agar. Roots are cut approximately ¾ of the way up,

and then treated with protoplasting solution as described previously (2, 3). The S17, S32 and S4

lines are described in (4). COBL9, SUC2 and JO121 are described in (5-7), respectively. J2501 is an

enhancer trap from the Jim Haselhoff collection (http://www.plantsci.cam.ac.uk/Haseloff/Home.html)

and the microarray .CEL files from this line were a gift from Ken Birnbaum. The RM1000 line was a

gift from Laurent Nussaume. All tissues sampled for the radial data set were 5 to 6 days old. Roots

sampled for the longitudinal data set were 7 days old and from the Col-0 ecotype. Transcriptional

GFP fusion lines (AT4G23790, AT5G14750, AT4G05170, AT5G60200, AT3G05150, AT3G29035,

AT3G43430, AT3G15500) in the Col-0 ecotype are described in (4) and were prepared for

microscopy at 5 days of age. Seeds were sterilized as in (4).

Microarray data acquisition. Two to three biological replicates were performed for each marker line

in the radial data set. GFP-expressing cells from roots of marker lines were sorted according to (3).

Cells marked in each of these marker lines are described in Supplementary Table S1. The sorted cells

were frozen immediately after collection. Samples for the longitudinal data set were dissected in the

following steps. The section marked columella was taken from the most extreme point of the root tip.

Six sections of approximately equivalent size were dissected from the meristematic zone. Two

sections of approximately equivalent size were dissected from the elongation zone where cells

transition from being optically dense to optically transparent as they begin to elongate (3). The first of

1

http://www.plantsci.cam.ac.uk/Haseloff/Home.html

four approximately equivalently sized sections in the maturation zone was dissected at the point of

visible root hair elongation. The final section was taken approximately midway along the root

longitudinal axis. Sections were collected into RNA extraction buffer and immediately frozen. Total

RNA was isolated from the frozen material using the Qiagen RNeasy kit (Valencia, California, United

States). RNA probes were labeled using the GeneChip® Eukaryotic Small Sample Target Labeling

Assay Version II and hybridized on the Affymetrix ATH1 GeneChip.

Normalization and mixed-model analysis. Mixed-model software used to globally normalize all

arrays and to identify differentially expressed probe sets is described in (1). The output of this

software is 2^residuals for each probe set by treatment. Marker lines or longitudinal sections are

considered a treatment. Normalization across all chips was performed as follows:

[Log2(IntensityPROBE, ARRAY) = logarithmic meanALLARRAY INTENSITY VALUES + residualsPROBE, ARRAY]. For

each array, all probes were removed that were greater than two standard deviations from the

probe-set mean. For all remaining probe values across replicates in a treatment, expression indices

are calculated as:

2^r1 + 2^r2 + 2^rN

N

In the radial data set, these expression indices were then used to calculate p- and q-values for

pairwise comparisons of all probe sets across all treatments. R2 values for all new CEL files are as

follows (COBL9: 1vs.2(0.97); JO121: 1vs.2(0.98), 1vs.3(0.98), 2vs.3(0.97); J2501: 1vs.2(0.96),

1vs.3(0.97), 2vs.3 (0.96); RM1000: 1vs.2(0.98); S4:(1vs.2(0.95), 1vs.3(0.95), 2vs.3(0.99); S17:

1vs.2(0.99), 1vs.3(0.97), 2vs.3(0.98); SUC2: 1vs.2(0.97), 1vs.3(0.96), 2vs.3(0.97); S32: 1vs.2(0.93),

1vs.3(0.97), 2vs.3(0.94)). Longitudinal section correlations are found in Table S13.

2

Cell type-enrichment analysis. All Affy probe set to AGI ID assignments were done using the

Affy_ATH1_array_elements-2006-01-06.txt file downloaded from TAIR (www.arabidopsis.org). A

gene was determined to be enriched in a cell type if it was 1.2-fold enriched with a q-value of less

than 0.001 compared to all other non-overlapping cell types. These queries are described in

Supplementary Table S2. The intersection of genes common to this set of queries was then obtained.

Furthermore, a gene was also determined to be enriched in a cell type if it is 2-fold enriched with no

p-value or q-value threshold as this was determined to identify valid in vivo cell type expression

enrichment in (4).

Biological Enrichment Analysis. This software was written in Java, and a p-value calculated from

the hypergeometric distribution using the Apache Commons Math library

(http://jakarta.apache.org/commons/math/). The software can be obtained by contacting the

corresponding author. This software tests for enrichment of Gene Ontology terms (obtained from

TAIR – 12_09_06), array annotation terms and transcription factor (TF) families specific to microarray

analysis using the Affymetrix ATH1 chip. Gene Ontology terms were obtained from TAIR. As it is

difficult to objectively identify which gene model information to use for each AGI ID we annotated a

GO term to an AGI ID if it appeared once for any gene model. We identified groups of genes that

were enriched in various biological processes based on microarray analysis in the literature. These

processes include root hair morphogenesis (8), primary cell wall biosynthesis (9), secondary cell wall

biosynthesis (9, 10), lateral root initiation (11), M-phase or S-phase of mitosis (12), and hormone-

specific responses (13). Genes were associated with these “array annotation” terms if they were

present in these lists. TFs belonging to a family were identified if they were considered to belong to a

family in 2 of 3 databases (AGRIS, RIKEN, DATF). We tested for enrichment within a query list when

compared to an ATH1 background. Two ATH1 background models were used. The first is all AGI

3

http://www.arabidopsis.org/

http://jakarta.apache.org/commons/math/

IDs found on the ATH1 chip using the most recent Affymetrix probe set ID (Affy) to AGI ID annotation

file (Affy_ATH1_array_elements-2006-01-06.txt). Many single Affy IDs are matched to multiple AGI

IDs. These occurrences of closely related genes in terms of sequence can produce confounding

results in enrichment analysis. We therefore used a second background model, called the singleton

model that considers only single Affy ID – AGI ID annotations. Term presence on query lists

(obtained from cell type-enrichment analysis, clustering analysis, intersection and binary intersection

analysis) is counted and compared to term presence on the singleton ATH1 chip. A p-value is

calculated using the hypergeometric distribution. Enrichment was considered significant if it the p-

value was less that the log10(-3). Term tables, ATH1 background and singleton ATH1 background

lists are found in Supplementary Folder 6. q-values were calculated using John Storey’s False

Discovery Rate method, also known as the q-value(14). p-value enrichment for each of these

categories were displayed on heatmaps using TMEV3 software (http://www.tm4.org/scgi-

bin/getprogram.cgi?program=mev_old).

Finding Dominant Differential Expression Patterns:

A computational pipeline, summarized in Fig S17, was developed to find distinct dominant

differential expression patterns from the radial or longitudinal input data. The input data was filtered

before analysis to remove probe sets that do not display differential behavior across the root.

Probe sets which are not expressed in the root (no expression above 1.0 in any sample which is

equivalent to between 75 and 100 of MAS5 normalized values) were removed. This filter removed

7266 and 3881 of 22746 probe sets in the radial and longitudinal data sets, respectively. We assume

that the dominant differential expression patterns should have a set of strong exemplars that would

appear in at least the top 50% of varying probe sets. The remaining probe sets were ranked by their

variance across the samples in the data set, and the top 50% were retained for the radial analysis

4

resulting in a final input set of 7740 probe sets (Table S12). This variance filtering removes probe sets

which do not vary significantly and thus would not provide information about differentially expressed

patterns and also significantly reduces the computational load for the clustering step. For the

longitudinal input data set, each probe was ranked independently in each replicate, and the union of

the top 50% was used, resulting in a final data set of 10830 probe sets (Table S12).

These filtered data sets were then clustered using the fuzzy K-means implementation FANNY

[http://cran.r-project.org/doc/packages/cluster.pdf] (15), with the distance between two probe sets

defined as (1-r)/2 where r is the Pearson correlation between the two probe sets and using a K=60.

Other larger choices of K were tested on the radial data set and it was found that the patterns

recovered by K=60 were also recovered by these other K choices, suggesting that K=60 is large

enough to capture the major dominant patterns. Clusters for the 10,830 longitudinal probe sets were

built from the data of root 1 only and root 2 was used to assess the robustness of the identified

patterns (see below).

The result of the fuzzy K-means clustering is a matrix containing the probability of each

probe set's belonging to each cluster, which is used to determine an initial set of patterns. The initial set

of patterns was generated by taking the median profile of all probe sets assigned to a cluster at or

above a probability of 0.4 (m-value). Using the probabilities as opposed to hard cluster assignment

allows for probe sets which do not fit well to any cluster to be removed (9.7% of probe sets for the

radial clustering and 28% of probe sets for the longitudinal clustering). Additionally the probabilities

allow for a probe set that fits well to two different clusters to be used to build both of these patterns

(10%of probe sets for the radial clustering and 2.5% of probe sets for the longitudinal clustering). The

probability cutoff of 0.4 was chosen because it is the point at which the average probe set is assigned

to ~1.0 cluster in the radial clustering output (Figure S18). To ensure that our final set of patterns are

5

distinct from one another, we hierarchically clustered (single linkage) the set of initial patterns, where

the distance between patterns was 1-r (r = Pearson correlation coefficient). The resulting tree was

then cut at a height of 0.1, corresponding to r=0.9, and leaves (patterns) on the same branch were

collapsed into a new pattern by taking the median profile of all probe sets belonging to either pattern.

The end result of this pipeline was a set of 51 radial and 40 longitudinal dominant differential

expression patterns.

Distribution of Dominant Pattern Enrichment for the Radial and Longitudinal Data Sets

We wanted to assess, for the radial and longitudinal pattern collection, how expression was

distributed over cell types or longitudinal sections. In the case of the radial data set, we wanted to

determine the distribution of patterns with respect to single or multiple cell types. Since the

longitudinal axis is a developmental time continuum, we analyzed the patterns for sections of

continuous enrichment.

RADIAL:For each pattern profile, the radial median raw values in non-overlapping cell types were

visually inspected for relative high expression. The number of non-overlapping cell types where

expression was high was scored.

LONGITUDINAL: We were only interested in relative increases of expression across all patterns, and

the raw values were therefore not suitable. For each pattern profile, the longitudinal median

transformed values were ranked by experiment. The highest ranked value was identified as an

enrichment peak. To identify the number of sections that a peak covered, the number of positive

contiguous section values associated with the expression peak were counted.

Assignment of Probe Sets to Patterns

To assess the potential biological roles for the identified patterns, probe sets were assigned to

the patterns. A probe set was assigned to a pattern if the Pearson correlation between the probe set’s

6

expression pattern and the identified dominant expression pattern was greater than or equal to 0.85.

The high value of 0.85 was chosen so that only strongly correlated probe sets would be assigned to

the patterns. One potential problem in using Pearson correlation is that in cases where probe sets

have relatively low or invariant expression, spuriously high correlation values can be obtained. To

avoid this problem, we only assigned probe sets which are expressed and varying in the root, which is

the set of 7740 and 10830 radial and longitudinal probe sets from above. Probe-set lists were built for

each of the 51 radial and 40 longitudinal patterns. The average number of probe sets assigned to a

radial pattern was 129.8, with a range from 14 to 529 (Figure S8, S19). The average number of

probe sets assigned to a longitudinal pattern was 302.4, with a range from 23 to 1272 (Figure S8,

S19). The average number of patterns a probe set was assigned to for the radial data set was 0.85,

and for the longitudinal data set was 1.12 (Fig. S8). Genes were then mapped to these probe sets

using the Affy_ATH1_array_elements-2006-01-06.txt naming file (Table S5, S6) and these lists were

analyzed for potential biological significance (Table S7-S11).

Comparing unsupervised patterns to cell type-specific lists

We predicted that the radial patterns we identify using our unsupervised approach should

contain the cell type-enriched probe sets we previously identified in our supervised approach. To

assess how well the probe sets assigned to the 51 radial patterns related to the cell type-enriched

genes, we compared the lists for each of the patterns from the two methods. For the majority of

cases, there is a clear overlap between one of the cell type-specific pattern lists and one of the lists

from the 51 unsupervised patterns (Fig. S20), suggesting that the unsupervised approach can identify

these cell type-specific patterns as well as other patterns.

Comparing expression across root replicates

7

Comparing between replicates in the longitudinal data set is non-trivial because each replicate was

gathered from independent single roots, and the individual sections from each root could not be taken

from precisely the same places with the same sizes. Thus, there is no obvious choice of similarity

measure to perform a time-series alignment of sections across replicates and assess for variability

between roots.

To assess the reproducibility of expression across the two different root replicates, we

calculated the root mean squared deviation (RMSD) between the expression curves from root 1

(excluding the columella section) and root 2 for all probe sets in our 10,830 set. When the distribution

of these true pair RMSD’s was compared against the random set generated by calculating the RMSD

for each probe set in root 1 (excluding the columella section) to 10 random probe sets in root 2, the

distributions were clearly different indicating that the expression pattern of a probe set between roots

is clearly not random (Fig. S10).

Data from the second root was also used to test how robust the identified dominant expression

patterns were across roots. Since probe sets were assigned to a pattern on the basis of being highly

Pearson correlated to a particular pattern, and thus appear co-regulated in root 1, we asked how well

these same genes were co-regulated in the root 2 data set. The probe sets assigned to each pattern

from root 1 were hierarchically clustered with the distance between two probe sets equal to 1-r, where

r is the Pearson correlation of those two probe sets’ expression in root 2. The resulting tree was cut

into groups at a height of 0.5, corresponding to a Pearson correlation value of 0.5. This value

corresponds to the average inter-probe set correlation for probe sets assigned to a pattern from root 1.

As a measure of co-expression, we used the size of the largest group selected by the clustering as a

percentage of the size of the entire probe set. We found that 19/40 of the patterns show 90% or

greater co-regulation and 30/40 show at least 60% co-regulation in the second root (Fig. S12). This

8

strongly suggests that the probe sets we generated through assignment to the identified patterns are

actually sets of genes that are biologically co-regulated and that the patterns may have a biological

significance.

Intersecting radial and longitudinal patterns (and visualization):

The 51 radial patterns describe dominant differential expression across root cell types

independent of longitudinal section and the 40 longitudinal patterns describe dominant differential

expression across longitudinal sections independent of cell type. In fact, the expression of genes may

be dependent upon both longitudinal section and cell type. To visualize the potential cell type and

longitudinal section-dependent expression patterns in terms of high relative expression we developed

a method for intersecting identified patterns to produce a relative visual expression map across both

cell type and longitudinal section. Radial and longitudinal patterns were intersected by adding the

individual log2 normalized values of each pattern, in cases where both values were positive or 0. This

conditional adding constrains the area in which expression can be present. This was based on the

following rationale: if a radial pattern shows no or very low expression in a specific cell type, then the

genes associated with that pattern are unlikely to be expressed in a longitudinal section that contains

this cell type. Similarly, if a pattern shows high expression in a specific tissue then expression must

be present in at least one longitudinal section which contains that cell type. A conditional add ensures

that both these cases will be true. A standard addition could place expression in a cell type and

longitudinal cross section where one data set was already specified to have low or no expression,

violating the first case described.

Probe-set lists were generated by intersecting the probe set lists for the radial and longitudinal

patterns. Relative expression heat maps for the 221 intersections containing at least 8 probe sets were

generated using all marker lines from the radial data set and all longitudinal sections.

9

We also created a heatmap of expression mapped onto a 3-dimensional Arabidopsis root using a

label or atlas image representing cell type and longitudinal section. We recomputed the radial pattern

accounting for cases where marker lines overlapped. The recomputed pattern was determined by a

weighted sum of all overlapping regions. The 3D root heatmap was then generated by intersecting

the recomputed radial patterns with the previously described longitudinal patterns using the same

conditional add. Intensity values were contrast enhanced in the maximum expressed region.

Identifying Binary Clusters

To assess the spatiotemporal patterns of absolute gene expression, as opposed to the

relative patterns analyzed earlier, we chose to intersect a binary representation of the

ANOVA normalized data. ANOVA normalized values were converted into a binary

representation by mapping all expression values less than 1.0 to 0, and all values

greater than or equal to 1.0 to 1. Any probe set which did not have any expression values

above 1.0 in either the radial or longitudinal data sets were removed before intersection.

Each radial binary vector was intersected with each longitudinal vector using the AND

operator. The binary heatmaps show the individual radial and longitudinal binary patters on the

extreme axis, with red indicating presence in the cell type or section, and dark blue

indicating absence. These patterns are intersected using the AND operator to produce a

heat map indicating where expression should be present (yellow) or absent (black).158 unique

spatiotemporal patterns, with at least 5 probe set members, were found.

Microscopy

Confocal images were obtained using a 10X lens or a 25X water-immersion lens on a Zeiss LSM-510

confocal laser-scanning microscope using the 488-nm laser line for excitation. Roots were stained

with 10 μg/mL propidium iodide for 0.5 to 2 minutes and mounted in water. GFP was rendered in

10

green and propidium iodide in red. Images were saved in .tif format. Images were manually stitched

together in Adobe Photoshop CS2 using the Photomerge command. No other image enhancement

was performed.

Image Analysis and Quantification

Images were straightened and preprocessed as described in (16). The images

were then aligned along the medial axis of the Arabidopsis root and partitioned into 50

equidistant sections. The GFP intensity for each section was obtained by summing all

pixels values within a bin and normalizing by the total number of pixels in each section.

Supplementary Material

Inference of Regulatory Modules

We inferred putative network connections by analyzing enriched cis-elements and TFs contained

within our identified transcriptional patterns on the premise that expression co-regulation was due to

direct physical interaction of a TF with its target cis-element. Further analysis of the spatially and

temporally regulated transcriptional programs identified 3 putative network connections.

MYB Promotion of Auxin Biosynthesis

In radial transcription pattern 5, which shows co-expression of Trp-dependent auxin biosynthetic

genes (Fig. 2E), the MYB binding site was identified as enriched (p<10-4) using the ATHENA TF

binding site enrichment tool (17). A single MYB domain TF is present in this pattern, ALTERED

TRYPTOPHAN REGULATION1 (ATR1). We can therefore infer the following network module for

auxin biosynthesis in cell types previously presumed to be unrelated (Fig. S21A). Note that among

the genes containing MYB binding sites, several are key regulators of Trp-dependent auxin

11

biosynthesis. Furthermore, a loss-of-function allele of ATR1 shows altered Trp-dependent auxin

biosynthesis and indole glucosinolate biosynthesis(18).

Putative Auxin Response Factor-Regulated Gene Expression in the Columella

In radial transcription pattern 13 (Fig 2F), columella-specific expression is found for genes involved in

auxin homeostasis. Furthermore, two auxin response factors (ARFs) are found within this set of

genes. ARF transcription factors are known to regulate gene expression through binding to the cis-

element, TGTCTC, and ARFs are known to heterodimerize(19). The expression patterns of these

ARFs are further regulated by micro-RNA mediated mechanisms(20). All transcriptional interactions

between these ARFs have not yet been identified. Transcriptional co-regulation of ARF gene

expression can provide clues as to specific spatial ARF-regulated modules. ARF10 has been

demonstrated to play a role in lateral root cap development (lateral root cap and columella) in

conjunction with ARF16, and ARF6 plays a role in floral maturation and development(21, 22). We

find strong co-expression of ARF10 and ARF6 in radial pattern 13. Furthermore, we have identified 5

putative targets of these ARFs which contain the TGTCTC element to infer a potential columella-

specific auxin-regulated module (Fig S21B). Note that one of these genes encodes SUPERROOT2

(At4g31500), a gene responsible for auxin homeostasis. This module could form a feedback loop to

regulate auxin levels (auxin regulates action of ARF genes, which in turn regulates expression of

target genes required for maintaining its homeostasis).

Putative WRKY-Regulated Modules in Hair Cells

We identified an enrichment of W box promoter elements (binding sites for WRKY TFs) and the

WRKY TF family in hair cells (p<10-10 and p<10-5.) We analyzed our groups of genes resulting from

the intersection of the radial and longitudinal data sets to delineate putative WRKY-regulated modules

in hair cells along the root’s longitudinal axis. In intersection pattern 150 (Fig. S21C), the WRKY9 TF

12

has a peak of expression in the basal meristematic zone. Three genes with W-boxes are closely

correlated with expression of this WRKY TF – At5g01320, At5g42860 and At5g46230. Further into

the root elongation zone, in intersection pattern 155 (Fig S21C), WRKY65 shows a peak of

expression, and again a further peak of expression in the maturation zone. Two different genes that

contain W-boxes (At5g16910, At5g20050) are potential targets of this WKRY due to their

spatiotemporal co-expression. These are but three examples of transcriptional regulatory modules

that were inferred from the expression data. This data should provide a rich resource for further

inference of modules in specific cell types at specific developmental stages.

13

Supplementary Figure Captions Figure S1. The number of genes enriched in each cell type. Vascular and hair cells contain the largest number of enriched genes, perhaps reflecting their specialized function.

Figure S2. Summary of significant GO term enrichment by cell type. GO term enrichment is displayed after hierarchical clustering. The hypergeometric distribution P value is log10 trans-formed. The figure is a duplicate of Fig. 2A at higher resolution.

Figure S3. Summary of significant array annotation term enrichment by cell type. The hyper-geometric distribution P value is log10 transformed.

Figure S4. Summary of significant cis element enrichment by cell type. Cis element enrichment was determined by ATHENA (17). The hypergeometric distribution P value is log10 transformed.

Figure S5. Putative novel function is assigned to xylem cells in the meristematic zone. Signifi-cantly enriched GO categories (top panel), array annotations (middle panel) and cis elements (bottom panel) associated with xylem cells in the meristematic zone (hypergeometric distribution P value <log10

–3).

Figure S6. Patterns show enrichment of expression in tissues that are ontologically (A) or spa-tially (B) related, or in cell types that are spatially separated (C and D). (A) Pattern 42 shows enrichment in the xylem tissue in all developmental stages. Genes assigned to these patterns show a strong enrichment for genes involved in microtubule-based processes (P = 4.03E–6) (B) Pattern 49 shows enrichment in the more mature xylem and in phloem tissue and a correspond-ing enrichment in ceramidase activity (P = 6.79E–5) and microtubule binding (P = 1.5E–4). (C) Pattern 35 shows enrichment in the columella and in the mature xylem and an enrichment for genes involved in proteolysis (P = 7.21E–4). These tissues are spatially separated by many cells. (D) Pattern 33 shows high enrichment in the hair cells and in developing xylem. Enrichment analysis suggests a shared developmental pathway, presumably related to new cell wall deposi-tion; (COPII vesicle coat (P = 4.07E–5), protein amino acid glycosylation (P = 6.5E–5), cell wall (P = 1.44E–4).

Figure S7. Each developmental zone shows a different distribution of pattern expression peaks. Based on peak expression within a longitudinal pattern, the pattern was assigned to the meris-tematic (A), elongation (B), or maturation (C) zone. The number of contiguous sections that show enrichment within each pattern were then counted. The distribution of these expression peaks was then plotted.

Figure S8. The number of probe sets assigned to each radial or longitudinal pattern. (A) The number of probe sets assigned to each radial pattern (B) The number of probe sets assigned to each longitudinal pattern.

Figure S9. Patterns which display expression fluctuations along the root’s longitudinal axis. Of the 40 dominant expression patterns identified by our computational pipeline, 17 show fluctua-tions in expression. The top and bottom images within each row show expression values that were mean-normalized across the longitudinal data set and log2 transformed to yield relative expression indices. The top panel displays this information in a heat map, and the bottom graph displays this information in a line graph.

Figure S10. Probe set expression between replicates is reproducible. The distribution of root mean squared differences between true probe set pairs (black) and each probe set and 10 ran-domly selected probe sets (red). Probe sets analyzed are the top 50% of expressed probe sets. For this analysis we normalized the data using mean = 0 and variance = 1. The two distributions

14

are separate, demonstrating that the data between replicates is not random, and therefore re-producible.

Figure S11. Probe set co-expression between roots for each pattern. Most probe set members maintain greater than 90% co-expression in a second root. For each pattern, the largest co-regulated group was considered.

Figure S12. The distribution of co-regulation of expression between roots by zone. Probe sets displaying peak enrichment in the meristematic, elongation or maturation zone, or patterns showing fluctuating expression along developmental time were analyzed for co-regulation of expression in a second root. The distribution of this co-regulation is plotted for the summary of each cluster type. Probe set patterns in the meristematic and maturation zone are the most ro-bust, as most of their sets show 90-100% co-regulation of probe set expression across a sec-ond root, while patterns in the elongation zone or patterns whose expression fluctuates are more variable.

Figure S13. Quantified GFP intensity matches the microarray expression levels. GFP quantifi-cation of images presented in Fig. 3 (A) AGAMOUS-LIKE21 (B) WEREWOLF (C) At4g05170, (D) At5g60200. Each root was divided into 50 equidistant units and these are represented on the x axis. Along the y axis are GFP intensity units normalized by the total area of each section.

Figure S14. Expression conferred by transcriptional:GFP fusions validates the microarray ex-pression profiles. Normalized expression of the gene is indicated in the top graph, and images of GFP expression conferred by the respective promoter are indicated in the lower image. (A) Expression of a NAC domain TF, AT3G29035, shows a peak of expression in the distal matura-tion zone. (B) Expression of a sugar transporter, AT3G05150, shows a peak of expression in the elongation zone. (C) Expression of a bZIP transcription factor, AT2G22850, shows a peak of expression in the maturation zone.

Figure S15. Expression conferred by the At3g43430 promoter, a zinc finger TF, shows fluctua-tion along the root longitudinal axis that varies between individual roots. (A) Expression values (y axis) of At3g43430 in both roots demonstrate varying fluctuation between both root samples. (B-E) Four images of roots display fluctuations of expression that differ between individual roots and that validate the observations made with microarray expression profiling. (Top) Quantifica-tion of GFP expression levels along the root’s longitudinal axis as in Fig. S13. (Bottom) The im-aged root whose GFP expression was quantified.

Figure S16. Intersection heat map matrix of a group containing AT4G05170. Intersection matrix is shown in left panel and corresponding GFP image of the transcriptional fusion on the right. The intersection predicts a peak of expression in the endodermis in the maturation zone.

Figure S17. Work-flow summary of our computational pipeline. Input data is filtered to remove low expressed and low varying probe sets. The remaining probe sets are then clustered using fuzzy k-means, and probe sets are kept in each cluster only if their probability of belonging to a cluster is greater than or equal to an m-value of 0.4. The resulting cluster profiles are then col-lapsed using Pearson correlation as a metric with a threshold value of 0.9 to identify distinct, dominant expression patterns. Probe sets are then assigned to each pattern if they correlate with a Pearson correlation coefficient of 0.85 or higher.

Figure S18. The average number of radial patterns a probe set belongs to plotted against m values (probability of belonging to a cluster). An m value of 0.4 was defined as a threshold to ensure that initial cluster profiles were only built from probe sets which had a probability of be-longing to that cluster.

15

15a

Figure S19. Distribution of probe set assignments to 0, 1, 2, 3, 4, 5, or 6 patterns. Due to as-signment of probe sets to patterns by Pearson correlation, genes may belong to more than one pattern.

Figure S20. Overlap between genes identified as cell type–enriched and genes present in the 51 dominant expression profiles. This was performed to ensure that the computational pipeline identified genes enriched in individual cell types as assigned by our first supervised approach.

Figure S21. Inference of transcriptional regulatory modules. (A) A putative MYB-regulated auxin biosynthesis module. A single MYB transcription factor, At5g60890 (ATR1) is present in a set of genes enriched for auxin biosynthesis function (Fig. 2E) and for MYB-binding sites. All genes containing a MYB-binding site are indicated as downstream of ATR1. All genes annotated as having a function in Trp-dependent auxin biosynthesis are indicated with a blue box and yellow shading. (B) A putative ARF-regulated module in the columella. At1g30330 (ARF6) and At2g28350 (ARF10) are strongly co-expressed in the columella. A set of genes also strongly co-expressed with these two members contain the binding site for ARFs and are indicated as downstream of ARF6 and ARF10. (C) Putative WRKY-regulated transcriptional modules identi-fied in different stages of root hair development from intersections of the radial and longitudinal data sets. WRKY TF binding sites (W-boxes) were enriched in root hairs (P < 10–10). The left panel indicates a subset of this hair cell enrichment in the basal meristematic zone. The WRKY TF At1g68150 is present in this gene group and three potential targets of this WRKY TF were inferred as being downstream as they contain a W-box and are co-expressed with At1g68150. The right panel indicates an additional group of genes which show fluctuation of expression in hair cells, first in the elongation zone and further in the maturation zone. This group of genes contains the WRKY TF At1g68150 and its two potential co-expressed downstream targets.

Supplementary Figure 1

16


17



18



19



20


21


22

0

2

4

6

8

10

12

14

16

18

20

0-10 10-20 20-30 30-40 40-50 50-60 60-70 70-80 80-90 90-100

% Co-Regulated

# PA T T E R N S




23


24


25


26


27


00.050.1

0.150.2

0.250.3

0.350.4

0.45

% of Probesets

0 1 2 3 4 5 6# of Patterns

RadialLongitudinal


28


29


30

Supplementary Online Material References

1. Levesque et al., PLoS Biology 4, e143 (2006).

2. Birnbaum et al., Nature Methods 2, 615 (2005).

3. Birnbaum et al., Science 302, 1956 (2003).

4. Lee et al., PNAS 103, 6055 (2006).

5. Imlau et al., Plant Cell 11, 309 (1999).

6. Laplaze et al., J. Exp. Bot. 56, 2433 (2005).

7. Brady et al., Plant Physiology 143, 172 (2007).

8. Jones et al., Plant Journal 45, 83 (2006).

9. Persson et al., PNAS 102, 8633 (2005).

10. Brown et al., Plant Cell 17, 2281 (2005).

11. Vanneste et al., Plant Cell 17, 3035 (2005).

12. Menges et al., Plant Journal 41, 546 (2005).

13. Nemhauser et al., Cell 126, 467 (2006).

14. Storey, Journal of the Royal Statistical Society, SeriesB 64, 479 (2002).

15. Gasch et al., Genome Biology 3, research0059.1 (2002).

16. Mace et al., Bioinformatics 22, e323 (2006).

17. O'Connor et al., Bioinformatics 21, 4411 (2005).

18. Celenza et al., Plant Physiology 137, 253 (2005).

19. Hardtke et al. (2004), vol. 131, pp. 1089-1100.

20. Mallory et al., Plant Cell 17, 1360 (2005).

21. Nagpal et al., Development 132, 4107 (2005).

22. Wang et al., Plant Physiology 17, 2204 (2005).

31

supporting online material for - science...supporting online material for a high-resolution root...

Documents