nucleosomes mark exons / supplementary materials€¦ · nucleosomes mark exons / schwartz et. al...

16
Nucleosomes Mark Exons / Schwartz et. al Nucleosomes Mark Exons / Supplementary Materials Schraga Schwartz, Eran Meshorer and Gil Ast Supplementary Results Analysis of non-coding exons: We were interested in determining whether the increased nucleosome occupancy levels observed along exons stemmed from sequence biases within exons that are due to the non-random composition and distributions of codons. We therefore generated three datasets of non-coding exons: exons fully located within the 5’ UTR, exons fully located within the 3’ UTR, and internal exons from non-coding genes. All three datasets were generated based on the human (hg18) UCSC knownGene table. For the 5’ UTR exons, we demanded that the exons end upstream of the CDS start site, whereas for the 3’ UTR exons we demanded that the exons begin downstream of the CDS end site. For the dataset of exons from non-coding genes, we used the transcript information table (kgTxInfo), which is linked to the knownGene table, providing annotation for each gene on whether it is translated or not. We also filtered out exons from non-coding genes that formed part of other transcripts in which these exons were annotated as coding. Finally, we filtered out exons longer than 300 nt, to make these analyses consistent with our other analyses. Our dataset included 5,927 internal non-coding exons, 8,198 internal 5’ UTR exons, and only 196 internal 3’ UTR exons. For all groups, we found increased nucleosome occupancy levels in activated human T cells within exons compared to introns (Supplementary Fig. 2b). 3’ UTR exons exhibited the same trend, but measurements were more noisy due to the small sample (data not shown). Analysis of sheared DNA: We were concerned that the bias of Solexa high throughput sequencing towards GC rich regions could bias our results 2,3 . To address this concern, we downloaded a dataset of >17 million mapped reads from sheared DNA in Jurkat cells subjected to Solexa high throughput sequencing from ref 4. We analyzed each such read in a manner similar to the way nucleosome occupancy reads were analyzed by Schones et al 5 : if the read was in the plus strand we considered the center of the mock nucleosome to be within a window 1 Nature Structural & Molecular Biology: doi:10.1038/nsmb.1659

Upload: others

Post on 14-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Nucleosomes Mark Exons / Supplementary Materials€¦ · Nucleosomes Mark Exons / Schwartz et. al Nucleosomes Mark Exons / Supplementary Materials Schraga Schwartz, Eran Meshorer

Nucleosomes Mark Exons / Schwartz et. al

Nucleosomes Mark Exons / Supplementary Materials

Schraga Schwartz, Eran Meshorer and Gil Ast

Supplementary Results

Analysis of non-coding exons: We were interested in determining whether the increased

nucleosome occupancy levels observed along exons stemmed from sequence biases within

exons that are due to the non-random composition and distributions of codons. We therefore

generated three datasets of non-coding exons: exons fully located within the 5’ UTR, exons

fully located within the 3’ UTR, and internal exons from non-coding genes. All three datasets

were generated based on the human (hg18) UCSC knownGene table. For the 5’ UTR exons,

we demanded that the exons end upstream of the CDS start site, whereas for the 3’ UTR exons

we demanded that the exons begin downstream of the CDS end site. For the dataset of exons

from non-coding genes, we used the transcript information table (kgTxInfo), which is linked to

the knownGene table, providing annotation for each gene on whether it is translated or not. We

also filtered out exons from non-coding genes that formed part of other transcripts in which

these exons were annotated as coding. Finally, we filtered out exons longer than 300 nt, to

make these analyses consistent with our other analyses. Our dataset included 5,927 internal

non-coding exons, 8,198 internal 5’ UTR exons, and only 196 internal 3’ UTR exons. For all

groups, we found increased nucleosome occupancy levels in activated human T cells within

exons compared to introns (Supplementary Fig. 2b). 3’ UTR exons exhibited the same trend,

but measurements were more noisy due to the small sample (data not shown).

Analysis of sheared DNA: We were concerned that the bias of Solexa high throughput

sequencing towards GC rich regions could bias our results2,3. To address this concern, we

downloaded a dataset of >17 million mapped reads from sheared DNA in Jurkat cells subjected

to Solexa high throughput sequencing from ref 4. We analyzed each such read in a manner

similar to the way nucleosome occupancy reads were analyzed by Schones et al5: if the read

was in the plus strand we considered the center of the mock nucleosome to be within a window

1Nature Structural & Molecular Biology: doi:10.1038/nsmb.1659

Page 2: Nucleosomes Mark Exons / Supplementary Materials€¦ · Nucleosomes Mark Exons / Schwartz et. al Nucleosomes Mark Exons / Supplementary Materials Schraga Schwartz, Eran Meshorer

Nucleosomes Mark Exons / Schwartz et. al

of 70-80 nt downstream, and if it was in the minus strand we considered the center of the mock

nucleosome to 70-80 nt upstream of the end coordinate of the read. We next correlated the

levels of reads per position across the dataset of constitutive exons and introns. Since we were

interested in assessing the effect of GC content, we also binned the exons into five equally

sized bins of gradually increasing GC content. We noted only negligent peaks within exons

(Supplementary Fig. 2c), and none in introns (Supplementary Fig. 2d).

Analysis of nucleosome occupancy based on tiling arrays in Drosophila melanogaster: We

obtained raw data pertaining to nucleosome occupancy levels in Drosophila embryos from ref

6. In this study, MNase treated chromatin was hybridized to high-density tiling (36-bp probe

spacing) GeneChip Drosophila tiling 1.0R arrays in two separate biological replicates. We

preprocessed the two tiling arrays using the Affymetrix TAS software with (default) quantile

normalization and scaled to a median intensity value of 500. Intensity values for each probe

were calculated using a bandwidth of 72 nt. In this manner, the intensity value reported for

each chromosomal location essentially reflects the intensities in the 145 nt surrounding this

region, corresponding approximately to the length of nucleosomes. The intensity reported by

TAS for each chromosomal location was assigned to the 25 nt interval (the length of the probe)

of each of the ~ 3 million probes in the array. As a next step, we generated a dataset of 33,598

internal exons, based on the Flybase Genes table of Drosophila (dm2, to correspond to the

array). We then mapped probes to the regions of 500 nt upstream and downstream of both the

3’ss and 5’ss of exons, and averaged the probe intensities across these positions. Results for

this analysis are presented in Supplementary Fig. 3a.

Characteristics and dynamics of H3K36me3 modified exons: In this section we describe in

detail analyses performed relating to the H3K36me3 modification. These analyses were all

performed independently of the findings by ref. 7, and were moved to Supplementary

Materials in light of this publication.

To verify that the H3K36me3 modification truly marks exons, we analyzed ChIP-seq data for

H3K36me3 in mouse embryonic stem cells (ESC) and mouse embryonic fibroblasts (MEF)1.

In both of these cell types, we observed an enrichment for H3K36me3 in the center of exons,

2Nature Structural & Molecular Biology: doi:10.1038/nsmb.1659

Page 3: Nucleosomes Mark Exons / Supplementary Materials€¦ · Nucleosomes Mark Exons / Schwartz et. al Nucleosomes Mark Exons / Supplementary Materials Schraga Schwartz, Eran Meshorer

Nucleosomes Mark Exons / Schwartz et. al

but not of introns, which again correlated with transcript expression levels within these cell

types (Supplementary Fig. 4a).

We next prepared custom UCSC Genome Browser tracks in order to visually inspect genes in

which exons are marked by the H3K36me3 modification. For the generation of custom UCSC

tracks (http://genome.ucsc.edu/), we expanded the binding sites to a 200-nt window centered

around the binding point. The 100 nucleotides in the center were weighted with the number of

tags covering them and the 50 more distal nucleotides at both sides were weighted with 50% of

the number of tags to reflect the lower confidence of these positions. Several genes exhibiting

such marked patterns are displayed in Supplementary Fig. 5.

To understand to what extent H3K36me3 might be functional, we assessed the conservation of

H3K36me3 modified exons, and their flanking introns. As expected from functional

conservation, we found a highly significant (Mann-Whitney, P=3.2e-80) increased

evolutionary conservation within H3K36me3 modified exons with respect to their non-

modified counterparts (Supplementary Fig. 4b), implying that exonic regions bound to

H3K36me3 modified nucleosomes are under evolutionary selection. As observed in our

analysis of nucleosome occupancy (Fig. 1f), we also found that H3K36me3 enrichment was

correlated with the level of exon inclusion (Supplementary Fig. 4c). There was a ~3 fold

increase in the prevalence of H3K36me3 in constitutive exons with respect to introns, and a

~1.4-1.7 increase with respect to alternative exons; these results are consistent with results

obtained by ref. 7.

To determine whether H3K36me3 levels within exons are consistent among different tissues

and organisms, we analyzed the correlations between exon modification levels in mouse

embryonic stem cells (ESC) and embryonic fibroblasts (MEF). Similarly to our approach in

human, we generated a dataset of 54,564 constitutive exons, 2103 alternative exons, 25,290

introns, and 12,605 promoters. We found a positive, highly significant association between the

two (Pearson R=0.25, P~0). To assess the extent to which this correlation is dependent on

transcription levels, we next binned all mouse genes into 8 bins, depending on the extent to

which their expression changed between the two tissues. In each of these bins we calculated

3Nature Structural & Molecular Biology: doi:10.1038/nsmb.1659

Page 4: Nucleosomes Mark Exons / Supplementary Materials€¦ · Nucleosomes Mark Exons / Schwartz et. al Nucleosomes Mark Exons / Supplementary Materials Schraga Schwartz, Eran Meshorer

Nucleosomes Mark Exons / Schwartz et. al

the correlation between H3K36me3 modification levels. We found that as the differences in

expression increases, the correlation between the two tissues in terms of H3K36me3

modification decreases (Supplementary Fig. 4d). It is noteworthy, however, that even in the

last bin representing the greatest changes in terms of gene expression, there is still a low but

highly significant positive correlation between H3K36me3 modification levels across the two

tissues.

To obtain a wider perspective on the dynamics of this modification, we compiled a dataset of

44,100 human-mouse orthologous exons. We used Galaxy8 to obtain coordinates of

human/mouse pairwise alignment, in order to establish orthology between human and mouse.

For a pair of human/mouse exons to be defined as orthologs, we required that at least 30 nt (or

50%) of each exon formed part of the alignment, according to the alignment block. For each

pair of exons, we determined the extent of H3K36me3 modifications in human T cells, in

mouse ESC and in MEF. As shown in Supplementary Fig. 4e, 2611 orthologous exons were

enriched with H3K36me3 in both human T cells and mouse ESC; this was highly statistically

significant (hypergeometric test, P =2.4e-168) and similar results were obtained for additional

pairwise comparisons across the different tissues and organisms (Supplementary Fig. 4f-h).

However, for the majority of exon pairs, H3K36me3 modification was present only in either

human T cells or mouse ESC, but not in both. Taken together, these results indicate that

although H3K36me3 is to a large extent tissue-specific and organism-specific, a significant

portion of modified exons nonetheless displays a tendency to remain stable across different

organisms, different tissues, and different levels of expression.

Mutual information based analysis for comparing between expression levels and GC

content as predictors for histone modification levels: To assess whether modification levels

are better explained in terms of GC content than in terms of gene expression levels, we used

mutual information. Mutual information is a quantity that measures the mutual dependence of

two variables, and is calculated as:

∑∑∈ ∈

=Yy Xx ypxp

yxpyxpYXI ))()(

),(log(),();(21

4Nature Structural & Molecular Biology: doi:10.1038/nsmb.1659

Page 5: Nucleosomes Mark Exons / Supplementary Materials€¦ · Nucleosomes Mark Exons / Schwartz et. al Nucleosomes Mark Exons / Supplementary Materials Schraga Schwartz, Eran Meshorer

Nucleosomes Mark Exons / Schwartz et. al

where p(x,y) is the joint probability distribution function of X and Y, and p (x)1 and p (y)2 are the

marginal probability distribution functions of X and Y respectively. For calculating this value

we made use of the bioDist() package9 in R, based on discretization of the variables into 5 bins.

5Nature Structural & Molecular Biology: doi:10.1038/nsmb.1659

Page 6: Nucleosomes Mark Exons / Supplementary Materials€¦ · Nucleosomes Mark Exons / Schwartz et. al Nucleosomes Mark Exons / Supplementary Materials Schraga Schwartz, Eran Meshorer

Nucleosomes Mark Exons / Schwartz et. al

Supplementary Tables

Modification

Total number of

tags

# enriched windows identified by SISSRs

# alternative exons

overlapping enriched windows

# constitutive exons

overlapping enriched windows

# introns overlapping

enriched windows

# promoters overlapping

enriched windows

H4K20me1 11015873 234648 633 11814 2846 1894H3K36me3 13572575 122109 411 11204 2031 1143H2BK5me1 8942880 137924 419 7685 1855 955H3K4me1 11322526 219938 390 4278 1984 898H3K79me1 5137886 66175 211 3039 969 414H3K9me1 9311627 142750 298 2206 1352 934H3K79me3 5929782 159098 227 2161 1490 720H3K4me3 16845478 137286 351 2100 399 2436H4K91ac 3191156 78439 203 1566 383 1316PolII 4150378 55154 158 1495 297 1573H3K4me2 5447902 87006 215 1409 584 668H3K79me2 4712875 131738 170 1340 1215 518H2BK5ac 3330268 72141 170 1258 348 1308H3K27ac 3433165 81502 203 1222 407 1400H2BK120ac 3444551 74580 170 1208 294 1216H2AZ 7536100 109907 209 1204 239 1413H2BK20ac 4083727 72633 136 1016 299 965H3K18ac 4249604 81890 168 991 280 1324H3K4ac 3546672 47799 111 719 195 944H4K5ac 4118574 53617 108 642 233 636CTCF 2947043 26814 48 631 197 335H3K9ac 3950661 37876 133 607 92 1160H2BK12ac 3615226 36301 78 471 143 482H4K8ac 4278905 36129 72 439 132 556H3K36ac 4374235 32098 79 340 87 637H3K27me1 10047279 16843 15 335 264 48H4K16ac 7059753 10283 22 328 43 265H4K20me3 5720089 35890 39 203 120 96H2AK5ac 3442542 11734 9 170 72 42H3K9me3 6348997 20814 19 99 64 38H2AK9ac 2070246 2950 10 73 6 120H4K12ac 3677187 2327 2 40 24 45H3R2me1 9560224 4252 0 39 17 8H3R2me2 6521560 3334 2 29 8 3H3K23ac 2527421 1618 0 15 12 8H3K27me3 8970141 2458 1 15 3 6H4R3me2 7357597 7994 2 10 4 1H3K27me2 9070882 2095 1 8 0 0H3K36me1 8077127 2139 1 8 3 0H3K9me2 9782127 2310 1 7 0 0H3K14ac 3799058 722 1 2 1 1

MEF.H3K36me3 6893084 54596 121 6196 1212 263ES.H3K36me3 6297739 51980 140 8018 1037 213

NP.H3K36me3 5050661 2960 12 402 86 13

hum

anm

ouse

6

Supplementary Table 1: Summary of ChIP-seq analyses in human T cells and in mouse embryonic fibroblasts (MEF), stem cells (ES), and neural progenitors. The number of short tags obtained for each modification is presented, along with the number of significantly enriched windows obtained by the SISSRs algorithm. In addition, the number of enriched windows overlapping the 400 nt surrounding the center of alternative exons, constitutive exons, introns, and transcription starts sites are indicated. Modifications with fewer than 700 occurences within constitutive exons are shaded in grey. We discarded these modifications from most our analyses, since the signal-to-noise ratio within them is very low.

Nature Structural & Molecular Biology: doi:10.1038/nsmb.1659

Page 7: Nucleosomes Mark Exons / Supplementary Materials€¦ · Nucleosomes Mark Exons / Schwartz et. al Nucleosomes Mark Exons / Supplementary Materials Schraga Schwartz, Eran Meshorer

Nucleosomes Mark Exons / Schwartz et. al

Genome Buildnumber of

exons

mean exon length

mean upstream

intron length

mean downstream intron length

Human Refseq, March 2006 168272 151.7 5882.4 4875.2Mouse RefSeq, July 2007 152374 151.1 4944.9 4023.5Chicken Ensemble, 2006 139890 139.7 2582.7 2253.2Zebrafish Ensemble, July 2007 173972 135.8 2231.6 2100.5Ciona Ensemble, March 2005 92337 142.5 549.9 535.0Drosophila Flybase, April 2006 35923 382.5 1376.8 1110.7C. elegans Ensemble, May 2008 89358 144.1 851.6 772.9

Supplementary Table 2: Genome builds used for the evolutionary analysis in Fig. 4. The number of internal exons extracted for each organism, and the mean length of exons and their flanking introns are presented as well.

7Nature Structural & Molecular Biology: doi:10.1038/nsmb.1659

Page 8: Nucleosomes Mark Exons / Supplementary Materials€¦ · Nucleosomes Mark Exons / Schwartz et. al Nucleosomes Mark Exons / Supplementary Materials Schraga Schwartz, Eran Meshorer

Nucleosomes Mark Exons / Schwartz et. al

Supplementary Figures

Supplementary Fig. 1: Visualization of nucleosome occupancy levels in activated and inactivated T cells (two upper tracks, respectively), as wells as H3K36me3 modification levels (orange tracks), across segments along four different genes based on customized UCSC genome browser tracks (http://genome.ucsc.edu/). The RefSeq track is included, indicating the presence of exons (boxes) and introns (lines). The conservation track (Mammal Cons) is shown at the bottom of each panel.

8Nature Structural & Molecular Biology: doi:10.1038/nsmb.1659

Page 9: Nucleosomes Mark Exons / Supplementary Materials€¦ · Nucleosomes Mark Exons / Schwartz et. al Nucleosomes Mark Exons / Supplementary Materials Schraga Schwartz, Eran Meshorer

Nucleosomes Mark Exons / Schwartz et. al

Supplementary Fig. 2: Validation of nucleosome occupancy peaks within exons. (a) Nucleosome occupancy as in Fig. 1b, but in resting T-cells5. Exons were aligned by their 3’ss (left panel) and by their 5’ss (right panel). Exons were divided into five bins based on transcript expression levels in resting T cells5. (b) Nucleosome occupancy levels in activated T cells along non-coding exons in the 1000 nt surrounding the exons (as in a). Two sets of non-coding exons are shown: internal exons fully residing in the 5’ UTR and and internal exons from non-coding genes. (c) Mock levels of nucleosome occupancy in the region surrounding the 3’ss and 5’ss of constitutive exons, after dividing them into five bins based on gradually increasing GC content. Mock nucleosome occupancy levels were calculated based on a dataset of sheared DNA in Jurkat cells4. (d) Analysis as in c, but for the 600 nt surrounding the center of introns.

9Nature Structural & Molecular Biology: doi:10.1038/nsmb.1659

Page 10: Nucleosomes Mark Exons / Supplementary Materials€¦ · Nucleosomes Mark Exons / Schwartz et. al Nucleosomes Mark Exons / Supplementary Materials Schraga Schwartz, Eran Meshorer

Nucleosomes Mark Exons / Schwartz et. al

Supplementary Fig. 3: Additional validation of nucleosome occupancy peaks within exons. (a) Nucleosome occupancies along exonic and intronic regions in Drosophila melanogaster, based on analysis of MNase treated chromatin hybridized to tiling arrays6. Note that intensities were log2 transformed. (b) Nucleosome occupancy within the last 100 exonic nt and first 300 nt downstream of it, as a function of PPT strength, complementary to Fig. 2b. (c) Nucleosome occupancy levels in activated T cells in the regions flanking the 3’ss (left panel) and 5’ss (right panel). Exons were divided into five equally sized groups, the length range of which is plotted as part of the label on the figure. In this analysis very long exons (>300) were not discarded, in contrast to the analyses shown throughout the manuscript, since we were interested in including long exons as well. (d) Nucleosome occupancy levels in alternative and constitutive exons, divided into 10 bins. The value for each bin represents the average occupancy within a decile of the exon length.

10Nature Structural & Molecular Biology: doi:10.1038/nsmb.1659

Page 11: Nucleosomes Mark Exons / Supplementary Materials€¦ · Nucleosomes Mark Exons / Schwartz et. al Nucleosomes Mark Exons / Supplementary Materials Schraga Schwartz, Eran Meshorer

Nucleosomes Mark Exons / Schwartz et. al

11

Supplementary Fig. 4: Analysis of histone modification H3K36me3. (a) Tag density profiles for H3K36me3 across a 2,000 nt window surrounding the center of introns, alternative exons, constitutive exons, and the transcription start sites in embryonic stem cells and mouse embryonic fibroblasts. Exons/introns/promoters were binned into five equally sized bins based on their expression levels, derived from1. (b) Conservation of sequences within exons modified and non-modified by H3K36me3, depicted in red and green, respectively, among 18 placental organisms as a function of distance from the 3’ss (left panel) and the 5’ss (right panel). (c) Mean tag coverage across introns, alternatively spliced exons (with inclusion levels below and above 50%, respectively), and constitutively spliced exons. Error bars present the SEM. The numbers on the bars indicate the percentage of exons/introns subjected to the modification. (d) Pearson correlation between H3K36me3 modification levels in mouse embryonic stem cells (ESC) and mouse embryonic fibroblasts (MEF), within a dataset of constitutive mouse exons. The correlations were derived across eight bins of gradually increasing differences in terms of gene expression, calculated as the absolute difference between the rank of the gene expression levels across the two tissues. Error bars represent the 95 confidence intervals for the correlations. (e-g) Venn-diagrams presenting the extent of overlap between H3K36me3 modified exons across different tissues and/or organisms. MEF – mouse embryonic fibroblasts; ES – mouse embryonic stem cells. Nature Structural & Molecular Biology: doi:10.1038/nsmb.1659

Page 12: Nucleosomes Mark Exons / Supplementary Materials€¦ · Nucleosomes Mark Exons / Schwartz et. al Nucleosomes Mark Exons / Supplementary Materials Schraga Schwartz, Eran Meshorer

Nucleosomes Mark Exons / Schwartz et. al

12

Supplementary Fig. 5: Visualization of H3K36me3 modifications across four genes, based on customized UCSC genome browser tracks (http://genome.ucsc.edu/). The modifications are plotted in orange, above the RefSeq track indicating the presence of exons (boxes) and introns (lines). The conservation track (Mammal Cons) is shown at the bottom of each panel.

Nature Structural & Molecular Biology: doi:10.1038/nsmb.1659

Page 13: Nucleosomes Mark Exons / Supplementary Materials€¦ · Nucleosomes Mark Exons / Schwartz et. al Nucleosomes Mark Exons / Supplementary Materials Schraga Schwartz, Eran Meshorer

Nucleosomes Mark Exons / Schwartz et. al

Supplementary Fig. 6: Prevalence of 8 different modifications across introns and constitutive exons, as a function of their location within genes. The ordinate depicts the percentage of exons subjected to a modification.

13Nature Structural & Molecular Biology: doi:10.1038/nsmb.1659

Page 14: Nucleosomes Mark Exons / Supplementary Materials€¦ · Nucleosomes Mark Exons / Schwartz et. al Nucleosomes Mark Exons / Supplementary Materials Schraga Schwartz, Eran Meshorer

Nucleosomes Mark Exons / Schwartz et. al

Supplementary Fig. 7: The majority of modifications are better explained in terms of GC content than in terms of expression. (a-d) Comparison between modifications levels across constitutive exons (panel a) and promoters (panels b-d) for four sample modifications when binned either by expression GC content (left panels) or by expression levels (right panel). (e) Mutual information for each modification within constitutive exons when assessed based on expression and on GC content. The higher mutual information for GC content indicates that GC content is more informative than expression in terms of predicting whether a certain modification occurs. Similar results were obtained when performing the analysis on introns, alternative exons, and promoters (data not shown).

14Nature Structural & Molecular Biology: doi:10.1038/nsmb.1659

Page 15: Nucleosomes Mark Exons / Supplementary Materials€¦ · Nucleosomes Mark Exons / Schwartz et. al Nucleosomes Mark Exons / Supplementary Materials Schraga Schwartz, Eran Meshorer

Nucleosomes Mark Exons / Schwartz et. al

Supplementary Fig. 8: Interplay between transcription and nucleosome occupancy. (a) RNAPII levels within introns and constitutively spliced exons across seven equally sized bins of gradually increasing expression. Error bars indicate the SEM. (b) Changes in expression levels between activated and resting T cells, as a function of changes in nucleosome occupancy. For each exon, we calculated the difference in gene expression levels (∆expression) and in nucleosome occupancy levels (∆nucleosome occupancy) between activated and resting T cells. The exons were divided into 10 bins based on ∆nucleosome occupancy values and for each of these we calculated mean ∆expression values, indicating the difference between expression levels in activated and inactivated T cells. Nucleosome occupancy levels and gene expression values were first normalized separately for each tissue, to mean values of 0 and standard deviations of 1.

15Nature Structural & Molecular Biology: doi:10.1038/nsmb.1659

Page 16: Nucleosomes Mark Exons / Supplementary Materials€¦ · Nucleosomes Mark Exons / Schwartz et. al Nucleosomes Mark Exons / Supplementary Materials Schraga Schwartz, Eran Meshorer

Nucleosomes Mark Exons / Schwartz et. al

References 1. Mikkelsen, T.S. et al. Genome-wide maps of chromatin state in pluripotent and lineage-

committed cells. Nature 448, 553-60 (2007). 2. Dohm, J.C., Lottaz, C., Borodina, T. & Himmelbauer, H. Substantial biases in ultra-

short read data sets from high-throughput DNA sequencing. Nucleic Acids Res 36, e105 (2008).

3. Hillier, L.W. et al. Whole-genome sequencing and variant discovery in C. elegans. Nat Methods 5, 183-8 (2008).

4. Valouev, A. et al. Genome-wide analysis of transcription factor binding sites based on ChIP-Seq data. Nat Methods 5, 829-34 (2008).

5. Schones, D.E. et al. Dynamic regulation of nucleosome positioning in the human genome. Cell 132, 887-98 (2008).

6. Mavrich, T.N. et al. Nucleosome organization in the Drosophila genome. Nature 453, 358-62 (2008).

7. Kolasinska-Zwierz, P. et al. Differential chromatin marking of introns and expressed exons by H3K36me3. Nat Genet (2009).

8. Taylor, J., Schenck, I., Blankenberg, D. & Nekrutenko, A. Using galaxy to perform large-scale interactive data analyses. Curr Protoc Bioinformatics Chapter 10, Unit 10 5 (2007).

9. Ding, B., Gentleman, R. & Carey, V. bioDist: Different distance measures.

16Nature Structural & Molecular Biology: doi:10.1038/nsmb.1659