supplementary information tiny rnas associated with ... · tiny rnas associated with transcription...
Post on 20-Feb-2019
224 Views
Preview:
TRANSCRIPT
Taft RJ et al. – tiRNAs
1
Supplementary Information
Tiny RNAs associated with transcription start sites
in animals
Ryan J. Taft1, Evgeny A. Glazov2, Nicole Cloonan1, Cas Simons1, Stuart Stephen1, Geoff Faulkner1, Timo Lassmann3, Alistair R.R. Forrest3,4, Sean M. Grimmond1, Kate Schroder1, Katharine Irvine1, Takahiro Arakawa3, Mari Nakamura3, Atsutaka Kubosaki3, Kengo Hayashida3, Chika Kawazu3, Mitsuyoshi Murata3, Hiromi Nishiyori3, Shiro Fukuda3, Jun Kawai3, Carsten O. Daub3, David A. Hume1,5, Harukazu Suzuki3, Valerio Orlando6, Piero Carninci3, Yoshihide Hayashizaki3 and John S. Mattick1 1Institute for Molecular Bioscience, The University of Queensland, St. Lucia, QLD 4072, Australia.
2Diamantina Institute for Cancer, Immunology and Metabolic Medicine, The University of Queensland, Princess Alexandra Hospital, Ipswich Road, Woolloongabba, Qld, 4102, Australia.
3RIKEN Omics Science Center, RIKEN Yokohama Institute, 1-7-22 Suehiro-cho Tsurumi-ku Yokohama, Kanagawa, 230-0045 Japan.
4The Eskitis Institute for Cell and Molecular Therapies, Griffith University, QLD 4111, Australia.
5The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Roslin, EH259PS, UK.
6Dulbecco Telethon Institute, IGB CNR, Epigenetics and Genome Reprograming lab, Via Pietro Castellino 111, Napoli, 80131, and Dulbecco Telethon Institute, IRCCS Santa Lucia at EBRI, Via del Fosso di Fiorano 64, Rome 00146, Italy.
Bioinformatics correspondence should be addressed to R.J.T (r.taft@imb.uq.edu.au). Experimental correspondence should be addressed to V.O (orlando@igb.cnr.it) or P.C (carninci@riken.jp). General correspondence should be addressed to Y.H. (yosihide@gsc.riken.jp) or J.S.M. (j.mattick@imb.uq.edu.au).
Nature Genetics: doi:10.1038/ng.312
Taft RJ et al. – tiRNAs
2
Supplementary Figures
Supplementary Figure 1
Example tiRNA loci. (a) Human (red) and chicken (light brown) tiRNAs are conserved at the EIF4G2 transcription start site in human. Regions of RNA PolII binding are depicted in dark brown, Sp1 binding regions in yellow, and CpG islands in green. The strand orientation of tiRNAs and deepCAGE clusters (dark blue) are indicated as white chevrons. (b) Drosophila tiRNAs (dark red) downstream of the Adh TSS. TiRNA strand orientation is indicated as white chevrons.
Nature Genetics: doi:10.1038/ng.312
Taft RJ et al. – tiRNAs
3
Supplementary Figure 2
Chicken small RNA size distributions by embryonic stage. The size distribution of all uniquely mapping small RNA tags from chicken embryonic stage day 5 (brown), day 7 (orange), and day 9 (yellow). Day 5 displayed the weakest enrichment (16-fold) at Refgene TSSs, while both CE7 and CE9 showed ~60-fold enrichment.
Nature Genetics: doi:10.1038/ng.312
Taft RJ et al. – tiRNAs
4
Supplementary Figure 3
Drosophila tiRNAs size and position characteristics. Small RNAs were obtained from Ruby et al.1. (a) The black line indicates the transcription start site, and the black arrow depicts the direction of transcription. Gray bars represent windows of 10 nt, and those above the x axis depict small RNAs with the same strand orientation as the TSS. Bars below the x axis (negative values) indicate small RNAs antisense to the TSS. Small RNAs are dominantly upstream and in the same orientation as the TSS. (b) Small RNAs that map to the same strand and are found in the region -60 to +120 relative to the TSS, or on the opposite strand within 400 nt upstream of the TSS, are dominantly 18 nt.
Nature Genetics: doi:10.1038/ng.312
Taft RJ et al. – tiRNAs
5
Supplementary Figure 4
Small RNA density and abundance with respect to TSSs in human. (a) The genome-wide distribution of THP-1 small RNA 5’ends (red) and deepCAGE abundance (gray line) relative to transcription start sites (black bar and arrow, indicating the direction of transcription) shows an ~ 20 nt offset between peak densities, indicating that tiRNAs are not truncated 5’ capped transcripts. (b) The distribution of THP-1 small RNAs at 1 nt resolution with respect the most highly expressed deepCAGE tag from active promoters identified as either broad with peak (PB) or single peak (SP). These promoter types have single dominant transcription start sites (see text). The black bar and arrow indicate transcription start and the direction of transcription, respectively.
Nature Genetics: doi:10.1038/ng.312
Taft RJ et al. – tiRNAs
6
Supplementary Figure 5
3’ end small RNAs. (a) The size distribution of unannotated human THP-1 small RNAs from the 3’ end of annotated Refgenes. 3’ end associated small RNAs and tiRNAs are significantly different in size (P < 10-4; one tailed T-test). (b) The size distribution of chicken small RNAs from the most 3’ end of Refgenes. Chicken 3’ end small RNAs and tiRNAs are also significantly different in size (P < 10-4; one tailed T-test).
Nature Genetics: doi:10.1038/ng.312
Taft RJ et al. – tiRNAs
7
Supplementary Figure 6
The relationship between deepCAGE and tiRNA abundance. Human THP-1 tiRNA and deepCAGE abundance do not exhibit a linear relationship, suggesting that although tiRNAs are generally associated with highly expressed genes tiRNAs can also be associated with weakly expressed transcripts.
Nature Genetics: doi:10.1038/ng.312
Taft RJ et al. – tiRNAs
8
Supplementary Figure 7
Gene and tiRNA expression in 0-1h Drosophila embryo. Gene expression values from Arbeitman et al.2 0-1h embryos were compared with tiRNA abundance in either 0-1h replicate from Chung et al. (top two panels, GEO datasets GSM286604 and GSM286613), with the abundance of tiRNAs observed in both replicates (bottom left panel), or with the abundance of tiRNAs observed in either replicate (bottom right panel).
Nature Genetics: doi:10.1038/ng.312
Taft RJ et al. – tiRNAs
9
Supplementary Figure 8
Gene and tiRNA expression in 2-6h Drosophila embryo. Gene expression values from Arbeitman et al.2 2-6h embryos were compared with tiRNA abundance in either 2-6h replicate from Chung et al. (top two panels, GEO datasets GSM286605 and GSM286606), with the abundance of tiRNAs observed in both replicates (bottom left panel), or with the abundance of tiRNAs observed in either replicate (bottom right panel).
Nature Genetics: doi:10.1038/ng.312
Taft RJ et al. – tiRNAs
10
Supplementary Figure 9
Gene and tiRNA expression in 6-10h Drosophila embryo. Gene expression values from Arbeitman et al.2 6-10h embryos were compared with tiRNA abundance in either 6-10h replicate from Chung et al. (top two panels, GEO datasets GSM286607 and GSM286611), with the abundance of tiRNAs observed in both replicates (bottom left panel), or with the abundance of tiRNAs observed in either replicate (bottom right panel).
Nature Genetics: doi:10.1038/ng.312
Taft RJ et al. – tiRNAs
11
Supplementary Figure 10
Gene and tiRNA expression relationships. The relative expression of Drosophila genes and their corresponding tiRNAs at three embryonic time points. Heatmap data are sorted by gene expression values from 0-1h embryo, from high (red) to low (green). TiRNA abundance values range from highly expressed (red) to absent (black). Log2 median ratio gene expression values range from +3.3 to -6.0. TiRNA abundance was normalized as a proportion of the total abundance of the unannotated small RNAs in each library (see Table 1). TiRNA abundance is generally low, ranging from 1 to 1,000 normalized counts. We observed no consistent relationship between relative gene expression and tiRNA abundance. Gene expression data were obtained from Arbeitman et al.2 and tiRNA abundance heatmaps were generated using Mayday3 (see Supplementary Methods).
Nature Genetics: doi:10.1038/ng.312
Taft RJ et al. – tiRNAs
12
Supplementary Figure 11
TiRNAs in mutant Drosophila and Argonaute immunoprecipitations. (a) The size distribution of tiRNAs in Dcr-2-/-, loqs-/-, and wild type Drosophila ovaries from Czech et al. 4. TiRNAs are not affected by loss of Dcr-2 but may be affected by the loss of loquacious, although the absence of the characteristic 18 nt tiRNA peak in the loqs-/- library is likely due to alterations in library preparation. (b) The relative proportion and size of small RNAs which map to -60 to +120 nt relative to Refgene TSSs derived from AGO1 and AGO2 IPs. TiRNAs are not observed in these libraries.
Nature Genetics: doi:10.1038/ng.312
Taft RJ et al. – tiRNAs
13
Supplementary Figure 12
Properties of tiRNAs from undifferentiated human THP-1 cells. The density distribution of small RNAs 5’ ends in undifferentiated THP-1 cells at (a) 10 nt and (b) 1nt resolution relative to TSSs. The black bar and arrow indicate the transcription start site and the direction of transcription, respectively. (c) The size distribution of tiRNAs in undifferentiated THP-1 cells. (d) Genes with tiRNAs in undifferentiated THP-1 cells (red) are more highly expressed than those without tiRNAs (gray). (e) The proportion of deepCAGE tag defined promoters (black), deepCAGE promoters with tiRNAs that are not associated with Refgenes (blue), and deepCAGE promoters with tiRNAs and associated with Refgenes (red) that are associated with regions of the genome showing H3K9-aceylation or PU.1, RNA PoII, or Sp1 binding in undifferentiated THP-1 cells.
Nature Genetics: doi:10.1038/ng.312
Taft RJ et al. – tiRNAs
14
Supplementary Methods
THP-1 small RNA deep sequencing
Cell culture and RNA extraction
THP-1 cells were cultured in RPMI, 10% FBS, penicillin/streptomycin, 10mM HEPES, 1mM
sodium pyruvate, 50µM 2-mercaptoethanol, and treated with 30ng/ml PMA (Sigma) to
differentiate them into macrophage-like cells. We prepared 5 short RNA libraries from
undifferentiated THP-1 cells from specific size fractions (11-22 nt, 22-32 nt, 32-42nt, 42-
52nt, and 52-82 nt) and from an additional 6 small RNA libraries (~15 nt to ~40 nt) from
THP-1 cells over a time-course of PMA differentiation (0, 2, 4, 12, 24, 96h).
Total RNA was extracted using the AGPC (acid–guanidinium-phenol-chloroform)
method, and all precipitations were done with ethanol, instead of isopropyl alcohol, in order
to ensure the recovery of short oligonucleotides. CTAB selective precipitation of long RNA 5,
was performed to separate long and short RNAs. Short RNAs (<75bp) were isolated from the
CTAB precipitation supernatant by precipitation with 2 volumes of ethanol. The RNA pellet
was resuspended in 7M GuCl and ethanol precipitated a second time.
Mixed short RNA library construction
Short RNAs derived from each time point were tagged with a 4nt tissue ID tag during the
adaptor ligation step. RNA-DNA hybrid oligonucleotide adaptor ligation was carried out
using 10µg total short RNA, 100µM of a 5’ adaptor containing an EcoRI recognition site
(Supplementary Table 2) and 100µM of a specific 3’ adaptor containing an EcoRI
recognition site and a 4 nt Tissue ID tag (Supplementary Table 2), with T4 RNA Ligase
(TaKaRa) for 16hrs at 15°C. The sample:adaptor:adaptor mixture ratio was 1µg short RNA:
100µM 5’adaptor 0.7µl : 100µM 3’adaptor 0.7µl. At the end of reaction, samples for each
mixed library were pooled, treated with 20mg/ml Proteinase K (15 mins, 37°C) and purified
by phenol/chloroform extraction and ethanol precipitated.
Purified short RNAs were separated from adaptor dimers on an 8% denaturing PAGE
gel. Short RNAs were excised and eluted from the gel in TEN elution buffer (10mM Tris·HCl
pH7.5, 1mM EDTA pH 7.5, 250mM NaCl) for ~16hrs at 4°C. Gel extracted short RNA tags
were filtered through MicroSpin Empty Columns (Amersham Biosciences) in TEN buffer
three times to remove any polyacrylamide contaminant and then purified by ethanol
precipitation.
Nature Genetics: doi:10.1038/ng.312
Taft RJ et al. – tiRNAs
15
cDNA synthesis was carried out on purified short RNAs by RT-PCR
(Supplementary Table 2) with M-MLV Reverse Transcriptase RNase H Minus, Point
Mutant (Promega). RT products were calibrated to determine the ratio of products derived
from individual time points in the libraries.
cDNAs derived from short RNA tags were amplified by PCR using adaptor-specific
primers (Supplementary Table 2). PCR was performed on 5 µl of template RT mixture,
with 1x buffer, 3 µl of DMSO, 12 µl of 2.5 mM dNTPs, 1.5 µl of 100uM Primer 1
(Supplementary Table 2), 1.5 µl of 100uM Primer 2 (Supplementary Table 2), 0.5 µl of
EX Taq polymerase (5 units/µl, TaKaRa) in a total volume of 50ul. After incubating at 94°C
for 1 min, ~12-14 cycles were performed for 30 sec at 94°C, 30 sec at 57°C, 1 min at 70°C;
followed by 5 min incubation at 70°C. PCR products were pooled, purified, ethanol
precipitated and resuspended in 40 µl of TE buffer. The PCR products were purified on a
12% polyacrylamide gel. The 60~80 bp fraction was cut out of the gel, eluted in 500 µl of
SAGE elution buffer (2.5mM Tris·HCl pH7.5 /1.25mM ammonium acetate /0.17mM EDTA
pH 7.5) for 16hrs at room temperature. Gel extracted short RNA tags were filtered twice
through with MicroSpin Empty Columns by centrifugation at 3000rpm for 2 min in SAGE
buffer (2.5mM Tris·HCl pH7.5, 1.25mM ammonium acetate, 0.17mM EDTA pH 7.5),
purified by ethanol precipitation, and re-suspended in 25 µl of 0.1x TE buffer and quantified
with Picogreen.
PCR-amplified, gel-purified short RNA tags were re-amplified in a total volume of
100 µl containing 2ng of short RNA tags, 6 µl of DMSO, 12 µl of 2.5 mM dNTPs, 2 µl of
100uM Primer 1 (Supplementary Table 2), 2µl of 100uM Primer 2 (Supplementary Table
2), 0.8 µl of EX taq polymerase (5 units/µl, TaKaRa). After incubating at 94°C for 1 min, ~8-
9 cycles were performed at 30 sec at 94°C, 30 sec at 57°C, 1 min at 70°C followed by 5 min
at 70°C. The PCR products were pooled, purified, ethanol-precipitated and re-dissolved in 50
µl of TE buffer.
PCR products were further purified with G-50 micro-columns (GE Healthcare),
ethanol precipitated and resuspended in 100 µl of TE buffer. The concentration was measured
with Picogreen. PCR products were digested with EcoRI (Fermentas, 3µg/reaction), followed
by Proteinase K treatment (20mg/ml, 45C, 15 minutes).
DNA tags derived from short RNAs were separated from the free DNA ends derived
from the ligated adaptors (cut off during restriction) by incubation with streptavidin-coated
magnetic beads. The cleaved tags were mixed with the beads (700 µl) and incubated at room
Nature Genetics: doi:10.1038/ng.312
Taft RJ et al. – tiRNAs
16
temperature for 15 mins with mild agitation. The beads were rinsed with 50 µl of 1x BW
buffer (1M NaCl, 0.5mM EDTA, 5mM Tris-HCl (pH7.5)), extracted by phenol/chloroform
followed by ethanol precipitation and resuspension in 40µl of TE buffer, or purified through
Microcon YM10 columns with buffer exchange into 0.1x TE. Short RNA tags were further
purified on a 12% polyacrylamide gel. The desired fraction was cut out of the gel, crushed,
and eluted in SAGE buffer for 16hrs at room temperature, followed by purification,
concentration with YM10 columns, and ethanol precipitation. The DNA was finally
resuspended in 6 µl of 0.1x TE buffer and quantified with Picogreen.
The short RNA tags (total yield) and adaptors (1/20 quantity of short RNA tags) were
concatenated in a 10 µl reaction with T4 DNA ligase (NEB) for 16hrs at 15°C. Proteinase K
digestion was carried out by adding 70µl of TE buffer and 20mg/ml Proteinase K and
digesting at 45°C for 15 minutes. Concatenated tags were purified with GFX columns
(Amersham) to eliminate short concatamers (<100bp). The eluted sample (50ul) was
transferred for sequencing, and concatamerized tags derived from short RNAs were
sequenced using the Roche FLX Genome Sequencer6.
Preparation of the capped small RNA library with a modified oligo-capping protocol
Capped short RNAs were identified using an oligo-capping protocol, similar to what has been
previously described for the capture of capped 5’ ends of full-length mRNAs 7-9. Briefly, total
RNA was dephosphorylated, then decapped with Tobacco Acid Pyrophosphatase and
subsequently ligated to RNA/DNA linkers (as described above). To dephosphorylate small
RNAs we began with 3 µg of short RNAs in 30 µl total volume, which we heated for 5
minutes at 65°C and then chilled on ice. We added 6 µl of 10x Antarctic phosphatase reaction
buffer, 2 µl Cloned RNase inhibitor (40 U/µl TaKaRa), 6 µl Antarctic phosphatase (5 U/µl,
NEB), and 16 µl of water and incubated the solution at 37°C for 2 hours. The sample was
then phenol-chloroform extracted, ethanol precipitated, and dissolved in 42.9 µl of water. To
decap we heated the sample at 65°C for 5min, chilled it on ice, and then added 5 µl 10x TAP
buffer, 2 µl Cloned RNase Inhibitor, (40 U/µl, TaKaRa), 0.1 µl Tobacco acid
pyrophosphatase (150 U/µl, Nippon Gene). The solution was incubated at 37°C for 1 hour.
Next, cDNA was prepared by cleavage of the linkers and purification of the insert by
electrophoresis and concatenation, as described in the Mixed short RNA library construction,
above. Finally, the sample was subjected to Roche FLX Genome Sequencing.
Nature Genetics: doi:10.1038/ng.312
Taft RJ et al. – tiRNAs
17
Short RNA library sequencing and tag extraction
We used in-house algorithms for linker masking and the extraction of short RNA tags. Short
RNA tags were extracted with the following parameters: EcoRI ligated doublet linker (12-
16bp) masking: maximum mismatch, 2 bp allowed; short RNA tag length, no limits.
General bioinformatics
Bioinformatic analysis was done on a high performance computing station which houses a
local mirror of the UCSC Genome Browser10. All small RNA datasets were mapped using
Vmatch (http://www.vmatch.de/). We required small RNAs to map uniquely to the genome
of interest without any mismatches. The resulting set was further filtered to remove any small
RNAs that intersected with repeat masker annotations, random chromosomes, the
mitochondrial genome, miRNA and snoRNA loci, unannotated genomic sequences with high
homology to tRNAs or rRNAs, and assembly gaps. Filtered small RNA datasets are hereafter
referred to as “unannotated small RNAs”. Genomic features and annotations, unless
otherwise noted, were obtained through the local UCSC mirror. Transfer RNA and rRNA
sequences not annotated by Repeat masker were identified by BLAST homology searches
(requiring 95% sequence identity) against rRNAs and tRNAs identified in Genbank from the
species of interest. Intersections between features (e.g. small RNAs and repeats) required a
minimum of 1 base of overlap, and were accomplished using a modified version of UCSC’s
back end tool, bedIntersect.
Bootstrap analysis
A perl script executing a bootstrap analysis was used to estimate the likelihood of small
RNAs overlaping deepCAGE promoters (for THP-1 small RNAs) or a Refgene TSSs (for
human, chicken and Drosophila small RNAs). For these analyses small RNAs and promoters
were collapsed down to individual loci using UCSC’s featureBits tool, eliminating the
possibility that multiple small RNAs and promoters mapping to the same region could
artificially inflate the results. Small RNAs were randomly assigned new chromosomal
locations, and the number intersecting with promoters or Refgene TSSs was tabulated. This
process was repeated for 105 iterations. Fold enrichment was determined by dividing the
number of observed overlaps by the average number of overlaps in all iterations.
Nature Genetics: doi:10.1038/ng.312
Taft RJ et al. – tiRNAs
18
THP-1 small RNA mapping and analysis
THP-1 small RNA tags were mapped to human genome (UCSC hg18, NCBI Build 36.1) and
pooled across time points to increase the effective depth of the analysis, consistent with the
analysis of promoters identified by deepCAGE (see below). Intersections with genomic
features (e.g. known small RNA loci, repeats) were performed as described above. We
obtained a total of ~10 million human small RNA reads, which collapsed down to 46,076
uniquely mapping sequences with a total abundance of ~2 million reads. We found a total of
23,628 unannotated small RNAs with a total abundance of 345,753 reads (~7 counts per
unannotated small RNA per million mapped tags). We found 2312 tiRNAs with a total
abundance of 3702 (~0.8 counts per tiRNA per million mapped tags). To estimate the number
of tiRNAs per cell we calculated ratio of mir-15a abundance (~20,000 counts) to average
tiRNA abundance (~ 0.8 counts) and compared this ratio with the number of mir-15a copies
per cell (~4000)11. We generated 7,518 control and 8,374 cap-trapped THP-1 small RNA
sequencing reads, and found a total abundance of 6 and 8 tiRNAs, respctively. Small RNA
distributions with respect to the TSS (both sense and antisense) were calculated by tabulating
the number of small RNA 5’ ends in 1 nt or 10 nt windows – e.g. the number of small RNA
5’ ends that map to bases 0 to +10 relative to the transcription start (either a Refgene
annotated TSS or the most abundant deepCAGE tag in a clustered promoter). Because some
TSSs map close to one another, a small RNA can be counted in more than one bin. However,
we found that this occurred for less than 15% of small RNAs and did not substantially affect
the results.
To ensure that sequence composition biases at promoters were not affecting small
RNA mapping we examined all promoter regions (-60 to +120 nts relative to the most highly
expressed CAGE tag) with evidence of tiRNAs and created an index of all unique Nmers (14
-23 nts) in the human genome. We found that unique 18mer Nmers are not overrepresented at
these at promoters. We then analyzed the number of unique small RNA mappings at these
regions and compared them with the expected number of mappings, based on the unique
Nmer index. We found fewer small RNAs of every size class (except 14mers, which are the
most weakly represented), with respect to 18mers, than we would expect by chance. We also
examined multi-mapping tags to assess if restricting our analyses to uniquely mapping tags
was biasing our results. We found ~5x more 14mers and ~1.5x more 15mer multi-mapping
than uniquely mapping tags. However, we found fewer multi-mapping than uniquely
mapping tags in all other size classes. The inclusion of multi-mapping tags increases the
Nature Genetics: doi:10.1038/ng.312
Taft RJ et al. – tiRNAs
19
number of 14 and 15mer tags in the tiRNA dataset (as expected by random chance), but these
are still more than two fold less abundant than 18mer tags.
Evofold, phastCons, and CpG island loci were obtained from the local mirror of the
UCSC Genome Browser. Intersections between tiRNAs and these genomic features were
performed using a modified version of UCSC’s bedIntersect. Sequence analysis was
performed using python scrips and Unix tools. Refgene 3’ end associated small RNAs were
identified by dividing all Refgenes into deciles (to normalize for size), and extracting small
RNAs which mapped to the same strand and in the 3’ 10% of any Refgene. A one-tailed T-
test was used to test if size distributions were different between tiRNAs and 3’ end small
RNAs.
Analysis of THP-1 promoters
DeepCAGE8,9,12,13 was performed in triplicate at five time points (1, 4, 12, 24, and 96 hours)
during THP-1 differentiation in response to phorbol 12-myristate 13-acetate (PMA)
stimulation. DeepCAGE tags were mapped to the human genome (UCSC hg18, NCBI Build
36.1) by aligning perfectly matching tags first, then those tags that map with a single base
pair substitution and finally tags which contain a single insertion or deletion. A filter was
applied to remove rRNA-derived tags. For tags that map to multiple locations a probabilistic
model, previously described by Faulkner et al.14, was used to assign weights to each of the
possible genomic mappings. To identify promoters we first normalized the CAGE data from
each sample by scaling CAGE tag counts such that the distribution of the number of tags per
position matches a common reference (power-law) distribution. We used technical replicates
to estimate experimental noise. We found that the noise distribution is well described by a
convolution of multiplicative noise and Poisson sampling noise. Using this noise model, a
Bayesian procedure was used to calculate, for each consecutive pair of TSSs, the probability
that both TSSs were expressed in a fixed relative proportion across all samples. Neighbouring
TSSs with a high probability of expression in a constant proportion were then hierarchically
joined into clusters. Promoters were defined as significantly expressed clusters, i.e. those that
have at least 1 tag in at least 2 samples and whose maximum expression across all samples is
at least 10 tags per million. All other TSS clusters were discarded.
DeepCAGE tags were clustered into a total of ~18,000 high confidence active
promoters. These promoters contain ~20% (~250,000) of all mapped deepCAGE tags. On
average these active promoters spanned 33 nt and were composed of 16 tags, with a mean tag
abundance of 2 counts per million (cpm) sequenced tags. Promoters that mapped to repeat
Nature Genetics: doi:10.1038/ng.312
Taft RJ et al. – tiRNAs
20
masker annotations, random chromosomes, assembly gaps, the mitochondrial genome, or
annotated small RNAs were removed from the analysis. The remaining 14,818 promoters
were used for all subsequent analysis. Less than 0.07% of promoters overlap any annotated
small RNA loci (including miRNAs and snoRNAs), indicating that the CAGE libraries are
not contaminated with small RNAs. Promoter architecture was assessed using a python script
incorporating previously published criteria8. Promoters with less than 10 total tags were
excluded from promoter architecture analysis. Using previously reported promoter
architecture definitions we found that the promoters used in all tiRNA analyses were
predominantly broad with peak (PB, 46.1%), followed by generally broad (BR, 34.4%),
single peak (SP, 14.4%), and multimodal (MU, 5.1%)8.
THP-1 gene expression analysis
Refgene annotations were obtained from the local mirror of the UCSC Genome Browser. A
deepCAGE promoter cluster mapping within -300 to +100 nt relative to an annotated
Refgene TSSs was defined as Refgene associated. Correspondingly, these genes were
identified as 'present' by deepCAGE. The most highly expressed deepCAGE tags from
promoters mapping within Refgene promoter regions are tightly associated with annotated
TSSs. Nearly one third map to the first nucleotide of an annotated Refgene TSS, and nearly
two thirds map within 50 nt of the annotated Refgene TSS. A two-tailed T-test was used to
test if deepCAGE expression levels between promoters with and without tiRNAs were
different.
To determine relative expression levels by microarray we queried THP-1 RNA
samples identical to those used for deepCAGE libraries (derived undifferentiated THP-1
cells, and at 1, 4, 12, 24, and 96 hours after macrophage differentiation in response to PMA).
RNA was purified for expression analysis by Qiagen RNeasy columns, Takara FastPure RNA
Kit or by TRIzol. RNA quality was analyzed by Nanodrop and Bioanalyser. RNA (500 ng)
was amplified using the Illumina TotalPrep RNA Amplification Kit, according to
manufacturer’s instructions. cRNA was hybridized to the Illumina Human Sentrix-6 bead
chips Ver.2, according to standard Illumina protocols (http://www.illumina.com). Chip scans
were processed using Illumina BeadScan and BeadStudio software packages and summarized
data was generated in BeadStudio (version 3.1). Quantile normalization of Illumina data and
B-statistic calculations were carried out using the lumi and limma packages of Bioconductor
in the R statistical language 15-17. Refgenes associated with tiRNA promoters were identified,
and refSeq mRNA accession numbers were retrieved and mapped to the Human Illumina V2
Nature Genetics: doi:10.1038/ng.312
Taft RJ et al. – tiRNAs
21
probe centric "genome" in Genespring v7.3.1. Quantile normalized data generated from PMA
treated THP-1 biological replicates were used to examine expression levels. A chi-squared
test was used to determine statistical significance.
THP-1 promoter ChIP-chip analysis
THP-1 cells were cross-linked with 1% formaldehyde for 10 min, and 125mM glycine in
PBS was added. Cross-linked cells were collected by centrifugation and washed twice in cold
1 x PBS. The cells were sonicated for 5~7 min with a Branson 450 Sonicator to shear the
chromatin. Complexes containing DNA bound to histone H3 acetylated at lysine 9 (H3K9Ac)
were immunoprecipitated with an antibody against H3K9Ac (07-352, Upstate) by overnight
rotation at 4°C. The immunoprecipitated sample was incubated with magnetic beads/Protein
G (Dynal) for 1 hr at 4°C followed by one wash with each of (1) Low salt wash buffer (0.1%
SDS, 1% Triton X-100, 2mM EDTA, 20mM Tris.HCl (pH8.1), 150mM NaCl), (2) High salt
wash buffer (0.1% SDS, 1% Triton X-100, 2mM EDTA, 20mM Tris.HCl (pH 8.1), 500mM
NaCl) and (3) LiCl wash buffer (10mM Tris.HCl (pH8.1), 0.25M LiCl, 0.5% NP-40, 0.5%
Sodium deoxycholate, 1mM EDTA, and two washes with TE buffer). The antibody-
H3K9Ac-DNA complexes were eluted from the magnetic beads by addition of 1% SDS and
100 mM NaHCO3. Beads were vortexed for 60 min at RT. The supernatants were incubated
for 3.5 hr at 65°C to reverse the cross-links, and incubated for further 30 min at 65°C in the
presence of 20mg/ml RNaseA. To purify the DNA, proteinase K solution was added at a final
concentration of 100mg/ml, and the samples were incubated overnight at 45°C, followed by a
phenol:chloroform:isoamyl alcohol extraction and ethanol precipitation to recover the DNA.
PU.1, Sp1 and RNA Polymerase II (PolII) DNA complexes were likewise
immunoprecipitated using antibodies T-21 (Santa-cruz), 07-645 (Upstate), and 8WG16
(Abcam), for PU.1, Sp1 and PolII, respectively.
Immunoprecipitated DNA was blunted using 0.25U/µl T4 DNA polymerase (Nippon
Gene). Linker oligonucleotides (5’-accgcgcgtaatacgactcactataggg-3’ and Phosphate-5’-
ccctatagtgagtcgtattaca-3’) were annealed to the DNA while the temperature was decreased
gradually from 99°C to 15°C over 90 min. The blunted immunoprecipitated DNA sample
was ligated to the annealed oligonucleotides with 500U of T4 DNA ligase (Nippon Gene).
The cassette DNA fragments (45ug/reaction) were amplified with Blend Taq Plus (Toyobo)
using the linker-specific oligonucleotide 5’-accgcgcgtaatacgactcactataggg-3’. PCR cycling
conditions were as follows: denaturation at 95°C for 1 min; 25 cycles of 95°C for 30 s, 55°C
for 30 s, 72°C for 2 min; and a final extension at 72°C for 7 min. Amplified DNA was
Nature Genetics: doi:10.1038/ng.312
Taft RJ et al. – tiRNAs
22
purified, fragmented with DNase I (Epicentre), and end-labeled with biotin-ddATP using
terminal deoxytransferase (Roche). Amplified DNA was hybridized to Affymetrix whole
genome tiling or promoter arrays for 18 h at 45°C, washed, and scanned using the Affymetrix
GeneChip System. Each sample was hybridized in triplicate. Affymetrix Human Tiling
Arrays (1.0) were used to measure H3K9Ac enrichment. PU.1 and Sp1 enrichment were
measured using Affymetrix Human Promoter arrays (1.0R). Three technical replicates were
performed for ChIP-chip experiments of H3K9, Sp1 and PU.1, and two technical replicates
for those of PolII.
RNA Polymerase II-immunoprecipitated DNA was treated with CIP and poly-dT
tailed using terminal transferase. The T7 poly-A primer (5’-CATTAGCGGCCGCGAAATT
AATACGACTCACTATAGGGAGAAAAAAAAAAAAAAAAAA [C or T or G] -3’) was
annealed and the DNA sample was subjected to second strand synthesis using DNA
polymerase I (Invitrogen) as follows; 94°C for 2min, ramp down to 35°C (1°C/sec), hold at
35°C for 2 min, ramp down to 25°C (0.5°C/sec), hold and add DNA polymerase I at 37°C for
90 min. After second strand synthesis, the reaction was terminated by EDTA addition and the
DNA was column-purified. DNA was amplified by in vitro transcription (IVT) using CUGA
T7-RNA polymerase (Nippon gene). RNA obtained from poly-dT-tailed DNA was purified
using the RNeasy Mini kit (Qiagen) and used to synthesize (cDNA) with SuperScriptII
(Invitrogen) and random primers. The DNA T7-polyA primer was annealed to the first strand
DNA to synthesize second strand DNA. The second strand DNA was amplified in a second
round of IVT, performed as described above. The amplified RNA (cRNA) was also purified
in the IVT amplification. The collected cRNA was used to synthesize double-strand cDNA.
The double-stranded cDNA, fragmented with DNase I (Epicentre), was end-labelled with
biotin-ddATP by using terminal deoxytransferase (Roche). After hybridizing the end-labelled
DNA fragments to the tiling arrays (Affymetrix Human Tiling Array 2.0R) for 18 h at 45°C,
the arrays were washed and scanned using the Affymetrix GeneChip System. Each of the
treatment and control samples was hybridized twice, to provide technical replicates.
The enrichment of DNA fragments immunoprecipitated with H3K9Ac compared to
the human genome was determined using the Affymetrix whole-genome tiling array (1.0R).
This array tiles the non-repetitive portion of the human genome at 35-bp intervals with more
than 41 M pairs of 25-mer probe sequences. The hybridization intensities (background-
subtracted intensity; PM – MM, where PM and MM indicate intensities detected by a 25-mer
perfectly matching and another one-base-mismatching the genome, respectively) of the
probes were measured in three technical replicates and quantile-normalized for each of the
Nature Genetics: doi:10.1038/ng.312
Taft RJ et al. – tiRNAs
23
treatment and control samples. A shift of the intensities in the treatment relative to control
data in a 400-bp window centered at each probe was evaluated by a Wilcoxon Rank Sum test,
which assigned a P-value to the probe position. We used the Affymetrix software, GTAS
(http://www.affymetrix.com/support/developer/downloads/TilingArrayTools) for the P-value
calculation. Enrichment of DNA fragments precipitated with RNA PolII compared to the
human genome was measured by using Affymetrix Human tiling array (2.0R). This array
tiles the same portion of human genome as 1.0R with only PM probes. Two technical
replicates were performed for both treatment and control samples in measurement of the PolII
enrichment, and the enrichment measure, P-value was calculated by using GTAS as
described for H3K9Ac.
Enrichment of PU.1 and Sp1-precipitated DNA was measured using the Affymetrix
Human Promoter arrays that tile promoter regions (7.5 kb upstream and 2.45 kb downstream
of transcription start sites) of annotated genes at 35-bp intervals with 25-mer probes.
Hybridization intensities were measured in three technical replicates for each of the treatment
and control samples. The enrichment measure expressed as a P-value was calculated by using
GTAS as described above.
The genome coordinates of the 25-mer probes, originally based on the version hg16
of human genome, were converted to hg18. The positions of the probes on hg18 were
determined by aligning the probe sequences to the human genome (hg18) using Vmatch
(http://www.vmatch.de).
ChIP-chip data were analysed such that a base must be bound to the protein or marker
of interest in both replicates in both undifferentiated cells and cells after 96h of exposure to
PMA at statistically significant levels. Undifferentiated and 96h ChIP-chip data were pooled
and clustered such that any 'present' base must have at least one other 'present' base within 35
nt. Intersections between ChIP-chip features, deepCAGE, and tiRNAs were completed using
a modified version of UCSC’s bedIntersect.
Undifferentiated THP-1 small RNA analysis
To ensure that pooling the deepCAGE and small RNA deep sequencing data across time
points was not distorting our results we restricted our analysis to small RNAs from
undifferentiated THP-1 cells (i.e. 0h). Using deepCAGE tags detected in at least two
replicates at 0h, we found that all trends observed for the pooled dataset are recapitulated at
0h, although overall less robustly. We found 156 small RNAs >200 fold enriched at 240
active promoters, which map to regions -60 to +120 nt relative to the TSS, and exhibit high
Nature Genetics: doi:10.1038/ng.312
Taft RJ et al. – tiRNAs
24
density 10 nt or further downstream (Supplementary Fig. 12a,b,c online). The vast majority
of these tiRNAs and their associated promoters map to Refgene TSSs (79% and 83%
respectively), which are highly expressed (Supplementary Fig. 12d online) and are enriched
for Sp1 and RNA PolII binding (Supplementary Fig. 12e online). 0h tiRNAs are dominantly
18nt and have no intersection with Evofold predictions. Only one third intersect with a
phastCons element. Consistent with tiRNAs from the pooled dataset we found that 0h
tiRNAs were ~72% GC.
Chicken small RNA analysis
Approximately 3 million sequences from embryonic chicken small RNA libraries made from
embryos collected at day 5, day 7 and day 9 of incubation (hereafter referred to as CE5, CE7
and CE9) were obtained from Glasov et al., GEO Series ID GSE1068618. Tags were mapped
to UCSC genome build galGal3 (v2.1 draft assembly, Genome Sequencing Center,
Washington University School of Medicine). We obtained a total of 130,588 uniquely
mapping sequences (69,011, 39,964, 21,613 from CE5, CE7, and CE9 respectively) with a
total abundance of 3,559,917 reads (1,192,303, 1,193,318, and 1,174,296 reads from CE5,
CE7, and CE9 respectively). We found 115,271 unannotated small RNAs (53,694, 39,964,
and 21,613 from CE5, CE7, and CE9 respectively) with a total abundance of 484,124 reads
(210,811, 185,168, and 88,145 reads from CE5, CE7, and CE9 respectively), or ~1.2 counts
per unannotated small RNA per million mapped tags. We identified a total of 1628 tiRNAs
(485, 822, 321 from CE5, CE7, and CE9 respectively) with a total abundance of 1769 counts
(512, 917, 340 reads from CE5, CE7, and CE9 respectively), or ~0.3 counts per tiRNA per
million mapped tags. Refgene, phastCons, and CpG island coordinates were obtained directly
through the UCSC Genome Browser mirror. Known small RNA loci were compiled from
miRBase (v 10.0), and sequence homology searches with known mammalian snoRNAs19.
Refgene TSSs coordinates were extracted from the UCSC Genome Browser.
Bootstrap enrichment was preformed as described above. Small RNA distributions with
respect to the TSS (both sense and antisense) were calculated as described above. Due to the
paucity of Refgene annotations in the Gallus gallus genome, and therefore the limited
number of TSSs used in this analysis, small RNAs mapping to more than one window was
observed in less than 2% of cases. A one-tailed T-test was used to asses the significance of
difference between tiRNAs and 3’ end small RNA sizes.
Nature Genetics: doi:10.1038/ng.312
Taft RJ et al. – tiRNAs
25
Drosophila small RNA and gene expression analysis
Drosophila melanogaster deep sequencing libraries were obtained through NCBI GEO.
Libraries GSE74481, GSE1162420, and GSE110864 were mapped to the Drosophila genome
(UCSC dm3, BDGP Release 5) as described above. Acquisition of genomic features and
removal of small tags that mapped to small RNAs, repeats, etc. was accomplished as
described above. GSE7448 showed 111,017 uniquely mapping tags with a total abundance of
358,893, and 78,276 unannotated small RNAs with a total abundance of 123,183 or ~4
counts per unannotated small RNA per million mapped tags. We found 1972 tiRNAs in
GSE7448, with a total abundance of 3060, or ~4 counts per tiRNA per million mapped tags.
GSE11624 showed 1,055,295 uniquely mapping tags with a total abundance of 5,650,248,
and 664,962 unannotated small RNAs with a total abundance of 1,644,447 or ~0.4 counts per
unannotated small RNA per million mapped tags. We found 29,722 tiRNAs with a total
abundance of 52,941, or ~0.3 counts per tiRNA per million mapped tags. Bootstrap
enrichment was preformed as described above. Small RNA distributions with respect to the
TSS (both sense and antisense) were calculated as described above. Small RNAs mapping to
multiple windows was observed in less than 10% of cases.
To examine the relationship between tiRNA abundance and gene expression we
obtained median normalized gene expression values for all genes detected at all life cycle
time points (3,318 Flybase genes) in the Arbeitman et al.2 dataset from our in-house UCSC
mirror (UCSC Genome Browser table hgFixed.arbFlyLifeMedianRatio). We created a
MySQL relational database of gene expression values and tiRNA abundance per gene. The
significance of differences in gene expression levels of genes with and without tiRNAs at
different embryonic time points was assessed by a Welch Two Sample t-test. Gene
expression and tiRNA abundance heatmaps were generated using Mayday3.TiRNA
abundance was normalized as a proportion of the total abundance of the unannotated small
RNAs in each library (see Table 1).
Gene Ontology analysis
Gene Ontology enrichment for human and Drosophila genes was assessed using a local
installation of GeneMerge21, which utilizes a hypergeometric test and a Bonferroni correction
to asses statistical significance. Gene Ontology enrichment for genes with tiRNAs and
present in the Arebeitman et al. gene expression data2 was done against a background of all
Arbeitman et al. Flybase genes. All other enrichments were done against all human or
Drosophila Refgenes.
Nature Genetics: doi:10.1038/ng.312
Taft RJ et al. – tiRNAs
26
References 1. Ruby, J.G. et al. Evolution, biogenesis, expression, and target predictions of a
substantially expanded set of Drosophila microRNAs. Genome Res. 17, 1850-64
(2007).
2. Arbeitman, M.N. et al. Gene expression during the life cycle of Drosophila
melanogaster. Science 297, 2270-5 (2002).
3. Dietzsch, J., Gehlenborg, N. & Nieselt, K. Mayday--a microarray data analysis
workbench. Bioinformatics 22, 1010-2 (2006).
4. Czech, B. et al. An endogenous small interfering RNA pathway in Drosophila. Nature
453, 798-802 (2008).
5. Lagonigro, M.S. et al. CTAB-urea method purifies RNA from melanin for cDNA
microarray analysis. Pigment Cell Res. 17, 312-5 (2004).
6. Margulies, M. et al. Genome sequencing in microfabricated high-density picolitre
reactors. Nature 437, 376-80 (2005).
7. Carninci, P. et al. High-efficiency full-length cDNA cloning by biotinylated CAP
trapper. Genomics 37, 327-36 (1996).
8. Carninci, P. et al. Genome-wide analysis of mammalian promoter architecture and
evolution. Nat. Genet. 38, 626-35 (2006).
9. Fromont-Racine, M., Bertrand, E., Pictet, R. & Grange, T. A highly sensitive method
for mapping the 5' termini of mRNAs. Nucl. Acids Res. 21, 1683-4 (1993).
10. Karolchik, D. et al. The UCSC Genome Browser Database: 2008 update. Nucl. Acids
Res. 36, D773-779 (2008).
11. Eis, P.S. et al. Accumulation of miR-155 and BIC RNA in human B cell lymphomas.
Proc. Natl. Acad. Sci. U S A 102, 3627-32 (2005).
12. Shiraki, T. et al. Cap analysis gene expression for high-throughput analysis of
transcriptional starting point and identification of promoter usage. Proc. Natl. Acad.
Sci. U S A 100, 15776-81 (2003).
13. de Hoon, M. & Hayashizaki, Y. Deep cap analysis gene expression (CAGE): genome-
wide identification of promoters, quantification of their expression, and network
inference. Biotechniques 44, 627-8, 630, 632 (2008).
14. Faulkner, G.J. et al. A rescue strategy for multimapping short sequence tags refines
surveys of transcriptional activity by CAGE. Genomics 91, 281-8 (2008).
Nature Genetics: doi:10.1038/ng.312
Taft RJ et al. – tiRNAs
27
15. Lin, S.M., Du, P., Huber, W. & Kibbe, W.A. Model-based variance-stabilizing
transformation for Illumina microarray data. Nucl. Acids Res. 36, e11 (2008).
16. Smyth, G.K. Linear models and empirical bayes methods for assessing differential
expression in microarray experiments. Stat. Appl. Genet. Mol. Biol. 3, Article3
(2004).
17. Smyth, G.K., Yang, Y.H. & Speed, T. Statistical issues in cDNA microarray data
analysis. Methods Mol. Biol. 224, 111-36 (2003).
18. Glazov, E.A. et al. A microRNA catalog of the developing chicken embryo identified
by a deep sequencing approach. Genome Res. 18, 957-64 (2008).
19. Griffiths-Jones, S., Saini, H.K., van Dongen, S. & Enright, A.J. miRBase: tools for
microRNA genomics. Nucl. Acids Res. 36, D154-8 (2008).
20. Chung, W.J., Okamura, K., Martin, R. & Lai, E.C. Endogenous RNA interference
provides a somatic defense against Drosophila transposons. Curr. Biol. 18, 795-802
(2008).
21. Castillo-Davis, C.I. & Hartl, D.L. GeneMerge--post-genomic analysis, data mining,
and hypothesis testing. Bioinformatics 19, 891-2 (2003).
Nature Genetics: doi:10.1038/ng.312
Biological Process Molecular Function Cellular CompartmentGO Term Fold pValue Description GO Term Fold pValue Description GO Term Fold pValue Description
Human refGenesGO:0006412 3.23 6.87E-10 translation GO:0003735 3.40 7.77E-09 structural constituent of ribosome GO:0005842 5.31 2.59E-05 cytosolic large ribosomal subunit
GO:0004842 2.83 3.56E-03 ubiquitin-protein ligase activity GO:0005843 4.50 9.05E-03 cytosolic small ribosomal subunit GO:0005515 1.33 1.49E-05 protein binding GO:0005840 3.76 3.76E-08 ribosome
GO:0005737 1.57 4.70E-06 cytoplasmGO:0005634 1.36 1.89E-07 nucleus
(GSM286604 & GSM286613)0 -1h Embryo : expressed > 0.5 median ratioGO:0006413 4.21 1.463E-03 translational initiation GO:0003743 4.21 7.049E-04 translation initiation factor activity GO:0005842 4.66 8.558E-05 cytosolic large ribosomal subunit GO:0006412 3.38 2.754E-07 translation GO:0003735 3.23 5.638E-05 structural constituent of ribosome GO:0005829 3.42 2.959E-03 cytosol
0 -1h Embryo : expressed < 0.5 median ratioNA NA GO:0016459 9.29 6.720E-04 myosin complex
(GSM286605 & GSM286606)2 -6h Embryo : expressed > 0.5 median ratioGO:0006412 3.70 1.863E-05 translation GO:0003735 4.03 1.545E-05 structural constituent of ribosome GO:0005843 7.52 5.393E-04 cytosolic small ribosomal subunit
GO:0003676 2.31 9.321E-06 nucleic acid binding GO:0005842 5.65 1.855E-04 cytosolic large ribosomal subunit 2 -6h Embryo : expressed < 0.5 median ratioGO:0035152 15.61 8.538E-03 regulation of tracheal tube architecture NA NA
(GSM286607 & GSM286611)6 -10h Embryo : expressed > 0.5 median ratioGO:0016360 9.76 5.457E-03 sensory organ precursor cell fate determination GO:0003735 4.47 3.735E-08 structural constituent of ribosome GO:0005842 6.68 2.003E-07 cytosolic large ribosomal subunit GO:0007424 4.09 2.021E-03 tracheal system development (sensu Insecta) GO:0003700 2.54 7.826E-04 transcription factor activity GO:0005634 2.09 2.355E-08 nucleusGO:0006412 3.68 4.142E-06 translation GO:0003676 2.29 2.628E-06 nucleic acid binding
6 - 10h Embryo : expressed < 0.5 median ratioNA NA NA
Genes with tiRNAs in either replicate at all embryonic time pointsGO:0015992 13.66 2.821E-05 proton transport GO:0008553 14.64 5.149E-08 hydrogen-exporting ATPase activity, phosphorylative mechanism GO:0000276 19.52 5.064E-04 proton-transporting ATP synthase complex, coupling factor F(o) GO:0015986 9.76 2.101E-05 ATP synthesis coupled proton transport GO:0046961 9.76 8.702E-06 hydrogen ion transporting ATPase activity, rotational mechanism GO:0005843 11.71 2.231E-04 cytosolic small ribosomal subunit GO:0006412 5.44 1.278E-06 translation GO:0046933 9.76 8.702E-06 hydrogen ion transporting ATP synthase activity, rotational mechanismGO:0005842 9.25 8.901E-06 cytosolic large ribosomal subunit
GO:0003735 6.91 7.509E-09 structural constituent of ribosomeGO:0003676 2.63 2.509E-04 nucleic acid binding
GSM286604: 0 -1h EmbryoGO:0015992 5.45 1.711E-09 proton transport GO:0004129 5.21 2.188E-03 cytochrome-c oxidase activity GO:0005842 6.73 1.253E-21 cytosolic large ribosomal subunit GO:0006099 3.44 2.545E-03 tricarboxylic acid cycle GO:0008553 5.05 1.110E-10 hydrogen-exporting ATPase activity, phosphorylative mechanism GO:0005843 6.44 1.339E-14 cytosolic small ribosomal subunit GO:0015986 3.40 5.935E-05 ATP synthesis coupled proton transport GO:0003735 3.78 7.326E-23 structural constituent of ribosome GO:0005747 5.28 1.094E-10 respiratory chain complex I GO:0006412 1.94 1.758E-06 translation GO:0046961 3.30 1.268E-04 hydrogen ion transporting ATPase activity, rotational mechanism GO:0005751 5.24 2.713E-03 respiratory chain complex IV GO:0006118 1.88 7.402E-03 electron transport GO:0046933 3.30 1.268E-04 hydrogen ion transporting ATP synthase activity, rotational mechanismGO:0005840 4.42 1.544E-10 ribosome
GO:0003779 2.32 4.374E-03 actin binding GO:0005759 3.67 2.632E-05 mitochondrial matrixGO:0003676 1.81 5.218E-08 nucleic acid binding GO:0005739 1.95 7.426E-03 mitochondrion
GO:0005737 1.64 3.013E-03 cytoplasm
GSM286613: 0 -1h Embryo (replicate)GO:0016477 6.31 8.006E-03 cell migration GO:0016538 6.66 6.152E-04 cyclin-dependent protein kinase regulator activity GO:0005830 11.84 8.873E-04 cytosolic ribosome GO:0001708 5.92 1.262E-03 cell fate specification GO:0003743 3.95 4.368E-05 translation initiation factor activity GO:0005842 6.26 1.568E-14 cytosolic large ribosomal subunit GO:0007431 5.15 6.173E-03 salivary gland development GO:0003735 3.91 6.685E-20 structural constituent of ribosome GO:0005843 5.92 1.633E-09 cytosolic small ribosomal subunit GO:0007409 4.31 6.990E-03 axonogenesis GO:0003676 2.07 2.062E-11 nucleic acid binding GO:0005840 4.19 2.492E-07 ribosomeGO:0006413 4.10 9.464E-05 translational initiation GO:0005515 1.84 8.727E-06 protein binding GO:0005737 1.77 5.726E-04 cytoplasmGO:0007411 3.29 1.084E-03 axon guidance GO:0005634 1.56 6.082E-06 nucleusGO:0007422 3.00 8.728E-03 peripheral nervous system developmentGO:0009993 2.55 3.437E-04 oogenesis (sensu Insecta)GO:0006412 2.01 5.656E-06 translation
GSM286605: 2 - 6h EmbryoGO:0007400 4.91 8.542E-04 neuroblast fate determination GO:0016538 5.88 3.905E-04 cyclin-dependent protein kinase regulator activity GO:0005843 5.17 1.367E-09 cytosolic small ribosomal subunit GO:0007419 4.37 1.647E-03 ventral cord development GO:0016564 4.15 3.038E-04 transcriptional repressor activity GO:0005751 4.98 4.444E-03 respiratory chain complex IV GO:0007456 4.16 1.771E-08 eye development (sensu Endopterygota) GO:0008553 3.13 5.578E-03 hydrogen-exporting ATPase activity, phosphorylative mechanism GO:0005842 4.26 3.215E-08 cytosolic large ribosomal subunit GO:0007219 4.03 9.349E-04 Notch signaling pathway GO:0003735 3.05 5.258E-14 structural constituent of ribosome GO:0005840 3.18 1.054E-04 ribosomeGO:0015992 3.76 1.250E-03 proton transport GO:0003779 2.35 1.447E-03 actin binding GO:0005886 1.91 3.269E-04 plasma membraneGO:0007411 3.53 1.644E-06 axon guidance GO:0003700 1.88 9.340E-06 transcription factor activity GO:0005737 1.83 3.656E-06 cytoplasmGO:0007391 3.34 6.516E-06 dorsal closure GO:0003676 1.86 7.547E-10 nucleic acid binding GO:0005634 1.82 1.708E-16 nucleusGO:0006917 3.08 4.490E-03 induction of apoptosis GO:0005515 1.84 9.274E-08 protein binding
Drosophila library GSE11624 enrichments: Drosophila Refgenes with tiRNAs from Chung et al. were queried for Gene Ontology enrichments based on their library/tissue/developmental time point of origin. The results are generally consistent with tiRNAassociation with highly expressed genes and/or genes expected to be highly expressed in a particular library. For example, tiRNAs are heavily associated with developmental and CNS genes in 6- 10 h Drosophila embryos.
HUMAN: Human Refgenes with tiRNAs derived from THP-1 cells were queried for Gene Ontology enrichments. Results show enrichemnt for ribosom components and translation, consistent with tiRNA association with highly expressed transcripts
Drosophila Embryo Expression: We analyzed highly and weakly expressed genes (from Arbeitman et al.) with tiRNAs (from Chung et al.)for Gene Ontology enrichments across three developmental time points. Highly expressed genes with tiRNA are consistently associated with the ribosome and translational machinery. We observed no consistent enrichment for weakly expressed genes with tiRNAs.
Supplementary Table 1 Taft RJ et al - tiRNAs
28
Nature Genetics: doi:10.1038/ng.312
GO:0007417 3.04 1.848E-03 central nervous system development GO:0005524 1.57 1.439E-04 ATP bindingGO:0007422 3.01 4.236E-04 peripheral nervous system development GO:0008270 1.57 7.721E-05 zinc ion bindingGO:0007424 2.83 8.847E-04 tracheal system development (sensu Insecta)GO:0009993 2.34 4.108E-04 oogenesis (sensu Insecta)GO:0007010 2.17 5.125E-04 cytoskeleton organization and biogenesisGO:0045449 2.14 4.089E-03 regulation of transcriptionGO:0007398 2.04 9.985E-04 ectoderm development
GSM286606: 2 - 6h Embryo (replicate)NA GO:0016566 18.86 6.144E-05 specific transcriptional repressor activity GO:0005843 9.20 3.903E-05 cytosolic small ribosomal subunit
GO:0016538 12.77 5.998E-03 cyclin-dependent GO:0005842 7.71 5.090E-05 cytosolic large ribosomal subunit GO:0003735 4.42 2.429E-06 structural constituent of ribosome GO:0005840 5.66 2.662E-03 ribosomeGO:0003700 2.90 9.637E-05 transcription factor activity GO:0005634 2.28 1.209E-07 nucleusGO:0003676 2.61 3.507E-06 nucleic acid binding
GSM286607: 6 - 10h EmbryoGO:0016318 6.80 7.371E-03 ommatidial rotation GO:0016566 9.15 1.184E-03 specific transcriptional repressor activity GO:0005842 7.06 1.030E-11 cytosolic large ribosomal subunit GO:0007219 6.80 3.099E-06 Notch signaling pathway GO:0003735 3.94 3.707E-13 structural constituent of ribosome GO:0005843 5.10 3.715E-04 cytosolic small ribosomal subunit GO:0001708 6.80 7.371E-03 cell fate specification GO:0003676 2.17 6.370E-09 nucleic acid binding GO:0005840 4.18 1.474E-04 ribosomeGO:0007422 6.12 3.947E-12 peripheral nervous system development GO:0005515 1.99 3.562E-05 protein binding GO:0005886 2.46 2.134E-05 plasma membraneGO:0007423 5.00 1.069E-04 sensory organ development GO:0005737 2.01 1.715E-04 cytoplasmGO:0007456 4.74 3.805E-05 eye development (sensu Endopterygota) GO:0005634 2.01 1.918E-12 nucleusGO:0007411 4.49 1.661E-05 axon guidanceGO:0007507 4.25 1.083E-03 heart developmentGO:0007417 4.18 6.064E-04 central nervous system developmentGO:0007391 3.58 5.509E-03 dorsal closureGO:0007424 3.48 4.249E-03 tracheal system development (sensu Insecta)GO:0007399 2.73 3.846E-04 nervous system developmentGO:0007398 2.54 5.689E-04 ectoderm developmentGO:0006412 1.96 3.545E-03 translation
GSM286611: 6 - 10h Embryo (replicate)GO:0046667 11.52 4.656E-03 retinal cell programmed cell death (sensu Endopterygota) GO:0016566 7.97 6.412E-05 specific transcriptional repressor activity GO:0005830 11.52 1.049E-03 cytosolic ribosome GO:0030723 9.87 2.611E-03 ovarian fusome organization and biogenesis GO:0016564 4.74 1.545E-04 transcriptional repressor activity GO:0005842 5.87 3.859E-13 cytosolic large ribosomal subunit GO:0008356 7.04 3.460E-05 asymmetric cell division GO:0008553 4.06 5.502E-05 hydrogen-exporting ATPase activity, phosphorylative mechanism GO:0005843 5.47 2.867E-08 cytosolic small ribosomal subunit GO:0007219 6.25 6.008E-09 Notch signaling pathway GO:0003735 3.62 4.440E-17 structural constituent of ribosome GO:0005840 4.61 1.539E-09 ribosomeGO:0006414 5.76 6.117E-03 translational elongation GO:0046961 3.14 3.838E-03 hydrogen ion transporting ATPase activity, rotational mechanism GO:0005886 2.59 3.927E-10 plasma membraneGO:0007400 5.51 9.765E-04 neuroblast fate determination GO:0046933 3.14 3.838E-03 hydrogen ion transporting ATP synthase activity, rotational mechanismGO:0005737 2.08 2.654E-08 cytoplasmGO:0007419 5.35 1.545E-04 ventral cord development GO:0003779 2.43 5.277E-03 actin binding GO:0005634 1.87 8.912E-15 nucleusGO:0016481 5.12 8.392E-04 negative regulation of transcription GO:0005515 2.32 8.235E-15 protein bindingGO:0007431 5.01 8.914E-03 salivary gland development GO:0003676 2.10 1.606E-12 nucleic acid bindingGO:0001736 5.01 8.914E-03 establishment of planar polarity GO:0003700 2.10 4.277E-07 transcription factor activityGO:0007409 4.89 2.071E-04 axonogenesis GO:0005524 1.65 1.015E-04 ATP bindingGO:0030707 4.55 3.683E-05 ovarian follicle cell development (sensu Insecta) GO:0008270 1.51 9.295E-03 zinc ion bindingGO:0007411 4.48 2.423E-09 axon guidanceGO:0007422 4.45 1.170E-09 peripheral nervous system developmentGO:0015992 4.32 5.267E-04 proton transportGO:0008340 4.06 1.154E-04 determination of adult life spanGO:0007456 3.96 1.716E-05 eye development (sensu Endopterygota)GO:0008355 3.93 4.592E-03 olfactory learningGO:0007611 3.93 4.592E-03 learning andGO:0007417 3.72 6.102E-05 central nervous system developmentGO:0007424 3.61 3.543E-06 tracheal system development (sensu Insecta)GO:0015986 3.27 2.405E-03 ATP synthesis coupled proton transportGO:0007476 3.22 3.068E-03 wing morphogenesisGO:0016337 2.91 3.284E-03 cell-cell adhesionGO:0009993 2.56 2.352E-04 oogenesis (sensu Insecta)GO:0007399 2.53 2.105E-05 nervous system developmentGO:0007155 2.44 2.026E-05 cell adhesionGO:0007398 2.40 1.648E-05 ectoderm developmentGO:0007498 2.19 9.839E-03 mesoderm developmentGO:0006412 1.82 9.594E-04 translation
GSM240749: Female headsGO:0015992 7.89 3.944E-05 proton transport GO:0008553 6.19 3.032E-04 hydrogen-exporting ATPase activity, phosphorylative mechanism GO:0005859 28.70 5.719E-03 muscle myosin complexGO:0015986 4.71 9.360E-03 ATP synthesis coupled proton transport GO:0046961 4.78 4.263E-03 hydrogen ion transporting ATPase activity, rotational mechanism GO:0000275 16.40 6.376E-03 proton-transporting ATP synthase complex, catalytic core F(1)GO:0006952 2.37 5.802E-04 defense response GO:0046933 4.78 4.263E-03 hydrogen ion transporting ATP synthase activity, rotational mechanismGO:0005840 8.39 5.431E-11 ribosome
GO:0003735 4.29 8.673E-09 structural constituent of ribosome GO:0005842 7.04 3.059E-06 cytosolic large ribosomal subunit GO:0020037 3.78 2.622E-03 heme binding GO:0005843 6.46 9.996E-04 cytosolic small ribosomal subunit
GSM286601: Male headsNA GO:0004022 23.80 2.805E-03 alcohol dehydrogenase activity GO:0005576 2.74 5.931E-03 extracellular region
GSM272651: S2 and KC cellsGO:0045034 6.76 3.676E-04 neuroblast division GO:0003735 3.07 6.037E-09 structural constituent of ribosome GO:0005842 7.65 1.983E-18 cytosolic large ribosomal subunit GO:0001736 5.88 1.853E-03 establishment of planar polarity GO:0003779 2.64 2.908E-03 actin binding GO:0005840 4.37 7.972E-07 ribosomeGO:0007422 3.97 1.040E-05 peripheral nervous system development GO:0005515 2.01 4.549E-07 protein binding GO:0005783 2.88 2.905E-03 endoplasmic reticulumGO:0007391 3.56 3.404E-04 dorsal closure GO:0005737 2.11 3.196E-07 cytoplasmGO:0007476 3.38 5.340E-03 wing morphogenesis GO:0005634 1.80 2.803E-10 nucleusGO:0008360 3.22 1.898E-03 regulation of cell shapeGO:0000910 3.12 9.633E-03 cytokinesis
Taft RJ et al - tiRNAs
29
Nature Genetics: doi:10.1038/ng.312
GO:0009993 2.83 5.305E-05 oogenesis (sensu Insecta)GO:0007010 2.27 8.715E-03 cytoskeleton organization and biogenesis
GSM272652: S2 cellsGO:0007430 5.50 4.057E-03 terminal branching of trachea, cytoplasmic projection extension (sensu Insecta) GO:0003735 2.90 4.226E-21 structural constituent of ribosome GO:0005842 4.67 4.393E-19 cytosolic large ribosomal subunit GO:0046843 4.02 1.430E-03 dorsal appendage formation GO:0003779 2.56 5.162E-09 actin binding GO:0005843 4.33 1.269E-11 cytosolic small ribosomal subunit GO:0007411 3.61 5.010E-13 axon guidance GO:0005525 2.19 9.133E-07 GTP binding GO:0005840 3.43 1.370E-10 ribosomeGO:0007298 3.56 8.010E-05 border follicle cell migration (sensu Insecta) GO:0003924 2.17 4.339E-05 GTPase activity GO:0005700 3.00 2.235E-03 polytene chromosomeGO:0007409 3.37 5.399E-04 axonogenesis GO:0000166 2.01 1.507E-04 nucleotide binding GO:0005783 2.50 4.307E-06 endoplasmic reticulumGO:0007422 3.22 9.004E-10 peripheral nervous system development GO:0004702 1.92 2.910E-03 receptor signaling protein serine GO:0005737 2.08 4.462E-18 cytoplasmGO:0008340 3.15 1.026E-05 determination of adult life span GO:0004674 1.87 8.431E-03 protein serine GO:0005622 1.79 1.309E-03 intracellularGO:0007391 3.01 4.413E-08 dorsal closure GO:0003676 1.78 1.488E-13 nucleic acid binding GO:0005886 1.79 1.378E-05 plasma membraneGO:0007015 2.79 1.229E-03 actin filament organization GO:0005515 1.73 4.859E-10 protein binding GO:0005634 1.65 1.255E-17 nucleusGO:0006897 2.75 9.997E-04 endocytosis GO:0005524 1.62 1.323E-09 ATP bindingGO:0008360 2.65 6.541E-06 regulation of cell shape GO:0008270 1.39 1.488E-03 zinc ion bindingGO:0007476 2.64 2.219E-04 wing morphogenesisGO:0009993 2.63 8.831E-12 oogenesis (sensu Insecta)GO:0007264 2.42 2.615E-04 small GTPase mediated signal transductionGO:0000910 2.38 1.965E-03 cytokinesisGO:0006457 2.13 6.587E-04 protein foldingGO:0007010 1.98 4.989E-05 cytoskeleton organization and biogenesisGO:0006468 1.84 4.525E-06 protein amino acid phosphorylationGO:0006886 1.82 4.118E-06 intracellular protein transportGO:0006412 1.55 1.649E-03 translation
GSM272653: KC cellsGO:0045034 5.03 4.180E-04 neuroblast division GO:0003735 2.77 4.026E-12 structural constituent of ribosome GO:0005853 8.38 6.131E-03 eukaryotic translation elongation factor 1 complexGO:0007298 4.06 2.603E-04 border follicle cell migration (sensu Insecta) GO:0003779 2.42 9.892E-05 actin binding GO:0005842 4.74 1.586E-12 cytosolic large ribosomal subunit GO:0008355 3.27 9.022E-03 olfactory learning GO:0000166 2.31 3.541E-05 nucleotide binding GO:0005843 3.77 4.744E-05 cytosolic small ribosomal subunit GO:0007611 3.27 9.022E-03 learning and GO:0005515 1.86 2.767E-09 protein binding GO:0005840 3.48 3.771E-07 ribosomeGO:0007411 3.26 4.580E-06 axon guidance GO:0005524 1.75 1.216E-09 ATP binding GO:0005783 2.50 6.997E-04 endoplasmic reticulumGO:0007015 3.12 2.984E-03 actin filament organization GO:0005200 1.75 4.910E-03 structural constituent of cytoskeleton GO:0005737 2.10 1.675E-12 cytoplasmGO:0007422 3.02 5.892E-05 peripheral nervous system development GO:0003676 1.63 6.660E-06 nucleic acid binding GO:0005886 2.00 4.395E-06 plasma membraneGO:0007391 2.98 8.109E-05 dorsal closure GO:0005634 1.47 4.740E-06 nucleusGO:0008360 2.79 2.202E-04 regulation of cell shapeGO:0007424 2.63 2.267E-03 tracheal system development (sensu Insecta)GO:0009993 2.52 1.454E-06 oogenesis (sensu Insecta)GO:0007010 2.41 2.257E-07 cytoskeleton organization and biogenesisGO:0000074 2.21 1.156E-03 regulation of progression through cell cycleGO:0007155 1.97 1.928E-03 cell adhesionGO:0006468 1.80 2.690E-03 protein amino acid phosphorylation
GSM275691: Imaginal discGO:0042049 8.35 5.998E-03 cell acyl-CoA homeostasis GO:0050809 9.74 5.649E-04 diazepam binding GO:0000221 5.56 4.401E-03 hydrogen ion transporting ATPase V1 domainGO:0015992 4.14 1.027E-04 proton transport GO:0004263 5.08 2.939E-04 chymotrypsin activity GO:0005842 5.14 2.493E-12 cytosolic large ribosomal subunit GO:0015986 3.78 5.431E-07 ATP synthesis coupled proton transport GO:0008553 4.96 1.534E-10 hydrogen-exporting ATPase activity, phosphorylative mechanism GO:0005843 5.11 6.247E-09 cytosolic small ribosomal subunit
GO:0046961 3.84 2.214E-07 hydrogen ion transporting ATPase activity, rotational mechanism GO:0005840 3.75 3.796E-07 ribosomeGO:0046933 3.84 2.214E-07 hydrogen ion transporting ATP synthase activity, rotational mechanismGO:0003735 3.06 1.687E-13 structural constituent of ribosomeGO:0003779 2.66 1.980E-05 actin bindingGO:0005200 1.79 6.974E-03 structural constituent of cytoskeleton
GSM286602: Male bodyGO:0005977 15.22 2.021E-04 glycogen metabolic process GO:0004129 8.54 5.235E-04 cytochrome-c oxidase activity GO:0000275 14.50 8.035E-04 proton-transporting ATP synthase complex, catalytic core F(1)GO:0006123 10.15 9.725E-04 mitochondrial electron transport, cytochrome c to oxygen GO:0008553 7.96 3.394E-11 hydrogen-exporting ATPase activity, phosphorylative mechanism GO:0000276 13.53 1.504E-04 proton-transporting ATP synthase complex, coupling factor F(o)GO:0015992 9.13 5.463E-11 proton transport GO:0046961 5.23 4.321E-06 hydrogen ion transporting ATPase activity, rotational mechanism GO:0005747 10.82 5.964E-18 respiratory chain complex I GO:0015986 5.45 1.285E-06 ATP synthesis coupled proton transport GO:0046933 5.23 4.321E-06 hydrogen ion transporting ATP synthase activity, rotational mechanismGO:0005751 9.55 7.936E-05 respiratory chain complex IV GO:0006099 4.97 1.797E-03 tricarboxylic acid cycle GO:0003735 4.08 1.247E-11 structural constituent of ribosome GO:0005842 7.66 3.481E-11 cytosolic large ribosomal subunit GO:0006936 4.40 4.878E-05 muscle contraction GO:0005843 7.10 4.482E-07 cytosolic small ribosomal subunit GO:0006412 2.01 7.285E-03 translation GO:0005840 5.31 1.450E-06 ribosome
GO:0005743 3.69 2.976E-03 mitochondrial inner membrane
GSM286603: Female bodyGO:0005977 10.47 2.310E-03 glycogen metabolic process GO:0045735 11.63 4.678E-03 nutrient reservoir activity GO:0000275 9.97 6.258E-03 proton-transporting ATP synthase complex, catalytic core F(1)GO:0006123 7.98 9.644E-04 mitochondrial electron transport, cytochrome c to oxygen GO:0042708 8.37 9.656E-03 elastase activity GO:0000276 9.30 1.671E-03 proton-transporting ATP synthase complex, coupling factor F(o)GO:0015992 7.68 1.209E-12 proton transport GO:0004263 6.68 6.514E-05 chymotrypsin activity GO:0000221 7.98 2.476E-04 hydrogen ion transporting ATPase V1 domainGO:0006119 5.71 7.140E-03 oxidative phosphorylation GO:0004129 6.61 1.020E-03 cytochrome-c oxidase activity GO:0005842 7.64 1.156E-17 cytosolic large ribosomal subunit GO:0006099 5.41 2.531E-07 tricarboxylic acid cycle GO:0008553 6.57 4.066E-12 hydrogen-exporting ATPase activity, phosphorylative mechanism GO:0005751 7.39 1.244E-04 respiratory chain complex IV GO:0015986 4.58 4.442E-07 ATP synthesis coupled proton transport GO:0003735 4.60 7.960E-24 structural constituent of ribosome GO:0005843 7.33 4.775E-12 cytosolic small ribosomal subunit GO:0007010 2.34 3.792E-03 cytoskeleton organization and biogenesis GO:0046961 4.44 1.317E-06 hydrogen ion transporting ATPase activity, rotational mechanism GO:0005747 7.13 5.986E-13 respiratory chain complex I GO:0006412 2.27 5.962E-08 translation GO:0046933 4.44 1.317E-06 hydrogen ion transporting ATP synthase activity, rotational mechanismGO:0005840 5.37 1.328E-10 ribosomeGO:0006508 1.70 5.976E-04 proteolysis GO:0003676 1.96 1.719E-07 nucleic acid binding GO:0005737 1.87 3.211E-04 cytoplasm
Taft RJ et al - tiRNAs
30
Nature Genetics: doi:10.1038/ng.312
Taft RJ et al. – tiRNAs
31
Supplementary Table 2
Primer Name Primer sequence (RNA bases are upper case)5’ adaptor 5’-acgctcacagaattcAAA-3’3’ adaptor 5’-phosphate-UXXxxgaattctcacgaggccagcgt-biotin-3’3’ RT-PCR primer (Primer 1) 5’-biotin-gcacgctggcctcgtgagaattc-3’5’ PCR primer (Primer 2) 5'-biotin-cagccgacgctcacagaattcaaa-3'
Nature Genetics: doi:10.1038/ng.312
top related