structures of microrna precursors
TRANSCRIPT
Seediscussions,stats,andauthorprofilesforthispublicationat:https://www.researchgate.net/publication/226825590
StructuresofMicroRNAPrecursors
CHAPTER·SEPTEMBER2008
DOI:10.1007/978-1-4020-8533-8_1
CITATIONS
6
READS
81
5AUTHORS,INCLUDING:
PiotrKozlowski
InstituteofBioorganicChemistryPolishAc…
51PUBLICATIONS1,639CITATIONS
SEEPROFILE
JuliaStarega-Roslan
InstituteofBioorganicChemistryPolishAc…
12PUBLICATIONS273CITATIONS
SEEPROFILE
MarcinMagnus
StanfordMedicine
7PUBLICATIONS54CITATIONS
SEEPROFILE
Availablefrom:JuliaStarega-Roslan
Retrievedon:10February2016
Chapter 1
Structures of MicroRNA Precursors
Piotr Kozlowski**, Julia Starega-Roslan**, Marta Legacz,
Marcin Magnus, and Wlodzimierz J. Krzyzosiak*
Abstract MicroRNAs are single-stranded regulatory RNAs of 18–25 nucleotide
length generated from endogenous transcripts that form local hairpin structures.
The processing of microRNA transcripts involves the activities of two RNase III
enzymes Drosha and Dicer. In this study we analyzed structural features of human
microRNA precursors that make these transcripts Drosha and Dicer substrates.
The structures of minimal functional primary precursors (pri-microRNAs) and
secondary precursors (pre-microRNAs) were predicted. The frequency, nucleotide
sequence content and the localization of various structure destabilizing motifs was
analyzed. We identified numerous pri-microRNAs which structures strongly depart
from the consensus structure and their processing is hard to explain by the existing
model of the Microprocessor complex. We also found a biased distribution of sym-
metric and asymmetric motifs along the pre-microRNA hairpin stem and an over-
representation of bulges on its 5′ arm (p < 0.000001), which may have considerable
functional implications.
Keywords miRNA biogenesis, Dicer, Drosha, RNA structure prediction, pri-miRNAs,
pre-miRNA structural motifs
1.1 Introduction
MicroRNAs (miRNAs) are a family of short single-stranded noncoding RNAs
identified in many eukaryotes from simple organisms to humans [1, 8]. It is
anticipated that hundreds of miRNAs regulate the expression of thousands of
human genes [20]. MiRNAs regulate gene expression at the posttranscriptional
S.-Y. Ying (ed.) Current Perspectives in microRNAs (miRNA), 1
© Springer Science + Business Media B.V. 2008
Laboratory of Cancer Genetics, Institute of Bioorganic Chemistry, Polish Academy of Science,
Noskowskiego 12/14, 61-704 Poznan, Poland
* Corresponding author: E-mail: [email protected]
** These authors contributed equally to this work.
2 P. Kozlowski et al.
level by programming the RNA induced silencing complex (miRISC) which
interacts with the complementary sequences of mRNAs causing their transla-
tional inhibition or cleavage [25, 33]. Specific miRNAs were shown to be
engaged in the variety of processes such as development, cell proliferation, dif-
ferentiation and apoptosis [12]. The detailed cellular function of the majority of
miRNAs still remains unknown.
The primary transcripts of miRNA genes (pri-miRNAs) are generated by either
RNA polymerase II [19] or RNA polymerase III [4]. The pri-miRNAs, which har-
bor a long stem and loop structure, are processed in the nucleus to shorter, approxi-
mately 60-nt hairpin precursors (pre-miRNAs) (Fig. 1.1A). The nuclear processing
enzyme is ribonuclease Drosha [18] which acts together with the DGCR8 protein
[17] within the Microprocessor complex [5, 7]. Drosha which is the RNaseIII
enzyme usually leaves a 2 nt overhang at the 3′-end of pre-miRNA and defines one
end of mature miRNA [2]. The pre-miRNAs are then exported to cytoplasm by
Exportin-5 [22] and further processed to miRNA duplexes by another RNaseIII
enzyme Dicer [3, 26, 34] which defines the other miRNA end. Thus, the two RNA
processing steps reduce the stem of the primary precursor hairpin into its internal
portion, which usually is the imperfect duplex containing a functional miRNA
strand (Fig. 1.1B). The presence of structure imperfection within this duplex facili-
tates its non-miRNA strand to be later disposed from the miRISC using the
“bypass” rather than the cleavage mechanism [6].
It is clear that the structures of miRNA precursors are instrumental for their
proper recognition and correct cleavages by the processing complexes containing
Drosha and Dicer. Therefore, to analyze the structural aspects of both the nuclear
and cytoplasmic steps of miRNA biogenesis the structures of the primary and secondary
miRNA precursors need to be established. This can be done either by experiment
or computational structure prediction. The latter approach gives the reliable struc-
tures of miRNA precursors that were confirmed in most of the investigated cases
by experimental analysis [13]. The computational approach is also much faster thus
better suited for structure analysis on a large scale. In order to predict the secondary
structures of miRNA precursors their nucleotide sequences need to be known first.
These sequences, however, are not available in the existing miRNA databases. For
the purpose of this study the sequences of pre-miRNAs were reconstructed from the
sequences of mature miRNAs as described earlier [14]. To analyze the pri-miRNA
sequences the concept of a “minimal” functional precursor was adapted [9]. The
pri-miRNA precursors may be very long and as such they are not amenable for a
detailed structure analysis. Therefore, we analyzed minimal pri-miRNAs which
were considered to be the shortest fragments of primary precursors that contain all
sequence and structure elements required to be functional substrates for a
Microprocessor. The difference between our approach and that described earlier [9]
was that not a single length but several different lengths of sequences harboring
miRNA were analyzed. Both the reconstructed sequences of pre-miRNAs and arbi-
trarily selected sequences of minimal pri-miRNAs were then subjected to structure
prediction and a detailed analysis of their secondary structures. We intended to learn
more about the occurrence and localization of different types of secondary structure
Fig. 1.1 Biogenesis of miRNA. (A) Steps of miRNA biogenesis. Proteins involved in substrate
recognition and precursor processing are shown. (B) Schemes of the Microprocessor complex and
RNase Dicer interacting with their substrates, pri-miRNA and pre-miRNA, respectively. Arrows
indicate Drosha and Dicer cleavage sites. RIIIa and RIIIb indicate the RNase domains responsible
for cleavage. dsRBD denotes the double-stranded RNA binding domain. The fragment which cor-
responds to miRNA is marked in gray. (C) Scheme of minimal precursor pri-miRNA. The SD
(single-stranded–double-stranded) and SL (stem-loop) junctions, internal distances and analyzed
region are indicated. SD is postulated to be the DGCR8 protein binding site
1 Structures of MicroRNA Precursors 3
motifs within the precursor hairpin. A comprehensive inventory of such motifs was
generated and analyzed in relevance to the Drosha and Dicer steps of miRNA
biogenesis.
4 P. Kozlowski et al.
1.2 Materials and Methods
1.2.1 Analysis of Pri-miRNA Structures
The sequences of minimal pri-miRNAs of the four length variants 110, 130, 150
and 300 nt were precut from longer sequences withdrawn from the GenBank for all
461 miRNAs deposited in the miRBase database version 8.2. [8]. To obtain the
minimal pri-miRNA sequence of selected length, the natural sequence extensions
of the required and equal length were added to each end of the pre-miRNA. These
1,844 sequences were subjected to secondary structure prediction by free energy
minimization using the Mfold program [36]. The suboptimality parameter was set
at 5% which means that all structures that have the free energy of the formation
(∆G) 5% higher than the lowest energy structure were also shown. For further
analysis only the lowest energy structures were taken.
The critical parameter analyzed in the predicted structures was the length of the
base-paired region in which only minor structure distorting motifs were allowed to
exist. Only those minimal pri-miRNAs were taken for further analysis which had the
same structure of the analyzed fragment in at least the 150 and 300 nt length variants.
There were 246 such precursors, and among them were 180 pri-miRNAs which had
the same structure of the fragment of interest in all four length variants.
1.2.2 Analysis of Pre-miRNA Structures
Prior to the structure analysis of pre-miRNAs their nucleotide sequences were
reconstructed following the rules described earlier [14]. Briefly, a one arm terminus
of the precursor hairpin was defined by one end of the miRNA sequence and the
second arm terminus was defined either by the miRNA* end or assuming the exist-
ence of the 2 nt 3′ overhang at the Drosha cleavage site. The secondary structures
of the pre-miRNAs were predicted using Mfold as described above for pri-miRNAs.
All secondary structure motifs present in the lowest free energy structures were
catalogued in the format that included the number and sequence of nucleotides
present in the specific motif, the motif orientation and its localization within the
precursor structure. These motifs were classified into two major groups: symmetric
internal loops (SL) and asymmetric internal loops (AL). The first group includes
both single nucleotide mismatches SL1:1 and longer symmetric loops SL2:2, SL3:3
etc. The second group includes bulges of different length AL0:1, AL0:2, AL0:3 etc.
and asymmetric internal loops ALX:Y where both X and Y are different from 0.
Thus, each motif is denoted by two numbers separated by the colon. The number
or sequence before and after the colon denotes nucleotides from the precursor 5′
precursor arm and 3′ arm, respectively. For example, the single nucleotide bulge
“a” located in the 5′ arm is denoted either AL1:0 or a:0. The localization of the
motif was numbered counting from the terminal nucleotide at the 5′ arm of the
1 Structures of MicroRNA Precursors 5
pre-miRNA. To unify the position numbering system for all types of structural
motifs the position of nucleotide directly preceding any specific motif was assigned
as the motif position.
1.2.3 Statistical Methods
For asymmetric motifs their equal distribution between two pre-miRNA arms was
assumed (null hypothesis). To assess the potential deviation from this distribution
the chi2-squared test was applied using Statistica (StatSoft, Tulsa, OK) or Prism v. 4.0
(GraphPad Software, San Diego, CA). To compare the distribution of symmetric
vs. asymmetric loops in pre-miRNAs having a moderate and high number of stem
structure distorting motifs the Fisher exact test was used for the 2 × 2 contingency
table analysis (programs as above). Where applicable, the Bonferroni adjustment
for multiple comparisons was used.
1.3 Results
1.3.1 Pri-miRNA Structure and Drosha
Step of miRNA Biogenesis
In agreement with the recently proposed model of the Microprocessor structure the
DGCR8 protein binds to the base of the pri-miRNA hairpin stem and forms a plat-
form for Drosha binding and precursor cleavage [9] (Fig. 1.1B). DGCR8 anchors
to the single strand-double strand (SD) junction in the structure of pri-miRNA. The
consensus structure of minimal pri-miRNA was established based on the analysis
of the predicted structures of numerous human primary precursors and the structure
prediction was performed using the 110 nt long sequence for each pri-miRNA [9].
In light of the fact that human pre-miRNAs span the length range of 42–82 nt [14],
the length of 110 nt seemed insufficient for the reliable minimal pri-miRNA struc-
ture prediction. To minimize the risk of taking an incorrect structure into considera-
tion besides the 110 nt also three longer sequences harboring miRNAs were used
for structure prediction in our study. Such an approach was undertaken because the
structures generated by computer programs used for RNA structure prediction by
free energy minimization are more trusted if the same critical domains are predicted
from the sequences of different lengths. The detailed analysis of the predicted struc-
tures of minimal pri-miRNA precursors was focused on the fragment localized
between the pre-miRNA ends and SD junction (Fig. 1.1C), which was proposed to
play a critical role in pri-miRNA recognition by the Microprocessor [9]. In the
consensus minimal pri-miRNA structure this region spanned 11 bp, which equals to
one helical turn of A-RNA. We wanted to find out whether the consensus minimal
6 P. Kozlowski et al.
pri-miRNA structure is correct and find the most deviant structures which still
remain substrates for the Microprocessor. It turned out from this analysis that there
are indeed pri-miRNAs which secondary structures ideally fit to the consensus
structure e.g. pri-miR-33 but there are also precursors which have the analyzed
region either much shorter e.g. pri-miR-656 or much longer e.g. pri-miR-607.
However, in the majority of the analyzed structures (62.6%) the SD junction was
located 9–13 nt below the Drosha cleavage site which is in rough agreement with
the consensus structure proposed by Han et al. [9] and confirmed by Saetrom et al.
[28]. This region was shorter in 13% of the analyzed precursors and longer in
19.5% of the precursors (Fig. 1.2). It is difficult to fit such precursors, especially
their extreme examples, into the presently accepted model of pri-miRNA process-
ing by the Microprocessor complex (Fig. 1.1C). This may suggest that either some
alternative models of Microprocessor architecture need to be considered or precur-
sors having structures most deviant from the consensus are processed in an entirely
different way.
1.3.2 Pre-miRNA Structure and Dicer Step of miRNA Biogenesis
The structural insights into Dicer function came from both biochemical studies [35]
and crystallography [23, 24]. In Fig. 1.1B the commonly accepted model of a Dicer
single processing center is shown [35]. Human Dicer is composed of several
Fig. 1.2 Pri-miRNA stem length distribution. The length distribution of base-paired stems having
either full base complementarity or only minor disruptions (small internal loops, bulges) within
the analyzed fragment in 243 minimal pri-miRNA precursors. Above the graph schematic struc-
tures representing three classes of such precursors are shown. They differ in the length of the
analyzed fragment
1 Structures of MicroRNA Precursors 7
functional domains: the PAZ domain, which is used for high-affinity binding to the 3′
overhanging nucleotides of pre-microRNA, the helicase domain, the DUF283 subunit,
the dsRNA binding domain and two catalytic RNase III domains that form the
intramolecular dimer during pre-miRNA cleavage [24, 35]. Thus, Dicer functions as a
molecular ruler and cleaves the pre-miRNA hairpin about two helical turns away from
the hairpin base to produce duplexes containing 18–24 nt long miRNAs. It is intuitively
understood that the length diversity of miRNAs has its source in the structural features
of pre-miRNA hairpins which rarely contain perfectly base paired stems. Usually the
single nucleotide mismatches, the larger symmetric internal loops, bulges and the
asymmetric internal loops break the regularity of the pre-miRNA double helical stem
structure and may influence both the efficiency of Dicer binding and specificity of
precursor cleavage. Therefore, a detailed analysis of the predicted secondary structures
of pre-miRNAs was performed to search for such structure distortions.
Out of the 461 nucleotide sequences of human pre-miRNAs which were sub-
jected to structure prediction nearly all (456) formed hairpins as the lowest free
energy structures. In these precursor hairpins as many as 1,243 secondary structure
motifs destabilizing stem structures were found altogether and the occurrence of
various types of motifs in each arm of the hairpin stem is shown in Fig. 1.3. These
motifs include 631 symmetric internal loops of various sizes including single nucle-
otide mismatches (SL) and a similar number (612) of asymmetric internal loops
including bulges (AL) (Fig. 1.3B). This means that 2.73 motifs (0.97 mismatches,
Fig. 1.3 The frequency of structural motifs in stems of 456 analyzed pre-miRNAs. (A) The
chessboard-like table shows in numbers the occurrences of each type of structural motifs identi-
fied in the predicted structures of miRNAs. These motifs are shown as pairwise combinations of
unpaired nucleotides present in precursor 5′ and 3′ arm. E.g. symmetric loops SL are located
diagonally and bulges along the 0 column. In this and the subsequent figures the motifs under
consideration are shadowed. (B) The total number of symmetric loops (SL) including mismatches
(SL1:1) and asymmetric loops (AL) including all types of bulges
8 P. Kozlowski et al.
0.42 symmetric loops of different length (2–5), 0.96 bulges and 0.38 asymmetric
internal loops) occur, on average, per analyzed pre-miRNA.
1.3.3 Symmetric Loops Including Single Nucleotide
Mismatches in Pre-miRNA Structures
In cataloguing the structural motifs present in pre-miRNAs we have looked not
only at their type and size but also at their nucleotide sequence and orientation
within the precursor hairpin. This allowed us to count symmetric internal loops
containing a different number of nucleotides and divide them into sequence and
orientation- specific subgroups. Figure 1.4A shows the number of occurrences of
single nucleotide mismatches and 2–5 nt long symmetric internal loops as well as
the number of occurrences of different nucleotides and sequences present in these
motifs. It is apparent that the frequency of symmetric loops decreases with their
size. The single nucleotide mismatches are most frequent and account for almost
2/3 of the total number of symmetric internal loops. The largest is the 5 nt long
loop identified only once in hsa-mir-196a-1. As shown in Fig. 1.4A all ten possi-
ble base combinations of single nucleotide mismatches are represented in the
pre-miRNAs and 61 different combinations of 2 nt long internal loops. For the
3–5 nt long internal loops the number of different sequence classes is almost
equal to the total number of such loops. This means that almost every symmetric
internal loop formed by more than two adjacent nucleotides has a different
sequence and there is no preference of any specific sequence within such loops.
For the single nucleotide mismatches and 2 nt long internal loops we analyzed
their distribution between different sequence classes (Figs. 1.4B, C, respectively).
It turned out that the a:c and c:u are most frequent among the former and the least
frequent is the c:c mismatch (Fig. 1.4B). The orientation analysis of single nucle-
otide mismatches shows that most of them are rather equally distributed in both
orientations with the exception of the c:u in which u is more frequent on the
5′ arm and c on the 3′ arm (36 and 58 c:u and u:c mismatches, respectively).
However, this distribution is only marginally significant (ch2; p-val = 0.02) and
not significant after Bonferroni correction. The 2 nt internal loops are almost ran-
domly distributed among sequence subclasses and clear ug:ug overrepresentation
is only observed (19 occurrences) (Fig. 1.4C).
1.3.4 Asymmetric Internal Loops Including
Bulges in Pre-miRNA Structures
As many as 437 bulges account for the majority of asymmetric loops identified in the
analyzed pool of pre-miRNAs. These bulges vary in size from 1 nt (293 occurrences)
to 11 nt (single occurrence) (Fig. 1.5A). The frequency of bulges decreases with bulge
1 Structures of MicroRNA Precursors 9
size with almost a perfect exponential correlation (r2 = 0.97). The distribution of the
most frequent bulges (1–3 nt) between the precursor hairpin arms shows their over-
representation in the 5′ arm (Fig. 1.5A). Testing the null hypothesis that bulges are
equally distributed between the 5′ and 3′ arms we showed that the total overrepresen-
tation of bulges in the 5′ arm is very significant (chi2 p-val < 0.000001). Individual
chi2 p-values for the 1-, 2- and 3 nt bulges are 0.00002, 0.014 and 0.23 respectively.
The nucleotides present in the single nucleotide bulges and sequences present in the
Fig. 1.4 The symmetric loops in pre-miRNAs. (A) The number of single nucleotide mismatches
and symmetric loops containing different numbers of unpaired nucleotides within the loop. The
number of different nucleotide combinations in symmetric loops (dark gray), and total number of
symmetric loops (light gray). (B) The number of different types of mismatches with their orienta-
tion taken into account and cumulative number (inset). (C) As in (B) but for 2 nt symmetric loops
10 P. Kozlowski et al.
2 nt bulges are shown in Fig. 1.5B, C, respectively. Among the former the most
frequent is u and least frequent is g (Fig. 1.5B). The 2 nt bulges are almost equally
distributed over sequence variants and almost all combinations of 2 nt sequences
occur (except for gc) (Fig. 1.5C). The asymmetric internal loops containing a different
number of unpaired nucleotides on each side constitute a smaller and more hetero-
genous group (177 occurrences). In this group the small motifs such as AL1:2 and
AL2:1 are most frequent but single cases of large motifs e.g. AL6:4 and AL3:10 were
also found. The analysis of their sequence contents did not reveal any significant
preferences.
Fig. 1.5 The bulges in pre-miRNAs. (A) (center) The total number of bulges containing a differ-
ent number of nucleotides (1–11 nt). (right) For the most frequent 1–3 nt bulges the total number
of bulges was split between two orientations 0:X and X:0 for bulges in the 3′ and 5′ arm of
pre-miRNA, respectively. (B) The number of different nucleotides in single-nucleotide bulges.
The 0:1 and 1:0 orientations are shown separately. (C) As in (B) but for different combinations of
nucleotides in 2 nt bulges
1 Structures of MicroRNA Precursors 11
1.3.5 Localization of Structural Motifs
Within the Pre-miRNA Hairpin Stem
The proper localization of structural motifs in pre-miRNAs may facilitate the adapta-
tion of precursor structures to the interacting proteins of miRNA biogenesis
machinery. It may create a suitable environment for interaction with the specific
RNA binding motifs of proteins and serve as a code for structure-specific RNA
recognition. Therefore we looked in this study also at the localization of structural
motifs in the pre-miRNA hairpin stem. As shown in Fig. 1.6 there is no specific
position in which the single nucleotide mismatches and symmetric internal loops
would be either over represented or under represented (Fig. 1.6A, B). However, a
clear trend is observed to decrease the frequency of these motifs in going from the
precursor base towards the terminal loop. This trend is most clear for single nucle-
otide mismatches and 2 nt long symmetric internal loops. The number of longer
symmetric loops (4 and 5 nt) is too low to see any trend (Fig. 1.6A). Interestingly,
the opposite trend is observed for asymmetric motifs the frequency of which
increases in the same direction (Fig. 1.6C).
1.4 Discussion
The bioinformatics survey of miRNA precursor structures which was performed in
this study provides a comprehensive insight into the structural variety of both
pri-miRNAs and pre-miRNAs. This insight may be considered as next step towards
a better understanding of the role of RNA structure in miRNA biogenesis. The
obtained gallery of predicted structures of miRNA precursors will guide the selection
of specific precursors for a more detailed experimental analysis of their struc-
tures and studies of their interactions with Drosha and Dicer protein complexes.
The structural features of the precursors of numerous known miRNAs will also help
to refine algorithms used for the identification of novel miRNA genes in genomes.
In addition, the structural information gathered in this study may be relevant to the
process of RISC loading by miRNA/miRNA* duplexes that may retain the structure
imperfections present within miRNA precursors.
We catalogued the rich repertoire of secondary structure motifs destabilizing and
distorting the stem structures of pre-miRNAs paying attention to the nucleotide
sequences present within these motifs and motif localization. The detailed analysis
of this data collection revealed that with some exceptions there are only minor pref-
erences for specific sequences in the destabilizing motifs present in pre-miRNA
structures. This means that protein complexes involved in miRNA biogenesis use
structure rather than sequence code for precursor recognition. As shown in this study
there are about 2.7 stem structure destabilizing motifs in the average pre-miRNA
hairpin. The number of pre-miRNAs containing a different number of such motifs is
almost normally distributed with extreme numbers being 0 and 7 motifs per pre-
miRNA (Fig. 1.7A). Taking into account this distribution and assuming somehow
12 P. Kozlowski et al.
Fig. 1.6 The localization of
structural motifs in the pre-miRNA
hairpin structure. (A) The locali-
zation of mismatches SL1:1 and
symmetric internal loops having
a different number of nucleotides
SL2:2 – SL5:5 within precursor
structure. (B) The cumulative
number of mismatched nucle-
otides along the pre-miRNA
hairpin. (C) Localization of
bulges within the precursor
structure
1 Structures of MicroRNA Precursors 13
arbitrarily the ~5% threshold of pre-miRNAs with extreme numbers of structural
motifs we divided all the analyzed pre-miRNAs into three classes: (1) having 0 structure
destabilizing motifs (2%), (2) containing a moderate number (1–4) of such motifs
(93%), and (3) harboring high number (5–7) motifs (5%). The analysis of the symmetric
and asymmetric loops distribution revealed a gradual increase of asymmetric motifs
with the total number of motifs in pre-miRNAs. When we compared the frequency
of SL vs. AL in pre-miRNAs with a moderate and high number of motifs it showed
significant excess of AL in pre-miRNAs containing a high total number of motifs
(p-val = 0.0005) (Fig. 1.7B). This could reflect the compensatory effect to balance
the structure distortion introduced by one bending motif by another.
The biased distribution of bulges observed in this study consisted of their over-
representation in the 5′arm of pre-miRNA. To validate this result we analyzed the
distribution of bulges also in the group of “prototypical” pre-miRNAs recently
distinguished by Tuschl’s group [16] on the basis of the precise miRNA 5′ end
processing, sequence conservation and high number of putative target sites [16].
We have shown that in the “prototypical” group the overrepresentation of bulges in
the 5′ arm is even higher than that revealed in the group of pre-miRNAs analyzed
in our study (compare results shown in Fig. 1.8A with those in Fig. 1.5A). Although
the number of “prototypical” miRNA precursors is smaller (266) than the total
number of pre-miRNAs analyzed by us (456) the statistical significance of the
disproportional distribution of bulges is even higher for the “prototypical” group.
Contrary to that group the observed bias completely disappears in the group of
Fig. 1.7 Statistics of pre-miRNAs containing a high number, moderate number and no structure
destabilizing motifs. (A) The number and frequency (inset) of pre-miRNAs having a different
number of structure distortions in the hairpin stem. Assuming the ~5% threshold we distinguished
three groups of miRNA precursors with a high number (5–7), moderate number (1–4) and no (0)
stem distortion. (B) Distribution of SL and AL motifs in precursors containing a different number
of structural motifs and frequency of SL and AL motifs in precursors having a moderate and high
number of structural motifs (inset)
14 P. Kozlowski et al.
“repeat-derived” miRNAs. Also the distribution of internal asymmetric loops
observed in this study is in line with a lower tolerance of excessive nucleotides in
the 3′ arm of pre-miRNA. Our analysis shows an overrepresentation of loops having
a higher number nucleotides in the 5′ arm (Fig. 1.3A). However, the number of
asymmetric internal loops is relatively small and this effect is not statistically
significant. To find out whether the bulges overrepresented in the 5′ arm of
pre-miRNA are equally distributed along the hairpin structure we compared the
localization of the 5′ and 3′ arm bulges (Fig. 1.8B). It appears from this comparison
that bulges in the 5′arm are not equally distributed but they tend to be clustered at
two sites with maxima at nucleotide positions 11 and 18. These sites could be
involved in the bending of the pre-miRNA structures and/or in interactions with
specific protein domains.
Comprehensive information on the distribution of various structural motifs in
miRNA precursors will be also useful to fine-tune the algorithms used for the ab initio
prediction of miRNA genes. Numerous algorithms have been developed to distin-
guish miRNA precursors from other hairpin structures encoded by genomes [10,
21, 30, 31]. These algorithms use different conservation, thermodynamic, sequence
and structure parameters. The latter include some general parameters such as the
length of the longest fully base-paired stem, terminal loop size, the number of
nucleotides in the symmetric and asymmetric loops including bulges [15, 27, 29] as
well as more specific structural characteristics such as frequency of triplet structure
elements [11, 27, 32]. The results of our study show that there are also other highly
Fig. 1.8 The overrepresentation of bulges in the 5′-arm of prototypical pre-miRNAs. (A) As in
(Fig. 1.5A) but separately for prototypical and repeat-derived classes of miRNA [16].
(B) Localization of bulges in the 5′-arm (black) or 3′-arm (gray) of pre-miRNA structure
1 Structures of MicroRNA Precursors 15
significant features of pre-miRNA structure that might facilitate miRNA gene pre-
diction. These parameters include: strong overrepresentation of bulges in the 5′ arm
of pre-miRNA, the opposite polarity of symmetric and asymmetric motifs distribution
along the hairpin stem and increased contribution of asymmetric motifs when the
total number of stem structure destabilizing motifs in pre-miRNAs increases.
Acknowledgement This work was supported by funding under the Sixth Research Framework
Programme of the European Union, Project RIGHT (LSHB-CT-2004-005276) and by the Ministry
of Science and Higher Education, Grant No. N301 112 32/3910.
References
1. Bartel, D. P. (2004). MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 116,
281–297.
2. Basyuk, E., Suavet, F., Doglio, A., Bordonne, R., and Bertrand, E. (2003). Human let-7 stem-
loop precursors harbor features of RNase III cleavage products. Nucleic Acids Res 31,
6593–6597.
3. Bernstein, E., Caudy, A. A., Hammond, S. M., and Hannon, G. J. (2001). Role for a bidentate
ribonuclease in the initiation step of RNA interference. Nature 409, 363–366.
4. Borchert, G. M., Lanier, W., and Davidson, B. L. (2006). RNA polymerase III transcribes
human microRNAs. Nat Struct Mol Biol 13, 1097–1101.
5. Denli, A. M., Tops, B. B., Plasterk, R. H., Ketting, R. F., and Hannon, G. J. (2004). Processing
of primary microRNAs by the microprocessor complex. Nature 432, 231–235.
6. Gregory, R. I., Chendrimada, T. P., Cooch, N., and Shiekhattar, R. (2005). Human RISC cou-
ples microRNA biogenesis and posttranscriptional gene silencing. Cell 123, 631–640.
7. Gregory, R. I., Yan, K. P., Amuthan, G., Chendrimada, T., Doratotaj, B., Cooch, N., and
Shiekhattar, R. (2004). The microprocessor complex mediates the genesis of microRNAs.
Nature 432, 235–240.
8. Griffiths-Jones, S., Grocock, R. J., van Dongen, S., Bateman, A., and Enright, A. J. (2006). miR-
Base: microRNA sequences, targets and gene nomenclature. Nucleic Acids Res 34, D140–144.
9. Han, J., Lee, Y., Yeom, K. H., Nam, J. W., Heo, I., Rhee, J. K., Sohn, S. Y., Cho, Y., Zhang, B. T.,
and Kim, V. N. (2006). Molecular basis for the recognition of primary microRNAs by the
Drosha-DGCR8 complex. Cell 125, 887–901.
10. Huang, T. H., Fan, B., Rothschild, M. F., Hu, Z. L., Li, K., and Zhao, S. H. (2007). MiRFinder:
an improved approach and software implementation for genome-wide fast microRNA precur-
sor scans. BMC Bioinformatics 8, 341.
11. Jiang, P., Wu, H., Wang, W., Ma, W., Sun, X., and Lu, Z. (2007). MiPred: classification of real
and pseudo microRNA precursors using random forest prediction model with combined fea-
tures. Nucleic Acids Res 35, W339–344.
12. Kim, V. N., and Nam, J. W. (2006). Genomics of microRNA. Trends Genet 22, 165–173.
13. Krol, J., Sobczak, K., Wilczynska, U., Drath, M., Jasinska, A., Kaczynska, D., and
Krzyzosiak, W. J. (2004). Structural features of microRNA (miRNA) precursors and their rel-
evance to miRNA biogenesis and small interfering RNA/short hairpin RNA design. J Biol
Chem 279, 42230–42239.
14. Krol, J., Starega-Roslan, J., Milanowska, K., Nowak, D., Kubiaczyk, E., Nowak, M., Majorek, K.,
Kaminska, K., and Krzyzosiak, W. J. (2006). Structural Features of microRNAs and Their
Precursors, In microRNA: Biology, Function & Expression, N. Clarke, and P. Sanseau, eds.
(DNA Press), Eagleville, PA, pp. 95–110.
15. Lai, E. C., Tomancak, P., Williams, R. W., and Rubin, G. M. (2003). Computational identifica-
tion of Drosophila microRNA genes. Genome Biol 4, R42.
16 P. Kozlowski et al.
16. Landgraf, P., Rusu, M., Sheridan, R., Sewer, A., Iovino, N., Aravin, A., Pfeffer, S., Rice, A.,
Kamphorst, A. O., Landthaler, M., et al. (2007). A mammalian microRNA expression atlas
based on small RNA library sequencing. Cell 129, 1401–1414.
17. Landthaler, M., Yalcin, A., and Tuschl, T. (2004). The human DiGeorge syndrome critical
region gene 8 and its D. melanogaster homolog are required for miRNA biogenesis. Curr Biol
14, 2162–2167.
18. Lee, Y., Ahn, C., Han, J., Choi, H., Kim, J., Yim, J., Lee, J., Provost, P., Radmark, O., Kim, S.,
and Kim, V. N. (2003). The nuclear RNase III Drosha initiates microRNA processing. Nature
425, 415–419.
19. Lee, Y., Kim, M., Han, J., Yeom, K. H., Lee, S., Baek, S. H., and Kim, V. N. (2004).
MicroRNA genes are transcribed by RNA polymerase II. EMBO J 23, 4051–4060.
20. Lewis, B. P., Burge, C. B., and Bartel, D. P. (2005). Conserved seed pairing, often flanked by
adenosines, indicates that thousands of human genes are microRNA targets. Cell 120,
15–20.
21. Lim, L. P., Glasner, M. E., Yekta, S., Burge, C. B., and Bartel, D. P. (2003). Vertebrate micro-
RNA genes. Science 299, 1540.
22. Lund, E., Guttinger, S., Calado, A., Dahlberg, J. E., and Kutay, U. (2004). Nuclear export of
microRNA precursors. Science 303, 95–98.
23. Macrae, I. J., Zhou, K., and Doudna, J. A. (2007). Structural determinants of RNA recognition
and cleavage by Dicer. Nat Struct Mol Biol 14, 934–940.
24. Macrae, I. J., Zhou, K., Li, F., Repic, A., Brooks, A. N., Cande, W. Z., Adams, P. D., and
Doudna, J. A. (2006). Structural basis for double-stranded RNA processing by Dicer. Science
311, 195–198.
25. Pillai, R. S., Bhattacharyya, S. N., Artus, C. G., Zoller, T., Cougot, N., Basyuk, E., Bertrand, E.,
and Filipowicz, W. (2005). Inhibition of translational initiation by Let-7 MicroRNA in human
cells. Science 309, 1573–1576.
26. Provost, P., Dishart, D., Doucet, J., Frendewey, D., Samuelsson, B., and Radmark, O. (2002).
Ribonuclease activity and RNA binding of recombinant human Dicer. EMBO J 21,
5864–5874.
27. Ritchie, W., Legendre, M., and Gautheret, D. (2007). RNA stem-loops: to be or not to be
cleaved by RNAse III. RNA 13, 457–462.
28. Saetrom, P., Snove, O., Nedland, M., Grunfeld, T. B., Lin, Y., Bass, M. B., and Canon, J. R.
(2006). Conserved microRNA characteristics in mammals. Oligonucleotides 16, 115–144.
29. Sewer, A., Paul, N., Landgraf, P., Aravin, A., Pfeffer, S., Brownstein, M. J., Tuschl, T., van
Nimwegen, E., and Zavolan, M. (2005). Identification of clustered microRNAs using an ab initio
prediction method. BMC Bioinformatics 6, 267.
30. Sheng, Y., Engstrom, P. G., and Lenhard, B. (2007). Mammalian MicroRNA prediction
through a support vector machine model of sequence and structure. PLoS ONE 2, e946.
31. Wang, X., Zhang, J., Li, F., Gu, J., He, T., Zhang, X., and Li, Y. (2005). MicroRNA identifica-
tion based on sequence and structure alignment. Bioinformatics 21, 3610–3614.
32. Xue, C., Li, F., He, T., Liu, G. P., Li, Y., and Zhang, X. (2005). Classification of real and
pseudo microRNA precursors using local structure-sequence features and support vector
machine. BMC Bioinformatics 6, 310.
33. Yekta, S., Shih, I. H., and Bartel, D. P. (2004). MicroRNA-directed cleavage of HOXB8
mRNA. Science 304, 594–596.
34. Zhang, H., Kolb, F. A., Brondani, V., Billy, E., and Filipowicz, W. (2002). Human Dicer
preferentially cleaves dsRNAs at their termini without a requirement for ATP. EMBO J 21,
5875–5885.
35. Zhang, H., Kolb, F. A., Jaskiewicz, L., Westhof, E., and Filipowicz, W. (2004). Single processing
center models for human Dicer and bacterial RNase III. Cell 118, 57–68.
36. Zuker, M. (2003). Mfold web server for nucleic acid folding and hybridization prediction.
Nucleic Acids Res 31, 3406–3415.