structures of microrna precursors

17
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/226825590 Structures of MicroRNA Precursors CHAPTER · SEPTEMBER 2008 DOI: 10.1007/978-1-4020-8533-8_1 CITATIONS 6 READS 81 5 AUTHORS, INCLUDING: Piotr Kozlowski Institute of Bioorganic Chemistry Polish Ac… 51 PUBLICATIONS 1,639 CITATIONS SEE PROFILE Julia Starega-Roslan Institute of Bioorganic Chemistry Polish Ac… 12 PUBLICATIONS 273 CITATIONS SEE PROFILE Marcin Magnus Stanford Medicine 7 PUBLICATIONS 54 CITATIONS SEE PROFILE Available from: Julia Starega-Roslan Retrieved on: 10 February 2016

Upload: independent

Post on 20-Nov-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

Seediscussions,stats,andauthorprofilesforthispublicationat:https://www.researchgate.net/publication/226825590

StructuresofMicroRNAPrecursors

CHAPTER·SEPTEMBER2008

DOI:10.1007/978-1-4020-8533-8_1

CITATIONS

6

READS

81

5AUTHORS,INCLUDING:

PiotrKozlowski

InstituteofBioorganicChemistryPolishAc…

51PUBLICATIONS1,639CITATIONS

SEEPROFILE

JuliaStarega-Roslan

InstituteofBioorganicChemistryPolishAc…

12PUBLICATIONS273CITATIONS

SEEPROFILE

MarcinMagnus

StanfordMedicine

7PUBLICATIONS54CITATIONS

SEEPROFILE

Availablefrom:JuliaStarega-Roslan

Retrievedon:10February2016

Chapter 1

Structures of MicroRNA Precursors

Piotr Kozlowski**, Julia Starega-Roslan**, Marta Legacz,

Marcin Magnus, and Wlodzimierz J. Krzyzosiak*

Abstract MicroRNAs are single-stranded regulatory RNAs of 18–25 nucleotide

length generated from endogenous transcripts that form local hairpin structures.

The processing of microRNA transcripts involves the activities of two RNase III

enzymes Drosha and Dicer. In this study we analyzed structural features of human

microRNA precursors that make these transcripts Drosha and Dicer substrates.

The structures of minimal functional primary precursors (pri-microRNAs) and

secondary precursors (pre-microRNAs) were predicted. The frequency, nucleotide

sequence content and the localization of various structure destabilizing motifs was

analyzed. We identified numerous pri-microRNAs which structures strongly depart

from the consensus structure and their processing is hard to explain by the existing

model of the Microprocessor complex. We also found a biased distribution of sym-

metric and asymmetric motifs along the pre-microRNA hairpin stem and an over-

representation of bulges on its 5′ arm (p < 0.000001), which may have considerable

functional implications.

Keywords miRNA biogenesis, Dicer, Drosha, RNA structure prediction, pri-miRNAs,

pre-miRNA structural motifs

1.1 Introduction

MicroRNAs (miRNAs) are a family of short single-stranded noncoding RNAs

identified in many eukaryotes from simple organisms to humans [1, 8]. It is

anticipated that hundreds of miRNAs regulate the expression of thousands of

human genes [20]. MiRNAs regulate gene expression at the posttranscriptional

S.-Y. Ying (ed.) Current Perspectives in microRNAs (miRNA), 1

© Springer Science + Business Media B.V. 2008

Laboratory of Cancer Genetics, Institute of Bioorganic Chemistry, Polish Academy of Science,

Noskowskiego 12/14, 61-704 Poznan, Poland

* Corresponding author: E-mail: [email protected]

** These authors contributed equally to this work.

2 P. Kozlowski et al.

level by programming the RNA induced silencing complex (miRISC) which

interacts with the complementary sequences of mRNAs causing their transla-

tional inhibition or cleavage [25, 33]. Specific miRNAs were shown to be

engaged in the variety of processes such as development, cell proliferation, dif-

ferentiation and apoptosis [12]. The detailed cellular function of the majority of

miRNAs still remains unknown.

The primary transcripts of miRNA genes (pri-miRNAs) are generated by either

RNA polymerase II [19] or RNA polymerase III [4]. The pri-miRNAs, which har-

bor a long stem and loop structure, are processed in the nucleus to shorter, approxi-

mately 60-nt hairpin precursors (pre-miRNAs) (Fig. 1.1A). The nuclear processing

enzyme is ribonuclease Drosha [18] which acts together with the DGCR8 protein

[17] within the Microprocessor complex [5, 7]. Drosha which is the RNaseIII

enzyme usually leaves a 2 nt overhang at the 3′-end of pre-miRNA and defines one

end of mature miRNA [2]. The pre-miRNAs are then exported to cytoplasm by

Exportin-5 [22] and further processed to miRNA duplexes by another RNaseIII

enzyme Dicer [3, 26, 34] which defines the other miRNA end. Thus, the two RNA

processing steps reduce the stem of the primary precursor hairpin into its internal

portion, which usually is the imperfect duplex containing a functional miRNA

strand (Fig. 1.1B). The presence of structure imperfection within this duplex facili-

tates its non-miRNA strand to be later disposed from the miRISC using the

“bypass” rather than the cleavage mechanism [6].

It is clear that the structures of miRNA precursors are instrumental for their

proper recognition and correct cleavages by the processing complexes containing

Drosha and Dicer. Therefore, to analyze the structural aspects of both the nuclear

and cytoplasmic steps of miRNA biogenesis the structures of the primary and secondary

miRNA precursors need to be established. This can be done either by experiment

or computational structure prediction. The latter approach gives the reliable struc-

tures of miRNA precursors that were confirmed in most of the investigated cases

by experimental analysis [13]. The computational approach is also much faster thus

better suited for structure analysis on a large scale. In order to predict the secondary

structures of miRNA precursors their nucleotide sequences need to be known first.

These sequences, however, are not available in the existing miRNA databases. For

the purpose of this study the sequences of pre-miRNAs were reconstructed from the

sequences of mature miRNAs as described earlier [14]. To analyze the pri-miRNA

sequences the concept of a “minimal” functional precursor was adapted [9]. The

pri-miRNA precursors may be very long and as such they are not amenable for a

detailed structure analysis. Therefore, we analyzed minimal pri-miRNAs which

were considered to be the shortest fragments of primary precursors that contain all

sequence and structure elements required to be functional substrates for a

Microprocessor. The difference between our approach and that described earlier [9]

was that not a single length but several different lengths of sequences harboring

miRNA were analyzed. Both the reconstructed sequences of pre-miRNAs and arbi-

trarily selected sequences of minimal pri-miRNAs were then subjected to structure

prediction and a detailed analysis of their secondary structures. We intended to learn

more about the occurrence and localization of different types of secondary structure

Fig. 1.1 Biogenesis of miRNA. (A) Steps of miRNA biogenesis. Proteins involved in substrate

recognition and precursor processing are shown. (B) Schemes of the Microprocessor complex and

RNase Dicer interacting with their substrates, pri-miRNA and pre-miRNA, respectively. Arrows

indicate Drosha and Dicer cleavage sites. RIIIa and RIIIb indicate the RNase domains responsible

for cleavage. dsRBD denotes the double-stranded RNA binding domain. The fragment which cor-

responds to miRNA is marked in gray. (C) Scheme of minimal precursor pri-miRNA. The SD

(single-stranded–double-stranded) and SL (stem-loop) junctions, internal distances and analyzed

region are indicated. SD is postulated to be the DGCR8 protein binding site

1 Structures of MicroRNA Precursors 3

motifs within the precursor hairpin. A comprehensive inventory of such motifs was

generated and analyzed in relevance to the Drosha and Dicer steps of miRNA

biogenesis.

4 P. Kozlowski et al.

1.2 Materials and Methods

1.2.1 Analysis of Pri-miRNA Structures

The sequences of minimal pri-miRNAs of the four length variants 110, 130, 150

and 300 nt were precut from longer sequences withdrawn from the GenBank for all

461 miRNAs deposited in the miRBase database version 8.2. [8]. To obtain the

minimal pri-miRNA sequence of selected length, the natural sequence extensions

of the required and equal length were added to each end of the pre-miRNA. These

1,844 sequences were subjected to secondary structure prediction by free energy

minimization using the Mfold program [36]. The suboptimality parameter was set

at 5% which means that all structures that have the free energy of the formation

(∆G) 5% higher than the lowest energy structure were also shown. For further

analysis only the lowest energy structures were taken.

The critical parameter analyzed in the predicted structures was the length of the

base-paired region in which only minor structure distorting motifs were allowed to

exist. Only those minimal pri-miRNAs were taken for further analysis which had the

same structure of the analyzed fragment in at least the 150 and 300 nt length variants.

There were 246 such precursors, and among them were 180 pri-miRNAs which had

the same structure of the fragment of interest in all four length variants.

1.2.2 Analysis of Pre-miRNA Structures

Prior to the structure analysis of pre-miRNAs their nucleotide sequences were

reconstructed following the rules described earlier [14]. Briefly, a one arm terminus

of the precursor hairpin was defined by one end of the miRNA sequence and the

second arm terminus was defined either by the miRNA* end or assuming the exist-

ence of the 2 nt 3′ overhang at the Drosha cleavage site. The secondary structures

of the pre-miRNAs were predicted using Mfold as described above for pri-miRNAs.

All secondary structure motifs present in the lowest free energy structures were

catalogued in the format that included the number and sequence of nucleotides

present in the specific motif, the motif orientation and its localization within the

precursor structure. These motifs were classified into two major groups: symmetric

internal loops (SL) and asymmetric internal loops (AL). The first group includes

both single nucleotide mismatches SL1:1 and longer symmetric loops SL2:2, SL3:3

etc. The second group includes bulges of different length AL0:1, AL0:2, AL0:3 etc.

and asymmetric internal loops ALX:Y where both X and Y are different from 0.

Thus, each motif is denoted by two numbers separated by the colon. The number

or sequence before and after the colon denotes nucleotides from the precursor 5′

precursor arm and 3′ arm, respectively. For example, the single nucleotide bulge

“a” located in the 5′ arm is denoted either AL1:0 or a:0. The localization of the

motif was numbered counting from the terminal nucleotide at the 5′ arm of the

1 Structures of MicroRNA Precursors 5

pre-miRNA. To unify the position numbering system for all types of structural

motifs the position of nucleotide directly preceding any specific motif was assigned

as the motif position.

1.2.3 Statistical Methods

For asymmetric motifs their equal distribution between two pre-miRNA arms was

assumed (null hypothesis). To assess the potential deviation from this distribution

the chi2-squared test was applied using Statistica (StatSoft, Tulsa, OK) or Prism v. 4.0

(GraphPad Software, San Diego, CA). To compare the distribution of symmetric

vs. asymmetric loops in pre-miRNAs having a moderate and high number of stem

structure distorting motifs the Fisher exact test was used for the 2 × 2 contingency

table analysis (programs as above). Where applicable, the Bonferroni adjustment

for multiple comparisons was used.

1.3 Results

1.3.1 Pri-miRNA Structure and Drosha

Step of miRNA Biogenesis

In agreement with the recently proposed model of the Microprocessor structure the

DGCR8 protein binds to the base of the pri-miRNA hairpin stem and forms a plat-

form for Drosha binding and precursor cleavage [9] (Fig. 1.1B). DGCR8 anchors

to the single strand-double strand (SD) junction in the structure of pri-miRNA. The

consensus structure of minimal pri-miRNA was established based on the analysis

of the predicted structures of numerous human primary precursors and the structure

prediction was performed using the 110 nt long sequence for each pri-miRNA [9].

In light of the fact that human pre-miRNAs span the length range of 42–82 nt [14],

the length of 110 nt seemed insufficient for the reliable minimal pri-miRNA struc-

ture prediction. To minimize the risk of taking an incorrect structure into considera-

tion besides the 110 nt also three longer sequences harboring miRNAs were used

for structure prediction in our study. Such an approach was undertaken because the

structures generated by computer programs used for RNA structure prediction by

free energy minimization are more trusted if the same critical domains are predicted

from the sequences of different lengths. The detailed analysis of the predicted struc-

tures of minimal pri-miRNA precursors was focused on the fragment localized

between the pre-miRNA ends and SD junction (Fig. 1.1C), which was proposed to

play a critical role in pri-miRNA recognition by the Microprocessor [9]. In the

consensus minimal pri-miRNA structure this region spanned 11 bp, which equals to

one helical turn of A-RNA. We wanted to find out whether the consensus minimal

6 P. Kozlowski et al.

pri-miRNA structure is correct and find the most deviant structures which still

remain substrates for the Microprocessor. It turned out from this analysis that there

are indeed pri-miRNAs which secondary structures ideally fit to the consensus

structure e.g. pri-miR-33 but there are also precursors which have the analyzed

region either much shorter e.g. pri-miR-656 or much longer e.g. pri-miR-607.

However, in the majority of the analyzed structures (62.6%) the SD junction was

located 9–13 nt below the Drosha cleavage site which is in rough agreement with

the consensus structure proposed by Han et al. [9] and confirmed by Saetrom et al.

[28]. This region was shorter in 13% of the analyzed precursors and longer in

19.5% of the precursors (Fig. 1.2). It is difficult to fit such precursors, especially

their extreme examples, into the presently accepted model of pri-miRNA process-

ing by the Microprocessor complex (Fig. 1.1C). This may suggest that either some

alternative models of Microprocessor architecture need to be considered or precur-

sors having structures most deviant from the consensus are processed in an entirely

different way.

1.3.2 Pre-miRNA Structure and Dicer Step of miRNA Biogenesis

The structural insights into Dicer function came from both biochemical studies [35]

and crystallography [23, 24]. In Fig. 1.1B the commonly accepted model of a Dicer

single processing center is shown [35]. Human Dicer is composed of several

Fig. 1.2 Pri-miRNA stem length distribution. The length distribution of base-paired stems having

either full base complementarity or only minor disruptions (small internal loops, bulges) within

the analyzed fragment in 243 minimal pri-miRNA precursors. Above the graph schematic struc-

tures representing three classes of such precursors are shown. They differ in the length of the

analyzed fragment

1 Structures of MicroRNA Precursors 7

functional domains: the PAZ domain, which is used for high-affinity binding to the 3′

overhanging nucleotides of pre-microRNA, the helicase domain, the DUF283 subunit,

the dsRNA binding domain and two catalytic RNase III domains that form the

intramolecular dimer during pre-miRNA cleavage [24, 35]. Thus, Dicer functions as a

molecular ruler and cleaves the pre-miRNA hairpin about two helical turns away from

the hairpin base to produce duplexes containing 18–24 nt long miRNAs. It is intuitively

understood that the length diversity of miRNAs has its source in the structural features

of pre-miRNA hairpins which rarely contain perfectly base paired stems. Usually the

single nucleotide mismatches, the larger symmetric internal loops, bulges and the

asymmetric internal loops break the regularity of the pre-miRNA double helical stem

structure and may influence both the efficiency of Dicer binding and specificity of

precursor cleavage. Therefore, a detailed analysis of the predicted secondary structures

of pre-miRNAs was performed to search for such structure distortions.

Out of the 461 nucleotide sequences of human pre-miRNAs which were sub-

jected to structure prediction nearly all (456) formed hairpins as the lowest free

energy structures. In these precursor hairpins as many as 1,243 secondary structure

motifs destabilizing stem structures were found altogether and the occurrence of

various types of motifs in each arm of the hairpin stem is shown in Fig. 1.3. These

motifs include 631 symmetric internal loops of various sizes including single nucle-

otide mismatches (SL) and a similar number (612) of asymmetric internal loops

including bulges (AL) (Fig. 1.3B). This means that 2.73 motifs (0.97 mismatches,

Fig. 1.3 The frequency of structural motifs in stems of 456 analyzed pre-miRNAs. (A) The

chessboard-like table shows in numbers the occurrences of each type of structural motifs identi-

fied in the predicted structures of miRNAs. These motifs are shown as pairwise combinations of

unpaired nucleotides present in precursor 5′ and 3′ arm. E.g. symmetric loops SL are located

diagonally and bulges along the 0 column. In this and the subsequent figures the motifs under

consideration are shadowed. (B) The total number of symmetric loops (SL) including mismatches

(SL1:1) and asymmetric loops (AL) including all types of bulges

8 P. Kozlowski et al.

0.42 symmetric loops of different length (2–5), 0.96 bulges and 0.38 asymmetric

internal loops) occur, on average, per analyzed pre-miRNA.

1.3.3 Symmetric Loops Including Single Nucleotide

Mismatches in Pre-miRNA Structures

In cataloguing the structural motifs present in pre-miRNAs we have looked not

only at their type and size but also at their nucleotide sequence and orientation

within the precursor hairpin. This allowed us to count symmetric internal loops

containing a different number of nucleotides and divide them into sequence and

orientation- specific subgroups. Figure 1.4A shows the number of occurrences of

single nucleotide mismatches and 2–5 nt long symmetric internal loops as well as

the number of occurrences of different nucleotides and sequences present in these

motifs. It is apparent that the frequency of symmetric loops decreases with their

size. The single nucleotide mismatches are most frequent and account for almost

2/3 of the total number of symmetric internal loops. The largest is the 5 nt long

loop identified only once in hsa-mir-196a-1. As shown in Fig. 1.4A all ten possi-

ble base combinations of single nucleotide mismatches are represented in the

pre-miRNAs and 61 different combinations of 2 nt long internal loops. For the

3–5 nt long internal loops the number of different sequence classes is almost

equal to the total number of such loops. This means that almost every symmetric

internal loop formed by more than two adjacent nucleotides has a different

sequence and there is no preference of any specific sequence within such loops.

For the single nucleotide mismatches and 2 nt long internal loops we analyzed

their distribution between different sequence classes (Figs. 1.4B, C, respectively).

It turned out that the a:c and c:u are most frequent among the former and the least

frequent is the c:c mismatch (Fig. 1.4B). The orientation analysis of single nucle-

otide mismatches shows that most of them are rather equally distributed in both

orientations with the exception of the c:u in which u is more frequent on the

5′ arm and c on the 3′ arm (36 and 58 c:u and u:c mismatches, respectively).

However, this distribution is only marginally significant (ch2; p-val = 0.02) and

not significant after Bonferroni correction. The 2 nt internal loops are almost ran-

domly distributed among sequence subclasses and clear ug:ug overrepresentation

is only observed (19 occurrences) (Fig. 1.4C).

1.3.4 Asymmetric Internal Loops Including

Bulges in Pre-miRNA Structures

As many as 437 bulges account for the majority of asymmetric loops identified in the

analyzed pool of pre-miRNAs. These bulges vary in size from 1 nt (293 occurrences)

to 11 nt (single occurrence) (Fig. 1.5A). The frequency of bulges decreases with bulge

1 Structures of MicroRNA Precursors 9

size with almost a perfect exponential correlation (r2 = 0.97). The distribution of the

most frequent bulges (1–3 nt) between the precursor hairpin arms shows their over-

representation in the 5′ arm (Fig. 1.5A). Testing the null hypothesis that bulges are

equally distributed between the 5′ and 3′ arms we showed that the total overrepresen-

tation of bulges in the 5′ arm is very significant (chi2 p-val < 0.000001). Individual

chi2 p-values for the 1-, 2- and 3 nt bulges are 0.00002, 0.014 and 0.23 respectively.

The nucleotides present in the single nucleotide bulges and sequences present in the

Fig. 1.4 The symmetric loops in pre-miRNAs. (A) The number of single nucleotide mismatches

and symmetric loops containing different numbers of unpaired nucleotides within the loop. The

number of different nucleotide combinations in symmetric loops (dark gray), and total number of

symmetric loops (light gray). (B) The number of different types of mismatches with their orienta-

tion taken into account and cumulative number (inset). (C) As in (B) but for 2 nt symmetric loops

10 P. Kozlowski et al.

2 nt bulges are shown in Fig. 1.5B, C, respectively. Among the former the most

frequent is u and least frequent is g (Fig. 1.5B). The 2 nt bulges are almost equally

distributed over sequence variants and almost all combinations of 2 nt sequences

occur (except for gc) (Fig. 1.5C). The asymmetric internal loops containing a different

number of unpaired nucleotides on each side constitute a smaller and more hetero-

genous group (177 occurrences). In this group the small motifs such as AL1:2 and

AL2:1 are most frequent but single cases of large motifs e.g. AL6:4 and AL3:10 were

also found. The analysis of their sequence contents did not reveal any significant

preferences.

Fig. 1.5 The bulges in pre-miRNAs. (A) (center) The total number of bulges containing a differ-

ent number of nucleotides (1–11 nt). (right) For the most frequent 1–3 nt bulges the total number

of bulges was split between two orientations 0:X and X:0 for bulges in the 3′ and 5′ arm of

pre-miRNA, respectively. (B) The number of different nucleotides in single-nucleotide bulges.

The 0:1 and 1:0 orientations are shown separately. (C) As in (B) but for different combinations of

nucleotides in 2 nt bulges

1 Structures of MicroRNA Precursors 11

1.3.5 Localization of Structural Motifs

Within the Pre-miRNA Hairpin Stem

The proper localization of structural motifs in pre-miRNAs may facilitate the adapta-

tion of precursor structures to the interacting proteins of miRNA biogenesis

machinery. It may create a suitable environment for interaction with the specific

RNA binding motifs of proteins and serve as a code for structure-specific RNA

recognition. Therefore we looked in this study also at the localization of structural

motifs in the pre-miRNA hairpin stem. As shown in Fig. 1.6 there is no specific

position in which the single nucleotide mismatches and symmetric internal loops

would be either over represented or under represented (Fig. 1.6A, B). However, a

clear trend is observed to decrease the frequency of these motifs in going from the

precursor base towards the terminal loop. This trend is most clear for single nucle-

otide mismatches and 2 nt long symmetric internal loops. The number of longer

symmetric loops (4 and 5 nt) is too low to see any trend (Fig. 1.6A). Interestingly,

the opposite trend is observed for asymmetric motifs the frequency of which

increases in the same direction (Fig. 1.6C).

1.4 Discussion

The bioinformatics survey of miRNA precursor structures which was performed in

this study provides a comprehensive insight into the structural variety of both

pri-miRNAs and pre-miRNAs. This insight may be considered as next step towards

a better understanding of the role of RNA structure in miRNA biogenesis. The

obtained gallery of predicted structures of miRNA precursors will guide the selection

of specific precursors for a more detailed experimental analysis of their struc-

tures and studies of their interactions with Drosha and Dicer protein complexes.

The structural features of the precursors of numerous known miRNAs will also help

to refine algorithms used for the identification of novel miRNA genes in genomes.

In addition, the structural information gathered in this study may be relevant to the

process of RISC loading by miRNA/miRNA* duplexes that may retain the structure

imperfections present within miRNA precursors.

We catalogued the rich repertoire of secondary structure motifs destabilizing and

distorting the stem structures of pre-miRNAs paying attention to the nucleotide

sequences present within these motifs and motif localization. The detailed analysis

of this data collection revealed that with some exceptions there are only minor pref-

erences for specific sequences in the destabilizing motifs present in pre-miRNA

structures. This means that protein complexes involved in miRNA biogenesis use

structure rather than sequence code for precursor recognition. As shown in this study

there are about 2.7 stem structure destabilizing motifs in the average pre-miRNA

hairpin. The number of pre-miRNAs containing a different number of such motifs is

almost normally distributed with extreme numbers being 0 and 7 motifs per pre-

miRNA (Fig. 1.7A). Taking into account this distribution and assuming somehow

12 P. Kozlowski et al.

Fig. 1.6 The localization of

structural motifs in the pre-miRNA

hairpin structure. (A) The locali-

zation of mismatches SL1:1 and

symmetric internal loops having

a different number of nucleotides

SL2:2 – SL5:5 within precursor

structure. (B) The cumulative

number of mismatched nucle-

otides along the pre-miRNA

hairpin. (C) Localization of

bulges within the precursor

structure

1 Structures of MicroRNA Precursors 13

arbitrarily the ~5% threshold of pre-miRNAs with extreme numbers of structural

motifs we divided all the analyzed pre-miRNAs into three classes: (1) having 0 structure

destabilizing motifs (2%), (2) containing a moderate number (1–4) of such motifs

(93%), and (3) harboring high number (5–7) motifs (5%). The analysis of the symmetric

and asymmetric loops distribution revealed a gradual increase of asymmetric motifs

with the total number of motifs in pre-miRNAs. When we compared the frequency

of SL vs. AL in pre-miRNAs with a moderate and high number of motifs it showed

significant excess of AL in pre-miRNAs containing a high total number of motifs

(p-val = 0.0005) (Fig. 1.7B). This could reflect the compensatory effect to balance

the structure distortion introduced by one bending motif by another.

The biased distribution of bulges observed in this study consisted of their over-

representation in the 5′arm of pre-miRNA. To validate this result we analyzed the

distribution of bulges also in the group of “prototypical” pre-miRNAs recently

distinguished by Tuschl’s group [16] on the basis of the precise miRNA 5′ end

processing, sequence conservation and high number of putative target sites [16].

We have shown that in the “prototypical” group the overrepresentation of bulges in

the 5′ arm is even higher than that revealed in the group of pre-miRNAs analyzed

in our study (compare results shown in Fig. 1.8A with those in Fig. 1.5A). Although

the number of “prototypical” miRNA precursors is smaller (266) than the total

number of pre-miRNAs analyzed by us (456) the statistical significance of the

disproportional distribution of bulges is even higher for the “prototypical” group.

Contrary to that group the observed bias completely disappears in the group of

Fig. 1.7 Statistics of pre-miRNAs containing a high number, moderate number and no structure

destabilizing motifs. (A) The number and frequency (inset) of pre-miRNAs having a different

number of structure distortions in the hairpin stem. Assuming the ~5% threshold we distinguished

three groups of miRNA precursors with a high number (5–7), moderate number (1–4) and no (0)

stem distortion. (B) Distribution of SL and AL motifs in precursors containing a different number

of structural motifs and frequency of SL and AL motifs in precursors having a moderate and high

number of structural motifs (inset)

14 P. Kozlowski et al.

“repeat-derived” miRNAs. Also the distribution of internal asymmetric loops

observed in this study is in line with a lower tolerance of excessive nucleotides in

the 3′ arm of pre-miRNA. Our analysis shows an overrepresentation of loops having

a higher number nucleotides in the 5′ arm (Fig. 1.3A). However, the number of

asymmetric internal loops is relatively small and this effect is not statistically

significant. To find out whether the bulges overrepresented in the 5′ arm of

pre-miRNA are equally distributed along the hairpin structure we compared the

localization of the 5′ and 3′ arm bulges (Fig. 1.8B). It appears from this comparison

that bulges in the 5′arm are not equally distributed but they tend to be clustered at

two sites with maxima at nucleotide positions 11 and 18. These sites could be

involved in the bending of the pre-miRNA structures and/or in interactions with

specific protein domains.

Comprehensive information on the distribution of various structural motifs in

miRNA precursors will be also useful to fine-tune the algorithms used for the ab initio

prediction of miRNA genes. Numerous algorithms have been developed to distin-

guish miRNA precursors from other hairpin structures encoded by genomes [10,

21, 30, 31]. These algorithms use different conservation, thermodynamic, sequence

and structure parameters. The latter include some general parameters such as the

length of the longest fully base-paired stem, terminal loop size, the number of

nucleotides in the symmetric and asymmetric loops including bulges [15, 27, 29] as

well as more specific structural characteristics such as frequency of triplet structure

elements [11, 27, 32]. The results of our study show that there are also other highly

Fig. 1.8 The overrepresentation of bulges in the 5′-arm of prototypical pre-miRNAs. (A) As in

(Fig. 1.5A) but separately for prototypical and repeat-derived classes of miRNA [16].

(B) Localization of bulges in the 5′-arm (black) or 3′-arm (gray) of pre-miRNA structure

1 Structures of MicroRNA Precursors 15

significant features of pre-miRNA structure that might facilitate miRNA gene pre-

diction. These parameters include: strong overrepresentation of bulges in the 5′ arm

of pre-miRNA, the opposite polarity of symmetric and asymmetric motifs distribution

along the hairpin stem and increased contribution of asymmetric motifs when the

total number of stem structure destabilizing motifs in pre-miRNAs increases.

Acknowledgement This work was supported by funding under the Sixth Research Framework

Programme of the European Union, Project RIGHT (LSHB-CT-2004-005276) and by the Ministry

of Science and Higher Education, Grant No. N301 112 32/3910.

References

1. Bartel, D. P. (2004). MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 116,

281–297.

2. Basyuk, E., Suavet, F., Doglio, A., Bordonne, R., and Bertrand, E. (2003). Human let-7 stem-

loop precursors harbor features of RNase III cleavage products. Nucleic Acids Res 31,

6593–6597.

3. Bernstein, E., Caudy, A. A., Hammond, S. M., and Hannon, G. J. (2001). Role for a bidentate

ribonuclease in the initiation step of RNA interference. Nature 409, 363–366.

4. Borchert, G. M., Lanier, W., and Davidson, B. L. (2006). RNA polymerase III transcribes

human microRNAs. Nat Struct Mol Biol 13, 1097–1101.

5. Denli, A. M., Tops, B. B., Plasterk, R. H., Ketting, R. F., and Hannon, G. J. (2004). Processing

of primary microRNAs by the microprocessor complex. Nature 432, 231–235.

6. Gregory, R. I., Chendrimada, T. P., Cooch, N., and Shiekhattar, R. (2005). Human RISC cou-

ples microRNA biogenesis and posttranscriptional gene silencing. Cell 123, 631–640.

7. Gregory, R. I., Yan, K. P., Amuthan, G., Chendrimada, T., Doratotaj, B., Cooch, N., and

Shiekhattar, R. (2004). The microprocessor complex mediates the genesis of microRNAs.

Nature 432, 235–240.

8. Griffiths-Jones, S., Grocock, R. J., van Dongen, S., Bateman, A., and Enright, A. J. (2006). miR-

Base: microRNA sequences, targets and gene nomenclature. Nucleic Acids Res 34, D140–144.

9. Han, J., Lee, Y., Yeom, K. H., Nam, J. W., Heo, I., Rhee, J. K., Sohn, S. Y., Cho, Y., Zhang, B. T.,

and Kim, V. N. (2006). Molecular basis for the recognition of primary microRNAs by the

Drosha-DGCR8 complex. Cell 125, 887–901.

10. Huang, T. H., Fan, B., Rothschild, M. F., Hu, Z. L., Li, K., and Zhao, S. H. (2007). MiRFinder:

an improved approach and software implementation for genome-wide fast microRNA precur-

sor scans. BMC Bioinformatics 8, 341.

11. Jiang, P., Wu, H., Wang, W., Ma, W., Sun, X., and Lu, Z. (2007). MiPred: classification of real

and pseudo microRNA precursors using random forest prediction model with combined fea-

tures. Nucleic Acids Res 35, W339–344.

12. Kim, V. N., and Nam, J. W. (2006). Genomics of microRNA. Trends Genet 22, 165–173.

13. Krol, J., Sobczak, K., Wilczynska, U., Drath, M., Jasinska, A., Kaczynska, D., and

Krzyzosiak, W. J. (2004). Structural features of microRNA (miRNA) precursors and their rel-

evance to miRNA biogenesis and small interfering RNA/short hairpin RNA design. J Biol

Chem 279, 42230–42239.

14. Krol, J., Starega-Roslan, J., Milanowska, K., Nowak, D., Kubiaczyk, E., Nowak, M., Majorek, K.,

Kaminska, K., and Krzyzosiak, W. J. (2006). Structural Features of microRNAs and Their

Precursors, In microRNA: Biology, Function & Expression, N. Clarke, and P. Sanseau, eds.

(DNA Press), Eagleville, PA, pp. 95–110.

15. Lai, E. C., Tomancak, P., Williams, R. W., and Rubin, G. M. (2003). Computational identifica-

tion of Drosophila microRNA genes. Genome Biol 4, R42.

16 P. Kozlowski et al.

16. Landgraf, P., Rusu, M., Sheridan, R., Sewer, A., Iovino, N., Aravin, A., Pfeffer, S., Rice, A.,

Kamphorst, A. O., Landthaler, M., et al. (2007). A mammalian microRNA expression atlas

based on small RNA library sequencing. Cell 129, 1401–1414.

17. Landthaler, M., Yalcin, A., and Tuschl, T. (2004). The human DiGeorge syndrome critical

region gene 8 and its D. melanogaster homolog are required for miRNA biogenesis. Curr Biol

14, 2162–2167.

18. Lee, Y., Ahn, C., Han, J., Choi, H., Kim, J., Yim, J., Lee, J., Provost, P., Radmark, O., Kim, S.,

and Kim, V. N. (2003). The nuclear RNase III Drosha initiates microRNA processing. Nature

425, 415–419.

19. Lee, Y., Kim, M., Han, J., Yeom, K. H., Lee, S., Baek, S. H., and Kim, V. N. (2004).

MicroRNA genes are transcribed by RNA polymerase II. EMBO J 23, 4051–4060.

20. Lewis, B. P., Burge, C. B., and Bartel, D. P. (2005). Conserved seed pairing, often flanked by

adenosines, indicates that thousands of human genes are microRNA targets. Cell 120,

15–20.

21. Lim, L. P., Glasner, M. E., Yekta, S., Burge, C. B., and Bartel, D. P. (2003). Vertebrate micro-

RNA genes. Science 299, 1540.

22. Lund, E., Guttinger, S., Calado, A., Dahlberg, J. E., and Kutay, U. (2004). Nuclear export of

microRNA precursors. Science 303, 95–98.

23. Macrae, I. J., Zhou, K., and Doudna, J. A. (2007). Structural determinants of RNA recognition

and cleavage by Dicer. Nat Struct Mol Biol 14, 934–940.

24. Macrae, I. J., Zhou, K., Li, F., Repic, A., Brooks, A. N., Cande, W. Z., Adams, P. D., and

Doudna, J. A. (2006). Structural basis for double-stranded RNA processing by Dicer. Science

311, 195–198.

25. Pillai, R. S., Bhattacharyya, S. N., Artus, C. G., Zoller, T., Cougot, N., Basyuk, E., Bertrand, E.,

and Filipowicz, W. (2005). Inhibition of translational initiation by Let-7 MicroRNA in human

cells. Science 309, 1573–1576.

26. Provost, P., Dishart, D., Doucet, J., Frendewey, D., Samuelsson, B., and Radmark, O. (2002).

Ribonuclease activity and RNA binding of recombinant human Dicer. EMBO J 21,

5864–5874.

27. Ritchie, W., Legendre, M., and Gautheret, D. (2007). RNA stem-loops: to be or not to be

cleaved by RNAse III. RNA 13, 457–462.

28. Saetrom, P., Snove, O., Nedland, M., Grunfeld, T. B., Lin, Y., Bass, M. B., and Canon, J. R.

(2006). Conserved microRNA characteristics in mammals. Oligonucleotides 16, 115–144.

29. Sewer, A., Paul, N., Landgraf, P., Aravin, A., Pfeffer, S., Brownstein, M. J., Tuschl, T., van

Nimwegen, E., and Zavolan, M. (2005). Identification of clustered microRNAs using an ab initio

prediction method. BMC Bioinformatics 6, 267.

30. Sheng, Y., Engstrom, P. G., and Lenhard, B. (2007). Mammalian MicroRNA prediction

through a support vector machine model of sequence and structure. PLoS ONE 2, e946.

31. Wang, X., Zhang, J., Li, F., Gu, J., He, T., Zhang, X., and Li, Y. (2005). MicroRNA identifica-

tion based on sequence and structure alignment. Bioinformatics 21, 3610–3614.

32. Xue, C., Li, F., He, T., Liu, G. P., Li, Y., and Zhang, X. (2005). Classification of real and

pseudo microRNA precursors using local structure-sequence features and support vector

machine. BMC Bioinformatics 6, 310.

33. Yekta, S., Shih, I. H., and Bartel, D. P. (2004). MicroRNA-directed cleavage of HOXB8

mRNA. Science 304, 594–596.

34. Zhang, H., Kolb, F. A., Brondani, V., Billy, E., and Filipowicz, W. (2002). Human Dicer

preferentially cleaves dsRNAs at their termini without a requirement for ATP. EMBO J 21,

5875–5885.

35. Zhang, H., Kolb, F. A., Jaskiewicz, L., Westhof, E., and Filipowicz, W. (2004). Single processing

center models for human Dicer and bacterial RNase III. Cell 118, 57–68.

36. Zuker, M. (2003). Mfold web server for nucleic acid folding and hybridization prediction.

Nucleic Acids Res 31, 3406–3415.