the rna base-pairing problem and base- pairing solutions

19
The RNA Base-Pairing Problem and Base- Pairing Solutions Zhipeng Lu and Howard Y. Chang Center for Personal Dynamic Regulomes, Stanford University, Stanford, California 94305 Correspondence: [email protected] SUMMARY RNA molecules are folded into structures and complexes to perform a wide variety of functions. Determination of RNA structures and their interactions is a fundamental problem in RNA biology. Most RNA molecules in living cells are large and dynamic, posing unique challenges to structure analysis. Here we review progress in RNA structure analysis, focusing on methods that use the cross-link, proximally ligate, and sequenceprinciple for high-throughput detec- tion of base-pairing interactions in living cells. Beginning with a comparison of commonly used methods in structure determination and a brief historical account of psoralen cross-linking studies, we highlight the important features of cross-linking methods and new biological in- sights into RNA structures and interactions from recent studies. Further improvement of these cross-linking methods and application to previously intractable problems will shed new light on the mechanisms of the modern RNA world.Outline 1 The everlasting RNA structure problem and ever-evolving solutions 2 Historical use of psoralen to analyze RNA structures and interactions 3 The next generation: Cross-linking with proximity ligation and high-throughput sequencing 4 Basic analysis of RNA structure cross-linking data 5 Prevalent long-range structures in the transcriptome 6 Conservation analysis of the RNA structures 7 Dynamic and complex structures: Catching RNA in action 8 Architecture of the XIST RNP: Modular structures for modular functions 9 Novel RNARNA interactions: The molecular social network 10 Limitations of the psoralen cross-linking methods and future directions 11 Integrating methods to solve complex RNA structures and interactions References Editors: Thomas R. Cech, Joan A. Steitz, and John F. Atkins Additional Perspectives on RNA Worlds available at www.cshperspectives.org Copyright # 2018 Cold Spring Harbor Laboratory Press; all rights reserved; doi: 10.1101/cshperspect.a034926 Cite this article as Cold Spring Harb Perspect Biol 2018;10:a034926 1 on April 20, 2022 - Published by Cold Spring Harbor Laboratory Press http://cshperspectives.cshlp.org/ Downloaded from

Upload: others

Post on 21-Apr-2022

22 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The RNA Base-Pairing Problem and Base- Pairing Solutions

The RNA Base-Pairing Problem and Base-Pairing Solutions

Zhipeng Lu and Howard Y. Chang

Center for Personal Dynamic Regulomes, Stanford University, Stanford, California 94305

Correspondence: [email protected]

SUMMARY

RNAmolecules are folded into structures and complexes to perform awide variety of functions.Determination of RNA structures and their interactions is a fundamental problem in RNAbiology. Most RNA molecules in living cells are large and dynamic, posing unique challengesto structure analysis. Here we review progress in RNA structure analysis, focusing on methodsthat use the “cross-link, proximally ligate, and sequence” principle for high-throughput detec-tion of base-pairing interactions in living cells. Beginning with a comparison of commonlyusedmethods in structure determination and a brief historical account of psoralen cross-linkingstudies, we highlight the important features of cross-linking methods and new biological in-sights into RNA structures and interactions from recent studies. Further improvement of thesecross-linking methods and application to previously intractable problems will shed new lighton the mechanisms of the “modern RNAworld.”

Outline

1 The everlasting RNA structure problem andever-evolving solutions

2 Historical use of psoralen to analyze RNAstructures and interactions

3 The next generation: Cross-linkingwith proximity ligation andhigh-throughput sequencing

4 Basic analysis of RNA structure cross-linkingdata

5 Prevalent long-range structuresin the transcriptome

6 Conservation analysis of the RNA structures

7 Dynamic and complex structures: CatchingRNA in action

8 Architecture of the XIST RNP: Modularstructures for modular functions

9 Novel RNA–RNA interactions: The molecularsocial network

10 Limitations of the psoralen cross-linkingmethods and future directions

11 Integrating methods to solve complex RNAstructures and interactions

References

Editors: Thomas R. Cech, Joan A. Steitz, and John F. AtkinsAdditional Perspectives on RNAWorlds available at www.cshperspectives.org

Copyright # 2018 Cold Spring Harbor Laboratory Press; all rights reserved; doi: 10.1101/cshperspect.a034926Cite this article as Cold Spring Harb Perspect Biol 2018;10:a034926 1

on April 20, 2022 - Published by Cold Spring Harbor Laboratory Press http://cshperspectives.cshlp.org/Downloaded from

Page 2: The RNA Base-Pairing Problem and Base- Pairing Solutions

1 THE EVERLASTING RNA STRUCTURE PROBLEMAND EVER-EVOLVING SOLUTIONS

Ever since the discovery of theDNA structure, specific base-pairing has been seen as a perfect mechanism of heredity,as Watson and Crick postulated “a possible copying mech-anism for the genetic material” (Watson and Crick 1953).Although DNA structures are relatively limited in variety,RNA structures are more diverse, and their essential roleshave been repeatedly discovered and shown in manyfundamental RNA-centric processes that make up life onEarth. As the driving force in the formation of helices, RNAbase-pairing underlies both intramolecular structures andintermolecular interactions. In the early 1950s and 1960s, itwas shown that both transfer of information from DNA toRNA (transcription) and fromRNA to protein (translation)use the same sequence complementarity mechanism. In thepast three decades, structural analysis showed that it isthe complex and dynamic structures in the spliceosomeand the ribosome, rather than the protein components,that catalyze chemical reactions (Cech and Steitz 2014).RNA–RNA interactions also form the molecular basisof small RNA (sRNA)-mediated gene regulation in eukary-otes (small interfering RNAs [siRNAs] and microRNAs

[miRNAs]) and bacteria (sRNAs). These classical studieshave largely dispelled the long-held notion of RNA aspassive carriers of genetic information and put RNA-basedregulation at center stage of molecular biology.

A large variety of methods that use fundamentallydifferent principles have been developed to study RNAstructures (Table 1). X-ray crystallography provided the firstatomic resolution picture of an RNA molecule, when thefirst transfer RNA (tRNA) crystal structure was solved inthe 1970s (Kim et al. 1973; Robertus et al. 1974). Since then,X-ray crystallography has been the dominating tool instudying both protein and RNA structures (Shi 2014). Nu-clear magnetic resonance (NMR) probes the conformationof molecules in solution, providing an alternative meansto analyze RNA, especially flexible ones (Bothe et al.2011). Cryogenic electron microscopy (cryo-EM), on theother hand, examines single frozen-hydrated particles intheir native state, and therefore can be applied to largerand more dynamic molecules and assemblies. With newtechnological innovations in electron detection and imageprocessing, the resolution of cryo-EM has dramatically im-proved in the past few years and in many cases approachesthat of X-ray crystallography (Bai et al. 2015). Some ofthe most spectacular advances using cryo-EM include the

Table 1. Major categories of methods for the analysis of RNA structures and interactions

Categories Variety of methods Features General reference(s)

X-ray crystallography In vitro, atomic resolution, smaller, andnearly static molecules

Shi 2014

NMR In vitro, atomic resolution, smaller molecules,can be either static or flexible

Bothe et al. 2011

Cryo-EM In vitro, near atomic resolution, large, andheterogeneous complexes

Bai et al. 2015

Thermodynamic methods Nussinov algorithm, Zukeralgorithm, etc. RNA structure,Sfold, Mfold, UNAfold,ViennaRNA Package, etc.

Guarantees theoretically optimal structures,slow when permitting pseudoknottedstructures

Eddy 2004; Mathews et al. 2010

Comparative sequenceanalysis

Fold and Align, Fold then Align,Align then Fold, etc.

Implies functional relevance, limited byalignments and efficiency (local structures)

Mathews et al. 2010

Chemical probing DMS-seq, Structure-seq, SHAPE,icSHAPE, Mutate and Map, etc.

In vitro and in vivo, one-dimension averagedreactivity/flexibility/accessibility profile,can be used to probe secondary, tertiarystructures and RNA–protein interactions

Kladwang et al. 2011; Ding et al.2014; Rouskin et al. 2014; Spitaleet al. 2015

Enzymatic probing PARS, Frag-seq, etc. In vitro, one-dimension averaged reactivity/flexibility/accessibility profile

Kertesz et al. 2010; Underwood et al.2010

Cross-linking (based onphysical proximity)

PARIS, LIGR-seq, SPLASH,CLASH, hiCLIP, MARIO

In vitro and in vivo, direct physical base-pairing contacts (exceptMARIO), capturesalternative conformations

Helwak et al. 2013; Sugimoto et al.2015; Aw et al. 2016; Lu andChang 2016; Nguyen et al. 2016;Sharma et al. 2016

NMR, Nuclear magnetic resonance, Cryo-EM, cryogenic electron microscopy; DMS, dimethyl sulfate; SHAPE, selective 2′-hydroxyl acylation by primerextension; icSHAPE, in vivo click SHAPE; PARS, parallel analysis of RNA structures; Frag-seq, fragmentation sequencing; PARIS, psoralen analysis of RNAinteractions and structures; LIGR-seq, ligation of interacting RNA and high-throughput sequencing; SPLASH, sequencing of psoralen cross-linked, ligated, andselected hybrids; CLASH, cross-linking, ligation, and sequencing of hybrids; hiCLIP, RNA hybrid and individual nucleotide resolution cross-linking andimmunoprecipitation; MARIO, mapping RNA interactome in vivo.

Z. Lu and H.Y. Chang

2 Cite this article as Cold Spring Harb Perspect Biol 2018;10:a034926

on April 20, 2022 - Published by Cold Spring Harbor Laboratory Press http://cshperspectives.cshlp.org/Downloaded from

Page 3: The RNA Base-Pairing Problem and Base- Pairing Solutions

structures of spliceosomes corresponding to several inter-mediary steps in splicing, each containing multiple smallnuclear RNAs (snRNAs) and proteins (Kastner et al. 2019;Plaschka et al. 2019; Yan et al. 2019). Messenger RNAs(mRNAs) and long noncoding (lnc)RNAs, which makeup the majority of distinct RNA species, are generallymuch more flexible, with well-folded regions interspersedwith amorphous linker regions, and folded together with alarge collection of nonstoichiometric RNA partners, RNA-binding proteins (RBPs), as well as indirect binders. Al-though it is likely that further technological innovationswill greatly enhance the power of X-ray crystallography,NMR, and cryo-EM, currently the vast majority of RNAmolecules are beyond the reach of these in vitro methods.

In parallel with the in vitro experimental methods, twoclasses of computational algorithmswere developed to addressthe RNA-folding problem, using thermodynamics and com-parative sequence analysis, respectively (Table 1) (Fallmannet al. 2017). As RNAmolecules are assumed tomost likely foldinto lowest free energy states, stable structures can be predictedbased on free energy terms for each unit of stacked base pairsand loops, also called the Turner energy rules (Mathews et al.2010). Given the time and space complexity of these algo-rithms, restrictions are commonlyapplied, such as prohibitingthe formation of pseudoknots and long-range structures. Inaddition, our understanding of the basic energy rules in RNAstructure formation, as well as the contributions from thecellular environment, is limited. Together, these issues haveresulted in structure models that are “elegant and too oftenwrong” (Eddy 2004; Mathews et al. 2010).

RNA structures that have biological functions are underevolutionary pressure to preserve specific secondary or ter-tiary interactions, often in the formof invariant or covariantbase pairs (Rivas and Eddy 2001). Therefore, another ap-proach to determine RNA structures is by identifying thesesignatures of selection in multiple sequence alignments(Table 1). Comparative sequence analysis is not restrictedby our limited understanding of the thermodynamic rulesand, therefore, can reveal both standard base pairs andtertiary interactions. By the very nature of this approach,the structures derived from sequence comparison are prob-ably functional. However, comparative sequence analysisis only applicable to high-quality alignments with certainsequence variation levels, and often restricted to small se-quence windows in the interest of efficiency.

RNA folding brings nucleotides into specific chemicalenvironments that change their reactivity properties (orsolvent accessibility, flexibility). Therefore, reading out thereactivity of individual nucleotides reveals informationabout the specific folding. Based on this principle, a largenumber of methods have been developed in the past severaldecades, using various chemicals and enzymes to probe

nucleotide reactivity (Table 1) (Ehresmann et al. 1987; LuandChang 2016). The chemicals and enzymes differentiallyreact with the base, sugar, or the phosphodiester backbone,either in vitro or in vivo, resulting in modifications that canbe read out as reverse transcription stops in gel electropho-resis or high-throughput sequencing. Reactivity profilesobtained from such experiments correlatewith base-pairingstatus and can be used as constraints in secondary structuremodeling, often as additional pseudoenergy terms. Probingdata have been shown to improve the accuracy of structuremodeling to some extent yet are still limited to simple struc-tures. These experiments provide averaged signals for eachRNA species, which does not reflect the dynamic nature ofRNA in living cells (Lu and Chang 2016). Many of theinherent problems in RNA structure prediction persisteven with the addition of chemical probing data. Statisti-cally correlated modification patterns can be obtained fromreverse transcriptase-induced mutations but are very low inefficiency and limited in length (Siegfried et al. 2014). Amutate-and-map strategy was recently developed to deriveinformation about base pairs but is only applicable to oneRNA at a time in vitro (Kladwang et al. 2011).

Contrary to the conventional chemical and enzymaticprobing experiments, cross-linking-based methods, whichare the topic of the current review, aim to directly identifythe base-pairing regions and limit the “guesswork” of struc-ture modeling to very short sequences that typically foldinto a single possible duplex (Table 1). Originally developedin the 1970s, photoactivated psoralen cross-linking has pro-vided mechanistic insights into many important and chal-lenging problems in RNA biology. Recently, new technicalimprovements have led to a revival of this class of methods,bringing superior sensitivity, resolution, and throughput.Starting from a historical perspective, in this review wewill discuss the experimental and analytical aspects of sev-eral psoralen cross-linking techniques that use the sameprinciple of “cross-link, proximally ligate, and sequence.”Current applications of psoralen cross-linking have resultedin the surprising discoveries of long range, alternative, anddynamic structures, newRNA–RNA interactions, and prin-ciples in the high-level organization and networks of thetranscriptome. As no tool is perfect for any task, it is essen-tial to integrate cross-linking methods with other types ofstructure determination techniques. We will discuss thelimitations of current implementations and how to pushthe technologies to the next level.

2 HISTORICAL USE OF PSORALEN TO ANALYZERNA STRUCTURES AND INTERACTIONS

Psoralen is a group of planar tricyclic compounds thatcan intercalate in double-stranded nucleic acids (Fig. 1A)

The RNA Base-Pairing Problem and Solutions

Cite this article as Cold Spring Harb Perspect Biol 2018;10:a034926 3

on April 20, 2022 - Published by Cold Spring Harbor Laboratory Press http://cshperspectives.cshlp.org/Downloaded from

Page 4: The RNA Base-Pairing Problem and Base- Pairing Solutions

A

AMT254 nm

365 nmB

5S rRNA5′

F

Rabin and Crothers 1979

U2

Intron

U6

U4

U2

U6

U1

pre-mRNA

5′5′ 5′

3′5′

5′

Hausner et al. 1990Wassarman and Steitz 1993

Calvet and Pederson 1981Calvet et al. 1982Mount et al. 1983

I U4U5

U6

U2U1 U2U6U4

U5

U1

U2U6U5

U2U6U5

U2U6U5

U2U6

U5U4

U1

Complex A Complex B Complex B* Complex C

Exon Intron Exon

E. coli 16S,Thompson andHearst 1983

G

Hyb. probes 1 2 1 2

RNA1RNA2

RNA1 x RNA2

Mig

ratio

n

D

1D: n

ativ

e

2D: d

enat

ured

Cross-linkedfragments

E

C

S821STIS81STE ITS25.8SH

E3 (SNORA63)

Rimoldi et al. 1993

U3

Calvet andPederson 1981Maser and Calvet 1989Stroke and Weiner 1989Tyc and Steitz 1992Beltrame and Tollervey 1992Hartshorne 1998

E1 (U17a)

Rimoldi et al. 1993

E2 (SNORA62)

Rimoldi et al. 1993

snR30

Morrissey and Tollervey 1993

Fayet-Lebaron et al. 2009

O O

NH2

O

Calvet et al. 1982 Rinke et al. 1985

Cross-linking – – + +

Figure 1. Psoralen chemistry and historical applications. (A) Structure of the commonly used psoralen AMT(4′-aminomethyltrioxsalen). Arrows point to the double bonds that undergo cycloaddition to pyrimidines.(B) Model of psoralen intercalation into nucleic acid helices and reversible cross-linking. (C) Detection of psoralencross-linked RNA structures by electron microscopy. (Image courtesy of Paul L. Wollenzien et al.; reprinted, withpermission, fromWollenzien et al. 1978, ©National Academy of Sciences.) (D) Detection of RNA–RNA interactionsusing gel electrophoresis and northern blotting. Cross-linked RNA pairs can be identified by the simultaneoushybridization of probes to both RNAs. (E) The “native-denatured” 2D gel method for selecting psoralen cross-linkedRNA. The first-dimension gel is native. In the second dimension, the first-dimension gel slices are embedded in aurea-denatured gel and run at high temperature, in which the cross-linked fragments assume an “X” shape andmigratemuchmore slowly, above the diagonal, forming scattered dots for simple RNA species or a smear for complexRNA mixtures, such as total RNA. (Adapted from Lu et al. 2016, with permission, from Elsevier.) (F) Secondarystructure of the Escherichia coli 5S ribosomal RNA (rRNA), based on psoralen cross-linking and enzymatic sequenc-ing (Rabin and Crothers 1979). (G) Psoralen cross-linked duplexes (blue blocks and the blue dashed lines) super-imposed on the Escherichia coli 16S rRNA model derived from crystal structure (Petrov et al. 2014). (Redrawn, withpermission, from Thompson and Hearst 1983.) (H ) Summary of small nucleolar RNA (snoRNA)–rRNA interac-tions based on psoralen cross-linking. ETS/ITS: external/internal transcribed sequence. Arrows point to cross-linking sites. (I) Summary of several RNA–RNA interactions in precursor messenger RNA (pre-mRNA) splicingbased on psoralen cross-linking. (The splicingmodel was redrawn, with permission, fromWill and Luhrmann 2011.)

4 Cite this article as Cold Spring Harb Perspect Biol 2018;10:a034926

on April 20, 2022 - Published by Cold Spring Harbor Laboratory Press http://cshperspectives.cshlp.org/Downloaded from

Page 5: The RNA Base-Pairing Problem and Base- Pairing Solutions

(Hearst 1981). The extensively conjugated system under-goes cycloaddition with pyrimidines upon long-wavelengthultraviolet (UV) irradiation, forming monoadducts andthen diadducts, or interstrand cross-links (Fig. 1B). Short-wavelength UV reverses the cross-link, permitting analysisof cross-linked RNA using various methods. Cross-linkingpresents definitive and specific evidence for base-pairingand has provided the firstmechanistic insights into a varietyof noncoding (nc)RNAs including the ribosomal RNAs(rRNAs), spliceosomal snRNAs, and small nucleolar andsmall Cajal body RNAs (sno/scaRNAs). Hearst and col-leagues have extensively discussed the chemical propertiesand early applications of psoralen on nucleic acid structures(Hearst 1981; Cimino et al. 1985). Here we briefly reviewsome of the classical studies that paved the way for recentbreakthroughs in throughput, sensitivity, and resolution ofcross-linking-based methods.

Initially, electron microscopy (EM) was used to detectpsoralen cross-linking sites in denatured and psoralencross-linked nucleic acids (Fig. 1C) (Cech and Pardue1976; Wollenzien et al. 1978). EM is limited in resolutionto dozens of base pairs and only shows the overall contourof the secondary structure. To analyze RNA–RNA interac-tions, denatured gel electrophoresis was used to resolvepsoralen cross-linked samples, in which cross-linkedRNAs were identified as slower bands by northern blotting(Fig. 1D) (Calvet and Pederson 1981; Calvet et al. 1982).Reversal of cross-linking further confirms the interaction.This setup works well for interactions involving sRNAsin which the partners can be exhaustively tested. To analyzecomplex interactions and structures, Brimacombe andcolleagues developed the 2D gel method (Fig. 1E) (Zwieband Brimacombe 1980). The 2D gel uses the differences inmigration of the cross-linked RNA fragments in denaturedgel versus native gel. After the first-dimension native gel,cross-linked fragments assume an “X” shape in the dena-tured gel and migrate slower than non-cross-linked frag-ments. The retarded fragments can be isolated from abovethe diagonal for sequencing. Together, these three methodswere extensively used in the early studies of RNA structuresand interactions.

The ribosome is a large RNA–protein complex thatmakes proteins from genetic information stored in mRNA.Several groups used psoralen cross-linking to study thefolding of the rRNAs before crystallization was possible.Rabin and Crothers applied psoralen cross-linking to the5S rRNA folded in solution and showed the existenceof a terminal stem (Fig. 1F) (Rabin and Crothers 1979).Thompson and Hearst combined the in-solution psoralencross-linking, 2D gel purification, photoreversal, and enzy-matic sequencing to discover 13 base-paired regions in theEscherichia coli 16S rRNA (Fig. 1G) (Thompson andHearst

1983). Although most of the duplexes from cross-linkingare consistent with the evolutionary model built by Nollerand Woese and the crystal structure model (Noller andWoese 1981; Petrov et al. 2014), a few of them could beartifacts of in vitro folding (for review, see Sergiev et al.2001 and Whirl-Carrillo et al. 2002). Nevertheless, theseheroic efforts showed the possibility of using psoralen tostudy complex RNAs.

The rRNAs are transcribed as a polycistronic precursor,then extensively processed and modified into the maturesubunits. Psoralen cross-linking elucidated several criticalsteps in rRNA biogenesis and revealed functions of anabundant class of sRNAs, snoRNAs (Fig. 1H). Calvet andPederson showed that the U3 snoRNA could be cross-linked to the large rRNAs in vivo, providing the first directevidence of snoRNAs involvement in rRNA processing(Calvet and Pederson 1981). This interaction is requiredfor the cleavage of the rRNAs to release the mature 18SrRNA (Kass et al. 1990). Later studies have continued torefine the cross-linking and mapping methods to identifythe precise binding sites on rRNAs for U3, E1 (U17a),E2 (SNORA62), E3 (SNORA63), snR30, etc. (Calvet andPederson 1981; Maser and Calvet 1989; Stroke and Weiner1989; Beltrame and Tollervey 1992; Tyc and Steitz 1992;Morrissey and Tollervey 1993; Rimoldi et al. 1993; Hart-shorne 1998; Fayet-Lebaron et al. 2009). Together with ge-netic experiments that specifically edit the identifiedbinding sites, these studies showed the essential roles ofsnoRNAs in rRNA processing.

The psoralen cross-linking methods made critical con-tributions in elucidating the mechanisms of eukaryoticmRNA splicing (Fig. 1I). Steitz and colleagues hypothesizedthat the U-rich snRNAs are involved in the removalof introns from eukaryotic heterogeneous nuclear RNAs(hnRNAs; precursor messenger RNAs [pre-mRNAs])based on sequence complementarity of U1with the 5′ splicesite (Lerner et al. 1980). Shortly after that, Calvet andPederson showed that U1 andU2 small nuclear ribonucleo-protein complexes (snRNPs) can be reversibly cross-linkedto pre-mRNAs, providing the first physical evidence for thedirect interactions (Calvet and Pederson 1981; Calvet et al.1982). Since then, a series of studies from several groupspinpointed the interaction sites between pre-mRNAs andsnRNAs, revealing a complex and dynamic ribonucleo-protein complex (RNP)machine (Mount et al. 1983; Rinkeet al. 1985; Hausner et al. 1990; Wassarman and Steitz1993). The sequential formation and reorganization ofdistinct RNA–RNA interactions ensure efficient and pre-cise removal of introns (Fig. 1I). The splicing mechanismsrevealed by psoralen cross-linking have since been validat-ed and refined by genetic analysis, X-ray crystallography,and cryo-EM (Will and Luhrmann 2011).

The RNA Base-Pairing Problem and Solutions

Cite this article as Cold Spring Harb Perspect Biol 2018;10:a034926 5

on April 20, 2022 - Published by Cold Spring Harbor Laboratory Press http://cshperspectives.cshlp.org/Downloaded from

Page 6: The RNA Base-Pairing Problem and Base- Pairing Solutions

3 THE NEXT GENERATION: CROSS-LINKINGWITH PROXIMITY LIGATION ANDHIGH-THROUGHPUT SEQUENCING

Traditional methods for the analysis of cross-linked RNA,such as EM, 1D and 2D gels, and enzymatic sequencing, arelaborious and inefficient (Fig. 1C,D). As a result, applica-tions were limited to highly abundant RNAs one at a time,like the rRNAs, snRNAs, and snoRNAs (Fig. 1). Severalrecent technological advancements have transformed classi-cal psoralen cross-linking into a powerful transcriptome-scale discovery tool (Fig. 2A, left side). In the new generationof methods, including psoralen analysis of RNA interactionsand structures (PARIS), sequencing of psoralen cross-linked, ligated, and selected hybrids (SPLASH), and ligationof interacting RNA and high-throughput sequencing(LIGR-seq), cells are treated with psoralens and UV irradi-ation; then cross-linked RNA fragments are enriched andproximally ligated so that each RNAduplex can be uniquelyidentified via high-throughput sequencing (Aw et al. 2016;Lu et al. 2016; Sharma et al. 2016; Lu et al. 2018). Thisstrategy determines RNA structure and interactions withnear base-pair resolution and single molecule accuracy atthe transcriptome level. Three related methods that use asimilar strategy, butwithout psoralen cross-linking,were alsoreported recently (Fig. 2A, right side) (Kudla et al. 2011;Helwak et al. 2013; Sugimoto et al. 2015; Nguyen et al.2016). Here these methods are compared side by side toillustrate their similarities and differences.

Several major differences exist among the psoralencross-linkingmethods (Fig. 2B). First, biotinylated psoralenwas used in SPLASH instead of the more commonly used4′-aminomethyltrioxsalen (AMT), thus enabling conve-nient isolation of RNA fragments covalently bound to pso-ralen, either withmonoadducts (cycloaddition on one side)or cross-links. The bigger size of the bio-psoralen moleculealso reduces efficiency in the intercalation of RNAduplexes.The concentrations of psoralens used vary considerably.Conventional chemical probing usually selects chemicalconcentrations to aim for “single-hit kinetics” to avoid theinduction of structural changes during probing (Spitaleet al. 2015). However, high concentrations of psoralenmay not be an issue because cross-linking fixes structuresto prevent further changes.

Given the low efficiency of cross-linking, several differ-ent strategies were used to enrich cross-linked fragmentsfrom total RNA. The 2D gel technique used in PARIS,although laborious, enriches cross-linked duplexes upto 100% in purity. In LIGR-seq, an exoribonuclease,RNase R, was used to remove nonlinked single-strandedfragments. This approach is easier to perform but doesnot efficiently remove non-cross-linked fragments, espe-

cially highly structured ones. In SPLASH, purification ofbio-psoralen-reacted fragments is simple with the strepta-vidin beads, but both monoadduct and cross-linked RNAfragments are extracted. The three methods select RNA frag-ments at different sizes, affecting both proximity ligationefficiency and structure model resolution. Longer fragmentspromote ligation but also result in ambiguous base-pairingmodels.

Short-wavelength UV and bifunctional chemical cross-linkers were used in cross-linking, ligation, and sequencingof hybrids (CLASH), RNA hybrid and individual nucleo-tide resolution cross-linking and immunoprecipitation(hiCLIP), andmapping RNA interactome in vivo (MARIO;Table 1, Fig. 2). In CLASH and hiCLIP, UV–cross-linkedprotein–RNA complexes were selected using antibodies;therefore, only the targets of a single RBP are identified.These methods achieve deeper coverage of specific interac-tions given the limited scope. MARIO fixes both specific(from UV cross-linking) and nonspecific (from chemicalcross-linking) interactions in physical proximity, and there-fore the identified RNA fragment pairs are not necessarilybase-paired. The detection of all protein-associated inter-actions is possible with MARIO because all RNA–proteincomplexes are recovered from cell lysate. However, the largesize of the RNA fragments in MARIO precludes accuratestructure modeling (Fig. 2B).

A commonly used strategy in the analysis of chromatinconformations, proximity ligation joins two restriction-digested DNA fragments using DNA ligases to enable theidentification of their physical proximity in 3D space in thenucleus (Dekker et al. 2002). Proximity ligation in RNAwasfirst noticed in the analysis of RNA–protein cross-linkingstudies and then intentionally used to identify potentialRNA–RNA interactions (Kudla et al. 2011). In contrastto DNA, in which longer fragments and the sticky endsensure highly efficient ligation, the ends of shorter RNAduplexes are under steric constraint, resulting in much low-er ligation efficiency (Fig. 2B). The ligation efficiency hasnot improved much despite extensive optimizations. Thetypical T4 RNA ligase 1 (Rnl1) is used in most applications,whereas in LIGR-seq, a thermostable T4 Rnl1 homologCircLigase was used (Blondal et al. 2005). In hiCLIP andMARIO, a linker oligonucleotidewas used to bridge the twoends, but with no obvious improvement, except in MARIOin which much longer fragments are ligated.

The various technical approaches used in these methodsprovide flexible choices that can be used in a “mix-and-match” manner depending on the nature of the specificproblem to be solved. Users can choose psoralen derivatives,RNA fragmentationmethods, enrichment techniques, ligaseswith or without linkers, and antibodies or antisense oligosthat select specific subsets of RNAs. Together these methods

Z. Lu and H.Y. Chang

6 Cite this article as Cold Spring Harb Perspect Biol 2018;10:a034926

on April 20, 2022 - Published by Cold Spring Harbor Laboratory Press http://cshperspectives.cshlp.org/Downloaded from

Page 7: The RNA Base-Pairing Problem and Base- Pairing Solutions

~

AMTUV 365 nm in vivo

Proteinase and RNase(S1, ShortCut RNase III)

1D: n

ativ

e

2D: d

enat

ured

–AMT +AMT

PARIS

A Direct psoralen cross-linking Protein-mediated cross-linking

Proximityligation

SPLASH

~

Bio-psoralenUV 365 nm in vivo

Fragment withMg2+ and enrich

Proximityligation

LIGR-seq

~

AMTUV 365 nm in vivo

RNase (S1) andrRNA depletion

Denature andcircLigase(proximityligation)

RNase R(exonuclease)

UV 254 nm

Ligate adapter, RT,and sequencing

Structuremodel or

Ligate adapter, RT,and sequencing

Structuremodel

or

MARIOCLASH hiCLIP

Recover allproteins

UV 254 nmin vivo

RNase

UV 254 nmin vivo

RNase RNase

Proximityligation

Immuno-precipitate

Immuno-precipitate

Proximityligation with

adapter

B

B

PurifyRNA

PurifyRNA

PurifyRNA

UV 254 nmand chemical cross-linking

Proximityligation with

adapter

soralnm in

nt witenric

B

B

B

Methods PARIS LIGR-seq SPLASH CLASH hiCLIP MARIO

Scale Global Global Global Targeted Targeted Global

Cross-linker Psoralen AMT Psoralen AMT Bio-psoralen UV 254 nm UV 254 nm Aldehyde, glyoxal

Fragment S1, RNase III S1 Mg2+ (90–110 nt) RNase A/T1 RNase I RNase I (>500 nt)

Enrichment 2D gel Exonuclease Biotin-streptavidin Antibody + gel Antibody + gel Biotin on proteins

Ligation T4 Rnl1 CircLigase T4 Rnl1 + linker T4 Rnl1 T4 Rnl1 + linker T4 Rnl1 + linker

rRNA removal No Yes No No No No

Resolution Near bp Near bp Near bp Near bp Near bp Low resolution

Chimeric % 2.5%–6% < 0.4% 1.7%–5.9% < 1% ~ 2% 8%–30%

B Summary of comparison

Figure 2. Recently developed cross-linking-based methods for the direct detection of RNA interactions and struc-tures. Only the methods that use the “cross-link, proximally ligate, and sequence” principle are listed here.(A) Workflow of the methods. (B) Major differences among the methods. (Part of panel A is adapted from Luet al. 2016, with permission, from Elsevier.)

Cite this article as Cold Spring Harb Perspect Biol 2018;10:a034926 7

The RNA Base-Pairing Problem and Solutions

on April 20, 2022 - Published by Cold Spring Harbor Laboratory Press http://cshperspectives.cshlp.org/Downloaded from

Page 8: The RNA Base-Pairing Problem and Base- Pairing Solutions

have been applied tomultiple species and cell lines, providinga rich resource for future studies (Gong et al. 2018).

4 BASIC ANALYSIS OF RNA STRUCTURECROSS-LINKING DATA

The cross-linking (PARIS-like) experiments produce asmall percentage of noncontinuous reads as a result ofproximity ligation of base-paired duplexes (Fig. 3A,B).

The noncontinuous reads can be regularly gapped or chias-tic, depending on which side proximity ligation occurs (Fig.3A,B). Short-read mappers, like the spliced transcriptsalignment to a reference (STAR) software, are capable ofdirectly mapping both types of sequences with appropriateparameters (Dobin et al. 2013). Although the noncontinu-ous reads can be directly used, assembly of duplex groups(DGs) is a critical step that standardizes and simplifiessubsequent analysis of both structures and interactions

Rightarm

Leftarm

Gap/loop

Genome

DuplexGroup

Gapped read

Chiastic read

RNA duplex

A B

Rightarm

Leftarm

RNA 1 RNA 2

Conservedstructure

DG in one species

Multiple alignments

EStructure-based

DGin species X

DGin species Y

Conservedstructure

Lift species X tospecies Y

Direct comparison

F

Detect RNA structure Detect RNA–RNA interaction

D

Multiplealignments

...

...

Sliding window

G

DG1

DG2

(((((....)))))

(((((....)))))

.))))

(((((

Alternative/dynamic structures

Conformer 1Conformer 2

DG1

DG2

((((((........))))))

[[[[..........]]]]

((((((..[[[[..))))))..]]]]

Potential pseudoknots

H

C A typical gapped read and associated structure (human U4 snRNA)CGCAGUGGCAGUAUCGUAGCCAA-----CCGAGGCGCGAUUAUUGCUAAUUGA..(((((((((((((((.(((...........))))))).))))))).)))).

Figure 3. RNA structure and interaction analysis based on cross-linking data. (A) Detecting RNA structures. Dis-continuous sequencing reads, including “regularly” gapped and chiastic, are assembled into duplex groups (DGs).(B) Detecting RNA–RNA interaction. The two arms of discontinuous reads mapped to two RNA molecules areassembled into DGs. (C) An example of a discontinuous read from the U4 small nuclear RNA (snRNA). Pairs ofparentheses indicate base pairs. (D) Conventional sliding window screen for conserved structures from wholegenome alignments. Multiple genome alignments are split into windows with fixed width and step. Conservationand covariation are then scored in each window. (E,F) DGs guide comparative sequence analysis. (E) DGs from onespecies are used to extract two blocks of alignments separated by arbitrary distance for structure conservationanalysis. (F ) Structures can be directly compared with DGs from multiple species, using sequence alignments.(G) Alternative/dynamic RNA structures are identified based on the conflicts of the base-pairing models betweentwo DGs. (H ) Interlocked DGs suggest either alternative conformations or pseudoknot structures. (Panels E–G areadapted from Lu et al. 2016, with permission, from Elsevier.)

Z. Lu and H.Y. Chang

8 Cite this article as Cold Spring Harb Perspect Biol 2018;10:a034926

on April 20, 2022 - Published by Cold Spring Harbor Laboratory Press http://cshperspectives.cshlp.org/Downloaded from

Page 9: The RNA Base-Pairing Problem and Base- Pairing Solutions

(Lu et al. 2016, 2018). A DG is a group of reads from RNAmolecules in the same conformation, corresponding to onespecific RNA duplex. Many structure prediction programscan be used to build the base-pairing model from the DGs(Table 1). Given the small arm size and the specific psoralencross-linking of staggered uridines, a unique and unambig-uous structure model can be established (Fig. 3C). Thestructure models can be validated using various orthogonalmethods, including selective 2′-hydroxyl acylation by prim-er extension (SHAPE)-like chemical probing and conser-vation analysis (Lu et al. 2016). In addition, DGs enablefacile analysis of significance, assembly of complex struc-tures, including dynamic conformations, pseudoknots andhigh-level architectures, and analysis of RNA–protein in-teractions (Fig. 3C–H).

Whereas intramolecular structures are straightforward,intermolecular interactions are more difficult to determine,in part because of the complexity of the transcriptome,which contains homologous and repetitive sequences andfrequent errors in the gene annotation. Genome-mappedreads need to be extensively filtered to remove artifacts.Alternatively, carefully curated transcriptome annotationscan be used as reference to avoid spuriousmapping butmaymiss ones not in the annotation. The significance of de-tected structures has been determined using different ap-proaches. After DG assembly, a threshold was applied tothe ratio of the number of reads in the DG divided bysequencing depth at the two arms (Lu et al. 2016). Alterna-tively, the Blencowe group used a probabilistic model toassess the significance of the detected interactions basedon the observed and expected connections between thetwo arms (Sharma et al. 2016).

5 PREVALENT LONG-RANGE STRUCTURESIN THE TRANSCRIPTOME

One surprising observation in the psoralen cross-linkingand hiCLIP studies is the prevalence of long-range struc-tures (Fig. 4A). (Here we use “structure” to denote allintramolecular base-pairing interactions, and “interaction”to refer to only intermolecular interactions [Sugimotoet al. 2015; Aw et al. 2016; Lu et al. 2016; Sharma et al.2016].) Many structures span hundreds to thousands ofnucleotides, and in mRNAs, often span multiple exons.Computational predictions, including ones that incorporatechemical probing data, have focused on local structuresin the interest of efficiency. The resulting models shouldnow be viewed with caution in light of evidence showingprevalent long-range structures.

In both mRNAs and lncRNAs, structures that connectthe 5′ and 3′ ends have been detected. Wan and colleaguesshowed that there is a positive correlation between the

circularization score (tendency to form longer rangestructures) with translation efficiency for mRNAs (Fig. 4I)(Aw et al. 2016). In the case of mRNAs, the end-to-endconnections are reminiscent of the mRNA circularizationmechanism essential for translation, in which translationinitiation factors on the 5′ end interact with the poly(A)-binding protein on the 3′ end (Wells et al. 1998). mRNAcircularization can also be mediated by base-pairing, asshown in the p53 mRNA, and this long-range base-pairingbetween the 5′ and 3′ untranslated regions (UTRs) pro-motes translation (Chen and Kastan 2010). End-to-endbase-paired structures are frequently observed in virusesand play critical roles in virus replication and translation(Nicholson and White 2014). As RNA structures formduring 5′ to 3′ transcription, and are constantly reorganizedby splicing, translation, and other processes, it is importantto determine how long-range structures emerge given thatlocal contacts are more likely to form by chance. Moreinterestingly, how do long-range structures contributeto the various RNA metabolism events? Comprehensivedissection of each structure by mutagenesis is needed tostudy their functions.

6 CONSERVATION ANALYSIS OF THE RNASTRUCTURES

RNA structures form spontaneously as a result of thermo-dynamics; presence of RNA structures does not implyfunction. Base-pair covariation, however, is an importantindicator of evolutionary pressure on functional structures.Sifting through sequence alignments for conserved and co-varied base pairs is a classical strategy for identifying func-tional structures, and this can be applied to either individualRNAs or whole genomes (Washietl et al. 2005; Pedersenet al. 2006; Smith et al. 2013). In practice, the search isrestricted to small sliding windows across the genome be-cause of extraordinary computational requirements (Fig.3D). Significance can be calculated by shuffling in eachwindow (Gesell and von Haeseler 2006). However, thesemethods only examine local structures and lack experimen-tal validation. Given the stringency of these methods,conserved structures cannot be identified from sequenceslacking sufficient variation.

The direct determination of helices in the cross-linkingmethods enables targeted analysis of structure conservationwithout distance limit. Using the PARIS-determined du-plexes as guides, we developed two approaches to directlysearch for structure conservation (Lu et al. 2016). First, theDGs from one species were used to extract alignment blocksfrom multiple genomes, in which the two arms can comefrom distant regions (Fig. 3E). Second, when structure dataare available frommore than one species, direct comparison

The RNA Base-Pairing Problem and Solutions

Cite this article as Cold Spring Harb Perspect Biol 2018;10:a034926 9

on April 20, 2022 - Published by Cold Spring Harbor Laboratory Press http://cshperspectives.cshlp.org/Downloaded from

Page 10: The RNA Base-Pairing Problem and Base- Pairing Solutions

A

18S 28S5.8S

Peculis 1997

U8

Lu and Chang 2016conserved inmetazoans

DG

spa

n (n

t)

B

MousePARIS

HumanPARIS

HumanRPL8

Human-specific (21)Mouse-specific (23)

Common (23)p < 0.001 (1000 shuffles)

500 nt

PARISTUBB3′UTR

SelectedPARISDGs

C200 nt

2 3 1 5 6 7

4 8

*

*

*

*

*

*

*

*

*

SPEN

RNA–protein cross-link

One A repeat

Xist A repeat

1 2 3 4 5 6 7 8

GD

Knownstructure(3′→5′)

PARIScoverage(0-81)

PARISDGs

P1

P3

HumanTERC

Template/pseudoknotdomain

XIST Helix conservation (139, p < 0.01)F

EXIST PARIS all DGs (1386)

DGspan(bp)

1 2 3 4

42

DG span (bp)

RNADomains:

RNA Domains: 1

H J

Low circularization scoreHigh translation efficiency

High circularization scoreHigh translation efficiency

I

U8 snoRNA

CUGUCAGGUGGGAUAACCCUUCCCUGUUCCUC ||||| ||| ||| || |||||| AGUAGUCAUCCC-AUUUUGAUUGGACAGAGUG

28S rRNA

10

Genom

ic

Trans

cript

omic

100

1000

10000

1e+05

1e+06

P2

0

5000

10000

15000

0

5000

10000

15000

Membrane,antigen binding

Membranetransport

5

6

2

8

7

3

1

4

9

RNA binding

Transcription

Ion homeostasis,binding

Catabolic processes

Ribosome,translation

Cytoskeleton,nucleus

Cellular componentbiogenesis,membrane organization

Figure 4. New insights into the RNA structurome and interactome. (A) Long-range structures are prevalent in thetranscriptome. Duplex group (DG) spans were calculated from Henrietta Lacks cell line psoralen analysis of RNAinteractions and structures (HeLa PARIS) data on genomic and transcriptomic coordinates. (B) Example of con-served structures in RPL8 messenger RNA (mRNA) based on direct comparison of human (human embryonickidney cells 293 [HEK293]) and mouse (J1 embryonic stem cells) PARIS data. Significance of the overlap was testedusing random shuffling of DGs in the exons. (C) Example of alternative/dynamic structures in the 3′ untranslatedregion (UTR) of tubulin β class 1 (TUBB) mRNA fromHeLa PARIS data. The corresponding structure models (firsttrack) and DGs are color coded. (D) PARIS detects pseudoknot structure (interlocked DGs) in the telomerase RNAcomponent (TERC). (E) PARIS determines the architecture of X-inactive specific transcript (XIST). Each point inthe heat map shows the connection between the two regions indicated by the feet of the triangle. Fourmajor domainsare obvious from the clustered DGs. (Legend continued on following page.)

10 Cite this article as Cold Spring Harb Perspect Biol 2018;10:a034926

Z. Lu and H.Y. Chang

on April 20, 2022 - Published by Cold Spring Harbor Laboratory Press http://cshperspectives.cshlp.org/Downloaded from

Page 11: The RNA Base-Pairing Problem and Base- Pairing Solutions

of the DGs is performed on sequence alignments (Fig. 3F).The second approach is especially useful when variation istoo low or too high for statistical analysis of significance.Wefound that a substantial fraction of structures that spanmore than 200 nt are conserved. These analyses alsoshowed that overall architectures are conserved for bothcoding and noncoding long RNAs (Fig. 4B). On the otherhand, structure data may also be helpful in correcting se-quence-based alignments that often do not consider struc-ture information. This approach will be especially useful fordistantly related sequences that cannot be aligned properlyby sequence alone.

Given the ubiquitous presence of RNA structures, theycan be quickly adapted for specific functions without show-ing obvious signs of conservation. As a result, such func-tional structures would have been missed by conventionalmethods that rely on covariation. Bartel and colleaguesshowed that random structures in the 3′ UTR are requiredfor mRNA stability (Wu and Bartel 2017). This study high-lights the limits of conventional comparative sequenceanalysis and the need for direct determination of structures.

7 DYNAMIC AND COMPLEX STRUCTURES:CATCHING RNA IN ACTION

Original work in the 1950s byAnfinsen showed that proteinstructures aremostly determined by primary sequence (An-finsen 1973). Dynamics in protein structures occurs locally,usually without breaking secondary structures. However,RNA folding is more promiscuous given the simpler alpha-bet and base-pairing rules. In addition to local structuredynamics, global rearrangements of structures are frequentin RNA and essential for their activities (Dethoff et al.2012). As described earlier in this review, dramatic re-arrangement of structures and interactions occurs multipletimes during pre-mRNA splicing (Fig. 1H). Conventionalchemical probing, although applicable to structure deter-mination in vivo, results in averaged reactivity profiles thatare hard to deconvolve into the individual conformations.In contrast, psoralen cross-linking and proximity ligation

directly capture each conformation in a single sequencingread, thus providing a direct readout of all the “cross-link-able” conformations, like taking snapshots of the RNA inaction.

Dynamic and alternative structures take the form ofconflicting DGs, in which two DGs overlap on one arm(Figs. 3G and 4C). Significantly overlapped base pairs can-not form at the same time, with the exception of triplexes,and usually are the result of two mutually exclusive confor-mations. Using this simple criterion, a large number ofdynamic structures were uncovered, some of which areconserved between human and mouse (Lu et al. 2016).Whereas some of the conformations could be simply thekinetically or thermodynamically trapped folding interme-diates, othersmay represent important functional states in adynamic regulatory process. Wan and colleagues appliedthe SPLASH method to the human embryonic stem celldifferentiation process and found that the transition in cel-lular states is accompanied by alterations of both RNAstructures and RNA–RNA interactions, suggesting regula-tory mechanisms by the dynamic structures and inter-actions (Aw et al. 2016). The constant rearrangement ofintra- and intermolecular base-pairing interactions and thedynamic protein interactions create a rugged free energy land-scape that integrates genetic and environmental signals.Further investigation of RNA dynamics will likely uncovernew “switches” in gene regulation pathways.

Pseudoknots are composite structures that involve atleast two duplexes that are interlocked. Pseudoknots stabi-lize structures and participate in several critical cellularprocesses, including telomere maintenance and ribosomalframeshifting (Gilley and Blackburn 1999; Giedroc andCornish 2009). Computational methods can be used toidentify pseudoknots but are very slow (Rivas and Eddy1999). Psoralen cross-linking directly identifies interlockedduplexes that could be either alternative conformations orpseudoknots (Fig. 3H). For example, PARIS identified thepseudoknots in telomerase RNA component (TERC; Fig.4D), RNA component of mitochondrial RNA processingendoribonuclease (RMRP), and ribonuclease P RNA com-

Figure 4. (Continued.) (F) About 10% of PARIS-determined duplexes are conserved in eutherian mammals(p value < 0.01). (G) Model of SPEN-A-repeat complex in the XIST ribonucleoprotein (RNP). The base-pairinginteractions among the 8.5 repeats are stochastic and only one specific family of conformations from PARIS data isshown here. SPEN binding involves both single- and double-stranded regions but is only cross-linked to the single-stranded region 3–5 nt upstream of the interrepeat duplex unit. (H ) PARIS in human and mouse cells revealed theprecise U8:28S interaction, which is conserved in metazoans. PARIS-determined interaction sites in blue, and thepreviously reported site in gray (Peculis 1997). The base-pairingmodel is shown in the inset. (I) Long-span structuresas detected by sequencing of psoralen cross-linked, ligated, and selected hybrids (SPLASH) correlates with highertranslation efficiency. (J ) RNAs of related functions tend to interact more frequently than nonrelated ones. GlobalRNA–RNA interaction networks can be organized into modules and the network topology changes in response tostem cell differentiation. (Panels A–H are adapted from Lu et al. 2016; panels I–J are adapted from Aw et al. 2016,both, with permission, from Elsevier.)

The RNA Base-Pairing Problem and Solutions

Cite this article as Cold Spring Harb Perspect Biol 2018;10:a034926 11

on April 20, 2022 - Published by Cold Spring Harbor Laboratory Press http://cshperspectives.cshlp.org/Downloaded from

Page 12: The RNA Base-Pairing Problem and Base- Pairing Solutions

ponent H1 (RPPH1) RNAs in both human and mouse (Luet al. 2016). Careful analysis and validation of the inter-locked duplexes may reveal additional pseudoknots in thetranscriptome.

8 ARCHITECTURE OF THE XIST RNP: MODULARSTRUCTURES FOR MODULAR FUNCTIONS

The psoralen cross-linking methods identify physical con-tacts without a distance limit and therefore, reveal the over-all architecture of RNA molecules. With few exceptions,whole transcript architectures have remained elusive forthe vast majority of the transcriptome (Zappulla andCech 2004). X-inactive specific transcript (XIST) is alncRNA essential for X-chromosome inactivation (XCI)in placental mammals (eutheria). XIST serves as a scaffoldto recruit a variety of protein complexes to orchestrate thecomplex XCI process. Using PARIS, we showed thatthe XIST RNA folds into compact and discrete domains,each spanning hundreds to thousands of nucleotides (Fig.4E) (Lu et al. 2016, 2017). Despite the low sequence con-servation, roughly 10% of the structures are conserved (Fig.4F). The conserved duplexes support the demarcation ofdomains and suggest functional relevance of the high-levelarchitecture. The XIST RNA coordinates multiple steps inXCI, including XIST localization to the inactive X, mem-brane attachment of the inactive X, histone and DNAmod-ifications, and chromatin compaction. Targeted geneticanalysis of each of the XIST domains will uncover the struc-tural basis of the specific functions.

Stable and discrete domains may not be a commonfeature for all RNAs; for example, highly stable structureswould not persist in mRNA coding regions because of theconstant action of ribosomes. Even for lncRNAs, architec-tures differ greatly; for example, metastasis-associated lungadenocarcinoma transcript 1 (MALAT1) and XIST havemore long-range structures that are organized into compactdomains, whereas nuclear-enriched abundant transcript 1(NEAT1) has more local structures, without clearly sepa-rated modular domains (Lu et al. 2016). Comparison of theduplex spans in general showed that mRNAs have morelocal structures than lncRNAs (Aw et al. 2016). Thesestudies have revealed a complex picture for the high-levelorganization of RNA structures. However, with the excep-tion of a few RNAs (such as XIST andMALAT1), the extentof domain formation in other RNAs is still unknown be-cause of the limited sequencing coverage and the inherentbias of psoralen cross-linking.

The integration of PARIS with in vivo click SHAPE(icSHAPE), conservation analysis, and protein-bindingassays (in vitro electrophoretic mobile shift assay [EMSA]and cross-linking and immunoprecipitation [CLIP]) led

to the first structure-interaction model of the essentialA-repeat domain complex in XIST RNP (Fig. 4G) (Luet al. 2016). In this model, repeat units in the A-repeatregion randomly form duplexes with each other, resultingin near identical structure units. These structure unitsassociate with the essential XCI factor SPEN in a coopera-tive manner. The structure model is consistent with nucle-otide resolution reactivity of the repeats determined byicSHAPE. Stringent tests for covariation showed no signifi-cance for any pair of repeats, consistent with the stochasticnature of the duplex formation, which dilutes and distrib-utes the evolutionary pressure (Rivas et al. 2017). Thisstudy establishes a “top-down” approach for studyinglong RNAs, both coding and noncoding, in which modularRNA structure domains are identified and isolated for fur-ther studies.

9 NOVEL RNA–RNA INTERACTIONS:THE MOLECULAR SOCIAL NETWORK

Traditionally, the analysis of RNA–RNA interactions hasrequired knowledge of at least one partner in the RNA–RNA complex, or the proteins involved. For example, inthe CLASH and hiCLIP methods, the identification ofnew interactions was based on the proteins that bindthe RNA duplexes (Fig. 2). Application of psoralen cross-linking methods has resulted in important de novodiscoveries of RNA–RNA interactions, such as snoRNA:rRNA, scaRNA:snRNA, mRNA:mRNA, snoRNA:mRNA,and snRNA:lncRNA. These discoveries have both solvedsome long-standing questions in the field and revealedpotentially new principles that govern coordinated geneexpression programs. Similar to the analysis of intramolec-ular RNA structures, psoralen cross-linking achieves nearbase-pair level resolution and single-molecule accuracy,which greatly facilitates further mechanistic studies.

As discussed in the beginning, early studies of snoRNAsand scaRNAs led to the discovery of snoRNA-guided rRNAprocessing and modifications. Yet many sno/scaRNAs arestill “orphans,” with no identified targets. In certain cases,these RNAs may have more than one partner, but only onewas known owing to the limited scope of conventionalmethods. An example of these poorly annotated snoRNAsis U8, which is highly expressed and conserved in metazo-ans (Fig. 4H). Previous studies suggested that the 5′ end ofU8 base pairs with the 5′ end of the 28S rRNA to guide thecleavage of precursor ribosomal RNA (pre-rRNA; Fig. 4H,site labeled “Peculis 1997”). However, PARIS experimentsin both human and mouse cells clearly identified aninteraction site near the 3′ end of the 28S rRNA (Lu et al.2016). The newly identified interaction is conserved in allmetazoans and the duplex is more extensive than the pre-

Z. Lu and H.Y. Chang

12 Cite this article as Cold Spring Harb Perspect Biol 2018;10:a034926

on April 20, 2022 - Published by Cold Spring Harbor Laboratory Press http://cshperspectives.cshlp.org/Downloaded from

Page 13: The RNA Base-Pairing Problem and Base- Pairing Solutions

vious site (Fig. 4H, inset). This result highlights the criticalcontribution of psoralen cross-linking in providing directphysical evidence for RNA–RNA interactions.

In addition to the intermolecular interactions that in-volve sRNAs (snRNAs, snoRNAs, scaRNAs, etc.), psoralencross-linking also detected mRNA:mRNA interactions, al-though with much fewer reads given their low abundance.Interestingly, the mRNA:mRNA interactions form extend-ed networks, in which functionally related mRNAs areclustered together, indicating co-regulation on the RNAlevel (Fig. 4J) (Aw et al. 2016). Even more surprisingly,differentiation of human embryonic stem (ES) cells altersthe topology of the interaction networks. It remains unclearwhat factors drive the formation and the alteration of theinteraction networks and what functional consequencessuch physical contacts bring. Given the limited coverageof current cross-linking data, it is likely that future workwill uncover other new interactions and mechanisms.Quantitative information about the interactions is alsoneeded to further understand their relevance. For example,are these interactions kiss-and-run encounters or long-term committed relationships?

10 LIMITATIONS OF THE PSORALENCROSS-LINKING METHODSAND FUTURE DIRECTIONS

Psoralen cross-linking has provided novel insights into theRNA structurome and interactome in living cells. Despitemore than 40 years of work, current cross-linking-basedmethods still suffer from several limitations. Here wediscuss the major issues, which will hopefully facilitatedata interpretation and point to directions for furtherimprovement.

Psoralen cross-linking of nucleic acid duplexes is slowand inefficient, often taking 30 min or more to achieveeffective cross-linking (Lu et al. 2016). Prolonged cross-linking can change the cellular physiology to alter RNAstructures. For example, long exposure to low temperature(needed to avoid UV-induced heating) may induce stressresponse. Light-activated psoralens also modify othercellular components such as lipids and proteins (Ciminoet al. 1985). These side effects should be considered whendesigning experiments that examine the effects of certainconditions on RNA structures. In addition, the slow reac-tion makes it difficult to obtain dynamic information aboutRNA structures in response to environmental signals.

Psoralen cross-linking has significant bias on both thesequence and structure levels. Psoralens are preferentially(>90%) cross-linked to staggered uridines (Cimino et al.1985; Boyer et al. 1988). Compact RNP structures mayblock cross-linking, and this effect is even more difficult

to estimate. Therefore, we can only make semiquantitativeconclusions about the structures that we observe in theseexperiments. For example, it is not possible to directly cal-culate the relative abundance of alternative conformations.In addition, conclusions regarding the overall topology ofintra- and inter-RNA interaction networks are likely to beinaccurate because of the bias.

Psoralens are the only class of photoreversible cross-linkers available now for the general analysis of nucleicacid structure and interactions. New reversible cross-linkerswith improved reaction speed and reduced bias will beof great value for the RNA field. For example, reversibleRNA-modifying groups can be added to other classes ofcompounds that intercalate the DNA/RNA duplex. In fact,many cancer chemotherapy drugs act by modifying DNA,and can be potentially adapted to reversibly cross-link RNA(Deans and West 2011). Chemicals that cross-link tertiarycontacts will provide important new information currentlymissed by psoralen cross-linking in living cells. For exam-ple, nitrogen mustard has been used to successfully identifytertiary contacts in the ribosome and snRNAs (Datta andWeiner 1992; Sergiev et al. 2001).

Another major bottleneck in the cross-linking methodsis the low efficiency of proximity ligation. In general, thehighest efficiency achieved is only ∼5% for short fragmentsthat permit near base-pair resolution structure modeling(Fig. 2B). Improving proximity ligation efficiency wouldgreatly reduce sequencing cost and increase the amountof useful information obtained from high-throughput ex-periments. Potential solutions include new ligases that areless sensitive to the structural context, and ligation condi-tions that promote the “ligatable” conformations, such ashigher temperature, chemical denaturation, and linkersthat effectively increase the flexibility of the RNA ends.

Although sequencing cost will continue to drop, target-ed analysis is a more cost-effective solution to analyzingRNAs of interest, especially low abundance RNAs fromessential genes. In particular, the psoralen cross-linking-based methods could be combined with antibody enrich-ment of RBPs or antisense oligo enrichment of single orgroups of RNAs.

11 INTEGRATING METHODS TO SOLVE COMPLEXRNA STRUCTURES AND INTERACTIONS

The cross-linking-based methods are particularly powerfulin sorting through the complexities of RNA molecules inliving cells, which are commonly depicted as a plate of“cooked spaghetti.” X-ray crystallography, NMR, cryo-EM, computational modeling, chemical/enzymatic prob-ing, and cross-linking-based methods, as summarized inTable 1, each have their own merits and limitations; there-

The RNA Base-Pairing Problem and Solutions

Cite this article as Cold Spring Harb Perspect Biol 2018;10:a034926 13

on April 20, 2022 - Published by Cold Spring Harbor Laboratory Press http://cshperspectives.cshlp.org/Downloaded from

Page 14: The RNA Base-Pairing Problem and Base- Pairing Solutions

fore, integration of diverse approaches will lead to the mostcomprehensive understanding of the basic properties ofRNA (Fig. 5). In this context, the direct base-pairing infor-mation from cross-linking experiments provides an essen-tial guide for the integrated analysis.

Many approaches can be envisioned to integrate struc-ture data and other relevant information from the variousmethods described above. For example, incorporation ofSHAPE-like data into cross-linking-based structure modelsachieves both high confidence and high resolution thatis not possible by either method alone (Lu et al. 2016). AsSHAPE-like methods provide a largely unbiased averagereactivity profile of the structure ensemble and cross-linkingmethods nominate all cross-linkable conformations (seediagram in Fig. 5A), the integration of these two methodsenables quantitative analysis of the alternative conforma-tions. Analysis of RNA–protein interactions can nowbe performed in a structural context, which may explainthe lack of apparent specificity on the sequence level (Fig.5B). Also, as already shown (Fig. 3D,E), cross-linking-basedstructure models can guide comparative sequence analysis,and on the other hand, the conservation assigns functionalsignificance to structures, prioritizing some of them for

further studies (Fig. 5B). Over the years, extensive knowl-edge of RNA sequence motifs has been accumulated, suchas ones that regulate splicing, translation, stability, and lo-calization. Transcriptome-wide structures and interactionsplace such motifs in a larger context, which will help revealpreviously unknown regulatory mechanisms and serve asexamples for identifying novel functional motifs. A relevantexample is the identification of long-range structures thatbring RNA-binding Fox-1 Homolog 2 (Rbfox2) proteins tothe vicinity of splicing motifs (Lovci et al. 2013). Genome-wide association studies have identified large numbers ofgenetic variations in the noncoding parts of the genome,and these are very likely to exert their effects through RNAstructures (Halvorsen et al. 2010). Although conventionalchemical probing has already been used in characterizingand interpreting genetic variations, cross-linking methodsare more direct in revealing their structural contexts,especially for long-range, dynamic, and complex structures(Fig. 5B).

With the ability to define overall architectures of RNAby cross-linking, large RNA molecules refractory to atomicresolution structure analysis in vitro can be studied basedon the “divide-and-conquer” principle (Fig. 5C). Stable

Protein binding

Alternative structure/interaction 2

Alternative structure 1

Single-stranded

SHAPE averagereactivity profile

+

+

+

Cro

ss-li

nked

stru

ctur

es

RNA regionof interest

ARNA

duplexes(inter/intra)

SHAPE-like

Conservation/covariation

RBPbinding

Functionalmotif

Geneticvariation

SNP

Comprehensive structure–function models

B C

In vitro methods toobtain atomic

resolution structures

Figure 5. Integrated RNA structure analysis using multiple methods. (A) Comparison of cross-linking-based meth-ods and accessibility/flexibility measurement (here using selective 2′-hydroxyl acylation with primer extension[SHAPE] as an example). The measured average accessibility profile is the sum of all possible RNA structure andinteraction conformations plus the effect of protein binding. In contrast, cross-linking methods directly identify thestructure conformations. (B) Cross-linking data provide an essential guide to place various types of RNA data in astructural context, thereby facilitating integrated functional analysis of RNA molecules. (C) From the large andflexible RNA molecules (gray lines), cross-linking-based methods identify small, compact, and stable structuredomains (black lines), which can be further studied using higher resolution methods like X-ray crystallographyand nuclear magnetic resonance (NMR).

Z. Lu and H.Y. Chang

14 Cite this article as Cold Spring Harb Perspect Biol 2018;10:a034926

on April 20, 2022 - Published by Cold Spring Harbor Laboratory Press http://cshperspectives.cshlp.org/Downloaded from

Page 15: The RNA Base-Pairing Problem and Base- Pairing Solutions

and compact structure domains can be trimmed to removeflexible regions based on cross-linking data, purified andthen subject to X-ray crystallography, NMR, and cryo-EManalysis. The identification of stable duplex units in theXIST A-repeat domain is one such example, in whichthe physiologically relevant structure units can be studiedin great detail in vitro (see an example of the XISTA-repeatdomain in Fig. 4G). De novo and data-assisted 3D structuremodeling represent an alternative solution to directly solv-ing the structure by in vitro atomic resolutionmethods. Themodeling efforts will also benefit from the cross-linkingmethods that provide physical contact information.

The biogenesis and function of RNA molecules in cellsare on a long journey, in which constant remodeling ofstructures impacts every aspect of their life. The dissectionof these dynamic structures will certainly reveal new prin-ciples of gene regulation. Identification of RNA structuremotifs that control gene expression may lead to new ther-apeutic opportunities by revealing potential drug targets.For example, RNA viruses represent a major health con-cern, and the mechanistic studies of viral RNA structureswould provide important information for designing antivi-ral drugs. With this new generation of methods for directidentification of base-pairing interactions in nucleic acids,opportunities abound for many topics in the RNA field forboth basic and translational research.

ACKNOWLEDGMENTS

This work was supported by National Institutes of Health(NIH) R01-HG004361, P50-HG007735 (H.Y.C.) and Stan-ford Jump Start Award (Z.L.). Z.L. is a Layton Family Fellowof the Damon Runyon-Sohn Foundation Pediatric CancerFellowship Award (DRSG-14–15). We thank Fan Liu forcomments on the manuscript.

REFERENCES∗Reference is also in this collection.

Anfinsen CB. 1973. Principles that govern the folding of protein chains.Science 181: 223–230.

Aw JG, Shen Y, Wilm A, Sun M, Lim XN, Boon KL, Tapsin S, Chan YS,Tan CP, Sim AY, et al. 2016. In vivo mapping of eukaryotic RNAinteractomes reveals principles of higher-order organization and reg-ulation. Mol Cell 62: 603–617.

Bai XC, McMullan G, Scheres SH. 2015. How cryo-EM is revolutionizingstructural biology. Trends Biochem Sci 40: 49–57.

Beltrame M, Tollervey D. 1992. Identification and functional analysis oftwo U3 binding sites on yeast pre-ribosomal RNA. EMBO J 11: 1531–1542.

Blondal T, Thorisdottir A, Unnsteinsdottir U, Hjorleifsdottir S, Aevars-son A, Ernstsson S, Fridjonsson OH, Skirnisdottir S, Wheat JO,Hermannsdottir AG, et al. 2005. Isolation and characterization of athermostable RNA ligase 1 from a Thermus scotoductus bacteriophage

TS2126 with good single-stranded DNA ligation properties. NucleicAcids Res 33: 135–142.

Bothe JR, Nikolova EN, Eichhorn CD, Chugh J, Hansen AL, Al-HashimiHM. 2011. Characterizing RNA dynamics at atomic resolution usingsolution-state NMR spectroscopy. Nat Methods 8: 919–931.

Boyer V, Moustacchi E, Sage E. 1988. Sequence specificity in photoreac-tion of various psoralen derivatives with DNA: Role in biologicalactivity. Biochemistry 27: 3011–3018.

Calvet JP, Pederson T. 1981. Base-pairing interactions between smallnuclear RNAs and nuclear RNA precursors as revealed by psoralencross-linking in vivo. Cell 26: 363–370.

Calvet JP, Meyer LM, Pederson T. 1982. Small nuclear RNA U2 is base-paired to heterogeneous nuclear RNA. Science 217: 456–458.

Cech TR, Pardue ML. 1976. Electron microscopy of DNA crosslinkedwith trimethylpsoralen: Test of the secondary structure of eukaryoticinverted repeat sequences. Proc Nal Acad Sci 73: 2644–2648.

Cech TR, Steitz JA. 2014. The noncoding RNA revolution—Trashing oldrules to forge new ones. Cell 157: 77–94.

Chen J, Kastan MB. 2010. 5′-3′-UTR interactions regulate p53 mRNAtranslation and provide a target for modulating p53 induction afterDNA damage. Genes Dev 24: 2146–2156.

Cimino GD, Gamper HB, Isaacs ST, Hearst JE. 1985. Psoralens as photo-active probes of nucleic acid structure and function: Organic chemistry,photochemistry, and biochemistry. Ann Rev Biochem 54: 1151–1193.

Datta B, Weiner AM. 1992. Cross-linking of U1 snRNA using nitrogenmustard. Evidence for higher order structure. J Biol Chem 267: 4503–4507.

Deans AJ, West SC. 2011. DNA interstrand crosslink repair and cancer.Nat Rev Cancer 11: 467–480.

Dekker J, Rippe K, Dekker M, Kleckner N. 2002. Capturing chromosomeconformation. Science 295: 1306–1311.

Dethoff EA, Chugh J, Mustoe AM, Al-Hashimi HM. 2012. Functional com-plexity and regulation through RNA dynamics. Nature 482: 322–330.

Ding Y, Tang Y, Kwok CK, Zhang Y, Bevilacqua PC, Assmann SM. 2014.In vivo genome-wide profiling of RNA secondary structure revealsnovel regulatory features. Nature 505: 696–700.

Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P,Chaisson M, Gingeras TR. 2013. STAR: Ultrafast universal RNA-seqaligner. Bioinformatics 29: 15–21.

Eddy SR. 2004. How do RNA folding algorithms work? Nat Biotechnol22: 1457–1458.

Ehresmann C, Baudin F, Mougel M, Romby P, Ebel JP, Ehresmann B.1987. Probing the structure of RNAs in solution. Nucleic Acids Res 15:9109–9128.

Fallmann J,Will S, Engelhardt J, Gruning B, Backofen R, Stadler PF. 2017.Recent advances in RNA folding. J Biotechnol 261: 97–104.

Fayet-Lebaron E, Atzorn V, Henry Y, Kiss T. 2009. 18S rRNA processingrequires base pairings of snR30 H/ACA snoRNA to eukaryote-specific18S sequences. EMBO J 28: 1260–1270.

Gesell T, vonHaeseler A. 2006. In silico sequence evolutionwith site-specificinteractions along phylogenetic trees. Bioinformatics 22: 716–722.

Giedroc DP, Cornish PV. 2009. Frameshifting RNA pseudoknots: Struc-ture and mechanism. Virus Res 139: 193–208.

GilleyD, Blackburn EH. 1999. The telomerase RNApseudoknot is criticalfor the stable assembly of a catalytically active ribonucleoprotein. ProcNatl Acad Sci 96: 6621–6625.

Gong J, Shao D, Xu K, Lu Z, Lu ZJ, Yang YT, Zhang QC. 2018. RISE: Adatabase of RNA interactome from sequencing experiments. NucleicAcids Res 46: D194–D201.

Halvorsen M, Martin JS, Broadaway S, Laederach A. 2010. Disease-associated mutations that alter the RNA structural ensemble. PLoSGenet 6: e1001074.

Hartshorne T. 1998. Distinct regions of U3 snoRNA interact at two siteswithin the 5′ external transcribed spacer of pre-rRNAs inTrypanosomabrucei cells. Nucleic Acids Res 26: 2541–2553.

The RNA Base-Pairing Problem and Solutions

Cite this article as Cold Spring Harb Perspect Biol 2018;10:a034926 15

on April 20, 2022 - Published by Cold Spring Harbor Laboratory Press http://cshperspectives.cshlp.org/Downloaded from

Page 16: The RNA Base-Pairing Problem and Base- Pairing Solutions

Hausner TP, Giglio LM, Weiner AM. 1990. Evidence for base-pairingbetween mammalian U2 and U6 small nuclear ribonucleoproteinparticles. Genes Dev 4: 2146–2156.

Hearst JE. 1981. Psoralen photochemistry. Ann Rev Biophys Bioeng 10:69–86.

Helwak A, Kudla G, Dudnakova T, Tollervey D. 2013. Mapping thehumanmiRNA interactome by CLASH reveals frequent noncanonicalbinding. Cell 153: 654–665.

Kass S, Tyc K, Steitz JA, Sollner-Webb B. 1990. The U3 small nucleolarribonucleoprotein functions in the first step of preribosomal RNAprocessing. Cell 60: 897–908.

∗ Kastner B, Will CL, Stark H, Lührmann R. 2019. Structural insights intonuclear pre-mRNA splicing in higher eukaryotes. Cold Spring HarbPerspect Biol doi: 10.1101/cshperspect.a032417.

Kertesz M, Wan Y, Mazor E, Rinn JL, Nutter RC, Chang HY, Segal E.2010. Genome-wide measurement of RNA secondary structure inyeast. Nature 467: 103–107.

Kim SH, Quigley GJ, Suddath FL, McPherson A, Sneden D, Kim JJ,Weinzierl J, Rich A. 1973. Three-dimensional structure of yeast phe-nylalanine transfer RNA: Folding of the polynucleotide chain. Science179: 285–288.

Kladwang W, VanLang CC, Cordero P, Das R. 2011. A two-dimensionalmutate-and-map strategy for non-coding RNA structure.Nat Chem 3:954–962.

Kudla G, Granneman S, Hahn D, Beggs JD, Tollervey D. 2011. Cross-linking, ligation, and sequencing of hybrids reveals RNA–RNA inter-actions in yeast. Proc Natl Acad Sci 108: 10010–10015.

Lerner MR, Boyle JA, Mount SM, Wolin SL, Steitz JA. 1980. Are snRNPsinvolved in splicing? Nature 283: 220–224.

Lovci MT, GhanemD,Marr H, Arnold J, Gee S, Parra M, Liang TY, StarkTJ, Gehman LT, Hoon S, et al. 2013. Rbfox proteins regulate alternativemRNA splicing through evolutionarily conserved RNA bridges. NatStruct Mol Biol 20: 1434–1442.

Lu Z, ChangHY. 2016. Decoding the RNA structurome. Curr Opin StructBiol 36: 142–148.

Lu Z, Zhang QC, Lee B, Flynn RA, Smith MA, Robinson JT, DavidovichC, Gooding AR, Goodrich KJ, Mattick JS, et al. 2016. RNA duplex mapin living cells reveals higher-order transcriptome structure. Cell 165:1267–1279.

Lu Z, Carter AC, ChangHY. 2017.Mechanistic insights in X chromosomeinactivation. Philos Trans R Soc B 372: 20160356.

Lu Z, Gong J, Zhang QC. 2018. PARIS: Psoralen analysis of RNA inter-actions and structures with high throughput and resolution. MethodsMol Biol 1649: 59–84.

Maser RL, Calvet JP. 1989. U3 small nuclear RNA can be psoralen-cross-linked in vivo to the 5′ external transcribed spacer of pre-ribosomal-RNA. Proc Natl Acad Sci 86: 6523–6527.

Mathews DH, Moss WN, Turner DH. 2010. Folding and finding RNAsecondary structure. Cold Spring Harb Perspect Biol 2: a003665.

Morrissey JP, Tollervey D. 1993. Yeast snR30 is a small nucleolar RNArequired for 18S rRNA synthesis. Mol Cell Biol 13: 2469–2477.

Mount SM, Pettersson I, Hinterberger M, Karmas A, Steitz JA. 1983. TheU1 small nuclear RNA-protein complex selectively binds a 5′ splice sitein vitro. Cell 33: 509–518.

Nguyen TC, Cao X, Yu P, Xiao S, Lu J, Biase FH, Sridhar B, Huang N,Zhang K, Zhong S. 2016. Mapping RNA–RNA interactome and RNAstructure in vivo by MARIO. Nat Commun 7: 12023.

Nicholson BL, White KA. 2014. Functional long-range RNA–RNAinteractions in positive-strand RNA viruses. Nat Rev Microbiol 12:493–504.

Noller HF, Woese CR. 1981. Secondary structure of 16S ribosomal RNA.Science 212: 403–411.

Peculis BA. 1997. The sequence of the 5′ end of the U8 small nucleolarRNA is critical for 5.8S and 28S rRNA maturation. Mol Cell Biol 17:3702–3713.

Pedersen JS, BejeranoG, Siepel A, RosenbloomK, Lindblad-TohK, Land-er ES, Kent J, Miller W, Haussler D. 2006. Identification and classifi-

cation of conserved RNA secondary structures in the human genome.PLoS Comput Biol 2: e33.

Petrov AS, Bernier CR, Gulen B, Waterbury CC, Hershkovits E, Hsiao C,Harvey SC, Hud NV, Fox GE, Wartell RM, et al. 2014. Secondarystructures of rRNAs from all three domains of life. PloS One 9: e88222.

∗ Plaschka C, Newman AJ, Nagai K. 2019. Structural basis of nuclear pre-mRNA splicing: Lessons from yeast. Cold Spring Harb Perspect Bioldoi: 10.1101/cshperspect.a032391.

Rabin D, Crothers DM. 1979. Analysis of RNA secondary structure byphotochemical reversal of psoralen crosslinks. Nucleic Acids Res 7:689–703.

Rimoldi OJ, Raghu B, Nag MK, Eliceiri GL. 1993. Three new small nu-cleolar RNAs that are psoralen cross-linked in vivo to unique regions ofpre-rRNA. Mol Cell Biol 13: 4382–4390.

Rinke J, Appel B, Digweed M, Luhrmann R. 1985. Localization of a base-paired interaction between small nuclear RNAs U4 and U6 in intactU4/U6 ribonucleoprotein particles by psoralen cross-linking. J MolBiol 185: 721–731.

Rivas E, Eddy SR. 1999. A dynamic programming algorithm for RNAstructure prediction including pseudoknots. J Mol Biol 285: 2053–2068.

Rivas E, Eddy SR. 2001. Noncoding RNA gene detection using compar-ative sequence analysis. BMC Bioinformatics 2: 8.

Rivas E, Clements J, Eddy SR. 2017. Lack of evidence for conservedsecondary structure in long noncoding RNAs.NatMethods 14: 45–48.

Robertus JD, Ladner JE, Finch JT, Rhodes D, Brown RS, Clark BF, Klug A.1974. Structure of yeast phenylalanine tRNA at 3 Å resolution. Nature250: 546–551.

Rouskin S, ZubradtM,Washietl S, KellisM,Weissman JS. 2014. Genome-wide probing of RNA structure reveals active unfolding of mRNAstructures in vivo. Nature 505: 701–705.

Sergiev PV, Dontsova OA, Bogdanov AA. 2001. Chemical methods forthe structural study of the ribosome: Judgment day. Mol Biol 35: 472–496.

Sharma E, Sterne-Weiler T, O’Hanlon D, Blencowe BJ. 2016. Globalmapping of human RNA–RNA interactions. Mol Cell 62: 618–626.

Shi Y. 2014. A glimpse of structural biology through X-ray crystallogra-phy. Cell 159: 995–1014.

Siegfried NA, Busan S, Rice GM,Nelson JA,Weeks KM. 2014. RNAmotifdiscovery by SHAPE and mutational profiling (SHAPE-MaP). NatMethods 11: 959–965.

Smith MA, Gesell T, Stadler PF, Mattick JS. 2013. Widespread purifyingselection on RNA structure in mammals. Nucleic Acids Res 41: 8220–8236.

Spitale RC, Flynn RA, Zhang QC, Crisalli P, Lee B, Jung JW, Kuchelmeis-ter HY, Batista PJ, Torre EA, Kool ET, et al. 2015. Structural imprints invivo decode RNA regulatory mechanisms. Nature 519: 486–490.

Stroke IL, Weiner AM. 1989. The 5′ end of U3 snRNA can be crosslinkedin vivo to the external transcribed spacer of rat ribosomal RNA pre-cursors. J Mol Biol 210: 497–512.

Sugimoto Y, Vigilante A, Darbo E, Zirra A, Militti C, D’AmbrogioA, Luscombe NM, Ule J. 2015. hiCLIP reveals the in vivo atlas ofmRNA secondary structures recognized by Staufen 1. Nature 519:491–494.

Thompson JF, Hearst JE. 1983. Structure of E. coli 16S RNA elucidated bypsoralen crosslinking. Cell 32: 1355–1365.

Tyc K, Steitz JA. 1992. A new interaction between the mouse 5′ externaltranscribed spacer of pre-rRNA and U3 snRNA detected by psoralencrosslinking. Nucleic Acids Res 20: 5375–5382.

Underwood JG, Uzilov AV, Katzman S, Onodera CS, Mainzer JE, Ma-thews DH, Lowe TM, Salama SR, Haussler D. 2010. FragSeq:Transcriptome-wide RNA structure probing using high-throughputsequencing. Nat Methods 7: 995–1001.

Washietl S, Hofacker IL, Lukasser M, Huttenhofer A, Stadler PF. 2005.Mapping of conserved RNA secondary structures predicts thousandsof functional noncoding RNAs in the human genome. Nat Biotechnol23: 1383–1390.

Z. Lu and H.Y. Chang

16 Cite this article as Cold Spring Harb Perspect Biol 2018;10:a034926

on April 20, 2022 - Published by Cold Spring Harbor Laboratory Press http://cshperspectives.cshlp.org/Downloaded from

Page 17: The RNA Base-Pairing Problem and Base- Pairing Solutions

Wassarman DA, Steitz JA. 1993. A base-pairing interaction between U2and U6 small nuclear RNAs occurs in >150S complexes in HeLa cellextracts: Implications for the spliceosome assembly pathway. Proc NatlAcad Sci 90: 7139–7143.

Watson JD, Crick FH. 1953. Molecular structure of nucleic acids: A struc-ture for deoxyribose nucleic acid. Nature 171: 737–738.

Wells SE, Hillner PE, Vale RD, Sachs AB. 1998. Circularization ofmRNA by eukaryotic translation initiation factors. Mol Cell 2: 135–140.

Whirl-Carrillo M, Gabashvili IS, Bada M, Banatao DR, Altman RB. 2002.Mining biochemical information: Lessons taught by the ribosome.RNA 8: 279–289.

Will CL, Luhrmann R. 2011. Spliceosome structure and function. ColdSpring Harb Perspect Biol 3: a003707.

Wollenzien PL, Youvan DC, Hearst JE. 1978. Structure of psoralen-cross-linked ribosomal RNA from Drosophila melanogaster. Proc Natl AcadSci 75: 1642–1646.

Wu X, Bartel DP. 2017. Widespread influence of 3′-end structures onmammalian mRNA processing and stability. Cell 169: 905–917.e11.

∗ YanC,WanR, Shi Y. 2019.Molecularmechanisms of pre-mRNA splicingthrough structural biology of the spliceosome. Cold Spring Harb Per-spect Biol doi: 10.1101/cshperspect.a032409.

Zappulla DC, Cech TR. 2004. Yeast telomerase RNA: A flexible scaffoldfor protein subunits. Proc Natl Acad Sci 101: 10024–10029.

Zwieb C, Brimacombe R. 1980. Localisation of a series of intra-RNAcross-links in 16S RNA, induced by ultraviolet irradiation ofEscherichia coli 30S ribosomal subunits. Nucleic Acids Res 8: 2397–2411.

The RNA Base-Pairing Problem and Solutions

Cite this article as Cold Spring Harb Perspect Biol 2018;10:a034926 17

on April 20, 2022 - Published by Cold Spring Harbor Laboratory Press http://cshperspectives.cshlp.org/Downloaded from

Page 18: The RNA Base-Pairing Problem and Base- Pairing Solutions

2018; doi: 10.1101/cshperspect.a034926Cold Spring Harb Perspect Biol  Zhipeng Lu and Howard Y. Chang The RNA Base-Pairing Problem and Base-Pairing Solutions

Subject Collection RNA Worlds

Alternate RNA StructuresMarie Teng-Pei Wu and Victoria D'Souza

Structural Biology of TelomeraseYaqiang Wang, Lukas Susac and Juli Feigon

Expressionof Long Noncoding RNA Regulation of Gene Approaches for Understanding the Mechanisms

Patrick McDonel and Mitchell Guttman

Splicing in Higher EukaryotesStructural Insights into Nuclear pre-mRNA

Berthold Kastner, Cindy L. Will, Holger Stark, et al.

Act on ChromatinThatExperiments to Study Long Noncoding RNAs

Principles and Practices of Hybridization Capture

Matthew D. Simon and Martin Machyna

UTRs Doing?′What Are 3Christine Mayr

Sequencerson Massively Parallel High-Throughput Linking RNA Sequence, Structure, and Function

Sarah K. Denny and William J. Greenleaf

Transcriptase EnzymesSingle-Molecule Analysis of Reverse

Linnea I. Jansson and Michael D. Stone

Insights into Eukaryotic TranslationComplexity: Ribosomal Structures Provide Extensions, Extra Factors, and Extreme

Melanie Weisser and Nenad Ban

RegulationCRISPR Tools for Systematic Studies of RNA

Gootenberg, et al.Jesse Engreitz, Omar Abudayyeh, Jonathan

with TranscriptionNascent RNA and the Coordination of Splicing

Karla M. Neugebauer

Relating Structure and Dynamics in RNA Biology

al.Kevin P. Larsen, Junhong Choi, Arjun Prabhakar, et

ComplexesRNA−Integrative Structural Biology of Protein

Magnetic Resonance (NMR) Spectroscopy for Combining Mass Spectrometry (MS) and Nuclear

AllainAlexander Leitner, Georg Dorn and Frédéric H.-T.

Synthetic GeneticsBeyond DNA and RNA: The Expanding Toolbox of

HolligerAlexander I. Taylor, Gillian Houlihan and Philipp

http://cshperspectives.cshlp.org/cgi/collection/ For additional articles in this collection, see

Copyright © 2018 Cold Spring Harbor Laboratory Press; all rights reserved

on April 20, 2022 - Published by Cold Spring Harbor Laboratory Press http://cshperspectives.cshlp.org/Downloaded from

Page 19: The RNA Base-Pairing Problem and Base- Pairing Solutions

of mRNANucleotides That Comprise the Epitranscriptome Discovering and Mapping the Modified

Bastian Linder and Samie R. Jaffrey

Lessons from YeastStructural Basis of Nuclear pre-mRNA Splicing:

NagaiClemens Plaschka, Andrew J. Newman and Kiyoshi

http://cshperspectives.cshlp.org/cgi/collection/ For additional articles in this collection, see

Copyright © 2018 Cold Spring Harbor Laboratory Press; all rights reserved

on April 20, 2022 - Published by Cold Spring Harbor Laboratory Press http://cshperspectives.cshlp.org/Downloaded from