semiempirical variational approach to rna folding

17
ELSEVIER Physica A 248 (1998) 336-352 PHYSICA Semiempirical variational approach to RNA folding Ariel Fern~indez * , Andrrs Colubri Instituto de Matemd~tica, Universidad Nacional del Sur, Consejo Nacional de lnvestigaciones Cientificas y T~cnicas, ,4venida Alern 1253, 8000 Bahia Blanca, Argentina Received 9 June 1997; revised 12 August 1997 Abstract A principle of stepwise maximization in the economy of the RNA folding process has been previously formulated as the principle of sequential minimization of conformational entropy loss (SMEL), This principle leads to a predictive folding algorithm rooted in an "adiabatic ansatz". In this approximation, microstates are lumped up into base-pairing patterns (BPPs), each of which is treated as a quasi-equilibrium state, and folding pathways are resolved as series of BPP transitions. Within this ansatz, a marker for the expediency of folding is a coarse Shannon information entropy, tr, which has been shown to reach its absolute minimum within experimentally relevant time scales, well below the limit of thermodynamic times. A rigorous treatment developed in this work validates the adiabatic approximation. Thus, a Lagrangian is identified at a semiempirical microscopic level, which is the variational counterpart of the SMEL principle. The Lagrangian computation of tr is contrasted with the adiabatic computation, revealing the subordination of torsional microstate dynamics to BPP transitions within time scales relevant to folding. Furthermore, the expediency of the folding process is explained by showing that the time scale for saturation of the capacity to generate information within the coarse BPP description of conformation space is commensurate with biologically relevant time scales and it stems from the inherent Lagrangian structure of the dynamics at the semiempirical level of description of chain torsions. PACS: 87.10,+e; 87.15.He; 87.15.Da Keywords: RNA folding; Lagrangian dynamics; Monte Carlo simulations I. Introduction The expediency in the folding of functional biopolymers under in vitro conditions is bewildering since folding occurs within time scales incommensurably shorter than * Corresponding author. Fax: 54 91 551 447; e-mail: [email protected]. 0378-4371/98/$19.00 Copyright (~) 1998 Elsevier Science B.V. All rights reserved PII S0378-43 71 (97)00527-X

Upload: colbas

Post on 22-Nov-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

ELSEVIER Physica A 248 (1998) 336-352

PHYSICA

Semiempirical variational approach to RNA folding

Ariel Fern~indez * , Andrrs Colubri Instituto de Matemd~tica, Universidad Nacional del Sur, Consejo Nacional de lnvestigaciones Cientificas

y T~cnicas, ,4venida Alern 1253, 8000 Bahia Blanca, Argentina

Received 9 June 1997; revised 12 August 1997

Abstract

A principle of stepwise maximization in the economy of the RNA folding process has been previously formulated as the principle of sequential minimization of conformational entropy loss (SMEL), This principle leads to a predictive folding algorithm rooted in an "adiabatic ansatz". In this approximation, microstates are lumped up into base-pairing patterns (BPPs), each of which is treated as a quasi-equilibrium state, and folding pathways are resolved as series of BPP transitions. Within this ansatz, a marker for the expediency of folding is a coarse Shannon information entropy, tr, which has been shown to reach its absolute minimum within experimentally relevant time scales, well below the limit of thermodynamic times. A rigorous treatment developed in this work validates the adiabatic approximation. Thus, a Lagrangian is identified at a semiempirical microscopic level, which is the variational counterpart of the SMEL principle. The Lagrangian computation of tr is contrasted with the adiabatic computation, revealing the subordination of torsional microstate dynamics to BPP transitions within time scales relevant to folding. Furthermore, the expediency of the folding process is explained by showing that the time scale for saturation of the capacity to generate information within the coarse BPP description of conformation space is commensurate with biologically relevant time scales and it stems from the inherent Lagrangian structure of the dynamics at the semiempirical level of description of chain torsions.

PACS: 87.10,+e; 87.15.He; 87.15.Da Keywords: RNA folding; Lagrangian dynamics; Monte Carlo simulations

I. Introduction

The expediency in the folding of functional biopolymers under in vitro conditions

is bewildering since folding occurs within time scales incommensurably shorter than

* Corresponding author. Fax: 54 91 551 447; e-mail: [email protected].

0378-4371/98/$19.00 Copyright (~) 1998 Elsevier Science B.V. All rights reserved PII S0378-43 71 (97)00527-X

A. Fern(mdez, A. Colubri/ Physica A 248 (1998) 336-352 337

ergodic or thermodynamic times [ 1-3]. In a recent work specialized for an RNA con- text [2], we have shown that the amount of Shannon information, - a , of the system measured relative to a coarse fixed description of conformation space reaches its ab- solute maximum within experimentally relevant time scales. However, this result has been obtained making use of an "adiabatic ansatz", whereby microscopic conformations are lumped up or resolved as base-pairing patterns (BPPs) treated as quasi-equilibrium states, and the rate of each elementary BPP transition is computed from within an Arrhenius-type scenario of activated processes.

At this point, the following question may be posed: What is the microscopic origin of the expediency of the RNA folding process as revealed by the timely absolute maximization of the coarse Shannon injbrmation content? In the present work we address this question by developing a semiempirical microscopic model of folding and we identify the Lagrangian structure of its underlying dynamics. The validity of this variational principle is given by the fact that it is precisely the counterpart of the principle of sequential minimization of conformational entropy loss (SMEL) [4,5]. The SMEL principle, reflecting the maximization in the economy of means involved in each step of folding, is valid at the BPP level and has been confirmed in previous predictive computations [4,5].

The variational principle introduced in this work enables us to rigorously compute the coarse informational entropy, or, at each stage of folding and compare the results with the adiabatic computation [2]. This analysis reveals that the adiabatic approximation is valid within biologically relevant folding time scales, and thus, provides a detailed semiempirical understanding of the expediency of the folding process.

An outline of the work is as follows. In Section 2, we succintly describe the computation of the time-dependent a using the adiabatic ansatz. Details are to be found in Ref. [2]. In Section 3, we introduce our semiempirical microscopic model in detail. Section 4 is devoted to identifying the variational principle at the micro- scopic level. Finally, in Section 5, we compare the rigorous derivation of cr with that obtained by the adiabatic approximation, thus revealing the validity of the latter sim- plified approach within the experimental time scales involved in detectable folding events [6].

2. The organizational capacity of an RNA molecule within the adiabatic approximation

This section is devoted to determining the expediency of the RNA folding process by monitoring the time evolution of the information content associated to a search for the functionally active structure. Conformations have been resolved as BPPs, each of which is regarded as a quasi-equilibrium state according to an adiabatic ansatz. In simple terms, this means that microscopic degrees of freedom are integrated out as conformational entropy. Our working assumptions will lead us to state that the coarse information entropy a compatible with the BPP description will reach its absolute

338 A. Fern(mdez, A. ColubrilPhysica A 248 (1998) 336-352

minimum within experimental time scales, long before the thermodynamic limit of

ergodic times has been reached. The partition of conformation space X relevant to the computation of a is the col-

lection Z of BPPs, a family of mutually disjoint classes. Thus, two microstates or

conformations resolved at the atomic level, are regarded as equivalent if they belong

to the same BPP. Given Z, a stochastic process ~ defining transition probabilities be-

tween elements of Z has been previously implemented to simulate the folding process [2,4]. Let ~/ be the measure over the space of pathways over Z induced by ~. Then, the coarse information entropy a may be defined as [2]

o'(t) z -- Z ~t 17(/4 ) In [zct q(A)], ( 1 ) ACZ

where A is a generic BPP, and ~t/~ is the projection of the pathway measure at time

t, which may be computed as follows:

rctq(A) = (# BPP pathways generated passing through A at time t)

/(#pathways generated), (2)

where "pathways generated" indicate computational realizations of ~. To implement the process ~ at the computational level, we first make use

of current combinatorial algorithms (see, e.g., Ref. [7]) to predict all plausible BPPs. Such algorithms incorporate the pseudoknot as a tertiary interaction motif and

consider only base pairing and stacking as stabilizing interactions in intramolecu- lar structure. The stochastic process is determined by the activation energy barriers required to produce or dismantle stabilizing interactions. Thus, at each instant, the partially folded chain undergoes a series of disjoint elementary events with transi- tion probabilities dictated by the unimolecular rates of the events. The stochastic

process is Markovian since the choice of the set of disjoint events at each stage of folding is independent of the history that led to that particular stage of the process.

The process is mechanistically constructed as follows: For each time t E I , we define a map t ~ J (x , t) = { j : 1 <~j <<, n(x, t)}, where J(x , t) = collection of elementary events representing conformational changes which are feasible at time t, given that the initial conformation x has been chosen at time t = 0, and n(x , t ) -- number of possible elementary events at time t. Associated to each event, there is a unimolecular rate constant kj(x, t) --rate constant for the j th event which may take place at time t for a process that starts with conformation x. The mean time for an elementary refolding event is the reciprocal of its unimolecular rate constant.

n(x,t) We now introduce a random variable r C [0 ,~ j= l kj(x, t)] , uniformly distributed over the interval. Let r* be a particular realization of r arising in a simulation of the

A. Ferndndez, A. Colubri/Physica A 248 (1998) 33~352 339

process, then there exists an index j* such that

j*--I j*

E kj(x,t) < r* <~ E kj(x,t), j - 0 j=0

(ko(x,t) = 0 for any x,t). (3)

This implies that the event j* = if(x, t) is chosen at time t for the folding process that starts at conformation x. The map t ~ if(x, t) for fixed initial condition x constitutes a realization of the Markov process which determines the folding pathway ~x. In turn,

the probability that the j*-event is chosen at time t is [k j* (X, t)/~j,~J(x,t) kf (x, t )]. Explicit values of the unimolecular rate constants require an updated compilation of

the thermodynamic parameters at renaturation conditions [7]. These parameters are used to generate the set of kinetic barriers associated with the formation and dismantling of stabilizing interactions, the elementary events in our context of interest. Thus, the activation energy barrier for the rate-determining step in the formation of a stabilizing

interaction is known to be -TASioop, where ASioop indicates the loss of conformational entropy associated to closing a loop. Such a loop might be of any of four admissible classes: bulge, hairpin, internal or pseudoknotted. For a fixed number L of unpaired bases in the loop, we shall assume the kinetic barrier to be the same for any of the four possible types of loops [2,4]. This assumption is warranted since the loss in

conformational entropy is due to two overlapping effects of different magnitude: The excluded volume effect, meaningful for relatively large L (L >~ 100) and the orientational effect that tends to favor the exposure of phosphate moieties towards the bulk solvent domain for better solvation. Since both effects are independent of the type of loop, we may conclude in relatively good agreement with calorimetric measurements, that the kinetic barriers are independent of the type of loop for fixed L. On the other hand, the activation energy barrier associated with dismantling a stem is - A H ( s t e m ) , the amount of heat released due to base pairing and stacking when forming all contacts in the stem.

For completion we shall display the analytic expressions for the unimolecular rate constants [4]. If the j th step or event happens to be a helix decay process, we obtain

kJ = fn exp[Gh/RT], (4)

where n is the number of base pairs in the helix formed in the j th step and Gh is the (negative) free-energy contribution resulting from stacking of the base pairs in the

helix. Thus, the essentially enthalpic term --Gh = - A H (stem) should be regarded as the activation energy for helix disruption. On the other hand, if the formation of an admissible stabilizing interaction happens to be the event designated by the jth step, the inverse of the mean time for the transition will be given by

kj = fn exp[--AGloop/R T] , (5)

where AGloop ~ -TASloop is the change in free energy due to the closure of the loop concurrent with helix formation.

340 A. Fern(mdez, A. Colubri/Physica A 248 (1998) 336-352

3. Folding at a semiempirical microscopic level

The vast gap between the time scales accessible to molecular dynamics compu- tations, typically in the range l ps-10ns, and those inherent to transitions between contact patterns (CPs), typically in the range 1 las-103 s, suggests the need for a semiempirical model. Thus, the problem becomes how to incorporate effective in- ternal degrees of freedom of the chain whose dynamics translates or projects onto sequences of CP transitions. The aim o f this section is to introduce a matrix where

local torsional states o f the chain are codified in a simplified fashion, so that pat- terns o f locally encoded structural signals may be recognized and translated as

B P e s [8]. Since conflicting possibilities may arise yielding different evaluations, a probabilistic

approach appears to be necessary. A realization of this concept is introduced in this work and materializes in a semiempirical model which deals mechanistically with the rich dynamic hierarchy of time scales determined by the different levels of structural resolution of the folding process and their interplay.

Thus, the exploration of conformation space results from the parallel occurrence of trails of folding events. The base-pairing matrix (BPM) evolving in the 1 ms - 103 s time scale and built upon the map of Watson-Crick (W-C) antiparallel complemen- tarities is generated by a search for consensus windows in a dihedral spin-state matrix (DSSM) which records the phases of 3N two-state oscillators evolving within a period range 10-1°-10 -5 s. The DSSM represents a coarse microscopic realization of the CP represented by the BPM such that a consensus window reflects the fulfillment of the structural constraints determined by the putative intrachain contact: Specific dihedral torsions are in the "correct state" required for contact formation. In turn, this puta- tive contact involves regions of the chain which flank the consensus window. The way in which the DSSM is translated into the BPM must be re-prescribed after each BPM transition, since consensus evaluation depends on the last CP generated. Thus,

long-range correlations are introduced in the D S S M by means o f a renormalization operation.

This sketch of the operational tenets reveals that, although advantage is taken of the fact that there exists a wide separation between characteristic time scales associated to folding events (typically in the range 1 ms-103s) and internal backbone motions, essentially realized as dihedral torsions (typically in the range 10-z°-10-5 s) [ 9-11], no "adiabatic assumption" subordinating or enslaving microscopic degrees of freedom to CP transitions is introduced.

No treatment of the RNA folding problem in vitro can be satisfactory without ac- tually incorporating the effect of the solvent and Mg(II) ions on the entropic cost associated to forming intramolecular contacts [6,12]. Thus, the role of the solvent en- vironment and of the divalent Mg(II) ions upon the consensus evaluation must be in- corporated: The solvent effects determine a different evaluation operation depending on the size of the loop [12], while the chelating role of Mg(II) on adjacent charged phos- phate groups of the RNA backbone couples the dihedral oscillators for the backbone

A. Ferndmdez, A. Colubri/Physica A 248 (1998) 336-352 341

torsions engaged in the orientation of the two phosphate groups which serve as ligand

donors. To summarize, the basic representational and operational tenets of the computational

design for the semiempirical microscopic model are:

(a) A DSSM matrix simultaneously recording the state of each "vertebral" and phos- phate orientation dihedral. Each effective torsional state is generated by a two-state oscillator whose period is chosen from Gaussian temperature-dependent distributions which are different depending on whether the nucleotide is free or engaged in an intrachain contact and its concurrent loops. A new period is chosen for an indi-

vidual oscillator after completion of the previous period, thus incorporating thermal fluctuations.

(b) A Watson-Crick map (WCM) of antiparallel windows upon which the BPMs are built by translation of the information encoded in the DSSM.

(c) A built-in internal clock C, incorporated in order to synchronize the DSSM generation timing and its reading and subsequent evaluation at regular intervals. In accord with Shannon's information theory, the fixed beat period of C must be at most

half of the shortest dihedral period. (d) The folder F which evaluates the DSSM and identifies consensus windows. This

operation is matched with the WCM to generate BPMs and ponders the interrelationship

between "vertebral" and phosphate-orientation consensus. In addition, F may dismantle or relax contact regions in the BPM whenever a significant consensus bubble arises

within a previously formed consensus window. (e) The operation of F is feedbacked within a renormalization loop into the DSSM

generator, and the new evaluation of consensus windows is renormalized or prescribed

according to the last BPM generated. Taking into account that folding materializes in a statistical ensemble of

RNA molecules and that conflicting consensus evaluations demand a probabilistic ap- proach, we shall adopt an appropriate output representation: The statistical dynam- ics of consensus search is defined by the time evolution of a base-pairing prob- ability matrix (BPPM), representing the weighted overlap of different consensus

evaluations. The essential operation in our parallel algorithm, the folding operation, lies within

a renormalization feedback loop and consists in the translation of information casted

in terms of the state of internal microscopic degrees of freedom into a coarser repre- sentation, a CP defined by a BPM. The latter matrix is in tum built upon the WCM. Thus, the algorithm and its underlying semiempirical model identify the CP class to which a DSSM belongs at certain time intervals according to a prescribed set of rules. The generation of the DSSM is in turn affected by the last CP transition that has

taken place, since a new set of constraints arises with each new CP formed, and the prescription for the evaluation operation itself is renormalized and thus depends on the last CP which has occurred. Prior to defining the folding process explicitly within our model, we must specify the basic representational elements and their interrelationships with regard to the basic operations:

342

(a~

A. Ferndndez, A. ColubrilPhysica A 248 (1998) 336-352

(b) V-V-V-A-V-V-V Fig. 1. Schematic representation of vertebral or row (1) consensus: When the vertebrae are correctly po- sitioned, as in (a), loop closure becomes readily feasible, while it is precluded except for large loops (N(loop)>> 17) if vertebral consensus is not reached, as in situation (b).

3.1. The D S S M

This 3 x N matrix is a coarse representation o f a microscopic realization o f a CP.

Each entry adopts the values 1 and 0, representing two significant states o f an RNA

backbone torsion localized in a specific nucleotide nt. The entry values are generated

by two-state oscillators, one for each entry, whose period z is automatically adjusted

after one whole period has been completed from a fixed T-dependent distribution w(z)

according to renormalization specifications detailed below. Each column in the DSSM

represents a different nt, with the ith column (1 <<,i<~N) corresponding to the nt with

contour value i along the chain. Each of the three rows represents a different reading

space and their interrelationships are pondered each time the DSSM is translated into

the BPM, according to the size o f the consensus window.

Each entry in the first row, denoted (1), indicates the dihedral spin state for a back-

bone "vertebral" torsion, as schematized in Fig. 1. A consensus window of consecutive

spins in, say, state 1 in row (1) is a necessary condition for closure of a loop com-

prised o f the sequence o f nt 's within the associated contour window. The physical

interpretation o f this vertebral consensus as a necessary constraint for loop closure is

schematized in Fig. 1. Notice that vertebral consensus is not directly related to any geometric curvature condition for loop closure, which would make it impossible to

have consensus subwindows flanked by W - C complementary regions (it would be im-

possible to satisfy at the same time and in the same region of the chain geometric constraints for loop formation involving the whole region and those for any smaller loop involving a subregion). On the other hand, the possibility of different conflic- tive consensus evaluations is perfectly compatible with the probabilistic nature o f our model. Depending on the size o f the window, the existence o f vertebral consensus at

the time of a reading with flanking regions within the WCM may lead to the BPM recording of an intra-chain long-range contact formation between the nt 's flanking the consensus window. I f N(loop) indicates the size o f the consensus window, we may

A. Fern~ndez, A. Colubri/Physica A 248 (1998) 336352 343

state that the necessary condition becomes sufficient if and only if N( loop)> Nc = 17

(cf. [12]), as shown below.

The entries in the second and third rows, denoted (2) and (3), in the DSSM indicate the dihedral spin states of those backbone torsions engaged in orienting the negatively charged phosphate group. Thus, for the sake of convention, spin state 1 in rows (2) and (3) for column i indicates that the phosphate of the ith nt faces bulk solvent whenever

this nt is a part of an intramolecular loop.

3.2. The interrelationship amon9 the rows of the D S S M

To specify the interrelationships between these different reading spaces in consen- sus evaluation, we first define an intramolecular ( i , j ) contact with i < j as the W-C base-pairing engaging the two nt 's with contour values i and j . The occurrence of the (i , j) contact is marked by a 1 in the /j-entry of the triangular BPM. Suppose the initial CP corresponds to the random coil, i.e., there are no intramolecular con-

tacts, and N(loop) = IJ - (i + 1)1 > 17 (cf. Ref. [12]). Then, once a reading of the

DSSM takes place, an (i , j) contact will be produced after evaluation and recorded as such in the new BPM if all dihedral "vertebral" spin states for the segment of row (1) flanked by entries i and j are in the correct torsional conformation for folding,

i.e., they are in state 1 (cf. Fig. 1). The fact that the range of the putative ( i , j ) in- teraction must be larger than 17 to materialize with vertebral consensus only can be justified as follows: Loop closure defines two solvent domains, an inner low-dielectric domain confined by the loop of rod-like dimensions and an outer high-dielectric or bulk-like domain. If the loop is sufficiently large, encompassing more than 17 nt's, as demonstrated in Ref. [12], the dielectric difference between the inner and outer do-

main becomes negligible: Each charged phosphate group pointing to the inside of the loop admits four water-solvation layers for a loop of size 17 or larger. However, tbr

smaller loops the consensus demands are higher and vertebral consensus is no longer sufficient. The phosphate groups must be oriented towards the bulk for better solva- tion. This argument leads us to qualitatively distinguish rows (2) and (3) from row (1): as more orientational constraints are associated to loop closure, more consensus is needed for it to materialize, so that for a putative loop smaller or equal to 17, that

is I J - (i + 1)1 ~< 17, consensus windows in rows (1)-(3) , flanked by columns i and j becomes the necessary and sufficient condition.

3.3. The dihedral frequency distributor (DFD)

In attempting to project dihedral torsions onto the CP space, where the folding process is conventionally recorded, the time scale limitations of molecular dynamics simulations must be circumvented. This explains the need to introduce models, such as the one presented in this work, in which activated molecular motions in the ns to las range are considered [9-11] . Thus, faster diffusional-like unhindered torsions, such as the torsion around the glycosidic sugar-base bond in an unpaired nt are integrated

344 A. Ferndmdez, A. ColubrilPhysica A 248 (1998) 336-352

out as conformational entropy of the state defined by the DSSM representation. Such

motions, well into the ps range, determine the rod-like shape of the RNA molecule when viewed within the time scale window between two DSSM states. Precisely, this

interrelation between shape and time scale justifies the concept of inner and outer solvent domain defined by an intramolecular loop, as put forth in our consensus analysis of the entropic cost of loop closure. Thus, the microscopic mean time range relevant to

DSSM transitions is 1 ns-l~ts, covering the timescale for internal motions (1-10ns), of the order of the calculated diffusional displacements of flexible hinged domains [10], and, at the other end of the spectrum, the limiting value ( 1 - 1 las) for a localized

helix-unwinding event leading to a bubble within a helix [11]. These considerations lead us to define a temperature-dependent normalized distribu-

tion of periods, w = w(z). In particular, the periods of unhindered dihedral oscillators are assigned from this distribution in such a way that the effect of thermal fluctuations

on the formation of consensus and thus, on structural transitions is incorporated. The distribution has three Gaussian peaks, each with dispersion tr 2 = gT, where the con- stant g depends on the actual denaturation temperature T (denat.) and on the consensus interpretation of denaturation, as shown below. The peaks occur, respectively, at mean periods 1 ns, lOOns and 1 las. This distribution allows us to classify nt 's in two classes:

to class I belong all nt's with mean dihedral period 1 ns, while class II contains all nt 's whose mean dihedral period is either lOOns or 1 ~ts. The first class corresponds to internal dihedral torsions of the RNA chain of the type probed by fluorescence depo-

larization [10]. These torsions occur in free nt's, that is, unpaired nt's not belonging to a loop. Accordingly, the DFD in the folding machine establishes a lottery from which periods of dihedral spin oscillators for free nt's are assigned from within the period

range centered at 1 ns. A new period is assigned from the lottery to each oscillator each

time a whole previously assigned period has been completed. The frequency f = 1/'r of a dihedral spin in any row of the DSSM corresponding to an nt not engaged in an intra-chain interaction or loop satisfies the inequality

[ f - I _ 1 ns[ ~< [z' - 1 ns[, with z' satisfying : w(z ' ) = Infimum~ {w(z) ~> 1/3N}.

(6)

The condition yielding the extreme period ~' arises from the fact that there are at most 3N free oscillators in the chain.

The other two peaks in the distribution correspond, respectively, to mean periods for nt 's engaged in an A-U or G-C Watson~2rick base-pair within a helix, or to nt's within loops. In the latter case the period range covers the entire bimodal distribution. Again, the same considerations apply with regards to the period assignment to oscillators for class II nt's. These rules imply that the nt 's in loops concurrently formed with an intrachain helix adopt the same cadence as the helix nt 's themselves. This is so, since the rate of helix dismantling is exclusively dependent on the size of the helix taken by itself [4,13], and determined by the formation of a significant consensus bubble amongst the class II nt's engaged in the helix. Furthermore, this local limiting event is fairly independent of concurrent microscopic events in the associated loops.

A. Fernandez, A. ColubriIPhysica A 248 (1998) 336~52 345

The mechanistic aspects of period distribution, as performed by the DFD imply that

this operation is subject to renormalization with each CP transition: A CP determines which columns in the DSSM correspond to free or class I nt's in the chain and which correspond to nt 's engaged in an intrachain interaction, or, equivalently, they belong to class II. Thus, since a CP transition reclassifies the nt's, it also dictates the range from which new period assignments are drawn. The period range for a specific nt remains the same as before the CP transition if the transition does not alter the class of the nt, and changes if the CP transition transfers the nt to a different class.

3.4. The built-in internal clock C

In order to satisfy the basic tenets of Shannon's information transmission while making all operations C-synchronized, the intrinsic beat period of C should be taken

1 t to represent ~ , one-half of the shortest possible period to be assigned to a dihedral spin oscillator. Since r / depends on w(z), it is itself T-dependent. The time interval

between two consecutive readings of the DSSM, ,(read), is taken as constant and, in order to lower the operational cost, it is fixed at the shortest time that a CP transition could possibly take. Thus, , ( r e a d ) = 23×3rr, the shortest time to form the 3-loop, the

smallest possible loop, engaging the fastest oscillators.

3.5. The jolder F

The actual translation of the consensus evaluation of the DSSM into the BPM is performed by the folder F, with the aid of an updated version of the reading instructions manual (RIM). An updating takes place with each CP transition marked by a change in the BPM. The folder might form or dismantle(relax) several intrachain helices in

parallel and records in the BPM the consensus evaluation performed with the aid of

the RIM, as indicated in the schemes displayed in Fig. 2. The consensus search or evaluation under unconstrained conditions, i.e., with a RIM defined by the random coil CP, has been partially delineated in (a) for helix formation. The flow chart for this

parallel operation is displayed in Fig. 3. On the other hand, helix dismantling materializes and is recorded as such by dele-

tion in the BPM whenever a consensus bubble forms amongst class II nt's engaged in

base-pairing. By "consensus bubble" we mean that in any of the three rows, a con- secutive sequence within the set of dihedral spins of helix nt's of length 30% of the total helix length must be out of phase with the consensus value 1. In other words, the sequence within the helical region must adopt the state 0 at the time when the reading of the DSSM takes place. Because of the renormalization loop, this transition at the BPM level immediately transfers a new set of constraints for the generation of new DSSMs. The nt's previously engaged in the helix and in the concurrent loops are reclassified, being transferred from class II to the higher-frequency class I.

Stacking effects [14] reflect themselves mechanistically in the formation of the con- sensus bubble: The larger the helix, the more improbable to find a 30% out-of-phase

346 A. Fern(mdez, A. ColubrilPhysica A 248 (1998) 336-352

The folding machine

(a)

DSSM +

RIM c°nsensus I~ BPM I

parallel operation

(b) frequency redistribution +

+ renormalization

Fig. 2. (a) General scheme of the folding machine featuring its basic components: the inherent clock C, the dihedral frequency distributor DFD, the folder F, the output displayer O and the renormalizer R. (b) General scheme of the interrelation between the different representational elements: the dihedral spin-state matrix DSSM, the reading instructions manual RIM and the base-pairing matrix BPM.

subsequence of oscillators from class II. Furthermore, these considerations enable us to estimate the constant g which determines the effect of thermal fluctuations on the pe- riod distribution: At the denaturation temperature T (denat.), every helix formed must

develop a consensus bubble evaluated and recorded with the next reading of the DSSM. Thus, if tr is "large enough", the period distribution in the helix is broad enough so

that consensus cannot be preserved: The period range, of the order of a, is such that

a helix consensus cannot survive two consecutive readings. From these considerations, and taking into account our empirical estimate of the denaturation dispersion fixed at a = 0.6 ItS, and the typical experimental T (denat.) = 312 K for ribozymes such as the ones studied in this work [14], we get g ~ 1.2 x 10 -15 s2/K.

3.6. The renormalizer R

The renormalizer has two simultaneous roles: it updates the RIM by determining contour distances relative to the latest BPP generated, and, by readjusting frequencies, it places a new set of constraints on the generation of new DSSMs based on the latest CP translated. Thus, renormalization accounts for long-range correlations that develop

on the D S S M as a consequence of folding events recorded as CP transitions.

A. Fernfndez, A. Colubri/Physica A 248 (1998) 336-352

No No / ~

(" consensus .~yes

347

Fig. 3. The flowchart of the folding operation as a consensus evaluation after reading the three rows (1)-(3) of the DSSM. The following notation has been adopted: "consens. (1)?" and "consens. (2) and (3)?" refer, respectively, to finding a consensus region on row (1) or finding it on rows (2) and (3) on the DSSM; "W~S?" refers to finding Watson-Crick complementarity in the regions flanking the consensus region and "N ~>Nc?" refers to deciding whether the size N of the consensus region is larger than the critical size Nc.

4. A Lagrangian defined at the semiempirical microscopic level

Foldings o f the RNA chain have been identified as pattems o f locally encoded signals

whose generation, translation and renormalization have been studied in the previous

section. Since structural patterns have been identified with consensus windows within

which local topological constraints are fulfilled, it follows that the preferred folding

pathway resolved as a sequence of CP-transitions is the one for which each transition

entails the minimal cost in conformational entropy, while maximizing the enthalpy

decrease. This BPP-pathway is the easiest to form, and therefore, the most probable as

it entails the maximum economy of means for each step. This is so since consensus

as identified in a DSSM, implies the fulfillment o f local topological constraints which,

in turn, determine the CP and, on the other hand, the enthalpic content is exclusively

dependent on the contacts which have materialized. Thus, our semiempirical approach

is perfectly compatible with the SMEL principle [4,13].

In view of these facts, the following problem arises: is it possible to define a

Lagrangian L at the coarse microscopic level o f DSSM-transit ions which reproduces

the SMEL behavior when a DSSM-trajectory is translated as a sequence of CP-

transitions? According to our treatment, the solution to this central problem involves

determining L = K - H , where H is the enthalpy content, or potential in our context,

3 4 8 A. Ferndmdez, A. Colubri/Physica A 248 (1998) 336-352

and K is the "kinetic energy" of a DSSM, K = K(DSSM), which must satisfy the

following conditions: (A) It must be uniquely defined for all DSSMs associated to a specific CP, since

otherwise many DSSM-trajectories, amongst which is the one singled out by L, would

translate into the SMEL pathway. (B) A CP-transition is favored if it entails a minimal entropic cost, therefore its

corresponding DSSM-transition must entail a minimization of the kinetic energy loss.

In accord with conditions (A) and (B), we define

1 K(DSSM) = g Z f/2, (7) i = I , . . . , 3 N

where the f i 's , given in Hz, are the mean frequencies of the oscillators of the DSSM and hence, H must be given in Hz 2 units.

The choice given in (7) is a good one, as we shall demonstrate making use of the fact that the oscillator frequencies of the DSSM are determined by the renormaliza- tion operation. Let us suppose that a pattern of local signals has been identified and recorded in the BPM at a given time by the translation operation. Then, the renor- malization operation reclassifies the residues engaged in the latest CP generated, from class I into class II. The reader is reminded that nts. belonging to loops whose clo- sure is concurrent with interactions adopt the same cadence of those nts. that form the respective contacts. Thus, the loss in conformational entropy associated to the CP

transition is corresponded with a loss in kinetic energy o f the D S S M in such a way

that a minimization o f the entropy loss is corresponded with a minimization o f the

loss in kinetic eneryy. This implies that the trajectory which maximizes the L-action

is the one that, projected onto CP-space, represents the S M E L behavior.

On the other hand, the renormalization operation warrants the uniqueness of K for every DSSM which translates into the same CP, since the renormalization operation establishes the new frequencies exclusively on the basis of the latest CP formed. By virtue of the renormalization operation, the occurrence of intrachain contacts is reflected

as a lowering of oscillator mean frequencies. This is the reason why the Lagrangian

L underlies the coarse microscopic behavior which shows up at the level o f CP tran-

sitions as the S M E L principle.

With regard to the T-dependence of K, we must take into account that at the denatu- ration temperature T = T(denat.), consensus bubbles are created which dismantle every CP, as indicated in Section (3). This is the result of a broad spreading of frequen- cies beyond the critical spreading value and the concurrent impossibility of preserving phase in consensus windows, as indicated in Section (3). Thus, the renormalization operation near criticality will speed up all the oscillators for consensus windows where critical-size bubbles occur, while such bubbles will become more and more ubiquitous as T approaches T(denat.). Then, K increases with T and the following relation holds as T approaches T(denat.) from below:

limit K = Maximum K - N x 1022 H z 2 . ( 8 ) T ~ T(dena t . ) -

40

30

20

10

I I

3 5

Iogl0#0t-p loops)

A. Fernlmdez, A. Colubri/Physica A 248 (1998) 336-352 349

I

7

Fig. 4. Plot of the Hamming distance (x 100) between consecutively evaluated DSSMs along the folding pathway generated by iterating ~z p translation-renormalization loops.

To conclude, we may state that the most economic folding pathway resolved at the coarse microscopic level, y*, satisfies the following classical dynamics result:

L-action(y* ) = Maximum L-action(y), (9) all DSSM--trajectofies y

where "L-action(y)" denotes the sum of all the L(DSSM) terms for all DSSMs along the trajectory y in the previously alloted time interval.

Without loss of generality, we may assume that there exists a positive constant q so that q + L = U > 0. We may define a stochastic process E at the semiempirical microscopic level of DSSM transitions by defining transitional probabilities with respect to U:

p(V --~ W) = L'(V ~ W)/ ~ L'(V ~ W'), (10) W'

where V, W, W ~ are DSSMs, and the sum is extended over all W' states accessible from V within a pre-determined Hamming distance d = d(max) between DSSM matrices (see Fig. 4). This distance is defined as

d(V, W) = (½N) ~ Aij(V, W), i,j

Aij(V,W) = 1 if Vij ¢ Wij, and = 0, otherwise. (11)

Prior to carrying out the stochastic computation using the Lagrangian, we determine the time dependence of d(max) using the basic operational tenets defined in Section (3). A D - t plot (D = 100d(max)) has been generated by performing 107 (7c-p loops) of

the type indicated in Fig. 2, where n denotes translation and p renormalization. Thus, D is determined by computing after every z(read) interval the distance between two

consecutive DSSMs evaluated through two consecutive n-operations. A computation for the functional RNA TtLSU (long splicing unit of Tetrahymena), is displayed in Fig. 4. This intensively studied RNA is a prototype ribozyme or catalytic RNA of the

so-called group I [14,15].

5. Results

O

The previously identified Lagrangian L enables us to account for the microscopic origin of the phenomenon of saturation of Shannon's informational content due to the folding process [2]. This thorough saturation has been inferred previously [2] using the

adiabatic ansatz. The microscopic treatment requires a comparison between the coarse informational entropy, aL(t), determined by the Lagrangian-induced stochastic process E and that determined by the adiabatic process 4. The former is identified as follows:

aL(t) = Z ~ztyn- 1 (A) ln[ntyrc- 1 (A)], (12) AEZ

where n is the translation or projection of a DSSM into its BPP class and 7 is the proba- bility measure determined by E over the space of DSSM pathways. The time-dependent behavior of both coarse information entropies, a and aL, are displayed in Fig. 5 for the species TtLSU indicated above.

An understanding of Fig. 5 requires the previous interpretation of Fig. 4. The analysis of Fig. 4 is carried out taking into account the time evolution of patterns of locally encoded structural signals determined by the time dependence of the DSSM. Within the range 10-103 (n-p) loops we observe a steady growth in the Hamming distance until the maximum value D = 30 is reached. This steady increase in pattem fluctuations

33

99

66_

I I I

3 5 7 Ioglo #(~-p loops)

350 `4. Ferndndez, A. Colubri/Physica .4 248 (1998) 336-352

rigorous

adiabatic

Fig. 5. Time dependence of the coarse information entropy relative to the partition Z. The abscissas indicate time in # (n-p) loops units. Each DSSM (g-p)-evaluation takes place at z(read) ~ 5 × 10 -8 s.

A. Fern4ndez. A. Colubri/ Physica A 248 (1998) 336-352 351

is primarily due to the large initial kinetic energy of the patterns, implying that the

phase of up to 30% of the oscillators does not survive two consecutive evaluations

(n-translations) o f the DSSM. The maximum in D is due to the initial formation of

"misfolded" short-range structures which are easily dismantled within the ~ts time scale, as indicated below.

Incipient helices formed in structure-nucleation events in the range 5 x 10 - 7 - 5 ×

10-4S are easily dismantled (a 30% consensus bubble is formed more easily than in

fully developed structures) with concurrent increase in kinetic energy of the DSSM pat-

tern due to class II ~ class I transitions in all nts. formerly belonging to the helical stem as well as its concurrent loops. This fragility o f incipient structures causes the large

pattern fluctuations marked by a large D in the range 5 z 10 7 _ 5 x 10 - 4 S, or 10 l 0 4

(rt p) loops. A large plateau starting at 5 x 10 -3 s (105 (re-p) loops) marks the forma-

tion o f a relatively stable kinetic intermediate which contains all structural motifs which

may form noncooperatively. That is, those motifs whose associated L-loops lie within

the favorable ranges o f low conformational entropy cost: 3 ~<L~< 14 or 18 ~<L ~< 100

[2,12]. At 5 x 10 -1 s, cooperative events lead to other helices whose loops have favor-

able renormalized sizes, while their sizes relative to the random coil are unfavorable.

On the other hand, the increase in class II nts. beyond the formation of the kinetic

intermediate stabilizes the patterns determining the survival of the oscillator phase in

consecutive DSSM evaluations. This determines the relatively low fluctuations beyond 5 x 10 IS.

At this point, we may examine Fig. 5 in which both informational entropies a and al~

are simultaneously displayed. The comparison is valid since both entropies are defined

relative to the same partition Z. For short time scales 5 x 10 - 7 - 5 × 10 - 4 S, fast-evolving

internal degrees of freedom simulated as torsional oscillators are not yet enslaved or entrained by CP transitions which evolve within typical time scales 10 - 4 - 1 0 3 s. For this

reason, within the range 5 x 10 -7 - 5 x 10 - 4 S, the level of exploration o f conformation

space due to uncorrelated or short-range correlated torsional excitations is vastly larger

than that resulting from an adiabatic process.

However, as soon as the stabilized kinetic intermediate is formed, the long-range

correlations coupling distant-row oscillators in the DSSM, begin to develop, as co-

operative effects occur upon short-range nucleating interactions. This long-range cor-

relations are in turn induced by CP transitions, in consonance with the nature o f the

renormalization operation. Thus, initial structure-nucleating steps involving uncorrelated

or locally-correlated motions do not demand as much enslavement o f fast-evolving tor-

sions as cooperative events, which entail long-range correlations. For this reason, we

expect the adiabatic approximation to fit the rigorous results as soon as long-range correlations governed by CP transitions occur beyond 5 x 10-3s ~ 105 ( n - p )

loops. This is precisely what is observed in Fig. 5, where both the rigorous and adiabatic

plot coincide within the range 10 - 3 - 50 s. This rigorous result supports the validity of

the adiabatic approximation within relevant foldin9 time scales marked by the emer- gence of long-range correlations. Furthermore, the information content as derived from

352 A. Ferndndez, A. Colubri/Physica A 248 (1998) 336 352

the evolution of patterns at the semiempirical microscopic level reaches its absolute maximum within experimentally relevant time scales [2,6].

These results indicate that the expediency of the folding of a natural biopolymer, a central issue in molecular biophysics, stems from the Lagrangian structure of a ju- diciously chosen coarse-grained version of the long-time dynamics of chain torsions. This Lagrangian structure is then inherent to a semiempirical model which, by simpli- fying locally encoded structural signals, enables one to bridge the gap between typical molecular dynamics time scales and the time scales relevant to meaningful folding events.

Acknowledgements

A.F. is the principal investigator of the National Research Council of Argentina (CONICET). Partial financial support for this work was provided by the J.S. Guggen- helm Memorial Foundation through a fellowship awarded to A.F. The authors thank Prof. Jerome Percus for valuable ideas and his hospitality during visits to the Courant Institute of Mathematical Sciences (New York University), where this research was first conceived.

References

[1] T.E. Creighton, Understanding protein folding pathways and mechanisms, in: L.M. Gierasch, J. King (Eds.), Protein Folding, Am. Assoc. Adv. Sci., Washington, 1990, pp. 157-170.

[2] A. Fernandez, Physica A 233 (1996) 226. [3] R.L. Baldwin, Proc. Nat. Acad. Sci. USA 93 (1996) 2627. [4] A. Fernandez, H. Arias, D. Guerin, Phys. Rev. E 52 (1995) R1299. [5] K.A. Dill, K.M. Fiebig, H.S. Chan, Proc. Nat. Acad. Sci. USA 90 (1993) 1942. [6] P. Zarrinkar, J. Williamson, Science 265 (1994) 918. [7] J.A. Jaeger, D.H. Turner, M. Zuker, Proc. Nat. Acad. Sci. USA 86 (1989) 7706. [8] M. Guenza, K.F. Freed, J. Chem. Phys. 105 (1996) 3823. [9] C. Cantor, P. Schimmel, Biophysical Chemistry, vols. I-III, Freeman, New York, 1980.

[10l C. Brooks III, M. Karplus, B. Montgomery Pettitt, Proteins: a theoretical perspective of dynamics, structure and thermodynamics, in: Advances in Chemical Physics, vol. LXXI, Wiley, New York, 1988.

[11] A. Fernandez, Z. Phys. B: Condens. Matter 79 (1990) 255. [12] A. Fernandez, G. Appignanesi, H. Cendra, Chem. Phys. Lett. 242 (1995) 460. [13] A. Fernandez, G. Appignanesi, Phys. Rev. Lett. 78 (1997) 2668. [14] R.F. Gesteland, J.F. Atkins (Eds.), The RNA World, Cold Spring Harbor Press, New York, 1993. [15] F. Michel, E. Westhof, J. Mol. Biol. 216 (1990) 585.