A parallel tabu search for conformational energy optimization of oligopeptides

Download A parallel tabu search for conformational energy optimization of oligopeptides

Post on 06-Jun-2016

212 views

Category:

Documents

0 download

TRANSCRIPT

<ul><li><p>A Parallel Tabu Search for ConformationalEnergy Optimization of Oligopeptides</p><p>L. B. MORALES,1 R. GARDUOJUREZ,2 J. M. AGUILARALVARADO,3</p><p>F. J. RIVEROSCASTRO31IIMAS-UNAM, Apdo. Postal 70221, 04510, Mxico D.F.2CCF-UNAM, Apdo. Postal 483, 62250 Cuernavaca, Morelos, Mxico3FC-UNAM, Circuito Exterior, Ciudad Universitaria, Coyoacan, 04510, Mxico D.F.</p><p>Received 8 June 1998; accepted 16 September 1999</p><p>ABSTRACT: We have developed and implemented a tabu search heuristic (TS)to determine the best energy minimum for oligopeptides. Our test molecule wasMet-enkephalin, a pentapetide that over the years has been used as a validationmodel for many global optimizers. The test potential energy function wasECEPP/3. Our tabu search implementation is based on assigning integer valuesto the variables to be optimized, and in facilitating the diversification andintensification of the search. The final output from the TS is treated with a localoptimizer, and our best result competes both in quality and CPU time with thosereported in the literature. The results indicate that TS is an efficient algorithm forconformational searches. We present a parallel TS version along withexperimental results that show that this algorithm allows significant increases inspeed. c 2000 John Wiley &amp; Sons, Inc. J Comput Chem 21: 147156, 2000Keywords: protein structure prediction; heuristic algorithms; protein folding;conformational space search; tabu search</p><p>Correspondence to: R. GarduoJurez; e-mail: ramon@ce.fis.unam.mx</p><p>Contract/grant sponsor: PAPIIT-DGAPA-UNAM; contract/grant number: IN-102397</p><p>Contract/grant sponsor: CRAY-UNAM Research Fund; con-tract/grant number: SC-002195</p><p>Contract/grant sponsor: CONACyT; contract/grant number:25245-E</p><p>Journal of Computational Chemistry, Vol. 21, No. 2, 147156 (2000)c 2000 John Wiley &amp; Sons, Inc. CCC 0192-8651 / 00 / 020147-10</p></li><li><p>MORALES ET AL.</p><p>Introduction</p><p>P eptides are short polymers made up of a fewto a few tens of amino acids. Many of thesehave meaningful roles in biochemistry and bio-physics. Some sequences of peptides have a cleartendency to form well-defined three-dimensionalstructures, that is, to fold. Peptides are also usefulas model systems for much larger peptide chainsknown as proteins. The naturally occurring three-dimensional structure of a protein, its tertiarystructure, is believed to be uniquely determined byits primary structure, the sequence of amino acidsof which the protein is composed. Anfisen1 in histhermodynamic hypothesis proposes that the na-tive state of a protein is the structure that minimizesthe free energy. By definition, such a state would beat the global minimum of free energy relative to allother states accessible on that time scale. Thus, theconformational search, or folding, can be posed asan optimization problem.</p><p>Conformational search of peptide molecules, toa first approximation, can be thought of as theproblem of finding the 3D molecular structure thatcorresponds to the lowest local minimum of anappropriate mathematical function describing thepotential energy of the system. Computer simula-tions are often used to carry out this task. A majorconcern in computer simulations is to obtain a setof low-energy conformations with biological signif-icance; that is, finding those conformations that arenear the thermodynamic native state.</p><p>Folding a protein from only a knowledge of itsamino acid sequence is a formidable task. Becauseit is computationally impossible to test all possibleconformations to determine the global minimum,it is necessary to develop methods that can landupon a global minimum without testing all con-formational possibilities. This is a challenging op-timization (minimization) task. In many cases thedetailed properties of the potential function to beminimized are not known. Even if the function isdifferentiable, one can often encounter nonconvexsurfaces, and the local properties of the function canbe different in the different search regions, i.e., thebasins can have different size or depth, the smooth-ness can vary, etc.</p><p>Many different force fields for proteins havebeen designed as a summation of a set of poten-tial energy contributions. Among the most usedones are: ECEPP,2 MM2,3 ECEPP/2,4 CHARMM,5</p><p>DISCOVER,6 AMBER,7 GROMOS87,8 MM3,9 andECEPP/3.10 Most of these have a large number of</p><p>local minima. In general, protein folding with anyforce field is a NP-hard problem,11 where the timeneeded to locate the lowest minimum grows ex-ponentially when the number of variables growslinearly. A major challenge in this type of globaloptimization problems is that there is no clear math-ematical basis for efficiently reaching the globalminimum, thus finding the latter in an accurate andspeedy way is of general interest.</p><p>To reduce the size of the problem one takes ad-vantage of the fact that under biological conditionssome internal motions of protein molecules occur ona time scale much smaller than others. Experimen-tally, the average values of covalent bond distancesand covalent bond angles are fairly constant, andlead to the assumption that conformational changesobserved in the dihedral angles could fully deter-mine the overall shape of the protein molecule.Thus, if one specifies the position of all atoms inthe protein molecule as a function of its internal co-ordinates, under the assumption of constant bondlengths and bond angles, the problem drastically re-duces the number of its degrees of freedom.</p><p>Although the size of the problem can be re-duced when the energy function is written in termsof torsional angles, it is known that in this formthe energy function is no longer partially separa-ble, meaning that it is no longer much less ex-pensive to reevaluate the energy if only a fewvariables change than if they all change. To over-come this effect, a number of workers have devisedinteresting stochastic and nonstochastic methods,which impose constrains and bias the search to-wards the region where the global minimum couldbe found. Among stochastic methods employedto predict oligopeptide 3D structures are MonteCarlo with minimization12a, 12b (MCM), simulatedannealing13 (SA), threshold accepting14 (TA), freeenergy Monte Carlo with minimization15 (FMCM),multicanonical ensemble16 (ME), conformationalspace annealing17 (CSA), and genetic algorithms18</p><p>(GA). Among nonstochastic methods we find mole-cular dynamics with minimization19 (MDM), dy-namic programming20 (DP), the diffusion equa-tion method21a (DEM), the mean-field technique22</p><p>(MFT), and a global optimization procedure knownas BB.23</p><p>In this article we take an approach to min-imize the ECEPP/310 energy function based ontabu search24 (TS), a stochastic optimizer developedto treat complex combinatorial optimization tasks.Tabu search has the advantage that only functionvalues are used, differentiability and continuity arenot required, and it is characterized by the use</p><p>148 VOL. 21, NO. 2</p></li><li><p>TABU SEARCH HEURISTIC</p><p>of memories during the search. In operation re-search literature, tabu search has proven to be betterthan simulated annealing, both in the CPU time re-quired and in the quality of the solutions found formany complex problems. Our test molecule is thatof Met-enkephalin, a pentapetide that has been usedas a validation model for many global optimizers,and because its lowest energy conformation for thepotential energy function ECEPP/3 is known.23 Wefirst present the problem we are dealing with in amathematical fashion, then we discuss the generalprinciple of the tabu search heuristic and explainhow to use tabu search for conformational search.Finally, we present our computational results andcompare our best results with those reported byother authors who have employed methods differ-ent from TS. We propose the use of a version of TS inparallel to improve the CPU time needed to find thelowest energy structure. Such a concurrent versionwas developed for parallel computing on a SGI ORI-GIN 2000 computer with close to ideal speed up.</p><p>The Problem</p><p>As indicated above, the conformation of a proteinwith a sequence of Nres amino acid residues in thepeptide chain can be described by a set of dihedralangles i, i, i, where i = 1, . . . , Nres on the back-bone, plus a set of dihedral angles ji , i = 1, . . . , Nres,j = 1, . . . , Ji, where Ji denotes the dihedral anglesof the side group on the ith residue. If one wishesto allow capping of the peptide, then one has to in-clude two more sets of dihedral angles. One couldbe defined as Nk , k = 1, . . . , KN for those dihedralangles on the amino end group, and the other couldbe defined as Ck , k = 1, . . . , KC for those dihedralangles on the carbonyl end group.</p><p>In this report the complete ECEPP/310 force fieldwas used. This force field is built upon the assump-tions that the bond lengths and angles are at theirequilibrium values, and that the resulting functionis in reality a conformational energy surface madeof a summation over interactions of types 14 andhigher. These interactions take into account elec-trostatic, nonbonded, hydrogen bond, and torsionalenergies, plus other empirical terms that take intoaccount a loop closing potential in the case thatthe peptide has intramolecular disulfide bonds, andfixed conformational energies for the propyl and hy-droxypropyl residues. A condensed description ofthe ECEPP/3 force field could be written as:</p><p>U = Uelec +Unonb +Uhb +Utor +Uloop +US-S</p><p>where</p><p>Uelec =</p><p>i</p><p>i 6= j</p><p>332.0qiqj/Drij</p><p>Unonb =</p><p>i</p><p>i 6= j</p><p>FA/r12ij C/r6ij</p><p>Uhb =</p><p>h</p><p>x</p><p>Ahx/r12hx Bhx/r10hx</p><p>Utor =</p><p>k</p><p>(U0/2.0)(1 cos nkk)</p><p>Uloop =</p><p>l</p><p>Bli= 3i= 1</p><p>(ril ri0)2</p><p>US-S =</p><p>s</p><p>As(r4s r40)2</p><p>All constants are estimated by fitting of experimen-tal data.10, 25</p><p>Given these definitions, the potential energy min-imization problem can be summarized as follows:</p><p>minimize U(i,i,i,</p><p>ji , </p><p>Nk , </p><p>Ck</p><p>)subject to the particular constrains:</p><p>180 i,i &lt; +180 i = 1, . . . , Nres10 (i 180) +10 i = 1, . . . , Nres180 ji &lt; +180 i = 1, . . . , Nres, j = 1, . . . , Ji180 Nk , Ck &lt; +180</p><p>k = 1, . . . , KN, k = 1, . . . , KCWe also have included anticorrelations in angles i,i+1 as defined in a previous article13 to further re-duce the space search.</p><p>Tabu Search</p><p>The tabu search heuristic is a search proce-dure for solving complex combinatorial optimiza-tion problems of a general type. It has been appliedsuccessfully to vehicle routing,26 large travelingsalesman,27 job shop scheduling,28 a bridge clubscheduling,29 evaluation of chemical distance,30 andprotein conformation on a lattice model.31 This pro-cedure has also been extended to optimization ofcontinuous-valued functions,32, 33 one of which hasbeen applied to molecular docking34 of small mole-cules.</p><p>Generally speaking, TS can be designed to per-form the following task: minimize f (x), subject to x X, where f is a cost function, and X is a set of feasiblesolutions.</p><p>JOURNAL OF COMPUTATIONAL CHEMISTRY 149</p></li><li><p>MORALES ET AL.</p><p>Tabu search is an iterative process. It starts froman initial feasible solution and tries to reach a globalminimum by moving from one solution to another.To accomplish this, we must define a set M of simplemodifications that can be applied to a given solutionto move to another solution. These modifications arecalled moves. The notation x = m(x), m M indi-cates that m transforms x into x. This leads us tothe definition of neighborhood N(x), an ingredientcommon to most heuristic and algorithmic proce-dures for optimization. For each feasible solution x,the neighborhood N(x) is the set of all feasible solu-tions directly reachable from x by a simple move min M. At each step of the iterative process, we gener-ate a subset V with j elements, and we move from xto the best solution x in V, whether or not f (x) isbetter than f (x). If N(x) is not large, it is possible totake V = N(x). The method of examining the en-tire neighborhood N(x) is best for writing a paralleltabu code because it allows a confortable balancingof the work load between the processors. To reducethe sampling size of V one can take the first movethat improves the current solution (however, if thereis no move that improves the current solution, thenone has to examine all neighbors in V); in this wayone can speed up the search because the mean cal-culation time of a step is less than the one needed inthe previous method; however, the evaluation timeis not constant, and the steps taken are not as good.</p><p>Up to this point, the algorithm is close to a localimprovement technique, except that we may movefrom x to a worse solution x, and, thus, we may es-cape from any local minima in f . To prevent cycling,a queue called the tabu list T of length |T| = t is pro-vided. Its aim is to forbid moves between solutionsthat reinstate certain attributes of past solutions. Af-ter t iterations they are removed from the list andfree to be instated again. The tabu list is also calledshort-term memory, because it stores informationabout the t most recent moves.</p><p>In many TS implementations the short-termmemory is complemented with a long-term mem-ory, whose purpose is to diversify the search and tomove to unexplored regions; its function is usuallybased on a frequency criterion. Other rules of diver-sification and intensification have been proposed inthe literature to improve the search.35, 36</p><p>Unfortunately, the tabu list may forbid certain in-teresting moves, such as those that will lead to abetter solution than the best found so far. To cancelthe tabu status of a move when this move is judgeduseful an aspiration criterion is also introduced</p><p>Stopping rules must also be defined. In manycases a lower bound f of f is known in advance.</p><p>As soon as we are close enough or we have reachedthis bound we may interrupt the algorithm. In gen-eral f is not available with sufficient accuracy, as itis the case in this study; thus, the stop criterion ismet whether a fixed maximum number of iterationsis reached, or if a given maximum number of iter-ations have been performed without improving thebest solution obtained so far.</p><p>Adaptation of Tabu Search</p><p>The most important points in the implementa-tion of a general TS to our particular application are:the search space X, the cost function f , the neigh-borhood N(x), the method of choosing an initialsolution, the length of the tabu list T, the aspirationcriterion and the stop criterion.</p><p>Let k denote the number of amino acid residuesin the molecule. Let i, i, and i denote the di-hedral angles in the skeleton corresponding to theith amino acid residue, where i = 1, . . . , k, andlet 1, . . . ,m be the dihedral angles in the lat-eral chains. If denotes a set of integer anglevalues between 180 to +180, then any vectorx = (1, . . . , n) = (111, . . . ,kkk,1, . . . ,m) 3k+m determines a three-dimensional conforma-tion of the molecule, and n is the total number ofvariables (in this case it is equal to 3k + m). For theintensification process, vide infra, the set containedin are multiples of 0.5 degrees.</p><p>Now we define the space search as:</p><p>X = {x = (1, . . . , n) 3k+m |180 0 i 180+ 0 and1 i + i+1 1, i = 1, . . . , k</p><p>},</p><p>0 an...</p></li></ul>

Recommended

View more >