# a parallel tabu search for conformational energy optimization of oligopeptides

Post on 06-Jun-2016

213 views

Embed Size (px)

TRANSCRIPT

A Parallel Tabu Search for ConformationalEnergy Optimization of Oligopeptides

L. B. MORALES,1 R. GARDUOJUREZ,2 J. M. AGUILARALVARADO,3

F. J. RIVEROSCASTRO31IIMAS-UNAM, Apdo. Postal 70221, 04510, Mxico D.F.2CCF-UNAM, Apdo. Postal 483, 62250 Cuernavaca, Morelos, Mxico3FC-UNAM, Circuito Exterior, Ciudad Universitaria, Coyoacan, 04510, Mxico D.F.

Received 8 June 1998; accepted 16 September 1999

ABSTRACT: We have developed and implemented a tabu search heuristic (TS)to determine the best energy minimum for oligopeptides. Our test molecule wasMet-enkephalin, a pentapetide that over the years has been used as a validationmodel for many global optimizers. The test potential energy function wasECEPP/3. Our tabu search implementation is based on assigning integer valuesto the variables to be optimized, and in facilitating the diversification andintensification of the search. The final output from the TS is treated with a localoptimizer, and our best result competes both in quality and CPU time with thosereported in the literature. The results indicate that TS is an efficient algorithm forconformational searches. We present a parallel TS version along withexperimental results that show that this algorithm allows significant increases inspeed. c 2000 John Wiley & Sons, Inc. J Comput Chem 21: 147156, 2000Keywords: protein structure prediction; heuristic algorithms; protein folding;conformational space search; tabu search

Correspondence to: R. GarduoJurez; e-mail: ramon@ce.fis.unam.mx

Contract/grant sponsor: PAPIIT-DGAPA-UNAM; contract/grant number: IN-102397

Contract/grant sponsor: CRAY-UNAM Research Fund; con-tract/grant number: SC-002195

Contract/grant sponsor: CONACyT; contract/grant number:25245-E

Journal of Computational Chemistry, Vol. 21, No. 2, 147156 (2000)c 2000 John Wiley & Sons, Inc. CCC 0192-8651 / 00 / 020147-10

MORALES ET AL.

Introduction

P eptides are short polymers made up of a fewto a few tens of amino acids. Many of thesehave meaningful roles in biochemistry and bio-physics. Some sequences of peptides have a cleartendency to form well-defined three-dimensionalstructures, that is, to fold. Peptides are also usefulas model systems for much larger peptide chainsknown as proteins. The naturally occurring three-dimensional structure of a protein, its tertiarystructure, is believed to be uniquely determined byits primary structure, the sequence of amino acidsof which the protein is composed. Anfisen1 in histhermodynamic hypothesis proposes that the na-tive state of a protein is the structure that minimizesthe free energy. By definition, such a state would beat the global minimum of free energy relative to allother states accessible on that time scale. Thus, theconformational search, or folding, can be posed asan optimization problem.

Conformational search of peptide molecules, toa first approximation, can be thought of as theproblem of finding the 3D molecular structure thatcorresponds to the lowest local minimum of anappropriate mathematical function describing thepotential energy of the system. Computer simula-tions are often used to carry out this task. A majorconcern in computer simulations is to obtain a setof low-energy conformations with biological signif-icance; that is, finding those conformations that arenear the thermodynamic native state.

Folding a protein from only a knowledge of itsamino acid sequence is a formidable task. Becauseit is computationally impossible to test all possibleconformations to determine the global minimum,it is necessary to develop methods that can landupon a global minimum without testing all con-formational possibilities. This is a challenging op-timization (minimization) task. In many cases thedetailed properties of the potential function to beminimized are not known. Even if the function isdifferentiable, one can often encounter nonconvexsurfaces, and the local properties of the function canbe different in the different search regions, i.e., thebasins can have different size or depth, the smooth-ness can vary, etc.

Many different force fields for proteins havebeen designed as a summation of a set of poten-tial energy contributions. Among the most usedones are: ECEPP,2 MM2,3 ECEPP/2,4 CHARMM,5

DISCOVER,6 AMBER,7 GROMOS87,8 MM3,9 andECEPP/3.10 Most of these have a large number of

local minima. In general, protein folding with anyforce field is a NP-hard problem,11 where the timeneeded to locate the lowest minimum grows ex-ponentially when the number of variables growslinearly. A major challenge in this type of globaloptimization problems is that there is no clear math-ematical basis for efficiently reaching the globalminimum, thus finding the latter in an accurate andspeedy way is of general interest.

To reduce the size of the problem one takes ad-vantage of the fact that under biological conditionssome internal motions of protein molecules occur ona time scale much smaller than others. Experimen-tally, the average values of covalent bond distancesand covalent bond angles are fairly constant, andlead to the assumption that conformational changesobserved in the dihedral angles could fully deter-mine the overall shape of the protein molecule.Thus, if one specifies the position of all atoms inthe protein molecule as a function of its internal co-ordinates, under the assumption of constant bondlengths and bond angles, the problem drastically re-duces the number of its degrees of freedom.

Although the size of the problem can be re-duced when the energy function is written in termsof torsional angles, it is known that in this formthe energy function is no longer partially separa-ble, meaning that it is no longer much less ex-pensive to reevaluate the energy if only a fewvariables change than if they all change. To over-come this effect, a number of workers have devisedinteresting stochastic and nonstochastic methods,which impose constrains and bias the search to-wards the region where the global minimum couldbe found. Among stochastic methods employedto predict oligopeptide 3D structures are MonteCarlo with minimization12a, 12b (MCM), simulatedannealing13 (SA), threshold accepting14 (TA), freeenergy Monte Carlo with minimization15 (FMCM),multicanonical ensemble16 (ME), conformationalspace annealing17 (CSA), and genetic algorithms18

(GA). Among nonstochastic methods we find mole-cular dynamics with minimization19 (MDM), dy-namic programming20 (DP), the diffusion equa-tion method21a (DEM), the mean-field technique22

(MFT), and a global optimization procedure knownas BB.23

In this article we take an approach to min-imize the ECEPP/310 energy function based ontabu search24 (TS), a stochastic optimizer developedto treat complex combinatorial optimization tasks.Tabu search has the advantage that only functionvalues are used, differentiability and continuity arenot required, and it is characterized by the use

148 VOL. 21, NO. 2

TABU SEARCH HEURISTIC

of memories during the search. In operation re-search literature, tabu search has proven to be betterthan simulated annealing, both in the CPU time re-quired and in the quality of the solutions found formany complex problems. Our test molecule is thatof Met-enkephalin, a pentapetide that has been usedas a validation model for many global optimizers,and because its lowest energy conformation for thepotential energy function ECEPP/3 is known.23 Wefirst present the problem we are dealing with in amathematical fashion, then we discuss the generalprinciple of the tabu search heuristic and explainhow to use tabu search for conformational search.Finally, we present our computational results andcompare our best results with those reported byother authors who have employed methods differ-ent from TS. We propose the use of a version of TS inparallel to improve the CPU time needed to find thelowest energy structure. Such a concurrent versionwas developed for parallel computing on a SGI ORI-GIN 2000 computer with close to ideal speed up.

The Problem

As indicated above, the conformation of a proteinwith a sequence of Nres amino acid residues in thepeptide chain can be described by a set of dihedralangles i, i, i, where i = 1, . . . , Nres on the back-bone, plus a set of dihedral angles ji , i = 1, . . . , Nres,j = 1, . . . , Ji, where Ji denotes the dihedral anglesof the side group on the ith residue. If one wishesto allow capping of the peptide, then one has to in-clude two more sets of dihedral angles. One couldbe defined as Nk , k = 1, . . . , KN for those dihedralangles on the amino end group, and the other couldbe defined as Ck , k = 1, . . . , KC for those dihedralangles on the carbonyl end group.

In this report the complete ECEPP/310 force fieldwas used. This force field is built upon the assump-tions that the bond lengths and angles are at theirequilibrium values, and that the resulting functionis in reality a conformational energy surface madeof a summation over interactions of types 14 andhigher. These interactions take into account elec-trostatic, nonbonded, hydrogen bond, and torsionalenergies, plus other empirical terms that take intoaccount a loop closing potential in the case thatthe peptide has intramolecular disulfide bonds, andfixed conformational energies for the propyl and hy-droxypropyl residues. A condensed description ofthe ECEPP/3 force field could be written as:

U = Uelec +Unonb +Uhb +Utor +Uloop +US-S

where

Uelec =

i

i 6= j

332.0qiqj/Drij

Unonb =

i

i 6= j

FA/r12ij C/r6ij

Uhb =

h

x

Ahx/r12hx Bhx/r10hx

Utor =

k

(U0/2.0)(1 cos nkk)

Uloop =

l

Bli= 3i= 1

(ril ri0)2

US-S =

s

As(r4s r40)2

All constants are estimated by fitting of experimen-tal data.10, 25

Given these definit

Recommended