application of tabu search strategy for finding low energy structure of protein

Post on 04-Sep-2016

213 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Artificial Intelligence in Medicine (2005) 35, 135—145

http://www.intl.elsevierhealth.com/journals/aiim

Application of tabu search strategy for findinglow energy structure of protein

Jacek Błazewicz a,b, Piotr Łukasiak a,b, Maciej Miłostan a,*

a Institute of Computing Science, Poznan University of Technology, Piotrowo 3a, 60-965 Poznan, Polandb Institute of Bioorganic Chemistry, Polish Academy of Science, Noskowskiego 12, 61-704 Poznan, Poland

Received 18 November 2004; received in revised form 24 January 2005; accepted 22 February 2005

KEYWORDSTabu search;Meta-heuristic;Hydrophobic—hydrophilic latticemodel;Protein structureprediction

Summary

Objective: Understanding protein functionality would mean understanding the basicsof life. This functionality follows a three-dimensional structure of proteins. Unfortu-nately till now it is not possible to obtain these structures artificially. This articleoffers a survey on the use of meta-heuristic methods in context of simplifiedmodels ofprotein folding.Methods: Tabu search (TS) strategy is one of the most successful meta-heuristics thathas been applied for large number of optimization problems. In the paper, theapplication of TS for finding low energy conformations of proteins in a simplifiedlattice model has been proposed.Results: The algorithm has been extensively tested and the tests showed its goodperformance. It compares well with the other heuristic approaches.Conclusions: The approach presented is competitive as comparedwith other methodsand due to its low computation time can be used as a complementary tool for ananalysis of the three-dimensional protein structures.# 2005 Elsevier B.V. All rights reserved.

1. Introduction

Understanding protein functionality would meanunderstanding the basics of life. This functionalityfollows a three-dimensional structure of proteins.Unfortunately till now it is not possible to obtain

* Corresponding author. Tel.: +48 61 665 2826;fax: +48 61 877 1525.

E-mail address: maciej.milostan@cs.put.poznan.pl(M. Miłostan).

0933-3657/$ — see front matter # 2005 Elsevier B.V. All rights resedoi:10.1016/j.artmed.2005.02.001

these structures artificially, which would revolutio-nize medicine, biology, chemistry, etc.

Anfinsen et al. [1,2] have shown reversibility of thefoldingprocess.Moreover, high resolutionX-ray struc-ture determination of several hundreds proteins haveconfirmed that the specific sequence of polypeptidechain has only a single, compact, biologically activefold in the native state [3]. The native conformationappears to be the one with significantly lower freeenergy than others. For some globular proteins thenative conformations areat the globalminimaof theiraccessible free energies.

rved.

136 J. Błazewicz et al.

Figure 1 Different types of conformations in HP-model(H: black, P: white): (A) a native conformation on the two-dimensional lattice, (B) a conformation on the three-dimensional lattice and (C) a conformation on thehoney-bee lattice.

In 1968 the biophysicist Levinthal formulated theso-called ‘‘Levinthal’s Paradox’’ [4] –—its conclusionleads to the thesis that folding process must besomehow driven through kinetic pathways. Therules for driving the protein through unstable con-formations remain not fully known. Thus, it is neces-sary to get more information about proteins’conformational space.

Due to the large complexity of the protein foldingproblem a large number of scientists switched tosimplified models. They use mathematical abstrac-tion of proteins for hiding many aspects of thefolding process and exaggerate the effects of theothers. Even for such models it is hard to formulatefast and realistic folding rules that lead protein tothe native state.

Themain idea of the experiment describedwas tocheck the usability of tabu search meta-heuristicstrategy [5] for finding low energy structures whilewalking efficiently through conformational space. Ithas to be clearly stated, that the aim of the methodwas not modeling the true folding pathways duringthe search process, but to create strategy that canfind the large number of optimal and suboptimalstructures in the reasonably short time. One decidedto test meta-heuristic strategy due to several NP-hardness (or NP-completeness) results related toprotein folding in the simplifiedmodels [6—8]. (Basi-cally, it means that the problem considered cannotbe solved optimally within reasonable time, i.e.polynomial time.)

For testing purposes one of the most successfuland well-studied simplified models–—the HP-model(H: hydrophobic amino acid, P: polar) proposed byDill [9] was used. The goal of this model is toinvestigate the way hydrophobic interactions influ-ence protein folds, whether delocalized ‘‘solva-tion’’ code is essential or not. HP-model wasproposed on the basis of observations that mostof hydrophobic amino acids are buried in the coreof the protein, and moreover, there are very fewconformations of the full chain that can bury non-polar amino acids to the greatest possible degree[9—11].

In the basic HP-model each amino acid is repre-sented as a bead (black bead: hydrophobic, white:polar) and the connecting bonds are represented aslines. The whole conformation is embedded in thetwo-dimensional (or more) lattice. The backgroundlattice simply divides the space into monomer-sizedunits. A lattice site may be either empty or filledwith one bead. The bond angles have only a fewdiscrete values, dictated by the structure of thelattice [10]. For each conformation one can com-pute the value of an energy function, which modelsfree energies of protein folds. One of the simplest

forms of the energy function counts each one of HH-contacts and multiplies it by a constant lower thanzero. Two amino acids create HH-contact if they aretopological neighbors and they are not connected bya bond (see Fig. 1).

The presented algorithm differs from the oneproposed by Lesh et al. [12] mainly in the definitionof moves that transform one solution into another. Itwas proposed independently in the same time andfirstly presented during the poster session at thesame conference [13]. The wider spectra ofsequence lengths and lattice types have been con-sidered than those presented by Pardalos et al. in1995 [14]. An overview of the other methods appliedin HP-model can be found in [15].

Extensive computational tests showed very goodperformance of the proposed algorithm for the HP-model in two- and three-dimensional space.

An organization of the paper is as follows: thebasics of HP-model are recalled in Section 2. Section3 presents and defines the main building blocks ofthe tabu search approach. Discussion of the resultsof the computational experiments is in Section 4.Conclusions are given in Section 5.

2. Problem definition

The native conformation of the protein is the onewith significantly lower energy than the others. Thusthe problem of finding the native protein structurecan be defined as the energy function minimizationproblem.

Application of tabu search strategy for finding low energy structure of protein 137

2.1. Two-dimensional HP-model

In the two-dimensional HP-model one can define theminimization problem as follows:min~a

Eð~s;~aÞ; (1)

where

Eð~s;~aÞ ¼ j � HHcð~s;~aÞ; (2)

where ~s is a sequence of amino acids containing nelements; si ¼ 1, if amino acid on the i th position inthe sequence is hydrophobic; si ¼ 0, if amino acid onthe i th position is polar. ~a is a vector of (n� 2)angles defined by consecutive triples of amino acidsin the sequence. HHc is a function that counts anumber of neighbors between amino acids, that arenot neighbors in sequence, but they are neighborson lattice (they are topological neighbors). j is aconstant lower than zero, that defines an influenceratio of hydrophobic contacts on the value of con-formational free energy; in most cases one canassume, that this ratio is equal to �1 (j ¼ �1).

Additionally, one has to define distances betweenlattice nodes (the size of lattice sites) on whichamino acids are placed–—here it is assumed thatthey are equal to 1. It is obvious, that one cannotplace two amino acids in the same place of space, soan Euclidean distance between two amino acids(disti; jð~aÞ) for a valid conformation fulfills the fol-lowing condition:

8 ði 6¼ jÞfðdisti; jð~aÞ> 0Þg;

where i; j2f1; 2; . . . ; ng:(3)

Moreover, the distance between consecutive pairsof amino acids is equal to the size of the lattice site–—here it can be assumed that it is equal to 1.

2.2. Three-dimensional HP-model

A formal definition of the three-dimensional HP-model is a little bit more complicated than forthe two-dimensional model. A number of possiblevalues of angles among consecutive triples of aminoacids in sequence is quite small and equal to 5–—eachnode of the three-dimensional (cubic) lattice has sixother nodes in its neighborhood. For defining theseangles one has to give an Euclidean plane, by spe-cifying its normal vector (the normal vector to asurface, often simply called the ‘‘normal’’, is avector perpendicular to it) and one of its points,and define angles among triples of amino acidsaccording to that plane.

A problem definition for the three-dimensionalmodel is similar to definitions that have beenalready given for two-dimensional models. Themaindifference is the way of computing a position of each

amino acid in the space. The considered problemcan be defined as follows:

min~a

Eð~s;~aÞ; (4)

Eð~s;~aÞ ¼ j � HHcð~s;~aÞ; (5)

where ~s is a sequence of n amino acids; si ¼ 1, ifamino acid on the i th position in the sequence ishydrophobic; si ¼ 0, in the opposite case. ~a is avector of (n� 2) angles defined by consecutivetriples of amino acids in sequence; ai 2fð0�; 90�Þ; ð0�;�90�Þ; ð90�; 0�Þ; ð�90�; 0�Þ; ð0�; 0�Þg.A position of the i th amino acid depends from i� 2elements of angle vector ~a. Each element of ~a is apair of angles. The first angle is the value of angleamong three consecutive amino acids on the baseplane (OXY), and the second one is the value of angleof three consecutive amino acids in the OYZ plane.All angles are defined relatively, so it is necessary totransform (rotate) coordinates system or thealready structured part of the conformation, inorder to compute the positions of consecutive aminoacids. The base plane for the i th amino acid isdescribed by a normal vector and the point thatcorresponds to the position of the i� 1 amino acid.The direction vector (~di�1 ¼ ½ðxi � xi�1Þ; ðyi � yi�1Þ;ðzi � zi�1Þ�) defines the direction of OX-axis on thebase plane; the normal vector defines the OZ-direc-tion; orientation of the coordinate system is con-sistent with the right hand rule. According to Euler’srotation theorem, any rotation may be describedusing three angles ðe1; e2; e3Þ, where the first rotationis by an angle about the z-axis, the second is by anangle about the x-axis, and the third is by an angleabout the z-axis (again). HHc is a function that countsa number of neighbors between amino acids, that arenot neighbors in sequence, but they are neighbors onlattice (they are topological neighbors). j is a con-stant lower than zero; it defines an influence ratio ofhydrophobic contacts on the value of the conforma-tional free energy; inmost cases one can assume, thatthis ratio is equal to �1 (j ¼ �1).

For the HP-model with the three-dimensionalcubic lattice one has to make the same assumptionsthat were made for the two-dimensional HP-model:distances between lattice nodes (the size of latticesites) are equal to 1, an Euclidean distance betweentwo amino acids (disti; jð~aÞ) for a valid conformationfulfills Eq. (3) and the distance between consecutivepairs of amino acids is equal to 1.

3. The algorithm

The proposed algorithm is based on tabu search(TS) strategy, which has been proposed by Glover

138 J. Błazewicz et al.

[16—18,5]. One has to define some main elements ofthis strategy to adapt it for the considered problem.In the following paragraphs one can find the frame-work of the algorithm and detailed definitions ofeach elements of the strategy. The key idea of usingTS strategy was to check whether such method canimprove the performance of existing methods.

3.1. The tabu search general framework

The tabu search (TS) begins by marching to a localminimum. To avoid retracing the steps used, themethod records recent moves in one or more tabulists. The original intent of the list was not toprevent a previous move from being repeated,but to ensure that it was not reversed. Normally,TS strategy chooses new better solution if it existsand is not tabu, but if such a choice is impossiblethen a worse solution can be chosen.

If a move is on the tabu list but it leads to a newsolution with improving optimal value then such amove can be done assuming that an additional con-dition, so-called aspiration level, is met. TS containstwo others important elements: an initializationprocedure and a diversification (or intensification)procedure. The calculations starts with a generationof a starting solution, using some problem domainknowledge. It is also necessary to define the neigh-borhood: solution x is a neighbor of another solutiony, if there exists one move that transforms y into x.

Short description of the TS strategy is as follows:

Algorithm 1 (Tabu search [16,5]).beginInitialize(x, tabu list TL, moves list RL, aspiration levelAðD; kÞ);// x–—the starting solution in the first iteration and currentsolution in all other iterationsxbest :¼ x;k :¼ 1;Compute tabu list length lk for iteration k;RL :¼ ?;TL :¼ ?;a ¼ 1;repeatrepeatGenerate neighbor solution ywhich is in the neighborhoodof current solution x, y 2NðxÞ;D :¼ gðxÞ � gðyÞ;Compute values of aspiration levels AðD; kÞ;untilAðD; kÞ<a or D ¼ max y =2 to forbiddenfgðxÞ � gðyÞg;Update moves list RL, e.g. RL :¼ RL[fsomefattributes of yg;TL :¼ flast lk moves generated; fromRL; that areforbiddeng;ifAðD; kÞ<a thena :¼ AðD; kÞ;x :¼ y;if gðyÞ< gðxbestÞ then xbest :¼ y;k :¼ kþ 1;

until some stop criteria fulfilled;end;

3.2. Tabu search for the three-dimensional HP-model

In the following paragraphs one can find definitionsof main elements of the strategy in question like:neighborhood, move, stop condition, tabu list,aspiration levels. Detailed description of the algo-rithm for two-dimensional HP-model with rectangu-lar lattice can be found in [19].

Move r transforms each solution x from the set ofall solutions V into another solution x0 2V: x 7! rx0.

Several variants of the move are defined:

� V

ariant 1. Move is defined as a change of a singleelement (a pair of angles) from vector ~a in theconformation of the protein sequence s to one ofthe nearest values.

� V

ariant 2. Move is defined as a change of a singleelement from vector~a in the conformation of theprotein sequence s to one of the valid values.

� V

ariant 3. Move is defined as a change of one ortwo consecutive elements from vector~a, describ-ing the protein conformation, to the valid values.

� V

ariant 4. Move is defined as a change of one, twoor three consecutive elements from vector ~a tothe valid values.

The proposed set of moves guarantees fast searchamong different conformations of proteins, becauseeach move might have wide ranging effects.Changes of angles occur only on consecutive posi-tions, and in that sense moves are local.

Set of solutions NðxÞ is called the neighborhood iffor each x0 2NðxÞ move r such as x 7! rx0 exists.

The tabu list is a short time memory (cyclic list)that contains limited number of forbidden moves. Ifthe list is full then adding new move removes theoldest one.

The searching procedure can make a transitionfrom solution x into x0 bymaking the forbiddenmove(from tabu list), if two conditions are simultaneouslyfulfilled:

� t

he value of energy E (defined by (2)) of solutionx0is lower than energy Emin of the current bestsolution,

� t

here is no solution x00 2NðxÞ, such that mover =2Rz (Rz is a set of forbidden and not feasiblemoves), x 7! rx00 and energy Ex00 � E.

The algorithm executes the searching procedurelruns times (each execution is called a run), each

Application of tabu search strategy for finding low energy structure of protein 139

time starting from a different solution (e.g. the bestsolution in previous run) and using the same ordifferent variants of moves. Each run ends if oneof the stop conditions is fulfilled:

� a given number lit of iterations has been achieved,

� a given lower bound of energy Emin has been

reached,

� a computed lower bound of energy Emin has been

reached–—computed by the sequence analysis,

� a ll possible moves are forbidden and aspiration

criteria for any of them are not fulfilled,

� a given number of iterations litmax after reaching

the upper bound, have been made.

A lower bound of energy is defined as suchvalue, that the optimal solution has the valueequal or greater than the latter. In fact, it isimpossible to compute optimal energy only byanalyzing an amino acid sequence without confor-mations. In most cases the lower bound is under-estimated (note, that the considered problem is aminimization one). It is lower than the energy ofthe optimal conformation. One can compute thelower bound on the basis of the observation, thatin the optimal conformations, the shape of thecore is the most compact one.

Some conformational motifs can be found quiteoften in optimal conformations, so the idea ofcreating the starting solutions is to use this knowl-edge to create some desirable conformational frag-ments at the beginning of the search. During thesearch these conformational fragments can be chan-ged. Such fragments can be determined by statis-tical analysis of optimal conformations.

The neighborhood NðxÞ of given solution x canconsist of many solutions with equally good ener-gies. In such a case one has to decide which one tochoose for the next step. The strategy of the choiceconsists of choosing two elements:

� t

urning point(s)–—an element of angle vector ~athat should be changed,

� a

ngle(s) value(s)–—the value of its change.

Turning points can be chosen randomly, or thenext element from the vector of the angles can bepreferred. Additionally, the choice of a particularangle value can be related to the frequency of itsoccurrence on a given position. One can also searchfor the particular motifs in the sequence, wherechanges have any sense, e.g. one can assume, thatthe most preferred turning points are triplets hph inthe sequence. In fact, the choice of a particularmethod or a motif is greatly dependent on theproblem instance.

The aim of the diversification procedure is tochange the focus of the searching procedure froma current fragment of the conformational space toanother, probably not yet explored. One of thediversification methods can be rewriting ~s in thereverse order, without making any changes to thevector ~a of the angles.

Solution space can be easily constrained on thebasis of the observation that each conformationwith non-zero angle vector ~a has a mirror image.Additionally, one can apply some simple conditionsto constrain the solution space more efficiently–—forexample one can add definitions of some forbiddencombinations of the angle values (e.g. values ofthree consecutive angles cannot be equal to 90�)in the aim not to compute values of energy forunrealistic conformations.

The algorithm that incorporate all elementsdefined above can be found below (see Algorithm 2).

Algorithm 2 (Tabu search algorithm for three-dimensional HP-model).

� S

tep 1. Initialization(tabu list LT ¼ ?, a numberof iterations in one run lit, a number of runs Lruns,bounds Emin ; Emax , a number of iterations in onerun litmax after reaching Emax )

Lruns :¼ Lruns � 1:

Step 2. Compute bounds: Emax ; Emin , if not expli-citly given.

� S

tep 3. Generate starting solution x. � S tep 4. Set x as the best solution xmin found so far. � S tep 5. Generate set Nbest �NðxÞ of the best

solutions in neighborhood NðxÞ, such thatx 7! rx0 2Nbest, and r =2Rz, and @ r =2 Rz ;x0 2Nbestðx 7! rx00Þ ^ ðEðx00Þ< Eðx0ÞÞ.

� S

tep 6. Generate set of solutions Naspir, such thatx 7! rx0 2Naspir and r 2Rz and forallx0 2Naspir@ x 2Nbest

: EðxÞ � Eðx0Þ.

� S tep 7. If ðNaspir 6¼?Þ, then choose solution x0 from

set Naspir–—randomly or using the strategy of thechoice, set xbest :¼ x0 and go further to Step 9.

� S

tep 8. If ðNbest ¼ ?Þ, then go to Step 11, in othercase choose solution x0 from set Nbest–—randomlyor using choice strategy, set xbest :¼ x0.

� S

tep 9. Update tabu list: remove the first elementfrom list if length of tabu list is exceeded, putelements f½ai; . . . ; aiþm�; ig at the end of the list,where m is dependent on the chosen variant ofmoves; vector ½ai; . . . ; aiþm� is a vector of modifiedelements in ~a (by move r : x 7! rxbest) from solu-tion x ¼ f~a;~sg.

� S

tep 10. Update values of variables:x :¼ xbest;If ðEðxbestÞ � Eðxmin ÞÞ, then xmin :¼ x.

140 J. Błazewicz et al.

Table 1 Test sequences for the three-dimensional HP-model from [15]

Number Length Sequence Eopt E1000TS

1 48 HPHHPPHHHH PHHHPPHHPP HPHHHPHPHH PPHHPPPHPP PPPPPPHH �32 �222 48 HHHHPHHPHH HHHPPHPPHH PPHPPPPPPH PPHPPPHPPH HPPHHHPH �34 �203 48 PHPHHPHHHH HHPPHPHPPH PHHPHPHPPP HPPHHPPHHP PHPHPPHP �34 �204 48 PHPHHPPHPH HHPPHHPHHP PPHHHHHPPH PHHPHPHPPP PHPPHPHP �33 �215 48 PPHPPPHPHH HHPPHHHHPH HPHHHPPHPH PHPPHPPPPP PHHPHHPH �32 �226 48 HHHPPPHHPH PHHPHHPHHP HPPPPPPPHP HPPHPPPHPP HHHHHHPH �32 �227 48 PHPPPPHPHH HPHPHHHHPH HPHHPPPHPH PPPHHHPPHH PPHHPPPH �32 �208 48 PHHPHHHPHH HHPPHHHPPP PPPHPHHPPH HPHPPPHHPH PHPHHPPP �31 �209 48 PHPHPPPPHP HPHPPHPHHH HHHPPHHHPH PPHPHHPPHP HHHPPPPH �34 �21

10 48 PHHPPPPPPH HPPPHHHPHP PHPHHPPHPP HPPHHPPHHH HHHHPPHH �33 �20

H: hydrophobic, P: polar. E1000TS is the energy found by tabu search after two runs and 1000 iterations in each run.

Fi

Fi

If ðEðxbestÞ � Emin Þ, then return as a resultsolution xbest and end the algorithm.

If ðEðxbestÞ � Emax Þ ^ ðitmax ¼ 0Þ, thenitmax :¼ a number of current iterations.

� S

tep 11. Checking the stop condition (see stopcondition in 3.2):

If (any of stop conditions is fulfilled andLruns � 0), then return as a result solution xmin ,and end the algorithm.

If (stop condition is not fulfilled), then return toStep 5,

Generate new starting solution xon the basis ofxmin (Diversification) Initialize (tabu list LT ¼ ?,number of iterations lit; Lruns � 1). Return toStep 5.

gure 2 A conformation for sequence no. 1; E ¼ �22.

gure 3 A conformation for sequence no. 2; E ¼ �20

The main difference between two-dimensional[19] and three-dimensional models lies in a numberof degrees of freedom that sequence of amino acidscan have on both types of lattices. In case of thethree-dimensional model with cubic lattice thenumber of degrees of freedom is equal to (n� 2)multiplied by 5, where n is the number of aminoacids in the sequence. Each backbone angle canhave one of five possible values. In other words,each amino acid placed in the three-dimensionalcubic lattice can have at most five H—H contactswith other amino acids. In case of two-dimensionalmodel with rectangular lattice a domain of possibleangle values, as well as the maximal number of H—Hcontact for one amino acid, is restricted to three.

Figure 4 A conformation for sequence no. 3; E ¼ �20.

Figure 5 A conformation for sequence no. 4; E ¼ �21.

Application of tabu search strategy for finding low energy structure of protein 141

Figure 6 A conformation for sequence no. 5; E � 22.

Figure 7 A conformation for sequence no. 6; E ¼ �22.

Figure 9 A conformation for sequence no. 8; E ¼ �20.

Figure 10 A conformation for sequence no. 9; E ¼ �21.

Figure 11 A conformation for sequence no. 10;

4. Results of computationalexperiments and their discussion

The algorithm was tested for a wide range of para-meters on several benchmark sequences commonlyfound in literature [15,12,20,21] (Table 1). In thissection results three-dimensional HP-model are pre-sented. The results for two-dimensional HP-modelcan be found in [19]. The algorithm was implemen-

Figure 8 A conformation for sequence no. 7; E ¼ �20.

E ¼ �20.

Figure 12 A relation between the tabu list size and theenergy for the sequence no. 1; 1000 iterations per run.

142 J. Błazewicz et al.

Figure 13 A relation between the tabu list size and timefor sequence no. 1; 1000 iterations per run.

Figure 17 A relation between the tabu list size and timefor sequence no. 3; 1000 iterations per run.

Figure 14 A relation between the tabu list size and theenergy for the sequence no. 2; 1000 iterations per run.

Figure 18 A relation between the tabu list size and theenergy for the sequence no. 4; 1000 iterations per run.

Figure 15 A relation between the tabu list size and timefor sequence no. 2; 1000 iterations per run.

Figure 19 A relation between the tabu list size and timefor sequence no. 4; 1000 iterations per run.

Figure 16 A relation between the tabu list size and theenergy for sequence no. 3; 1000 iterations per run.

Figure 20 A relation between the tabu list size and theenergy for sequence no. 5; 1000 iterations per run.

Application of tabu search strategy for finding low energy structure of protein 143

Figure 21 A relation between the tabu list size and timefor sequence no. 5; 1000 iterations per run.

Figure 24 A relation between the tabu list size and theenergy for sequence no. 7; 1000 iterations per run.

Figure 25 A relation between the tabu list size and timefor sequence no. 7; 1000 iterations per run.

ted in C language. Most of the tests have been doneon PC with AMD Duron 700 MHz processor, underLinux OS.

4.1. Results for the three-dimensionalmodel

Benchmark sequences used in the experiment withthe three-dimensional model are presented inTable 1.

In Figs. 2—11 the tabu search (TS) results for thestrategy with the number of iterations in the stopconditions restricted to 1000, are presented. Thestart conformation was the straight one, the diver-

Figure 22 A relation between the tabu list size and theenergy for sequence no. 6; 1000 iterations per run.

Figure 23 A relation between the tabu list size and timefor sequence no. 6; 1000 iterations per run.

Figure 26 A relation between the tabu list size andenergy for sequence no. 8; 1000 iterations per run.

Figure 27 A relation between the tabu list size and timefor sequence no. 8; 1000 iterations per run.

144 J. Błazewicz et al.

Figure 28 A relation between the tabu list size andenergy for sequence no. 9; 1000 iterations per run.

Figure 30 A relation between the tabu list size andenergy for sequence no. 10; 1000 iterations per run.

sification procedure has used the latest solutionfound in the previous part.

Times of computations for 1000 iterations arebelow 30 s but the qualities of the constructedstructures are quite good (Figs. 2—11) but it is hardto compare the results with other methods com-monly found in literature (see Table 1). It is obviousthat for the three-dimensional model, a number ofiterations, that is necessary for finding the optimal

Figure 29 A relation between the tabu list size and timefor sequence no. 9; 1000 iterations per run.

Table 2 Performance of the TS method in comparison to Mmodel 48-mer proteins on three-dimensional cubic lattices

number EMC EHZ ECG Eopta ETS t

1 �30 �31 �32 �32 �22 12 �30 �32 �34 �34 �20 73 �31 �31 �34 �34 �20 84 �30 �30 �33 �33 �21 15 �30 �30 �32 �32 �22 16 �30 �29 �32 �32 �22 17 �31 �29 �32 �32 �20 28 �31 �29 �31 �32 �20 19 �30 �31 �33 �34 �21 1

10 �30 �33 �33 �33 �20 4

H: hydrophobic, P: polar.a Optimal energy found by CHCC method.b The algorithm performed on SUN SPARC1.c The algorithm performed on AMD Duron 700 MHz.

solution, has to be much larger then in case of thetwo-dimensional model [19] due to the increasedcomplexity of the problem. The usability of thediversification strategy has been proved by the highimprovement of the solutions obtained as a result ofits performing. A relation between tabu list sizes,time and energy is presented in Figs. 12—31. It isimportant to say that obtained structures have mostof the hydrophobic amino acids buried inside the

Figure 31 A relation between the tabu list size and timefor sequence no. 10; 1000 iterations per run.

C, HZ, CHCC and TS in finding low energy minima of HP-(based on [15])

HZb (min) tCG

b (min) tCHCCb (min) tTS

c (min)

:5� 104 9:4� 100 3:0� 101 1:4� 10�1

:1� 104 3:5� 101 2:3� 100 3:4� 10�1

:2� 104 6:2� 101 3:0� 101 2:0� 10�1

:6� 106 2:9� 101 7:1� 101 1:2� 10�1

:1� 105 1:2� 101 3:2� 101 3:8� 10�1

:8� 105 4:6� 102 8:0� 101 1:6� 10�1

:2� 104 6:4� 101 1:1� 101 3:4� 10�1

:4� 104 3:8� 101 5:3� 101 3:4� 10�1

:6� 104 2:6� 101 8:3� 100 2:0� 10�1

:1� 103 1:1� 100 4:8� 100 2:9� 10�1

Application of tabu search strategy for finding low energy structure of protein 145

molecules. Thus, the quality of these suboptimalstructures is good.

The quality of solutions for three-dimensionalspace (Table 2) obtained by the method presentedwas not as high as for two-dimensional model [19]due to the fact of increased complexity. Largenumber of degrees of freedom makes it impossibleto obtain perfect solutions in short time. In Figs. 2—11 it is shown that some interesting conformationshas been generated in very short time. Thus themethod is a good starting point for further investi-gation. Possibly, it can be improved by adding morerobust search schemes, and additional elementssuch like cycles detectionmechanisms. Additionally,solutions obtained by tabu search strategy can besuccessfully applied as a starting point for methodslike genetic algorithms or Monte Carlo.

5. Conclusions

Themethod proposed in the paper tries to follow thenature in the artificial environment and givesimpressive results for sequences containing up to100 amino acids. It is a variant of the tabu searchstrategy adapted for the considered problem. It usesthe problem domain knowledge, such as conforma-tional motifs. The proposed algorithm has a goodperformance and finds low energy conformationvalues for subsequences of protein chains thereforecompares well with the other heuristic approaches.

The condition that native conformation should bestable is not theonly one, theproteinhas tobeable tofind this conformation in a short time, starting from adenaturated state characterized by a random popu-lation of unfolded conformations. A folding pathwayis built into the structure by a natural selection. Theapproach presented is competitive as comparedwithother methods and due to its low computation timecanbeusedasa complementary tool for ananalysis ofthe three-dimensional protein structures.

References

[1] Anfinsen CB, Haber E, Sela M, White Jr FH. The kinetics offormation of native ribonuclease during oxidation of the

reduced polypeptide chain. Proc Natl Acad Sci USA1961;47:1309—14.

[2] Anfinsen CB. Principles that govern the folding of proteinchains. Science 1973;181:223—30.

[3] Branden C, Tooze J. Introduction to protein structure. 2nded. Garland Science Publishing; 1999. p. 89—120.

[4] Levinthal C. Are there pathways for protein folding? ChemPhys 1968;65:44—5.

[5] Glover F, Laguna M. Tabu search. Boston, USA: Kluwer Aca-demic Publishers; 1997. p. 1—357.

[6] Unger R, Moult J. Finding the lowest free energy conforma-tion of a protein is an NP-hard problem: proof and implica-tions. Bull Math Biol 1993;55(6):1183—98.

[7] Crescenzi P, Goldman D, Papadimitriou C, Piccolboni A,Yannakakis M. On the complexity of protein folding.In:Proceedings of the 1998 STOC. J Comput Biol 1998;5(3):409—22.

[8] Berger B, Leighton T. Protein folding in the hydrophobic—hydrophilic (HP) model is NP-complete. J Comput Biol1998;5(1):27—40.

[9] Dill KA. Theory for the folding and stability of globularproteins. Biochemistry 1985;24:1501—9.

[10] Dill KA, Bomberg S, Yue K, Fiebig KM, Yee DP, Thomas PD, etal. Principles of protein folding: a perspective from simpleexact models. Protein Sci 1995;4:561—602.

[11] Dill KA. Polymer principles and protein folding. Protein Sci1999;8:1166—80.

[12] Lesh N, Mitzenmacher M, Whitesides S. A complete andeffective move set for simplified protein folding. RECOMBProc 2003;188—95.

[13] Miłostan M, Łukasiak P, Dill KA, Błazewicz. A tabusearch strategy for finding low energy structures of proteinsin HP-model. Curr Comput Mol Biol RECOMB 2003;205—6.

[14] Pardalos PM, Liu X, Xue G. Protein conformation of alattice model using tabu search. J Global Opt 1997;11(1):55—68.

[15] Beutler TC, Dill KA. A fast conformational search strategy forfinding low energy structures of model proteins. Protein Sci1996;5:2037—43.

[16] Glover F. Tabu search, Part I. ORSA J Comput 1989;1:190—206.

[17] Glover F. Tabu search and adaptive memory programming–—advances, applications and challenges. In: Interfaces incomputer science an operations research. Kluwer AcademicPublishers; 1996. p. 1—75.

[18] Glover F, Laguna M, Tabu search. Modern heuristic techni-ques for combinatorial problems. Oxford: Blackwell Scien-tific Publishing; 1993. p. 70—141.

[19] Bł azewicz J, Dill KA, Łukasiak P, Miłostan M. A tabu searchstrategy for finding low energy structures of proteins in HP-model. Comput Meth Sci Technol 2004;10(1):7—19.

[20] Toma L, Toma S. Contact interactions method: a new algo-rithm for protein folding simulations. Protein Sci1996;5:147—53.

[21] Unger R, Moult J. Genetic algorithms for protein foldingsimulations. J Mol Biol 1993;231:75—81.

top related