lattice models of peptide aggregation: evaluation of conformational search algorithms

9
Lattice Models of Peptide Aggregation: Evaluation of Conformational Search Algorithms MARK T. OAKLEY, 1 JONATHAN M. GARIBALDI, 2 JONATHAN D. HIRST 1 1 School of Chemistry, University of Nottingham, University Park, Nottingham NG7 2RD, United Kingdom 2 School of Computer Science and Information Technology, University of Nottingham, Jubilee Campus, Wollaton Road, Nottingham NG8 1BB, United Kingdom Received 16 June 2005; Accepted 22 July 2005 DOI 10.1002/jcc.20306 Published online in Wiley InterScience (www.interscience.wiley.com). Abstract: We present a series of conformational search calculations on the aggregation of short peptide fragments that form fibrils similar to those seen in many protein mis-folding diseases. The proteins were represented by a face-centered cubic lattice model with the conformational energies calculated using the Miyazawa–Jernigan potential. The searches were performed using algorithms based on the Metropolis Monte Carlo method, including simulated annealing and replica exchange. We also present the results of searches using the tabu search method, an algorithm that has been used for many optimization problems, but has rarely been used in protein conformational searches. The replica exchange algorithm consistently found more stable structures then the other algorithms, and was particularly effective for the octamers and larger systems. © 2005 Wiley Periodicals, Inc. J Comput Chem 26: 1638 –1646, 2005 Key words: lattice models; peptide aggregation; conformational search algorithms Introduction Many soluble proteins can aggregate to form insoluble deposits, and the presence of these deposits is associated with Alzheimer’s disease, Parkinson’s disease, and Creutzfeldt–Jakob disease, amongt others. 1,2 Aggregation occurs via a mis-folded form of the protein, which can be formed directly after the protein is synthe- sized or by partial unfolding of the protein from its native state. One striking feature of protein aggregation is that, although the proteins involved have a wide range of structures in their native states, they all form aggregates with very similar structures. They take the form of amyloid fibrils, which typically have diameters of 7–10 nm and are several hundred nanometers long. X-ray diffrac- tion 3–5 and solid-state NMR 6,7 experiments show that the struc- tures mainly comprise -sheets with the chains running perpen- dicular to the fibril. In a recent X-ray diffraction study, the fibrils formed by a seven-residue peptide contained paired -sheets. 8 Small soluble aggregates may be more toxic than large fibrils. 9 –13 If this is the case, then the most biologically relevant species are also the most computationally accessible. Even if they are not the main cause of the diseases, small aggregates still play an important role in the nucleation of fibrils. However, there are no high-resolution structures of these smaller soluble aggregates, and a computational method for predicting these structures is highly desirable. Searching for the most stable conformation of a single protein chain is a computationally demanding problem, and one would expect conformational searches on aggregates containing several chains to be orders of magnitude more difficult. However, there are several short fragments (of four or more residues) that form aggregates with similar structures to those formed by large proteins, 14 –18 and conformational searches involving these sys- tems are feasible. There have been several all-atom molecular dynamics (MD) studies on the unfolding 19,20 and aggregation 21–26 of short polypeptides, but the high computational cost of these calculations limits both the size of the systems that can be studied and the sampling of conformational space. These studies generally show that parallel or antiparallel -sheet structures are stable enough to remain largely unchanged over the course of an MD simulation up to a few hundred nanoseconds, which is several orders of magni- tude shorter than the time taken for aggregates to form, which can Correspondence to: J. D. Hirst; e-mail: [email protected] Contract/grant sponsor: the EU Framework 6 Biopattern Network of Excellence Contract/grant sponsor: EPSRC for a Joint Research Equipment Initiative Grant; contract/grant number: GR/62052 © 2005 Wiley Periodicals, Inc.

Upload: mark-t-oakley

Post on 11-Jun-2016

213 views

Category:

Documents


1 download

TRANSCRIPT

Lattice Models of Peptide Aggregation: Evaluation ofConformational Search Algorithms

MARK T. OAKLEY,1 JONATHAN M. GARIBALDI,2 JONATHAN D. HIRST1

1School of Chemistry, University of Nottingham, University Park, Nottingham NG7 2RD,United Kingdom

2School of Computer Science and Information Technology, University of Nottingham, JubileeCampus, Wollaton Road, Nottingham NG8 1BB, United Kingdom

Received 16 June 2005; Accepted 22 July 2005DOI 10.1002/jcc.20306

Published online in Wiley InterScience (www.interscience.wiley.com).

Abstract: We present a series of conformational search calculations on the aggregation of short peptide fragments thatform fibrils similar to those seen in many protein mis-folding diseases. The proteins were represented by a face-centeredcubic lattice model with the conformational energies calculated using the Miyazawa–Jernigan potential. The searcheswere performed using algorithms based on the Metropolis Monte Carlo method, including simulated annealing andreplica exchange. We also present the results of searches using the tabu search method, an algorithm that has been usedfor many optimization problems, but has rarely been used in protein conformational searches. The replica exchangealgorithm consistently found more stable structures then the other algorithms, and was particularly effective for theoctamers and larger systems.

© 2005 Wiley Periodicals, Inc. J Comput Chem 26: 1638–1646, 2005

Key words: lattice models; peptide aggregation; conformational search algorithms

Introduction

Many soluble proteins can aggregate to form insoluble deposits,and the presence of these deposits is associated with Alzheimer’sdisease, Parkinson’s disease, and Creutzfeldt–Jakob disease,amongt others.1,2 Aggregation occurs via a mis-folded form of theprotein, which can be formed directly after the protein is synthe-sized or by partial unfolding of the protein from its native state.One striking feature of protein aggregation is that, although theproteins involved have a wide range of structures in their nativestates, they all form aggregates with very similar structures. Theytake the form of amyloid fibrils, which typically have diameters of7–10 nm and are several hundred nanometers long. X-ray diffrac-tion3–5 and solid-state NMR6,7 experiments show that the struc-tures mainly comprise �-sheets with the chains running perpen-dicular to the fibril. In a recent X-ray diffraction study, the fibrilsformed by a seven-residue peptide contained paired �-sheets.8

Small soluble aggregates may be more toxic than largefibrils.9–13 If this is the case, then the most biologically relevantspecies are also the most computationally accessible. Even if theyare not the main cause of the diseases, small aggregates still playan important role in the nucleation of fibrils. However, there are nohigh-resolution structures of these smaller soluble aggregates, anda computational method for predicting these structures is highly

desirable. Searching for the most stable conformation of a singleprotein chain is a computationally demanding problem, and onewould expect conformational searches on aggregates containingseveral chains to be orders of magnitude more difficult. However,there are several short fragments (of four or more residues) thatform aggregates with similar structures to those formed by largeproteins,14–18 and conformational searches involving these sys-tems are feasible.

There have been several all-atom molecular dynamics (MD)studies on the unfolding19,20 and aggregation21–26 of shortpolypeptides, but the high computational cost of these calculationslimits both the size of the systems that can be studied and thesampling of conformational space. These studies generally showthat parallel or antiparallel �-sheet structures are stable enough toremain largely unchanged over the course of an MD simulation upto a few hundred nanoseconds, which is several orders of magni-tude shorter than the time taken for aggregates to form, which can

Correspondence to: J. D. Hirst; e-mail: [email protected]

Contract/grant sponsor: the EU Framework 6 Biopattern Network ofExcellence

Contract/grant sponsor: EPSRC for a Joint Research EquipmentInitiative Grant; contract/grant number: GR/62052

© 2005 Wiley Periodicals, Inc.

be seconds to hours, depending on the peptide and the condi-tions.27,28 Clearly, a simpler model is needed if larger systems areto be studied.

The complexity of these systems can be reduced using unitedatom representations, where a number of atoms are grouped to-gether. Aggregation has been studied using models where eachresidue is represented by two to four beads.29–31 These studiesused discrete molecular dynamics algorithms, which can accesslonger time scales than conventional MD simulations. The systemcan be simplified further by representing each residue by a singlebead centered on the �-carbon atom. A lattice-based model re-duces the conformational space even further. In these models, eachresidue is constrained to lie at a point on a regular lattice and isseparated from the adjacent residues in the chain by a single stepalong one of the lattice directions. Aggregation has previouslybeen modelled using the square32–35 and cubic36–39 lattices.

The cubic lattice is rather coarse. In this study, we explore thedevelopment of minimalist models of aggregation using the finer12-coordinate face-centered cubic (FCC) lattice. This lattice can fit the�-carbon atoms of a protein with a root mean square error of 1.78 Å,in comparison to 2.84 Å for the cubic lattice.40 Also, the packing ofresidues in protein crystal structures closely resembles the FCC ar-rangement.41,42 The FCC lattice can be superimposed on to a cubiclattice with the moves available being the 12 cyclic permutations of(�1, �1, 0). The �-carbon atoms of neighboring peptide residues arealways separated by 3.8 Å so each lattice unit is equivalent to 2.7 Å.Even though the FCC lattice provides a good model of proteins, it hasonly been used rarely in studies of protein folding43–45 and has notpreviously been used to study aggregation.

A fragment of the Alzheimer’s amyloid-� protein, KLVF-FAE, has been the subject of several computational investiga-tions. All atom MD simulations23 of large aggregates showedthat antiparallel �-sheet structures are more stable than parallel�-sheets, but these simulations only considered �-sheet initialstructures and only lasted for a few nanoseconds. Other MDsimulations46 of the trimer followed the conversion of a disor-dered aggregate through an �-helical intermediate to a �-sheet.Such a transition does not normally occur on the time scalesaccessible by MD simulations so a small force holding theaggregate together was added to accelerate the simulation.Conformational searches of the dimer47 and trimer48 using asimplified off-lattice model also found �-sheets to be the moststable structures and also provided insight into the aggregationmechanism. Conformational searches49 of the trimer and hex-amer also gave antiparallel �-sheets as the most stable struc-tures. The model used for these searches did not include Cou-lomb interactions between charged residues, so the preferencefor antiparallel structures is not caused by the interactionsbetween the oppositely charged K and E terminal residues.

In conformational investigations of aggregation, there are twomain problems to be solved: selection of a potential energy func-tion that provides a good model of real proteins and selection of analgorithm capable of searching through the accessible conforma-tional space for the global energy minimum. In this article we haveused a simple potential based on pairwise contact energies andhave focused on the selection of a search algorithm.

Computational Methods

Some previous studies34,35 have used energy functions based onthe hydrophobic-polar (HP) model, in which all residues are as-signed as either hydrophobic or polar, and pairwise contact ener-gies are assigned to HH, HP, and PP contacts.50 This modelreproduces some of the features of protein folding, but is a severesimplification. The HCPC model expands the HP model by addingpositively and negatively charged residues,32,33 but is still not verydetailed. Another widely used contact potential is the Go�51 model,which makes all contacts present in the native state of the proteinfavorable and any other contacts unfavorable. This model repro-duces some of the features of proteins such as two-state folding.52

Aggregation has been studied using Go� model proteins restrictedto a cubic lattice.36,37 The high sequence-specificity of the inter-actions causes the aggregates formed by Go� model proteins to beless stable than the aggregates formed by other models. The Go�model can only be used when the native structure of the protein isknown and is therefore inappropriate for our study.

Two previous studies38,39 have looked at the aggregation ofproteins restricted to cubic lattices using the Miyazawa–Jernigan(MJ) potential.53 This potential includes pairwise interaction en-ergies between all 20 naturally occurring amino acids and anadditional repulsive term for any residues with a large number ofcontacts. All of the energy terms are derived from the frequency ofoccurrence of contacts in 1168 protein crystal structures and pro-vide a relatively detailed model of proteins. All of the potentialsdescribed here are based only on the number of contacts and do nottake into account the relative orientations of the residues. Despitethis limitation, the MJ potential is one of the best potentialsinvolving only two-body interactions that is available and hencewas used in this study.

The MJ potential was designed for off-lattice structures, so itcontains some distance-dependent terms. In this study, we considerany residues in neighboring lattice sites to be in contact and anythat are further apart to be noninteracting; so the distance-depen-dent terms are ignored. Each residue has an attractive energy, Ep

c,and a repulsive energy, Ep

r , and the total energy of the polypeptide,E, is the sum of these over all residues:

E � �p

�Epc � Ep

r�. (1)

The attractive term for each residue is the sum of all the pairwiseinteractions with the neighboring residues:

Epc �

1

2 � eijnijc. (2)

The pairwise contact energies, eij, are taken from the literature53

and nijc is the number of contacts of this type. The repulsive part of

the MJ potential contains two parts: a hard-core potential term toprevent residues overlapping and a packing density term to preventoverpacking. The hard-core term was removed in our study, be-cause the constraints of the lattice impose a minimum residue–residue distance. This leaves the overpacking energy, which isadded to the energy for any residue with a number of contacts, np

c,

Lattice Models of Peptide Aggregation 1639

greater than qip, with the values of qip for each type of residuetaken from the literature:53

Epr � �qip

npc�Ep

c � ln�N(ip, npc) � �

N(ip, qip) � ��. (3)

The first term removes the extra attractive energy gained fromhaving more than qip contacts. The values of N(ip, np

c) are takenfrom the literature53 and the values of N(ip,qip) are generated byinterpolation of these values. A small positive number, � � 10�6,is added to prevent divergence of the logarithm. Previous stud-ies38,39 used an older version of the MJ potential54 that did notinclude the repulsive overpacking term, but the coordination num-ber of the cubic lattice is small enough that the number of contactsrequired to add this term could never be reached. It is known thatthe strength of the attraction between hydrophobic residues is toohigh.38,54 Leonhard et al. have dealt with this problem by addinga solvent–amino acid interaction energy,38 but this involves theintroduction of two adjustable parameters to describe the nature ofthe solvent and has not been included in our study.

Ignoring excluded volume, the maximum number of confor-mations accessible by a polymer chain on a lattice is (c � 1)(n�2),where c is the coordination number of the lattice and n is thenumber of residues. Thus, for a nine residue oligopeptide (thelongest monomer in this study), the number of conformations thatmust be searched is 117 � 1.9 � 10.7 This is small enough to

enumerate exhaustively, using methods such as a backtrackingalgorithm.55 If we assume that the same expression gives a lowerbound for aggregates, the number of conformations of the dimer ofa nine residue oligopeptide is 4.6 � 10.16 This is too many forexhaustive enumeration and a search algorithm must be used tofind the most stable structures.

Methods based on the Metropolis Monte Carlo (MC) proce-dure56 have been widely used in studies of protein folding.57 Inthis method, a candidate structure is generated by random modi-fication of a previous structure. If the new structure is more stable,then it is accepted as a new starting structure; if it is less stable,then it is accepted with a probability, p, given by eq. (4), where theenergy difference between the candidate structure and the currentstructure is �E. Thus, the search tends to move towards morestable structures, but is capable of overcoming energy barriers.

p � e��E/RT. (4)

The temperature, T, is an adjustable parameter that can be usedto modify the probability of accepting a structure. The energy andtemperature scales used in this study are arbitrary so the value ofR is set to 1. At low temperatures, the search cannot surmount thebarriers between energy minima and becomes stuck in a smallregion of conformational space. At high temperatures, the search iscapable of exploring all of the conformational space, but the globalpotential energy minimum is no longer the global free energy

Table 1. Parameters Used for Each Search Algorithm.

Method Parameter Value

Monte Carlo Temperature 1.50Simulated annealing Maximum temperature 2.00

Minimum temperature 1.00Replica exchange Number of replicas 5

Temperatures 1.00, 1.25, 1.50, 1.75, 2.00Frequency of exchange 1000 steps

TS Tabu list length 10,000MC/TS Temperature 1.50

Tabu list length 10

Table 2. Energies (in RT Units) of the Most Stable Structure of Each Aggregate and theMethod(s) That Found Them.

Number of chains KLVFFAE LMVGGVVIA AGAAAAGA

2 �132.0 (all) �191.8 (MC) �84.4 (all)3 �232.9 (all but TS) �318.8 (MC/TS) �142.7 (MC, SA, RE)4 �329.5 (MC/TS) �446.7 (SA) �206.1 (RE)5 �425.9 (RE) �570.5 (SA) �261.1 (TS)6 �522.2 (MC) �701.9 (SA) �318.5 (SA)7 �615.7 (SA) �824.0 (MC) �379.4 (TS)8 �713.1 (RE) �951.9 (RE) �438.2 (RE)9 �808.9 (RE) �1080.8 (MC) �504.0 (RE)

10 �899.2 (RE) �1208.2 (RE) �561.0 (RE, TS)

1640 Oakley, Garibaldi, and Hirst • Vol. 26, No. 15 • Journal of Computational Chemistry

minimum. In studies of aggregation a high temperature can alsolead to dissociation of the aggregate. This is a desirable feature instudies of the thermodynamics and mechanism of aggregation, butit leads to a large increase in the size of the search space, andtherefore slows the search for the global minimum. The tempera-ture of the MC run must be a compromise between these twoextremes.

Several modifications of the MC method combine the goodoptimization of local minima in low-temperature searches and thecoverage of conformational space at high temperature. Simulatedannealing (SA) optimizations58 start at a high temperature, whichis then slowly reduced as the search proceeds via a coolingschedule. Possible cooling schedules include linear (Tnew � Told �dT), exponential (Tnew � C � Told) and several more elaborateones. In this study, a simple linear cooling schedule was used. Inthe limiting case of an infinitely slow cooling schedule, SA isguaranteed to find the global energy minimum.

Replica exchange (RE) simulations59,60 involve a number ofsimultaneous MC simulations, each at a different temperature. Atregular intervals, the structures of replicas i and j are exchanged,with a probability given by eq. (5). Thus, the high temperatureprovide good coverage of conformational space and the low tem-perature runs provide good optimization of local minima. It isrelatively straightforward to implement RE to take advantage ofparallel processing, although this has not been necessary in thisstudy. Gront et al. used RE to study the conformations of longpolypeptides on the FCC lattice, and it performed significantlybetter than standard MC.43 RE has also been used to study theaggregation of polymers on a cubic lattice.36

p � e��1/RTj���1/RTi��Ei�Ej�. (5)

There are many search algorithms, such as the tabu search(TS),61 that are not based on MC. In a TS, all of the structuresaccessible by making one step from the starting structure aregenerated, and the most stable candidate is selected as the newstarting structure. The starting structure is then added to the tabulist, a list of structures that cannot be accepted. This forces thesearch away from structures that have already been visited intonew areas of conformational space. The tabu list cannot growunchecked, because the algorithm will spend an ever-increasingamount of time comparing new structures to those on the tabu list.Therefore, there is a limit to the number of structures on the tabulist and, after this has been reached, the oldest structures are

Figure 1. The two most stable KLVFFAE dimer structures. Figure 2. The most stable KLVFFAE octamer structure.

Table 3. Mean Minimum Energies in RT Units in KLVFFAE Searches.

Number of chains MC SA RE TS MC/TS

2 �131.9 �132.0 �132.0 �131.8 �132.03 �232.8 �232.8 �232.8 �232.1 �232.84 �327.4 �327.3 �327.3 �321.9 �327.45 �424.8 �424.9 �425.1 �418.4 �424.76 �520.1 �519.5 �520.4 �510.0 �519.57 �612.9 �613.8 �613.8 �607.4 �613.28 �710.5 �711.3 �711.8 �692.0 �710.59 �803.9 �805.4 �805.7 �769.8 �802.9

10 �896.0 �896.9 �897.5 �881.3 �895.0

Lattice Models of Peptide Aggregation 1641

overwritten. Choosing an appropriate length for the tabu list isimportant. If it is too short, then the algorithm repeatedly searchesa small area of conformational space, too long and the time takento check candidate structures against the tabu list slows down thesearch.

An algorithm that combines the MC and TS methods mightperform well. One such algorithm is Azizi and Zolfaghari’s adap-tive SA/TS method.62 Test calculations using adaptive temperaturecontrol were not promising, so a simpler algorithm was used. Thisconsisted of a fixed temperature MC search coupled to a short tabulist so that any moves to recently visited structures were automat-ically rejected.

Three polypeptide fragments were studied: KLVFFAE,18 LM-VGGVVIA17 (from the Alzheimer’s amyloid-� protein), andAGAAAAGA15,16 (from a prion protein). The MC, SA, RE, TS,and MC/TS algorithms were all used for conformational searches.The move set for all algorithms consisted of two bond moves andreptation moves.63 A spherical boundary was added to the system,with the size of the sphere such that the aggregate occupied �1%of the available lattice points. This boundary was imposed toprevent complete dissociation of the aggregate, but was largeenough to allow the aggregate to explore diffuse structures.

All of the search algorithms have one or more adjustableparameters, and selecting good values for each of these is criticalin maximizing the performance of the algorithms. For each aggre-gate, the parameters that produce the best results are different.However, rather than retuning the parameters for each new system,the values that gave the best overall results in a series of testcalculations on various aggregates were chosen (Table 1).

Figure 3. The most stable KLVFFAE decamer structure.

Figure 4. The most stable LMVGGVVIA dimer structure.

Table 4. Mean Minimum Energies in RT Units in LMVGGVVIA Searches.

Number of chains MC SA RE TS MC/TS

2 �191.6 �191.6 �191.6 �191.1 �191.63 �316.6 �316.8 �317.0 �313.6 �317.14 �444.8 �445.0 �445.1 �438.6 �444.55 �568.7 �568.4 �568.2 �557.1 �567.86 �695.4 �695.8 �696.2 �683.0 �693.67 �820.3 �819.8 �820.4 �812.4 �818.98 �948.6 �949.2 �948.3 �933.6 �946.89 �1074.9 �1073.8 �1075.8 �1061.4 �1071.7

10 �1199.2 �1201.3 �1201.8 �1189.7 �1198.0

1642 Oakley, Garibaldi, and Hirst • Vol. 26, No. 15 • Journal of Computational Chemistry

The results presented here are from 10 independent runs of 109

steps for each of the MC based procedures (2 � 108 for each of thefive replicas in RE). The TS results are from five runs of 2 �106

steps. The initial structure for each search was generated by addinga randomly oriented chain to the most stable structure of theprevious aggregate. These calculations were performed in serial ona cluster of 48 dual AMD Athlon 1900� processor computers.

Results

Finding the most stable conformation of a single short polypeptidechain on a lattice is readily tractable, and all of the algorithmsfound the global energy minima of all three sequences in less than2 min. These were confirmed as the global minima by reference toexhaustive searches using the backtracking algorithm. The se-quences LMVGGVVIA and KLVFFAE both had a single globalenergy minimum, whereas the more homogeneous sequenceAGAAAAGA had 20 degenerate most stable structures.

Table 2 summarizes the energies of the most stable structuresof each aggregate. The mean minimum energy across all runs waschosen as the best measure of the relative performance of thealgorithms and these results are presented in more detail (Table3–5).

Aggregation of KLVFFAE

Two degenerate lowest energy structures were found for theKLVFFAE dimer. Almost all of the runs of all of the searchalgorithms found one or both of these structures so they areprobably the global energy minima. With the trimers and largeraggregates the differences between the methods become apparent(Table 3). The RE method produced the most consistent results,with the SA method also performing well. The TS algorithmperformed rather poorly; the best structures produced by TS werealways much less stable than the best structures from any of theother algorithms. The TS searches also scaled poorly with the sizeof the system. The ratio of the time taken by a TS run to the timetaken by an MC run increased from 1.7 for the dimer to 11.0 forthe decamer.

Both of the dimer structures had both chains in identical com-pact conformations (Fig. 1). In the larger aggregates, the peptidechains tend to be more unfolded (Fig. 2). The addition of an extrachain leads to significant reorganization of the aggregate andrepeating substructures are not seen. The most stable structureshave a compact core of hydrophobic residues, caused by the strongattractive interactions between the hydrophobic residues. The at-tractive interactions involving charged K and E residues areweaker than those involving the hydrophobic residues so thesegroups lie on the outside of the aggregate. In the decamer, the moststable structure found had a single unoccupied lattice residue at thecenter of the aggregate (Fig. 3). This empty site reduces therepulsive overpacking energy of the residues in the core of the

Figure 5. The most stable LMVGGVVIA hexamer structure.

Figure 6. The most stable LMVGGVVIA octamer structure.

Lattice Models of Peptide Aggregation 1643

aggregate. None of the aggregates adopted the �-sheet structuresseen in the higher resolution computational studies.

Aggregation of LMVGGVVIA

In the searches of the LMVGGVVIA dimer, the most stablestructure was only found by one run of the MC algorithm and bynone of the other algorithms, and it is difficult to tell whether thisrepresents the global energy minimum. This could be the result ofa much larger search space because the increase from sevenresidues in each chain to nine multiplies the total number ofaccessible conformations of the dimer by a factor of 114 � 1.5 �10.4 Alternatively, LMVGGVVIA could have a more ruggedenergy landscape than KLVFFAE, leading to a more difficultsearch. The most stable structure found comprises two partlyunfolded chains (Fig. 4).

In all of the aggregates from the trimer to the decamer, therewas little difference between the results of the MC, SA, and REalgorithms (Table 4). Again, the TS algorithm found less stablestructures than the MC-based algorithms, and the computer timerequired scaled poorly with the size of the aggregate. The moststable structures of the aggregates up to and including the hexamerare more compact than the KLVFFAE aggregates because thereare no charged groups that prefer to lie on the outside of theaggregate (Fig. 5). In the larger systems the repulsive overpackingenergy becomes more important, and the most stable structureshave some unoccupied lattice points in the center of the aggregate(Fig. 6).

Aggregation of AGAAAAGA

The more homogeneous sequence produces rather different resultsto the other sequences. All of the dimer searches produced thesame lowest energy and, in most cases, each run found severalstructures with this energy. Over all of the searches a total of 32different structures with the global minimum energy were found.All of these structures had the same arrangement of A and Gresidues (Fig. 7), with different connections between these resi-dues. Each run of an MC-based algorithm found an average of 5.8of these structures, but the TS runs only found 2.6.

In the searches of the larger aggregates, the RE and TS algo-rithms consistently produced the best results (Table 5). The goodperformance of the TS method for this peptide is probably due tothe large number of degenerate low-energy structures. In the othersystems the global minimum may be too far from the startingstructure for the TS method to find it. However, when there areseveral global minima, there is a greater probability that one ofthem will be close enough to the starting structure to be found byTS.

In aggregates up to the hexamer the most stable structures areplanar structures with a thickness of two residues (Fig. 8). In thistype of structure the maximum number of contacts for any residueis seven (eight for terminal residues), so there is only a smallrepulsive overpacking energy. In the hexamer searches, globularstructures (Fig. 9) occur at slightly higher energy than the planarstructures. These structures leave a lattice point unoccupied toreduce the overpacking energy in the same way that the largeraggregates of the other peptides do. In systems larger than thehexamer, no disc structures were found among the low-energystructures, and the most stable aggregates had globular structures,with increasing numbers of unoccupied sites (Fig. 10).

Figure 7. Two of the 32 degenerate most stable structures of theAGAAAAGA dimer.

Table 5. Mean Minimum Energies in RT Units in AGAAAAGA Searches.

Number of chains MC SA RE TS MC/TS

2 �84.4 �84.4 �84.4 �84.4 �84.43 �142.4 �142.5 �142.5 �142.3 �142.44 �200.6 �200.7 �200.9 �200.3 �200.75 �259.1 �259.7 �260.1 �260.3 �258.66 �315.1 �316.9 �317.2 �316.9 �315.57 �375.1 �377.4 �377.8 �378.0 �374.48 �434.8 �437.1 �437.6 �437.4 �434.19 �497.0 �501.8 �502.0 �501.2 �496.8

10 �554.1 �558.8 �559.6 �559.5 �552.9

Figure 8. The most stable AGAAAAGA hexamer structure.

1644 Oakley, Garibaldi, and Hirst • Vol. 26, No. 15 • Journal of Computational Chemistry

Conclusions

The RE algorithm consistently produced good results across allthree sequences and in all sizes of aggregate. RE worked better asthe size of the system increased and out-performed the otheralgorithms in the octamers and larger aggregates. The SA algo-rithm generally performed almost as well, and in the medium-sizedaggregates (4–7 chains) it frequently found the best overall struc-tures. The MC algorithm produced comparable results in somesystems, but performed poorly in others. Thus, the addition ofmultiple temperatures to the MC method substantially improvesthe effectiveness of the search. The poor performance of the TSmethod is due, in part, to the small number of moves made, andtherefore the poor coverage of conformational space, compared tothe MC-based simulations. This is caused by large number ofenergy evaluations needed for each move, which is by far the mosttime-consuming part of any of the searches. The MC searches onlyperform one energy evaluation for each candidate structure, withan acceptance ratio of approximately 0.4. The TS algorithm per-forms several energy evaluations for each move; and the numberof evaluations per move increases with the size of the aggregate,with more than 100 evaluations needed for the largest aggregates.The results of the AGAAAAGA searches suggest that the TSalgorithm is much better at searching the local neighborhood than

it is at searching all of the accessible conformational space. Theaddition of a tabu list to the MC method did not lead to anyimprovement, and in many cases, the MC/TS algorithm performedworse than the standard MC algorithm.

In the smaller systems studied here, compact structures thatmaximize the number of contacts are favored. As the size of theaggregates increases, the fraction of buried residues increases.Buried residues have a greater number of contacts so the repulsivepart of the MJ potential becomes more important. In systems largerthan 50–70 residues (depending on the sequence) the effect of therepulsive term becomes so large that empty sites must be intro-duced. Previous studies47,48 have shown that even small aggre-gates containing two or three chains form �-sheets, but the systemsstudied here did not have any �-sheet character. This is probablybecause a potential that is based only on contacts, like the MJpotential, is not detailed enough64 and that multibody terms tomodel properties like hydrogen bonding are needed. The develop-ment of a potential including such terms is the focus of furtherwork.

Acknowledgments

We thank Matt Wood for technical assistance.

References

1. Dobson, C. M. Nature 2003, 426, 884.2. Selkoe, D. J. Nature 2003, 426, 900.Figure 9. The most stable globular AGAAAAGA hexamer structure.

Figure 10. The most stable AGAAAAGA octamer structure.

Lattice Models of Peptide Aggregation 1645

3. Serpell, L. C.; Blake, C. C. F.; Fraser, P. E. Biochemistry 2000, 39,13269.

4. Balbirnie, M.; Grothe, R.; Eisenberg, D. S. Proc Natl Acad Sci USA2001, 98, 2375.

5. Sikorski, P.; Atkins, E. D. T.; Serpell, L. C. Structure 2003, 11, 915.6. Tycko, R. Biochemistry 2003, 42, 3151.7. Jaroniec, C. P.; MacPhee, C. E.; Bajaj, V. S.; McMahon, M. T.;

Dobson, C. M.; Griffin, R. G. Proc Natl Acad Sci USA 2004, 101, 711.8. Nelson, R.; Sawaya, M. R.; Balbirnie, M.; Madsen, A. O.; Riekel, C.;

Grothe, R.; Eisenberg, D. Nature 2005, 435, 773.9. Bucciantini, M.; Giannoni, E.; Chiti, F.; Baroni, F.; Formigli, L.;

Zurdo, J. S.; Taddei, N.; Ramponi, G.; Dobson, C. M.; Stefani, M.Nature 2002, 416, 507.

10. Cleary, J. P.; Walsh, D. M.; Hofmeister, J. J.; Shankar, G. M.;Kuskowski, M. A.; Selkoe, D. J.; Ashe, K. H. Nat Neurosci 2005, 8,79.

11. Caughey, B.; Lansbury, P. T. Annu Rev Neurosci 2003, 26, 267.12. Kirkitadze, M. D.; Bitan, G.; Teplow, D. B. J Neurosci Res 2002, 69,

567.13. Walsh, D. M.; Klyubin, I.; Fadeeva, J. V.; Cullen, W. K.; Anwyl, R.;

Wolfe, M. S.; Rowan, M. J.; Selkoe, D. J. Nature 2002, 416, 535.14. Tjernberg, L.; Hosia, W.; Bark, N.; Thyberg, J.; Johansson, J. J Biol

Chem 2002, 277, 43243.15. Gasset, M.; Baldwin, M. A.; Lloyd, D. H.; Gabriel, J. M.; Holtzman,

D. M.; Cohen, F.; Fletterick, R.; Prusiner, S. B. Proc Natl Acad SciUSA 1992, 89, 10940.

16. Blondelle, S. E.; Forood, B.; Houghten, R. A.; PerezPaya, E. Biochem-istry 1997, 36, 8393.

17. Halverson, K.; Fraser, P. E.; Kirschner, D. A.; Lansbury, P. T. Bio-chemistry 1990, 29, 2639.

18. Balbach, J. J.; Ishii, Y.; Antzutkin, O. N.; Leapman, R. D.; Rizzo,N. W.; Dyda, F.; Reed, J.; Tycko, R. Biochemistry 2000, 39, 13748.

19. Parchment, O. G.; Essex, J. W. Proteins 2000, 38, 327.20. Armen, R. S.; DeMarco, M. L.; Alonso, D. O. V.; Daggett, V. Proc

Natl Acad Sci USA 2004, 101, 11622.21. Ma, B. Y.; Nussinov, R. Protein Sci 2002, 11, 2335.22. Zanuy, D.; Ma, B. Y.; Nussinov, R. Biophys J 2003, 84, 1884.23. Ma, B. Y.; Nussinov, R. Proc Natl Acad Sci USA 2002, 99, 14126.24. Tsai, H. H.; Zanuy, D.; Haspel, N.; Gunasekaran, K.; Ma, B. Y.; Tsai,

C. J.; Nussinov, R. Biophys J 2004, 87, 146.25. Sekijima, M.; Motono, C.; Yamasaki, S.; Kaneko, K.; Akiyama, Y.

Biophys J 2003, 85, 1176.26. Paci, E.; Gsponer, J.; Salvatella, X.; Vendruscolo, M. J Mol Biol 2004,

340, 555.27. Chirita, C. N.; Kuret, J. Biochemistry 2004, 43, 1704.28. Wilkins, D. K.; Dobson, C. M.; Gross, M. Eur J Biochem 2000, 267,

2609.

29. Urbanc, B.; Cruz, L.; Ding, F.; Sammond, D.; Khare, S.; Buldyrev,S. V.; Stanley, H. E.; Dokholyan, N. V. Biophys J 2004, 87, 2310.

30. Peng, S.; Ding, F.; Urbanc, B.; Buldyrev, S. V.; Cruz, L.; Stanley,H. E.; Dokholyan, N. V. Phys Rev E 2004, 69, 041908.

31. Nguyen, H. D.; Hall, C. K. Proc Natl Acad Sci USA 2004, 101, 16180.32. Dima, R. I.; Thirumalai, D. Protein Sci 2002, 11, 1036.33. Harrison, P. M.; Chan, H. S.; Prusiner, S. B.; Cohen, F. E. Protein Sci

2001, 10, 819.34. Giugliarelli, G.; Micheletti, C.; Banavar, J. R.; Maritan, A. J Chem

Phys 2000, 113, 5072.35. Gupta, P.; Hall, C. K.; Voegler, A. C. Protein Sci 1998, 7, 2642.36. Bratko, D.; Blanch, H. W. J Chem Phys 2003, 118, 5185.37. Bratko, D.; Blanch, H. W. J Chem Phys 2001, 114, 561.38. Leonhard, K.; Prausnitz, J. M.; Radke, C. J. Phys Chem Chem Phys

2003, 5, 5291.39. Broglia, R. A.; Tiana, G.; Pasquali, S.; Roman, H. E.; Vigezzi, E. Proc

Natl Acad Sci USA 1998, 95, 12930.40. Park, B. H.; Levitt, M. J Mol Biol 1995, 249, 493.41. Bagci, Z.; Jernigan, R. L.; Bahar, I. J Chem Phys 2002, 116, 2269.42. Bagci, Z.; Jernigan, R. L.; Bahar, I. Polymer 2002, 43, 451.43. Gront, D.; Kolinski, A.; Skolnick, J. J Chem Phys 2000, 113, 5065.44. Zhang, Y.; Skolnick, J. J Chem Phys 2001, 115, 5027.45. Pokarowski, P.; Kolinski, A.; Skolnick, J. Biophys J 2003, 84, 1518.46. Klimov, D. K.; Thirumalai, D. Structure 2003, 11, 295.47. Santini, S.; Wei, G. H.; Mousseau, N.; Derreumaux, P. Structure 2004,

12, 1245.48. Santini, S.; Mousseau, N.; Derreumaux, P. J Am Chem Soc 2004, 126,

11509.49. Favrin, G.; Irback, A.; Mohanty, S. Biophys J 2004, 87, 3657.50. Lau, K. F.; Dill, K. A. Macromolecules 1989, 22, 3986.51. Go� , N. Annu Rev Biophys Bioeng 1983, 12, 183.52. Kaya, H.; Chan, H. S. Proteins 2000, 40, 637.53. Miyazawa, S.; Jernigan, R. L. J Mol Biol 1996, 256, 623.54. Miyazawa, S.; Jernigan, R. L. Macromolecules 1985, 18, 534.55. Hirst, J. D. Protein Eng 1999, 12, 721.56. Metropolis, N.; Rosenbluth, A. W.; Rosenbluth, M. N.; Teller, A. H.;

Teller, E. J Chem Phys 1953, 21, 1087.57. Vasquez, M.; Nemethy, G.; Scheraga, H. A. Chem Rev 1994, 94,

2183.58. Kirkpatrick, S.; Gelatt, C. D.; Vecchi, M. P. Science 1983, 220, 671.59. Swendsen, R. H.; Wang, J. S. Phys Rev Lett 1986, 57, 2607.60. Hansmann, U. H. E. Chem Phys Lett 1997, 281, 140.61. Glover, F. Comput Oper Res 1986, 13, 533.62. Azizi, N.; Zolfaghari, S. Comput Oper Res 2004, 31, 2439.63. Wall, F. T.; Mandel, F. J Chem Phys 1975, 63, 4592.64. Vendruscolo, M.; Domany, E. J Chem Phys 1998, 109, 11101.

1646 Oakley, Garibaldi, and Hirst • Vol. 26, No. 15 • Journal of Computational Chemistry