the jigsaw puzzle model: search for conformational specificity in protein interiors

16
The Jigsaw Puzzle Model: Search for Conformational Specificity in Protein Interiors Rahul Banerjee*, Malabika Sen, Dhananjay Bhattacharya and Partha Saha Crystallography and Molecular Biology Division, Saha Institute of Nuclear Physics Sector 1, Block AF, Bidhan Nagar, Calcutta 700 064, India The jigsaw puzzle model postulates that the predominant factor relating primary sequence to three-dimensional fold lies in the stereospecific pack- ing of interdigitating side-chains within densely packed protein interiors. An attempt has been made to check the validity of the model by means of a surface complementarity function. Out of a database of 100 highly resolved protein structures the contacts between buried hydrophobic resi- dues (Leu, Ile, Val, Phe) and their neighbours have been categorized in terms of the extent of side-chain surface area involved in a contact (over- lap) and their steric fit (Sm). The results show that the majority of contacts between a buried residue and its immediate neighbours (side-chains) are of high steric fit and in the case of extended overlap at least one of the angular parameters characterizing interresidue geometry to have pro- nounced deviation from a random distribution, estimated by x 2 . The calculations thus tend to support the “jigsaw puzzle” model in that 75 – 85% of the contacts involving hydrophobic residues are of high sur- face complementarity, which, coupled to high overlap, exercise fairly stringent constraints over the possible geometrical orientations between interacting residues. These constraints manifest in simple patterns in the distributions of orientational angles. Approximately 60–80% of the buried side-chain surface packs against neighbouring side-chains, the rest inter- acting with main-chain atoms. The latter partition of the surface maintains an equally high steric fit (relative to side-chain contacts) emphasizing a non-trivial though secondary role played by main-chain atoms in interior packing. The majority of this class of contacts, though of high complemen- tarity, is of reduced overlap. All residues whether hydrophobic or polar/ charged show similar surface complementarity measures upon burial, indicating comparable competence of all amino acids in packing effec- tively with their atomic environments. The specificity thus appears to be distributed over the entire network of contacts within proteins. The study concludes with a proposal to classify contacts as specific and non- specific (based on overlap and fit), with the former perhaps contributing more to the specificity between sequence and fold than the latter. q 2003 Elsevier Ltd. All rights reserved. Keywords: side-chain; packing; jigsaw puzzle; hydrophobic core; surface complementarity *Corresponding author Introduction The polypeptide chain of every functional native protein is specific to a particular three-dimensional fold. A proper understanding of the factors respon- sible for this specificity, defined as the adoption of a unique fold out of several alternatives is perhaps one of the core issues in the protein folding problem. One of the first proposals was made by Francis Crick, 1 who considered the key deter- minant linking sequence to fold lay in the stereo- specific packing of amino acid side-chains in the interior of the molecule. The packing density within a protein resembles a crystalline solid. 2 It 0022-2836/$ - see front matter q 2003 Elsevier Ltd. All rights reserved. Supplementary data associated with this article can be found at doi: 10.1016/S0022-2836(03)01033-7 E-mail address of the corresponding author: [email protected] Abbreviation used: VDW, van der Waals. doi:10.1016/j.jmb.2003.08.013 J. Mol. Biol. (2003) 333, 211–226

Upload: rahul-banerjee

Post on 15-Oct-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

The Jigsaw Puzzle Model: Search for ConformationalSpecificity in Protein Interiors

Rahul Banerjee*, Malabika Sen, Dhananjay Bhattacharya andPartha Saha

Crystallography and MolecularBiology Division, SahaInstitute of Nuclear PhysicsSector 1, Block AF, BidhanNagar, Calcutta 700 064, India

The jigsaw puzzle model postulates that the predominant factor relatingprimary sequence to three-dimensional fold lies in the stereospecific pack-ing of interdigitating side-chains within densely packed protein interiors.An attempt has been made to check the validity of the model by meansof a surface complementarity function. Out of a database of 100 highlyresolved protein structures the contacts between buried hydrophobic resi-dues (Leu, Ile, Val, Phe) and their neighbours have been categorized interms of the extent of side-chain surface area involved in a contact (over-lap) and their steric fit (Sm). The results show that the majority of contactsbetween a buried residue and its immediate neighbours (side-chains) areof high steric fit and in the case of extended overlap at least one of theangular parameters characterizing interresidue geometry to have pro-nounced deviation from a random distribution, estimated by x2.The calculations thus tend to support the “jigsaw puzzle” model in that75–85% of the contacts involving hydrophobic residues are of high sur-face complementarity, which, coupled to high overlap, exercise fairlystringent constraints over the possible geometrical orientations betweeninteracting residues. These constraints manifest in simple patterns in thedistributions of orientational angles. Approximately 60–80% of the buriedside-chain surface packs against neighbouring side-chains, the rest inter-acting with main-chain atoms. The latter partition of the surface maintainsan equally high steric fit (relative to side-chain contacts) emphasizing anon-trivial though secondary role played by main-chain atoms in interiorpacking. The majority of this class of contacts, though of high complemen-tarity, is of reduced overlap. All residues whether hydrophobic or polar/charged show similar surface complementarity measures upon burial,indicating comparable competence of all amino acids in packing effec-tively with their atomic environments. The specificity thus appears tobe distributed over the entire network of contacts within proteins. Thestudy concludes with a proposal to classify contacts as specific and non-specific (based on overlap and fit), with the former perhaps contributingmore to the specificity between sequence and fold than the latter.

q 2003 Elsevier Ltd. All rights reserved.

Keywords: side-chain; packing; jigsaw puzzle; hydrophobic core; surfacecomplementarity*Corresponding author

Introduction

The polypeptide chain of every functional nativeprotein is specific to a particular three-dimensional

fold. A proper understanding of the factors respon-sible for this specificity, defined as the adoption ofa unique fold out of several alternatives is perhapsone of the core issues in the protein foldingproblem. One of the first proposals was made byFrancis Crick,1 who considered the key deter-minant linking sequence to fold lay in the stereo-specific packing of amino acid side-chains in theinterior of the molecule. The packing densitywithin a protein resembles a crystalline solid.2 It

0022-2836/$ - see front matter q 2003 Elsevier Ltd. All rights reserved.

Supplementary data associated with this article can befound at doi: 10.1016/S0022-2836(03)01033-7

E-mail address of the corresponding author:[email protected]

Abbreviation used: VDW, van der Waals.

doi:10.1016/j.jmb.2003.08.013 J. Mol. Biol. (2003) 333, 211–226

was Crick’s view characteristically called the “jig-saw puzzle” model, that such dense packingcould only be achieved by the correct fit of com-plementary surfaces contributed by interdigitatingside-chains. Subsequent experiments have, how-ever, cast doubts on whether this feature of cor-rectly folded proteins is the predominantdeterminant of specificity.

In general, proteins are quite tolerant tomutations in their interiors provided the hydro-phobic composition of the core is not altered.A dramatic demonstration of this fact was in thecase of phage T4 lysozyme3 where the proteinretained its overall fold (though with reduced ther-mal stability and activity), despite the mutation ofseven residues to methionine. Crystal structures ofthis and similar mutants4,5 have shown that pro-teins adequately compensate such mutations bystructural adjustments of both side and main-chain atoms thereby preserving their overallshape. In yet another experiment,6 12 out of 13core residues of ribonuclease barnase were ran-domly mutated. Of these mutants 23% retainedtheir enzymatic activity in vivo. Study of haemo-globin structures from different species also exhib-ited fold conservation with only 15% sequenceidentity, the hydrophobic character of the buriedresidues being preserved in each case.7

These observations have led to an alternativeproposal that the pattern of hydrophobicitiesembedded in the primary sequence plays the keyrole in determining the fold.8 In this view thesecond genetic code has an essentially binary char-acter. In order to specify a fold it is thus onlynecessary to explicitly state the binary patterningof polar (P) and non-polar (H) residues in a poly-peptide chain. Such a hypothesis is based on thefact that, compared to other forces, the hydro-phobic effect plays the predominant role in foldinga protein,9 with secondary structural elements anddetailed internal architecture arising simply onaccount of compaction of the polypeptide chain.

Attempts to experimentally design proteins basedon the binary H-P code have yielded significantresults. Kamtekar et al.10 have designed four-helixbundles, with polar and non-polar amino acidsbeing positioned in the sequence following theperiodicity of an a-helix. No attempt was made toconstrain the exact identities of the residues. Ofthe designed proteins 60% were soluble, indicatinga measure of compactness to escape degradationby cellular proteases. The secondary structures ofa few such proteins probed by circular dichroismspectroscopy exhibited the characteristic featuresof the targeted a-helix. Lattice simulations in twoand three dimensions confirmed that it is possibleto fold binary H-P sequences into compact struc-tures in silico.11,12 There is thus an increasing con-sensus that the constraint exercized by detailedpacking interactions within a protein maybe rela-tively weak in determining the overall fold, whencompared to the pattern of hydrophobicities in theprimary sequence.

Even then, the part played by packing in attain-ing a fully functional thermally stable proteincannot be discounted. In experiments which ran-domly mutated the core of a protein molecule,most mutants were thermally unstable when com-pared to the wild-type.13 Again a non-negligiblefraction of these mutants or designed proteinseither failed to express or were wholly inactive,making it probable that in such instances even thebasic fold may not have been attained or havehighly perturbed structures. Dahiyat & Mayo14

redesigned the core of streptococcal protein G b1domain, where a mutant with an overpacked coreexhibited all the features of a disordered collapsedglobule, whereas an underpacked instance led tothe complete unraveling of the structure. The factthat mutants can be wholly inactive despite con-serving the hydrophobic composition and corevolume (with respect to the wild-type protein)gives a fair measure of the importance of stereo-specificity or relative geometry of the interactingresidues within the molecular interior. It is thusfairly well established that the relative thermalstabilities of native and mutant proteins cannot becorrectly predicted in terms of the relative hydro-phobicities of the mutated residues alone.15 Forhemoglobins, although the basic fold is maintainedin different species with as low as 15% sequenceidentity, there is yet variability in the relative orien-tation of the helices, which can be as large as 308and 7 A in rotation and translation, respectively.These observations emphasize that densely packedcores normal to correctly folded proteins cannot betaken for granted, and the crucial importance inspecifying the thermostable optimally active formof the molecule. In the design process therefore adeliberate effort might have to be made to con-struct a tightly packed core of a fully functionalmolecule. It thus becomes imperative to elucidatethe principles by which a protein assembles its cor-rectly packed hydrophobic interior.

To date there exist two hypotheses, the “jigsawpuzzle” and the “nuts and bolts” model, which lieon the opposite ends of the spectrum. In the formerthe buried amino acids have to be brought togetherin precise three-dimensional geometry to allow theinterlocking of side-chain complementary surfacesin order to achieve dense packing. Such is notrequired of the “nuts and bolts” association,where close packing arises simply on account ofcompaction within a constrained volume. The twomodels are further distinguished by the fact thatin the jigsaw puzzle model the side-chains remaininterlocked on systematic expansion of the chainuntil a critical point of disjuncture is reached, esti-mated to be around 25% increase in volume.16 Incontrast there will be no abrupt increase in side-chain entropy for the nuts and bolts case, as sub-stantial conformational freedom will be gained byonly a few percentage points increase in volume.17

Although stereospecific geometry is the essenceof the jigsaw puzzle model, finding non-randompreferred modes of side-chain association within

212 Conformational Specificity in Protein Interiors

proteins has been quite elusive. Behe et al.18 failedto detect any preferred interactions amongstside-chains. Similarly, no outstanding mode ofassociation was found for two closely packedphenylalanine residues in a protein environment.19

On the other hand, there have also been reports ofsignificant deviations from a random distributionfor interplanar and polar angles characterizing thegeometry of particular residues engaged in pair-wise interactions.20 – 22 Given the ensuing contro-versy, the present study re-examines side-chainpacking utilizing a surface complementarityfunction.23 The method of small-probe contact dotsconfirmed the excellent “goodness-of-fit” betweeninterdigitating side-chains in protein interiors.24

The present enquiry thus attempts to elucidatewhether such well-fitted packing interactions arepossible due to preferred geometrical orientationof side-chains or otherwise.

Results

The primary aim of the present study is to eluci-date the role of surface complementarity in thepacking of amino acid side-chains within proteinsand determine the extent to which high surface fitrequires specific geometrical orientation betweeninteracting residues. To this end Connolly and vander Waals (VDW) surfaces have been calculatedfor 100 highly resolved protein structures (Table 1).From this database, contacts between side-chainsof completely buried residues within proteins(targets) and their neighbours have been character-ized in terms of two parameters: (1) surface com-plementarity (Sm); and (2) overlap.

A modified version of the original function pro-posed by Lawrence & Colman23 has been used tomeasure steric fit between residue surfaces. Over-lap estimates the extent of contact between targetand neighbour, being defined (see Materials andMethods) as the fraction of the total number ofside-chain dot surface points of a target involvedin contact with a neighbour. The side-chain surfaceof a target can thus be partitioned into patcheswhich make contact or “overlaps” with surround-ing residues. The Sm for each patch can be esti-mated along with the overall complementarity ofthe target. Although only the side-chain surfacewas considered for the target, its immediate atomicenvironment can be assembled by both side andmain-chain atoms of neighbouring residues. Bothcases were analyzed independently.

A target is said to be in contact with a neigh-bouring residue when there is an overlap of atleast one surface point. The average number ofneighbours making contact with a target, withtheir side-chain atoms alone varies between fourand ten, depending upon the size of the target(Figure 1(a) and (b)). Bulky side-chains (Phe, Tyr,Trp) have typically nine or ten neighbours,whereas the number varies from seven to nine fortargets of intermediate size (Leu, Ile, Met) and

four to six for smaller amino acids (Ala, Ser). Con-sidering only those contacts with an overlapgreater than equal to 10% limits the average num-ber of neighbours to between three and four foralmost every amino acid. An average of five toeight neighbours make main-chain contacts (Figure2(a) and (b)) with the target and reduces to lessthan two on the application of a 10% cutoff onoverlap. Thus both Connolly and VDW surfacesconfirm an average of three to five neighbourswhich make extended contact (greater than orequal to 10% overlap) with targets either throughtheir side or main-chain atoms.

Most targets have an overlap of about 60–80%with neighbouring side-chain surface points(Table 2). The average Sm (kSml) calculated onConnolly surfaces, for this partition of the targetside-chain, is highest for hydrophobic residues(Leu, 0.60; Ile, 0.60; Val, 0.59; Phe, 0.61) and dropsmarginally by about 10% for polar and chargedamino acids. Tyrosine and tryptophan are two

Table 1. Protein Data Bank files in the database

Pdb (resolution, no. buried) Pdb (resolution, no. buried)

a þ b All a1lkk_A(1.00, 13) 7rsa (1.26, 17) 2end (1.45, 27) 1cem

(1.65,136)1lit (1.55, 27) 1mrj (1.60, 69) 2pgd (2.00,126) 1osa (1.68,

12)1ubi (1.80, 15) 1mkb_A(2.00,35) 1lmb_4(1.80, 15)

1dsb_A(2.00, 42)1pne (2.00, 31) 2chs_A(1.90, 23) 2gst_A(1.80, 51) 2erl (1.00, 2)1ctf (1.70, 8) 1fkf (1.70, 15) 2utg_A(1.64, 4) 1rro (1.30, 23)2ihl (1.40, 25) 2tsc_A(1.97, 69) 1poa (1.50, 12) 1axn (1.78, 86)1ppd (2.00, 64) 1hfc (1.56,30) 2wrp_R(1.65, 3) 2lis (1.35, 21)1erz_A(1.70, 88) 1ako (1.70, 77) 1ib2_A (1.90, 86) 1bgf (1.45,

23)1jh6_A(1.80, 42) 1gif_A(1.90,17) 1af7 (2.00, 68)

All b alb1ifc (1.19, 23) 1amm (1.20, 41) 1cus (1.25, 58)

1xyz_A(1.40,112)1cka_A(1.50, 7) 2rhe (1.60, 20) 2rn2 (1.48, 26) 1tca (1.55, 113)4fgf (1.60, 23) 2cpl (1.63, 42) 1dad (1.60, 54) 3chy (1.66, 29)1tta_A(1.70, 24) 1knb (1.70, 48) 6xia (1.65, 98) 2dri (1.60, 74)1eur (1.82, 123) 1thw (1.75, 47) 1tph_1(1.80, 63) 1rva_A(2.00,

51)1dif_A(1.70,17)2bbk_H(1.75,106) 1lam (1.60, 141) 1lau_E(1.80,

68)1hoe (2.00, 11) 1jbc (1.20, 63) 1tml (1.80, 82) 1pii (2.00, 143)1aac (1.31, 24) 2mcm (1.50, 18) 4mbp (1.70, 115) 1mla (1.50,

92)1stn (1.70, 27) 1nif (1.60, 87) 5p21 (1.35,40) 2ctc (1.40, 100)1smd (1.60,171) 2ayh (1.60, 57) 1php (1.65, 119) 1mor (1.90,

132)1ab9_B(1.60, 12) 1arb (1.20, 86) 1d3v_A(1.70, 94) 1chd (1.75,

61)1whi (1.50, 29) 1kap_P(1.64,129) 1pdo (1.70, 27) 1jsr_A (1.70,

96)1czf_A(1.68,101)1h6w_A(1.90,2)

1bgv_A(1.90,143)1jr2_A(1.84,63)

1pgs (1.80, 88) 1tl2_A(2.00, 39) 1srv_A(1.70,41)1hbq (1.70, 30) 1sfp (1.90, 21) Small proteins-multidomain1beh_A(1.75, 52) 1iaz_A(1.90,47) 1cnr (1.05, 1) 1ptx (1.30, 3)1dto_A(1.90, 31) 1amx (2.00, 36) 3pte (1.60, 122) 2ovo (1.50, 2)

Resolution (A) and number of buried residues are given inparentheses. The polypeptide chain selected is written next tothe pdb code.

Conformational Specificity in Protein Interiors 213

other residues with high kSml. On the other hand,the patches on the target side-chain making contactwith main-chain atoms alone maintains a uni-formly high kSml (greater than or equal to about0.60) for all residues with the singular exception ofproline (0.46). When both side and main-chain sur-face points are considered the overall kSml is fairlyuniform for all amino acids.

The patterns in Sm and overlap are conservedfor both Connolly and VDW surfaces. For all resi-dues, the correlation coefficients between kSmls(calculated on VDW and Connolly) for only side-chain, main-chain and both types of neighboursare 0.98, 0.90 and 0.99, respectively. For buried tar-gets alone their values (in the same order) are0.88, 0.67, and 0.89. In absolute terms, Sm in VDWsurfaces are depressed by about 10–15% relativeto Connolly, probably due to the sharper surfacecontours in the former.

The marginal variation in kSml among the differ-ent amino acids (as targets) tends to confirm theview that hydrophobicities exercize a greater con-straint in positioning residues with respect to theirexposure to solvent rather than their intrinsic abil-ity for better packing. In other words, the differ-ence in hydrophobicities between apolar andpolar/charged residues are far more significantthan their complementarity measures.

For a particular target, each individual contactwith a neighbour can be separately estimated foroverlap and steric fit. Every contact was classifiedaccording to the residues in the position of targetand neighbour (eg. Leu (target)–Ala (neighbour))and the Sm averaged over all such contacts in thedatabase. In addition, the pairwise correlation orpropensity of contact between the two (Table 3)were also estimated. As expected, the hydrophobictargets on one hand and the polar/charged resi-dues on the other show a marked preference tointeract with similar neighbours, in terms of hydro-phobicity or charge. For example the contact pro-pensity of leucine targets lies between 1.08 and1.20 for Leu, Ile, Val, Phe and Met neighbours butdrops to 0.60–1.0 for polar/charged residues. Forhydrophobic targets, however, this preferencedoes not appear to correlate with any pronouncedor regular variability in kSml. The standard devi-ation in kSml for all hydrophobic target–neighbourcontacts lies between 0.02 and 0.03, with a maxi-mum of 0.04 for methionine. For polar/chargedresidues s(Sm) is only marginally higher (0.04–0.05).Charged targets tend to have a slightly lower kSmlwhen interacting with neighbours of oppositecharge (Asp-Arg: 0.51; Arg-Glu: 0.48; Arg-Asp:0.48), giving a negative correlation between kSml

Figure 1. (a) Frequency of the number of neighbours(side-chain contacts) for leucine targets. (b) Frequencyof the number of neighbours (side-chain contacts) forleucine targets which have an overlap greater than orequal to 10%.

Figure 2. (a) Frequency of the number of neighbours(main-chain contacts) for leucine targets. (b) Frequencyof the number of neighbours (main-chain contacts) forleucine targets which have an overlap greater than orequal to 10%.

214 Conformational Specificity in Protein Interiors

and propensity of contact, indicating thereby thatthe optimal geometry for electrostatic interactionsmay marginally decrease the steric fit betweenoppositely charged partners. However, despitethis small effect, the results do not preferentiallysingle out any target–neighbour pair for theirextraordinary steric fit. The results from bothConnolly and VDW surfaces are in good agreement.

A similar analysis of target–main-chain inter-actions did not yield any dominant preferencesbetween amino acids and hence all the main-chainneighbours for a target were grouped togetherwithout any further sorting based on residuetype.

Categories of contact

A contact between a target and a neighbour canbe classified into four categories based on Sm andoverlap: (1) low overlap–low Sm; (2) low over-lap–high Sm; (3) high overlap–low Sm; and (4)high overlap–high Sm. The highest possible valuefor Sm is 1.00. Greater than or equal to 0.5 was con-sidered to be high and less than 0.5 low Sm, for aConnolly surface. Due to the consistently lowervalue of Sm obtained on VDW surfaces relative toConnolly, the cutoff for high Sm was scaled by:

ð0:5=kSmlConnollyÞ £ kSmlvan der Waals

for analyses on VDW surfaces. Here, kSml is theaverage complementarity for a particular residue(Table 2). Depending on whether side or main-chain contacts were being considered kSml weresuitably chosen for every target.

As overlaps higher than 30% are uncommon

Table 2. Average overlap of the target side-chain surfacemaking contact with neighbouring side and main-chainatoms with their corresponding average surface comple-mentarities

Average over-lap (%)

Average complementaritykSml

Residue Bur Side Main Total Side Main

Ala 746 0.70(0.19)

0.30 0.59(0.09)

0.57(0.11)

0.61(0.15)

0.69(0.16)

0.31 0.53(0.08)

0.52(0.10)

0.54(0.12)

Leu 909 0.78(0.12)

0.22 0.62(0.06)

0.60(0.07)

0.65(0.11)

0.79(0.10)

0.21 0.53(0.05)

0.52(0.06)

0.55(0.09)

Ile 659 0.78(0.12)

0.22 0.62(0.06)

0.60(0.07)

0.64(0.10)

0.78(0.11)

0.22 0.53(0.05)

0.53(0.06)

0.55(0.09)

Val 761 0.77(0.12)

0.23 0.61(0.06)

0.59(0.07)

0.62(0.11)

0.76(0.10)

0.24 0.53(0.05)

0.52(0.06)

0.54(0.09)

Phe 427 0.78(0.12)

0.22 0.63(0.05)

0.61(0.06)

0.66(0.10)

0.79(0.10)

0.21 0.56(0.04)

0.55(0.05)

0.57(0.08)

Met 192 0.73(0.13)

0.27 0.64(0.06)

0.62(0.07)

0.67(0.09)

0.75(0.12)

0.25 0.55(0.05)

0.55(0.06)

0.56(0.09)

Pro 169 0.66(0.14)

0.34 0.57(0.07)

0.58(0.08)

0.46(0.26)

0.70(0.14)

0.3 0.51(0.06)

0.52(0.07)

0.45(0.13)

Ser 318 0.65(0.20)

0.35 0.57(0.08)

0.54(0.10)

0.61(0.14)

0.67(0.16)

0.33 0.51(0.07)

0.51(0.09)

0.52(0.11)

Thr 288 0.69(0.15)

0.31 0.58(0.07)

0.56(0.09)

0.61(0.11)

0.69(0.13)

0.31 0.51(0.06)

0.51(0.08)

0.52(0.10)

Asn 137 0.63(0.18)

0.37 0.58(0.08)

0.56(0.10)

0.60(0.13)

0.65(0.16)

0.35 0.51(0.06)

0.51(0.07)

0.51(0.10)

Gln 85 0.67(0.13)

0.33 0.59(0.06)

0.58(0.07)

0.60(0.09)

0.69(0.12)

0.31 0.51(0.05)

0.52(0.06)

0.52(0.08)

Trp 126 0.73(0.11)

0.27 0.63(0.05)

0.61(0.05)

0.67(0.07)

0.75(0.10)

0.25 0.55(0.04)

0.55(0.04)

0.57(0.08)

Tyr 250 0.74(0.12)

0.26 0.61(0.05)

0.60(0.06)

0.65(0.09)

0.75(0.11)

0.25 0.54(0.04)

0.54(0.05)

0.56(0.08)

Asp 134 0.68(0.17)

0.32 0.57(0.07)

0.52(0.09)

0.59(0.11)

0.70(0.15)

0.3 0.50(0.06)

0.48(0.07)

0.53(0.11)

(continued)

Table 2 continued

Average over-lap (%)

Average complementaritykSml

Residue Bur Side Main Total Side Main

Glu 81 0.71(0.14)

0.29 0.59(0.07)

0.56(0.08)

0.64(0.09)

0.74(0.12)

0.26 0.51(0.06)

0.51(0.51)

0.53(0.09)

Arg 63 0.67(0.14)

0.33 0.59(0.05)

0.58(0.06)

0.59(0.09)

0.71(0.13)

0.29 0.51(0.05)

0.51(0.06)

0.50(0.08)

Lys 36 0.65(0.15)

0.35 0.58(0.07)

0.57(0.08)

0.59(0.12)

0.68(0.14)

0.32 0.50(0.06)

0.51(0.07)

0.50(0.11)

His 66 0.69(0.16)

0.31 0.58(0.06)

0.57(0.07)

0.60(0.13)

0.72(0.13)

0.28 0.53(0.05)

0.53(0.06)

0.51(0.11)

The kSml of the entire side-chain of the target is also tabulated(total). The first and second row of every residue are results cal-culated on Connolly and van der Waals surfaces, respectively.The standard deviations are given in parentheses and Bur refersto the number of buried residues.

Conformational Specificity in Protein Interiors 215

(Figure 3), an overlap greater than or equal to 10%was stipulated to be high. The same criterion foroverlap was preserved for both surfaces. In thecase of preferential interactions between twoapolar residues due to the specific association oftheir complementary side-chain surfaces, it couldbe expected that most contacts between the twoshould be of significant overlap with a fairly highmeasure of steric fit. Statistical distribution of geo-metrical parameters should also show significantdeviations from randomness. Thus a “jigsawpuzzle”-like fit due to specific orientation ofamino acid side-chains will probably have maxi-mum number of contacts in the fourth category,

subject to the relative surface areas of target andneighbour.

Distribution of contacts in the four categories

Analysis of the distribution of contacts amongstthe four categories and statistical analysis of thetarget–neighbour geometry, in terms of deviationfrom a random distribution was performed for thehydrophobic residues leucine, isoleucine, valine,and phenylalanine. Alanine was considered onlyas a neighbour. For almost all pairs (excludingtarget–alanine) approximately 75–85% of the con-tacts are with high steric fit (Table 4). Of these,

Table 3. The average surface complementarity kSml and the contact propensity for every target–neighbour pair calcu-lated on Connolly surfaces

Neighbour

Target Ala Leu Ile Val Phe Met Pro Tyr Trp Ser Thr Asn Gln Asp Glu Arg Lys His

Ala 0.54 0.58 0.59 0.57 0.59 0.6 0.51 0.56 0.58 0.55 0.56 0.56 0.55 0.53 0.53 0.57 0.59 0.560.82 1.07 0.98 1.06 1.08 1.21 0.93 1.03 1.06 0.75 1.12 1.04 0.79 0.96 0.92 0.79 0.97 0.86

Leu 0.56 0.59 0.59 0.58 0.62 0.61 0.51 0.61 0.62 0.57 0.59 0.58 0.59 0.57 0.6 0.61 0.56 0.591.03 1.08 1.16 1.14 1.2 1.1 0.79 0.95 1.05 0.81 0.89 0.77 0.76 0.58 0.73 0.79 0.92 0.62

Ile 0.57 0.59 0.6 0.59 0.61 0.6 0.52 0.62 0.63 0.58 0.55 0.57 0.59 0.58 0.58 0.6 0.57 0.591.08 1.19 0.98 1.12 1.09 1.15 0.9 1.05 0.82 0.82 0.83 0.8 0.9 0.87 0.66 0.83 0.82 0.76

Val 0.56 0.59 0.6 0.58 0.61 0.61 0.52 0.6 0.61 0.54 0.57 0.56 0.59 0.54 0.55 0.57 0.57 0.581.09 1.09 1.18 1.02 0.98 1.06 0.88 0.98 1.07 0.88 0.94 0.79 1.01 0.66 0.84 0.76 0.92 0.93

Phe 0.57 0.61 0.61 0.6 0.61 0.61 0.56 0.6 0.62 0.55 0.58 0.58 0.6 0.52 0.56 0.59 0.53 0.581.07 1.16 1.06 1 1.03 1.07 0.82 0.99 1.11 0.99 0.81 0.85 0.95 0.71 0.86 0.75 0.71 0.82

Met 0.58 0.62 0.6 0.62 0.61 0.58 0.50 0.62 0.61 0.58 0.62 0.57 0.63 0.56 0.58 0.58 0.58 0.671.14 1.13 1.17 1.06 1.05 0.76 0.84 1.11 0.87 0.98 0.86 0.64 1.04 0.94 0.57 0.69 0.64 0.97

Pro 0.50 0.55 0.57 0.55 0.63 0.54 0.44 0.58 0.65 0.56 0.51 0.51 0.56 0.50 0.52 0.56 0.51 0.620.87 0.75 0.82 0.93 0.73 0.81 1.21 1.18 1.61 1.21 1.35 1.39 0.87 1.55 1.25 1.2 0.93 0.87

Tyr 0.56 0.60 0.62 0.59 0.61 0.58 0.53 0.60 0.60 0.50 0.58 0.53 0.52 0.48 0.53 0.59 0.58 0.570.99 0.94 1.02 0.89 0.9 1.05 1.24 0.78 0.98 0.87 0.9 0.97 1.05 1.18 1.41 1.47 1.3 1.26

Trp 0.56 0.60 0.61 0.60 0.62 0.64 0.62 0.61 0.61 0.51 0.51 0.60 0.56 0.52 0.58 0.58 0.58 0.541.14 1.05 0.83 1.02 1.08 0.82 1.23 0.94 0.7 1.12 0.68 0.96 1.19 1.29 0.83 1.07 1.04 1.31

Ser 0.56 0.58 0.59 0.56 0.56 0.61 0.49 0.56 0.58 0.53 0.51 0.49 0.56 0.46 0.47 0.57 0.49 0.560.76 0.73 0.8 0.87 0.88 0.82 1.18 1.02 1 1.11 1.13 1.51 1.29 1.94 1.6 1.23 1.44 1.71

Thr 0.54 0.60 0.56 0.56 0.58 0.62 0.51 0.58 0.58 0.53 0.55 0.49 0.59 0.45 0.54 0.58 0.52 0.531.07 0.79 0.85 0.84 0.79 0.97 1.39 0.9 0.82 1.16 1.23 1.32 1.27 1.22 1.65 1.36 1.13 1.03

Asn 0.51 0.62 0.59 0.56 0.61 0.61 0.49 0.54 0.63 0.44 0.51 0.47 0.47 0.45 0.52 0.54 0.56 0.550.96 0.68 0.63 0.71 0.75 0.73 1.45 1.1 0.75 1.29 1.37 1.19 1.48 1.82 2.1 1.19 1.52 1.41

Gln 0.53 0.61 0.58 0.59 0.59 0.56 0.53 0.57 0.59 0.54 0.56 0.52 0.57 0.49 0.51 0.51 0.57 0.520.68 0.55 0.91 0.84 0.73 0.77 0.92 1.3 1.26 1.3 1.52 1.74 1.28 1.94 1.27 1.81 1.3 1.51

Asp 0.48 0.58 0.60 0.52 0.57 0.58 0.52 0.53 0.58 0.45 0.49 0.52 0.50 0.49 0.47 0.53 0.53 0.540.95 0.38 0.77 0.52 0.54 0.61 1.66 1.09 1.06 2.04 1.37 1.38 1.59 0.97 1.25 2.45 2.61 2.24

Glu 0.54 0.63 0.58 0.58 0.58 0.57 0.61 0.56 0.61 0.45 0.54 0.53 0.50 0.46 0.51 0.51 0.53 0.460.75 0.62 0.56 0.68 0.84 0.75 1.02 1.33 0.89 1.93 1.82 1.86 1 0.6 0.84 2.51 1.81 1.81

Arg 0.55 0.63 0.57 0.59 0.57 0.58 0.65 0.60 0.65 0.55 0.56 0.55 0.55 0.48 0.48 0.55 0.54 0.470.64 0.62 0.69 0.66 0.66 0.65 1.3 1.04 0.92 2.07 1.38 1.3 1.8 2.7 2.26 1.09 0.88 1.46

Lys 0.59 0.62 0.57 0.63 0.58 0.50 0.42 0.61 0.59 0.48 0.61 0.54 0.57 0.50 0.50 0.50 0.56 0.630.76 0.59 0.62 0.77 0.69 0.35 1.62 1.26 1.27 0.72 0.99 2.64 1.45 2.64 3.12 0.61 0.55 1.29

His 0.53 0.62 0.62 0.58 0.62 0.65 0.51 0.58 0.57 0.52 0.55 0.43 0.56 0.52 0.47 0.53 0.62 0.570.78 0.53 0.48 0.85 0.79 0.79 1.01 1.54 1.13 2.05 1.05 1.9 1.39 1.93 1.5 1.3 1.15 1.66

The amino acid residues as targets are sorted in the first column, whereas the neighbours are in the first row. The contact propensityis given in bold.

216 Conformational Specificity in Protein Interiors

comparable fractions are found in categories 2and 4, with the exception of Phe having a higherfrequency in the latter category. The remaining20–25% of the contacts divide more or less equallyin bins 1 (low overlap–low Sm) and 3 (high over-lap–low Sm). Target–alanine interactions preferen-tially populate low overlap categories (1 and 2)with approximately double the number of contactsin 2. So great is the asymmetry in the distributionof contacts between categories of low and high sur-face complementarity, that the predicted randomfrequency for angular bins in the low fit categories(1 and 3) having a count less than 5 is quite com-mon. The statistical distributions for both surfacesare in good agreement, though there is a systematictendency in VDW to have larger number ofcontacts.

For interactions between target and main-chainatoms, the distribution is dominated by the lowoverlap categories (similar to target–alanine), with65–75% of the contacts in 2 and the rest in category1, the fraction of contacts in 3 being negligible.Upon removal of contacts between target andmain-chain atoms of adjacent residues, there isa reduction of contacts in 1 to less than 20%(Table 4), with practically no change in the occu-pancies of bins with high overlap. Even if the over-all patterns are quite similar, both surfaces displaysystematic differences. For VDW surfaces, a largerfraction of contacts (by about 10–15%) is found inbin 2 relative to Connolly with a concomitantreduction of contacts in 4. The fraction of contactsin bin 1 is also systematically less in VDW. Thoughinitially the total number of contacts in VDWexceeds Connolly for all pairs, the situationreverses upon excluding contacts between targetand main-chain atoms of adjacent residues, fromboth surfaces.

Thus the results unambiguously demonstratethat the majority of contacts (75–85%) for all target–neighbour pairs are of high surface complementar-ity, with comparable fractions in categories of lowand high overlap in the case of interactionsbetween side-chains. The question then arises asto the extent and nature of constraints imposedupon interresidue geometry by steric fit.

Geometry of target–neighbour contacts

For each target–neighbour pair, the contacts in aparticular category were separately assessed fordeviations from a random geometry by meansof x2. Following Singh & Thornton,25 an internalcartesian frame of reference was defined for everyresidue based on the atomic positions of its side-chain atoms (Table 5). For target–main-chain

Table 4. Percentage occupancy in the four contact cat-egories for every target–neighbour pair

Percentage occupancyin the four categories

Target NeighbourTotal no. of

contacts 1 2 3 4

Leu Ala 566 25.3 59.5 6.0 9.2603 15.2 66.2 8.5 10.1

Leu Leu 1492 12.8 37.3 12.5 37.41520 8.8 42.3 15.5 33.4

Leu Ile 1032 11.9 37.5 11.9 38.71050 9.4 41.5 15.0 34.1

Leu Val 1049 15.6 41.2 11.0 32.21069 12.1 46.1 13.8 28.0

Leu Phe 801 8.4 34.4 9.6 47.6806 6.7 35.7 11.3 46.3

Ile Ala 411 24.1 56.7 5.8 13.4449 16.3 62.1 8.9 12.7

Ile Leu 1072 11.8 37.2 12.2 38.71094 9.5 42.7 13.1 34.7

Ile Ile 709 9.4 39.8 12.0 38.8714 8.3 42.2 13.0 36.5

Ile Val 735 12.1 42.9 11.7 33.3754 8.4 48.5 14.1 29.0

Ile Phe 553 8.5 34.2 11.6 45.7558 7.3 37.6 13.8 41.2

Val Ala 443 21.0 51.7 9.9 17.4478 11.7 59.8 12.3 16.1

Val Leu 1016 10.4 31.5 13.1 45.01048 8.3 35.7 15.7 40.3

Val Ile 713 6.9 34.6 12.9 45.6733 5.6 37.8 14.5 42.1

Val Val 809 11.4 35.4 15.8 37.4834 7.7 39.9 18.2 34.2

Val Phe 482 7.9 29.5 12.4 50.2489 7.4 31.1 13.7 47.8

Phe Ala 279 27.6 63.4 3.2 5.7288 19.8 70.1 3.1 7.0

Phe Leu 732 11.1 47.7 7.2 34.0743 11.3 50.2 9.0 29.5

Phe Ile 476 13.2 45.8 7.1 33.8479 12.7 45.5 9.6 32.2

Phe Val 472 15.3 50.2 4.7 29.9491 14.9 53.0 7.7 24.4

Phe Phe 438 7.8 38.6 8.4 45.2444 9.0 43.0 6.8 41.2

Leu Main 3951 19.0 70.3 1.4 9.33807 9.9 84.0 1.8 4.4

Ile Main 3012 19.3 70.7 1.6 8.42921 9.6 85.0 1.5 3.9

Val Main 3249 19.0 67.6 2.2 11.23154 11.4 80.0 2.1 6.4

Phe Main 2160 18.6 76.2 0.5 4.72093 12.1 85.4 0.7 1.8

The four categories are: 1, low overlap–low complementarity;2, low overlap–high complementarity; 3, high overlap–lowcomplementarity; and 4, high overlap–high complementarity.

For every target–neighbour pair the first row consists of resultscalculated on Connolly surfaces and the second van der Waal(subsequent to scaling). Main-chain neighbours are designatedas Main. Contacts between target and main-chain atoms fromresidues adjacent to target have not been included.

Conformational Specificity in Protein Interiors 217

contacts, two adjacent peptide planes were con-sidered on either side of the neighbour’s centrallylocated Ca atom. The coordinates of both targetand neighbour were then transformed to theinternal reference frame of the target and the originof the neighbour’s coordinate system assignedspherical polar coordinates ðr; u;wÞ: The averageradial distance was calculated for all contacts in acategory, along with the distribution in polarangle u and azimuthal angle w in appropriate binswhose angular ranges were determined by the resi-due type in the positions of target and neighbour(see Materials and Methods). In addition, twomore angles c1 (angle between the z axes of targetand neighbour) and c2 (angle between their x-axes) were defined to characterize interresiduegeometry. The distribution in the angle subtendedby two randomly oriented vectors has a probabilitydensity given by sin u0 du0=2; where u0 is the anglebetween the vectors. Thus for a random distri-bution the probability of u, c1 and c2 falls of as afunction of sin u; whereas each bin should beequally populated for w. The x2 was used to esti-mate the deviation from a random distribution,and was calculated twice for target–main-chaincontacts, once for each peptide plane precedingand following the neighbour. The results areshown in Tables 6 and 7.

For every target–neighbour pair, categories 1and 2 on one hand, and 3 and 4 on the other havecomparable average radial distances ðkrlÞ: Thedifference in krl between the categories of low (1and 2) and high overlap (3 and 4) can be as highas 2.0 A.

For target–side-chain interactions some charac-teristic patterns in x2 are consistently observed foralmost all pairs. Contacts with high fit and overlap(4) exhibit pronounced deviation from a randomdistribution for at least one of the anglesðu;w;c1;c2Þ characterizing interresidue geometry,marked by exceptionally high x2. In general, lowestx2 are found for contacts in category 1 with valuesin categories 2 and 3 lying intermediate between 1and 4.

All contacts involving Leu, Val and Ile targetshave exorbitantly high x2 for (f (category 4), dueto preferential occupation of lower angular ranges0–608 (Leu, Val) and 300–0–608 for Ile than war-ranted by a random distribution (Tables 6 and 8).In addition, x2 in u (excluding Leu-Ala, Ile-Ala)also tends to be high for the same targets as a con-sequence of increased frequency from 08 to 608, at

Table 5. Coordinate systems for the amino acid residuesdefined on the side-chain atoms

Diagram of coordinate sys-tem Definition of coordinate system

Ala Origin: CB

Val Origin: CB; X ¼ (CG1 þ CG2)/2 2 CB, Y ¼ Z £ X, Z ¼ (CG1 2CB) £ (CG2 2 CB)

Leu Origin: CG; X ¼ (CD1 þ CD2)/2 2 CG, Y ¼ Z £ X, Z ¼ (CD1 2

CG) £ (CD2 2 CG)

Ile Origin: CB; X ¼ (CG1 þ CG2)/2 2 CB, Y ¼ Z £ X, Z ¼ (CG2 2CB) £ (CG1 2 CB)

Phe Origin (O): centroid of CG, CD1,CD2, CE1, CE2, CZ;X ¼ CG 2 O, Y ¼ (CD2 þ CE2)/2 2 O, Z ¼ X £ Y

Main-chain Origin: C; X ¼ N 2 C, Z ¼ (O–C) £ (N 2 C), Y ¼ Z £ X

The atomic positions are indicated by the pdb code for theatom type. The vectors defined below have been normalized tounity.

Figure 3. Plot of frequency of contacts versus overlapfor interactions involving leucine as both target andneighbour.

218 Conformational Specificity in Protein Interiors

Table 6. The x2 in u, w, c1, c2 along with the average radial distance krl (s in parentheses) for the subset of contacts inthe four categories, for every target–neighbour pair

x2

Cat. no. Target Neighbour krl (A) u w c1 c2

1 Leu Ala 5.2 (0.6) 13.9 21.3 – –2 Leu Ala 5.3 (0.5) 14.6 80.7 – –3 Leu Ala 4.7 (0.3) 2.7p 30.1 – –4 Leu Ala 4.7 (0.4) 8.6p 30.5 – –

1 Leu Leu 6.4 (0.8) 10.0 50.9 40.6 11.42 Leu Leu 6.4 (0.7) 12.5 115.7 81.4 32.53 Leu Leu 5.7 (0.6) 11.6 95.6 3.6 12.34 Leu Leu 5.6 (0.6) 53.2 171.2 45.0 29.9

1 Leu Ile 6.6 (1.0) 4.7 4.7 5.4 6.42 Leu Ile 6.7 (0.9) 15.6 32.4 26.8 16.93 Leu Ile 6.1 (0.8) 13.9 36.9 10.9 9.44 Leu Ile 5.9 (0.8) 21.8 88.2 21.3 17.4

1 Leu Val 6.3 (0.8) 11.7 3.5 5.9 19.62 Leu Val 6.2 (0.7) 11.8 30.8 24.0 35.13 Leu Val 5.6 (0.6) 6.3 43.0 15.5 10.24 Leu Val 5.6 (0.6) 31.0 87.4 8.3 11.1

1 Leu Phe 6.9 (0.7) 8.1p 19.7 9.7p 4.5p

2 Leu Phe 6.5 (0.7) 7.1 75.5 7.6 4.03 Leu Phe 5.9 (0.5) 1.4 41.6 5.1 4.34 Leu Phe 5.5 (0.7) 28.2 151.1 30.6 9.5

1 Ile Ala 5.5 (0.8) 4.0 12.0 – –2 Ile Ala 5.6 (0.8) 13.5 79.1 – –3 Ile Ala 5.0 (0.5) 13.1p 26.5p – –4 Ile Ala 5.0 (0.5) 23.3p 42.2 – –

1 Ile Leu 6.7 (0.9) 28.4 12.8 1.7 3.82 Ile Leu 6.7 (0.9) 21.8 128.6 27.4 34.53 Ile Leu 6.2 (0.8) 14.4 104.6 11.7 19.94 Ile Leu 5.9 (0.7) 108.3 159.3 19.0 14.6

1 Ile Ile 7.2 (1.3) 10.9p 22.3 16.2p 10.3p

2 Ile Ile 6.9 (1.1) 33.3 58.7 61.4 44.73 Ile Ile 6.4 (1.0) 16.8 60.8 20.5 18.34 Ile Ile 6.2 (0.8) 42.2 107.3 44.0 16.1

1 Ile Val 6.3 (1.2) 6.4 21.5 59.3 34.22 Ile Val 6.6 (0.9) 21.6 71.5 171.6 59.93 Ile Val 5.8 (0.8) 8.6 31.5 17.7 17.64 Ile Val 6.0 (0.7) 49.0 121.8 34.0 15.6

1 Ile Phe 7.1 (0.9) 4.2p 17.5 4.4p 1.4p

2 Ile Phe 7.0 (1.0) 9.5 62.3 3.5 5.33 Ile Phe 6.3 (0.7) 13.9p 32.8 2.8p 6.0p

4 Ile Phe 5.8 (0.9) 68.5 110.9 14.9 5.0

1 Val Ala 5.2 (0.7) 8.5 6.3 – –2 Val Ala 5.4 (0.6) 26.1 57.2 – –3 Val Ala 4.9 (0.4) 7.8p 25.1 – –4 Val Ala 4.8 (0.5) 47.5 70.5 – –

1 Val Leu 6.4 (0.8) 3.7 25.0 5.0 5.82 Val Leu 6.3 (0.7) 6.4 66.3 19.1 36.03 Val Leu 5.7 (0.6) 22.1 103.4 14.8 9.24 Val Leu 5.7 (0.6) 104.6 234.6 15.4 13.5

1 Val Ile 6.5 (1.2) 2.3p 1.8 19.9p 26.5p

2 Val Ile 6.6 (1.0) 16.7 52.2 145.5 32.63 Val Ile 5.8 (0.9) 15.4 38.9 38.2 33.54 Val Ile 6.0 (0.8) 54.3 139 56.6 25.3

1 Val Val 6.0 (0.8) 5.7 11.6 85.0 36.62 Val Val 6.3 (0.7) 27.4 65.2 126.3 48.13 Val Val 5.6 (0.7) 30.4 77.9 6.4 7.54 Val Val 5.7 (0.6) 82.3 152.5 52.5 16.1

1 Val Phe 6.7 (0.6) 6.4p 10.3 3.7p 7.7p

2 Val Phe 6.9 (0.9) 13.3 45.7 10.0 30.83 Val Phe 5.9 (0.5) 14.0p 30.7 9.0p 2.3p

4 Val Phe 5.5 (0.7) 77.0 94.9 12.0 3.1

(continued)

Conformational Specificity in Protein Interiors 219

Table 6 continued

x2

Cat. no. Target Neighbour krl (A) u w c1 c2

1 Phe Ala 5.6 (0.9) 9.9 5.5 – –2 Phe Ala 5.2 (0.8) 0.9 14.3 – –3 Phe Ala 4.6 (0.6) 5.2p 4.7p – –4 Phe Ala 4.2 (0.4) 15.7p 1.6 – –

1 Phe Leu 6.7 (0.8) 7.5 0.3 1.8 3.52 Phe Leu 6.4 (0.7) 14.4 21.1 5.2 2.93 Phe Leu 5.8 (0.5) 23.1 13.7 0.5 3.94 Phe Leu 5.3 (0.6) 102.9 10.8 26.7 8.5

1 Phe Ile 6.9 (1.0) 18.7 0.3 10.5 5.92 Phe Ile 6.9 (0.9) 15.3 0.5 8.0 13.13 Phe Ile 6.3 (0.6) 11.8p 5.7 7.9p 6.2p

4 Phe Ile 5.6 (0.8) 40.6 15.9 26.3 3.8

1 Phe Val 6.5 (0.7) 16.9 2.3 1.2 5.22 Phe Val 6.4 (0.9) 5.0 3.9 6.7 12.43 Phe Val 5.8 (0.5) 1.6p 10.2 3.1p 3.9p

4 Phe Val 5.2 (0.7) 66.2 9.2 14.2 2.7

1 Phe Phe 7.0 (0.7) 5.9p 4.6 5.8p 6.1p

2 Phe Phe 6.4 (0.8) 5.2 17.1 1.6 7.93 Phe Phe 5.9 (0.5) 3.9p 8.8 1.6p 11.4p

4 Phe Phe 5.4 (0.5) 18.3 1.6 9.6 11.1

1 Leu Main1 7.1 (1.2) 21.2 3.2 24.6 6.22 Leu Main1 6.5 (1.3) 30.0 15.5 35.4 66.73 Leu Main1 5.8 (1.2) 12.9p 6.4 11.3p 10.0p

4 Leu Main1 5.7 (1.1) 24.1 26.8 17.9 11.6

1 Leu Main2 6.1 (0.9) 7.4 0.7 28.2 15.82 Leu Main2 5.7 (0.8) 69.0 50.2 71.3 87.23 Leu Main2 5.1 (0.5) 7.1p 7.6 18.2p 13.8p

4 Leu Main2 5.2 (0.7) 11.0 21.2 21.7 15.0

1 Ile Main1 7.1 (1.4) 33.1 29.4 11.8 17.62 Ile Main1 6.6 (1.4) 186.3 223.4 30.7 36.93 Ile Main1 5.7 (1.2) 32.9p 21.4 18.5p 15.8p

4 Ile Main1 6.2 (1.2) 62.1 96.8 16.0 32.8

1 Ile Main2 6.1 (1.1) 71.5 44.2 8.9 23.32 Ile Main2 5.8 (1.0) 231.6 172.1 21.0 23.03 Ile Main2 5.2 (1.0) 47.7p 25.8 22.9p 37.2p

4 Ile Main2 5.3 (0.9) 92.7 71.8 27.2 23.2

1 Val Main1 6.8 (1.2) 63.4 18.6 17.0 46.72 Val Main1 6.4 (1.4) 92.4 77.5 44.7 16.93 Val Main1 6.1 (1.3) 18.0p 39.3 10.5p 6.2p

4 Val Main1 6.0 (1.1) 87.6 98.0 33.9 4.8

1 Val Main2 5.8 (1.0) 82.5 15.4 35.8 19.52 Val Main2 5.6 (0.8) 144.8 41.4 27.5 42.63 Val Main2 5.3 (0.6) 24.8p 36.3 4.5p 7.6p

4 Val Main2 5.1 (0.8) 146.8 71.9 19.6 14.2

1 Phe Main1 7.3 (1.3) 18.0 7.4 3.4 13.32 Phe Main1 6.9 (1.4) 89.1 13.8 11.3 11.03 Phe Main1 5.8 (0.5) 7.4p 0.7 4.0p 2.0p

4 Phe Main1 5.7 (0.9) 0.3 8.9 2.0 1.8

1 Phe Main2 6.5 (0.9) 47.4 12.8 12.1 4.22 Phe Main2 6.1 (0.9) 136.1 22.5 12.5 12.53 Phe Main2 5.4 (0.5) 4.7p 1.3 4.6p 7.1p

4 Phe Main2 5.2 (0.6) 0.9 1.6 9.2 5.4

The x20.05 for a three-bin and six-bin model are 5.991 and 11.071, respectively. Only results obtained from Connolly surfaces have

been tabulated. For target–main-chain interactions the first and second peptide planes are designated Main1 and Main2, respectively.All target–neighbour pairs which have a predicted frequency of less than 5 for a particular angular bin, assuming a random distri-bution are marked with an asterisk ( p ).

220 Conformational Specificity in Protein Interiors

the expense of 120–1808 (Tables 6 and 9). The x2

values in u, w are comparable for categories 2 and3. However, the orientation between target andneighbour appears more constrained in 2 relativeto 3, due to increased x2 for c1 or c2 in severalinstances (Leu-Leu, Val-Leu, Val-Val, Val-Ile, Ile-Valand Ile-Ile). Different distributions are observedfor high x2 in c1 (Table 9). Generally, the lowerangular range from 08 to 308 exhibits a highercount (Val-Val, Val-Ile, Ile-Ile, Ile-Val) in both cate-gories 2 and 4, though 150–1808 may also be a pre-

ferred angular range (Leu-Leu: 4). Val-Val has asignificant reduction in counts from 1208 to 1508.

Phe targets have distinctly different patterns inx2 from Leu, Ile and Val. No pronounced deviationis observed for f, though u shows an abrupt rise inx2 (4) due to the increased incidence of contacts inthe 0–308 angular bin. Of all interactions involvingPhe as a target, Phe-Phe contacts show minimumconstraints in angular orientation, x2 in u being18.3, 34.4 for Connolly and VDW surfaces, respec-tively. Phe-Leu and Phe-Ile tend to have significantly

Table 8. The distribution in f for selected target–neighbour pairs which exhibit exceptionally high x2 in the category ofcontacts with high fit and complementarity (4)

% Occupancy in bins with f (deg.) range

Target Neighbour x2 0–60 60–120 120–180 180–240 240–300 300–360

33.33 33.33 33.33Random 0.0 16.67 16.67 16.67 16.67 16.67 16.67

Leu Leu 171.2 52.2 39.6 8.2 – – –Leu Ile 88.2 49.9 37.8 12.3 – – –Leu Val 87.4 53.3 34.9 11.8 – – –Leu Phe 151.1 57.5 36.2 6.3 – – –Val Ala 70.5 77.9 16.9 5.2 – – –Val Leu 234.6 64.1 30.0 5.9 – – –Val Ile 139.0 61.8 29.2 8.9 – – –Val Val 152.5 64.7 27.7 7.6 – – –Val Phe 94.9 60.3 30.2 9.5 – – –Ile Leu 159.3 25.8 20.5 2.2 5.3 15.4 30.8Ile Ile 107.3 27.3 19.6 5.1 5.8 10.2 32.0Ile Val 121.8 31.0 17.6 2.9 4.5 11.4 32.6Ile Phe 110.9 34.4 15.4 4.0 3.5 17.0 25.7Ile Main1 223.4 19.5 23.3 18.1 7.0 12.4 19.7Ile Main2 172.1 20.1 20.6 18.3 8.0 12.7 20.3

For sampling from 08 to 1808 the predicted percentage occupancies are given in bold for a random distribution. Same conventionsapply for target–main pairs from category 2 as in the previous Tables.

Table 7. The distribution in u for selected target–neighbour pairs which exhibit exceptionally high x2 in the category ofcontacts with high fit and complementarity (4)

% Occupancy in bins with u (deg.) range

Target Neighbour x2 0–30 30–60 60–90 90–120 120–150 150–180

13.4 36.6 50.0Random 0.0 6.7 18.3 25.0 25.0 18.3 6.7

Leu Leu 53.2 13.8 19.7 22.2 18.8 19.2 6.3Val Leu 104.6 13.8 20.8 30.4 28.0 6.8 0.2Val Val 82.3 14.2 26.1 24.4 29.4 5.3 0.7Val Ile 54.3 10.5 25.5 29.5 25.2 8.9 0.3Val Phe 77.0 15.3 27.7 29.7 21.1 5.8 0.4Ile Leu 108.3 12.5 28.4 28.2 25.1 5.3 0.5Ile Phe 68.5 13.0 28.9 28.1 23.7 6.3 0.0Ile Val 49.0 7.3 27.8 31.8 26.1 5.7 1.2Phe Leu 102.9 33.7 39.0 27.3 – – –Phe Ile 40.6 27.3 44.1 28.6 – – –Phe Val 66.2 35.5 37.6 26.9 – – –Phe Phe 18.3 23.7 32.8 43.4 – – –Val Main1 92.4 4.7 17.4 32.6 20.8 16.3 8.2Val Main2 144.8 3.2 19.2 30.6 22.3 14.1 10.6Ile Main1 186.3 3.9 19.5 35.8 20.7 12.4 7.7Ile Main2 231.6 3.8 22.5 33.8 20.7 9.9 9.3Phe Main1 89.1 8.6 30.1 61.3 – – –Phe Main2 136.1 6.6 29.9 63.5 – – –

For sampling from 08 to 908 the predicted percentage occupancies are given in bold for a random distribution. Only results obtainedfrom Connolly surfaces have been tabulated. For target–main-chain contacts from category 2 (bold), the two planes are designatedMain1 and Main2.

Conformational Specificity in Protein Interiors 221

higher x2 in c1, though the deviation in the inter-planar angle (c1) for Phe-Phe are relatively low.The pattern in x2 and the distribution of contactsin the angular bins are in agreement for Connollyand VDW surfaces.

Variability in the torsion angle x2, changes theshape of the isoleucine side-chain. The distributionfor x2 in isoleucine both as target and neighbourare in agreement with the one reported in therotamer library.26 The statistics given in the Tablesare for all isoleucine rotamers taken together.

Target–main-chain contacts have exceptionallyhigh x2 for selected angles in category 2 ratherthan 4. As has been previously mentioned contactsbetween target and main-chain atoms of adjacentresidues were removed prior to analysis, whichled to a substantial decrease in contacts with lowoverlap and complementarity (1). For every target,planes 1 and 2 were analysed separately. The aver-age radial distance krl is slightly higher for thefirst plane relative to the second. The x2 in u forboth planes are high for every target in category 2,due to increased frequency in u from 608 to 908 forIle, Val and Phe. The distribution of contacts in thiscase thus follow a pattern quite different from thatobtained for target–side-chain interactions. Reducedoccupancies in the 180–2408 bin for both planeslead to extreme x2 in w for Ile. Although instancesof high x2 for c1 and c2 are observed (Leu), theydo not appear to give a sustained and coherentpattern. Despite scaling, the differences in thenumber of contacts between VDW and Connollysurfaces for category 4 occasionally give significantdifferences in x2, though the overall patterns areconserved.

Discussion

Both the “jigsaw puzzle” and the “nuts andbolts” models agree that there is considerabledegree of compaction within proteins leading todensely packed interiors. The former model furtherasserts that such tight packing is possible due tothe stereospecific association of interacting side-

chains. The present calculations indeed show thatthe majority (75–85%) of the side-chain contactswithin proteins involving hydrophobic residues(Leu,Ile,Val,Phe) are of high steric fit. On the otherhand, frequent occurrence of extended overlap(.10%) between side-chains with low complemen-tarity is definitely ruled out, as category 3 in alltargets is sparsely populated with only 10–15% ofthe total number of contacts. The calculations thustend to disfavour the “nuts and bolts” model, ifby the term “nuts and bolts” is implied the possi-bility of dense compaction in proteins, withoutsome degree of match between complementarysurfaces as an essential requirement. However,contacts with high fit divide more or less equallybetween categories of low (2) and high (4) overlap,with specific angles in category 4 characterizingtarget–neighbour geometry maintaining the largestdeviation from a random distribution. Such devi-ations persist in category 2 though to a much lesserextent than 4. Average radial distance krl betweentarget–neighbour also significantly reduces in 4relative to 2. The majority of hydrophobic contactswithin proteins thus primarily fall into two equallypopulated classes: interactions with high fit andoverlap perhaps likened to adjacent pieces of athree-dimensional jigsaw puzzle and those withhigh fit but increased separation between targetand neighbour leading to both lower overlap andreduced geometrical constraints. Both categories 2and 3 show high x2 in f (Leu,Ile,Val). In addition,elevated x2 for c1 or c2 in 2 for several target neigh-bour pairs indicates that additional constraints areperhaps necessary for steric fit (albeit with lowoverlap) than for simple compaction (3). Theresults, thus suggest that of the two (Sm and over-lap) steric fit probably exercises a more stringentcontrol over possible interresidue orientationsthan the latter, manifesting in fairly regular pat-terns in the distributions of orientational angles u,f, and c1. Therefore to sum up, the present studytends to side with the jigsaw puzzle model giventhe fact that most interactions between buriedhydrophobic residues are characterized by highsurface complementarity and the heightened

Table 9. Distribution in c1 for target–neighbour pairs which exhibit exceptionally high x2 in categories 2 and 4

% Occupancy in bins with u (deg.) range

Target Neighbour x2 0–30 30–60 60–90 90–120 120–150 150–180

13.4 36.6 50.0Random 0.0 6.7 18.3 25.0 25.0 18.3 6.7

Leu Leu 81.4 13.1 17.2 18.0 17.1 24.4 10.2Val Val 126.3 16.8 33.9 23.1 19.6 2.8 3.8Val Ile 145.5 23.9 27.9 16.6 17.0 10.9 3.6Ile Val 171.6 23.5 26.3 18.4 17.8 11.1 2.9Leu Leu 45.0 8.8 21.0 21.7 16.5 20.6 11.5Val Val 52.5 14.9 18.5 29.0 23.4 7.3 6.9Val Ile 56.6 16.3 20.0 25.2 21.2 11.1 6.2Ile Val 34.0 15.1 19.2 26.9 20.4 11.8 6.5Ile Ile 44.0 14.9 20.0 30.2 17.8 11.6 5.5

In the case of category 2, the target and neighbour are indicated by bold face.

222 Conformational Specificity in Protein Interiors

constraints in interresidue geometry exhibitedby these contacts compared to those of lowersteric fit.

There is a certain degree of arbitrarinessinvolved in the criterion (Sm greater than or equalto 0.50) to distinguish low from high fit inConnolly surfaces. Scaling VDW to Connollyreduces its cutoff on Sm to around 0.44. By suitablyselecting criteria for low/high fit and overlap itshould be possible to preferentially fill up any oneof the categories from 1 to 4. The physically rele-vant fact, however, is that an overwhelmingmajority of hydrophobic contacts within proteinsmaintain some measure of surface complementar-ity which restricts the possible orientationsbetween interacting residues, as there can be nodoubt that constraints on target–neighbour geo-metry are indeed a function of steric fit.

Analysis of target–main-chain interactionsemphasize the non-trivial role played by main-chain atoms in interior packing. About 20–30% ofthe side-chain surface of every buried residuepacks against main-chain atoms with high kSmlmade possible by 70–80% of these contacts to beof high steric fit. Thus the contribution of main-chain atoms though of less consequence than side-chain are nevertheless significant.

No target–neighbour pair appears to be pre-ferentially selected on account of a specificmatch between complementary surfaces, as allresidues when buried show more or less equalcapacity to pack tightly against their immediateatomic environment, indicated by comparablekSml (Table 2). Although there are minor vari-ations, no pair or set of pairs seems to domin-ate the list of contacts by a combination ofexorbitantly high Sm and propensity of contact.Given the fact that most buried residues havethree or four neighbours (with extended over-lap) the specificity between the primarysequence and fold appears to be distributedover the whole network of contacts in the pro-tein interior. The study thus tends to supportthe conclusion that even if the hydrophobiceffect is the predominant force in protein fold-ing, there are nevertheless definite constraintsin network geometry in terms of surface com-plementarity which have to be satisfied inorder to achieve stable and densely packedhydrophobic clusters.

It is obvious that all interactions do not con-tribute equally to the specificity betweensequence and fold. The present calculations givea well defined algorithm to identify the subset of“specific” contacts (those with high fit and over-lap) which exercise a relatively stringent controlover interresidue geometry. Without a fairlysophisticated scheme to classify contacts thestatistics may well average out the constraintsrestricting the orientations between interactingresidues. The surface complementarity is notbetween residue pairs, but rather between a resi-due and its full complement of neighbours. The

next phase of the study will be to develop theo-retical and graphical methods to analyze the fullnetwork of specific contacts stabilizing the mol-ecular structure.

Materials and Methods

A data set of 100 non-homologous protein X-raycrystal structures with resolution better than 2.00 A wereselected from the Protein Data Bank27 (Table 1). Extensiveuse was made of the SCOP database28 to ensure that notwo proteins belonged to the same family. Four pairs ofstructures (2ayh, 1jbc), (1ab9, 1arb), (1nif, 1acc), (1rro,1osa) belong to the same super-family (distinct for eachpair): 1xyz, 6xia, 1pii and 1tph are classified under thesame fold, though with different superfamilies. The restbelong to different folds. Care was taken to representevery class and avoid irregularities in the polypeptidechain. Proteins with deeply embedded and spatiallyextended cofactors (e.g. hemoglobin) were not includedin the database.

The ratio of the solvent-accessible areas2 (probe radius1.4 A) of a residue in the protein to that of the same resi-due in the fully extended Gly-Xaa-Gly fragment wasused to decide burial. A residue was considered buriedif the above ratio was less than or equal to 0.05.

The entire calculation has been performed onConnolly surfaces generated using the MS29 programwith a probe radius of 1.4 A. Connolly surfaces consistof two types of surface points: (a) contact point, whichis essentially the point of contact between the probesphere and a single atom; and (b) reentrant point,defined as the inward surface of the probe in touch withtwo or three atoms. The basic principle employed tocompute the surface was to preserve as far as possiblethe individual surface identities of the residues constitut-ing the polypeptide chain. The greater the length of pep-tide fragments used to generate the surface, the greaterthe distortion in the surfaces of constituent residues. Onthe other hand, any truncation of the chain (to calculatethe surface of a selected peptide) leaves ambiguous thestatus of reentrant points at the region of truncation. Toresolve and balance these conflicting claims the follow-ing algorithm was used.

1. Initially, a particular residue i was selected fromthe polypeptide chain and its surface calculatedin isolation. All surface points were retained bar-ring contact and reentrant points of (CvO)i and(N–H)i, which were filtered out.

2. The surface of the same residue was then com-puted again, this time with atoms (Ca–N–H)iþ1

and (Ca–CvO)i21 attached, from adjacent resi-dues. All contact and reentrant points of (CvO)i

were selected from this set. All surface points of(N–H)i were also retained, excluding the reentrantpoints defined by the simultaneous contact ofthe probe sphere to (N–H)i and (Ca–CvO)i21, asreentrants of (CvO) from the previous residuewould have already sampled these points. Allother dot points were rejected.

3. The selected surface points from both sets werethen merged and the surface of the protein com-pleted by moving down the chain one residue at atime. Suitable alterations were made to the pro-gram to deal with N and C-terminal residues.

Conformational Specificity in Protein Interiors 223

Ideally, removal of reentrant points reduces aConnolly surface to a VDW sub-surface, computed onthe same molecule. To distinguish physically meaningfulresults from artefacts of surface generation the calcu-lations were repeated on a set of VDW surfaces. Theouter envelope of VDW spheres was computed foreach residue i with attached atoms (Ca–CvO)i21 and(Ca–N–H)iþ1. Both surfaces were viewed in the graphicsdisplay program RasMol and were found to be adequate.

All hydrogen atoms were included for both theConnolly and VDW surfaces30 and the atomic radiiassigned from the all atom molecular mechanics forcefield.31 Both surfaces were sampled at 10 dots/A2.Hydrogen atoms bonded to Ca of glycine were labeledas side-chain atoms in the Connolly surface but con-sidered as main-chain atoms for VDW.

The surface complementarity, Sm was estimated usinga truncated version of the function Sc originally pro-posed by Lawrence & Colman.23 The goodness-of-fitwas computed only for buried residues, henceforthreferred to as targets. The surface complementarity ofthe target was estimated with respect to its immediateneighbourhood contributed by the rest of the polypep-tide chain. Buried residues either coordinated to abound metal ion or interacting with a ligand (#4 A)were not considered as targets, in addition to cysteineby virtue of its ability to form disulphide bridges. Formultiple side-chain conformations the conformer withthe higher occupancy was retained. In the case of equaloccupancies, atomic positions of the first conformer wasused. Every dot surface point on the target had its near-est neighbour calculated from surface points contributedby other residues. Then following Lawrence & Colman:23

SðxaÞ ¼ na·n0a expð2wlxa 2 x0

al2Þ

where xa; x0a are the coordinates of the dot point on thetarget and neighbour, respectively, na; n0

a being thecorresponding normals ðw ¼ 0:5Þ: The median of the dis-tribution of SðxÞ for all the surface points on the targetwas defined as its surface complementarity measure Smwith respect to its neighbourhood.

For every target, only surface points (both contact andreentrant) contributed by the side-chain atoms alonewere considered in estimating its steric fit with itsimmediate atomic environment. A neighbour nearest toa surface point on a target can, however, belong to eithera side or main-chain atom. Initially, therefore all side-chain surface points of a buried residue were sortedinto two bins based on whether its neighbour was froma side or main-chain atom, after which all dot points onthe target making contact with the same residue weregrouped together. Failure to find a neighbour to within3.5 A of a surface point (target) led to its exclusion fromsubsequent calculations. This distance cutoff was fairlysuperfluous, as almost all surface points on the targetwere able to identify neighbours within 3.5 A. Neigh-bours which were reentrants involving a side and main-chain atom, were considered to be contributed by themain-chain.

Let the total number of dot points of a buried residueðRTÞ be N and let n0 be a subset of N; which have fortheir nearest neighbours surface points of a particularresidue (RN). Then the overlap of RT with RN is definedas:

ðn0=NÞ £ 100:0

The overlap was also estimated by the summation ofcorresponding area elements, the rms deviation between

the two measures of overlap being effectively zero. Thesurface complementarity between RT and RN is definedas the median of the distribution of SðxÞ over the pointsn0: Thus the packing for every target (buried residue)was analysed in terms of overlaps with its neighboursand their corresponding Sm.

A target is said to be in contact with its neighbourwhen there is an overlap of at least one point. The pro-pensity of contact between a residue x (target) with resi-due y (neighbour) is defined as:32

PwknxyðkÞnðkÞ

PwknxðkÞnyðkÞ

where nðkÞ is the total count of target–neighbourdoublets of the kth protein in the database, nxyðkÞ thecount occurrence of residue x as target and residuey as neighbour, nxðkÞ the count occurrence of residuex as target irrespective of neighbour and nyðkÞ thecount of residue y as neighbour irrespective of target.The summation is over the number of proteins in thedatabase and wk is an appropriate weight, taken tobe 1. All target–neighbour pairs were counted separ-ately as the surface complementarity functionbetween two surfaces is, in general, non-commuta-tive, given the fact that the neighbourhood of everyside-chain surface is unique.

The methodology outlined by Singh & Thornton25 wasfollowed to characterize the relative geometry between atarget and its neighbour. For every residue an internalframe of reference was defined, based on the atoms con-stituting the amino acids (Table 5). To analyse targetmain-chain contacts, two adjacent peptide planes wereconsidered on either side of the Ca atom of the residuein contact with the target, with a coordinate systemlocated on each plane. The coordinates of both targetand neighbour were then transformed to the internalframe of the target. The origin of the coordinate systemlocated on the neighbour was assigned spherical polarcoordinates ðr; u;wÞ; where u (polar) is the angle sub-tended by the vector (from the origin of the target tothat of the neighbour) on the z axis of the target and w(azimuthal) the angle between the x axis (target) and theprojection of the same vector onto the x– y plane. Theaverage radial distance was calculated for all contacts ina category, along with the distribution in u and w. u wassampled in 308 bins from 08 to 1808 for Leu, Ile, Val and0–908 for Phe. Barring Ile (0–3608), w was transformedinto the angular range 0–1808, divided into 608 bins.The distribution in the angle subtended by two ran-domly oriented vectors has a probability density givenby sin u0 du0=2; where u0 is the angle between thevectors.19 For two planar amino acid residues in contactone natural choice of vectors would be the normal tothe ring planes subtending the interplanar angle. ForPhe and main-chain peptide planes the coordinate sys-tems were so chosen that the z-axis was coincident withtheir respective plane normals. For non-planar residues(Leu, Ile, Val) the z-axis was chosen normal to a planedefined by a subset of side-chain atoms (e.g. CG, CD1,CD2 for Leu). For contacts with Phe both as target andneighbour the distribution of interplanar angles was ana-lysed in three 308 bins from 08 to 908, otherwise a six-binmodel was utilized from 08 to 1808. In addition, theangle between the two X-axes of target and neighbourreference frames were sampled from 08 to 1808 in sixbins (spanning 308 each). Thus the distribution of fourangles analysed statistically were, azimuthal angle w,polar angle u, c1 (angle between the z axes of target and

224 Conformational Specificity in Protein Interiors

neighbour) and c2 (angle between the x-axes). For arandom distribution the probability of the last three fallsoff as a function of sin u; whereas each bin should beequally populated for w. The x2 was used to estimate thedeviation from a random distribution, and was calcu-lated twice for target–main-chain contacts, once foreach peptide plane preceding and following the neigh-bour. The statistics was computed separately for eachset of planes. Target–neighbour pairs for which main-chain atoms belonged to N and C-terminal residueswere not included in the calculations. Such cases consti-tuted a negligible fraction (generally less than 0.5%) ofthe total set of contacts.

The deviation from a random distribution was esti-mated using x2 defined as

PðE 2 OÞ2=E where E and O

are the expected and observed values, respectively.

Acknowledgements

The authors acknowledge the Council of Scientificand Industrial Research, Government of India, forfunding the project along with the ComputerCentre, Saha Institute of Nuclear Physics formaking available adequate computing facilities.Mr Gautam Garai of the Computer Centre wasespecially kind and cooperative at all the stagesof the study. The study has been inspired byProfessor Raghavan Varadarajan of the MolecularBiophysics Unit, Indian Institute of Science,Bangalore.

References

1. Crick, F. H. C. (1953). The packing of a-helices:simple coiled coils. Acta Crystallog. 6, 689–697.

2. Richards, F. M. (1974). The interpretation of proteinstructures total volume, group volume distributions& packing density. J. Mol. Biol. 82, 1–14.

3. Gassner, N. C., Baase, W. A. & Matthews, B. W.(1996). A test of the “jigsaw puzzle” model for pro-tein folding by multiple methionine substitutionswithin the core of T4 lysozyme. Proc. Natl Acad. Sci.USA, 93, 12155–12158.

4. Eriksson, A. E., Baase, W. A., Zhang, X.-J., Hienz,D. W., Blaber, M., Baldwin, E. P. & Matthews, B. W.(1992). Response of a protein structure to cavitycreating mutations and its relation to the hydro-phobic effect. Science, 255, 178–183.

5. Buckle, A. M., Cramer, P. & Fersht, A. R. (1996).Structural and energetic responses to cavity creatingmutations in hydrophobic cores: observation of aburied water molecule and the hydrophilic natureof such hydrophobic cavities. Biochemistry, 35,4298–4305.

6. Axe, D. D., Foster, N. W. & Fersht, A. R. (1996).Active barnase variants with completely randomhydrophobic cores. Proc. Natl Acad. Sci. USA, 93,5590–5594.

7. Lesk, A. M. & Chothia, C. (1980). How differentamino acid sequences determine similar proteinstructures: the structure and evolutionary dynamicsof the globins. J. Mol. Biol. 136, 225–270.

8. Beasley, J. R. & Hecht, M. H. (1997). Protein design:

the choice of de novo sequences. J. Biol. Chem. 272,2031–2034.

9. Dill, K. A. (1990). Dominant forces in protein folding.Biochemistry, 29, 7133–7155.

10. Kamtekar, S., Schiffer, H. X., Babik, J. M. & Hecht,M. H. (1993). Protein design by binary patterning ofpolar and non-polar amino acids. Science, 262,1680–1685.

11. Lau, K. F. & Dill, K. A. (1990). Theory of protein mut-ability and biogenesis. Proc. Natl Acad. Sci. USA, 87,638–642.

12. Sikorski, A. & Skolnick, J. (1989). Monte carlosimulation of equilibrium globular protein folding:a-helical bundles with long loops. Proc. Natl Acad.Sci. USA, 86, 2668–2672.

13. Lim, W. A. & Sauer, R. T. (1991). The role ofinternal packing interactions in determining thestructure and stability of a protein. J. Mol. Biol.219, 359–376.

14. Dahiyat, B. I. & Mayo, S. L. (1997). Probing the role ofpacking specificity in protein design. Proc. Natl Acad.Sci. USA, 94, 10172–10177.

15. Terwilliger, T. C. (1995). Engineering the stability andfunction of Gene V protein. Advan. Protein Chem. 46,177–215.

16. Shakhnovitch, E. I. & Finkelstein, A. V. (1989).Theory of cooperative transitions in protein mol-ecules. Why denaturation of protein molecules isa first-order phase transition. Biopolymers, 28,561–602.

17. Dill, K. A., Bromberg, S., Yue, K., Feibig, K. M., Yee,D. P., Thomas, P. D. & Chan, H. S. (1995). Principlesof protein folding—a perspective from simple exactmodels. Protein Sci. 4, 561–602.

18. Behe, M. J., Lattman, E. E. & Rose, G. D. (1991).The protein folding problem: the native folddetermines packing, but does packing determinethe native fold? Proc. Natl Acad. Sci. USA, 88,4195–4199.

19. Singh, J. & Thornton, J. M. (1985). The interactionbetween phenylalanine rings in proteins. FEBSLetters, 191, 1–6.

20. Samanta, U., Pal, D. & Chakrabarti, P. (1999). Packingof aromatic rings against tryptophan residues in pro-teins. Acta Crystallog. sect. D, 55, 1421–1427.

21. Brocchieri, L. & Karlin, S. (1994). Geometry of inter-planar residue contacts in protein structures. Proc.Natl Acad. Sci. USA, 91, 9297–9301.

22. Mitchell, J. B. O., Laskowski, R. A. & Thornton, J. M.(1997). Non randomness in side-chain packing : thedistribution of interplanar angles. Proteins: Struct.Funct. Genet. 29, 359–376.

23. Lawrence, M. C. & Colman, P. M. (1993). Shape com-plementarity at protein/protein interfaces. J. Mol.Biol. 234, 946–950.

24. Word, J. M., Lovell, S. C., LaBean, T. H., Taylor, H. C.,Zalis, M. E., Presley, B. K. et al. (1999). Visualisingand quantifying molecular goodness-of-fit: smallprobe contact dots with explicit hydrogen atoms.J. Mol. Biol. 285, 1711–1733.

25. Singh, J. & Thornton, J. M. (1990). SIRIUS: an auto-mated method for the analysis of the preferred pack-ing arrangement between protein groups. J. Mol. Biol.211, 595–615.

26. Dunbrack, R. L., Jr & Cohen, F. E. (1997). Bayesianstatistical analysis of protein side-chain rotamer pre-ferences. Protein Sci. 6, 1661–1681.

27. Bernstein, F. C., Koetzle, T. F., Williams, G. J. B.,Meyer, E. F., Brice, M. D., Rodgers, J. R. et al. (1997).

Conformational Specificity in Protein Interiors 225

The protein data bank: a computer based archival filefor macromolecular structures. J. Mol. Biol. 112,535–542.

28. Murzin, A. G., Brenner, S. E., Hubbard, T. & Chothia,C. (1995). SCOP: a structural classification of proteindatabase for the investigation of sequences andstructures. J. Mol. Biol. 247, 536–540.

29. Connolly, M. L. (1985). Computation of molecularvolume. J. Am. Chem. Soc. 107, 1118–1124.

30. Word, J. M., Lovell, S. C., Richardson, J. S. &Richardson, D. C. (1999). Asparagine and gluta-mine: using hydrogen atom contacts in the choiceof side-chain amide orientation. J. Mol. Biol. 285,1735–1747.

31. Cornell, W. D., Cieplak, P., Bayly, C. I., Gould, I. R.,Merz, K. M., Jr, Ferguson, D. M. et al. (1995).A second generation force field for the simulation ofproteins, nucleic acids and organic molecules. J. Am.Chem. Soc. 117, 5179–5197.

32. Karlin, S., Bucher, P. & Brendel, V. (1991). Statistical

methods and insights for protein and DNAsequences. Annu. Rev. Biophys. Chem. 20, 175–203.

Edited by J. Thornton

(Received 5 June 2003; received in revised form8 August 2003; accepted 11 August 2003)

Supplementary Material for this paper compris-ing one Table is available on Science Direct

226 Conformational Specificity in Protein Interiors