the surface of β‐sheet proteins contains amphiphilic regions which may provide clues about...

8
PROTEINS Structure, Function, and Genetics 25253-260 (1996) The Surface of P-Sheet Proteins Contains Amphiphilic Regions Which May Provide Clues About Protein Folding William Parker' and John J. Stezowski2 'Department of Surgery, Duke University Medical Center, Durham, North Carolina 27707; 2Department of Chemistry, University of Nebraska, Lincoln Nebraska 68588-0304 ABSTRACT A major bottleneck in the field of biochemistry is our limited understand- ing of the processes by which a protein folds into its native conformation. Much of the work on this issue has focused on the conserved core of the folded protein. However, one might imag- ine that a ubiquitous motif for unaided folding or for the recognition of chaperones may in- volve regions on the surface of the native struc- ture. We explore this possibility by an analysis of the spatial distribution of regions with am- phiphilic a-helical potential on the surface of P-sheet proteins. All proteins, including p-sheet proteins, con- tain regions with amphiphilic a-helical poten- tial. That is, any a-helix formed by that region would be amphiphilic, having both hydropho- bic and hydrophilic surfaces. In the three-di- mensional structure of all p-sheet proteins an- alyzed, we have found a distinct pattern in the spatial distribution of sequences with am- phiphilic a-helical potential. The amphiphilic regions occur in ring shaped clusters approxi- mately 20 to 30 A in diameter on the surface of the protein. In addition, these regions have a strong preference for positively charged amino acids and a lower preference for residues not favorable to a-helix formation. Although the purpose of these amphiphilic regions which are not associated with naturally occurring a-helix is unknown, they may play a critical role in highly conserved processes such as protein folding. o 1996 Wiey-Lies, ~nc. Key words: protein structure, protein folding, chaperone, folding path, amphi- philic sequences, p-sheet proteins INTRODUCTION Proteins fold in a manner which is still poorly un- derstood. Little effort has been directed toward find- ing possible motifs present on the surface of proteins which might be involved in the formation of a cor- rectly folded protein from an unfolded precursor. Ex- posed structural features which are involved in chaperone recognition or in folding might be ex- pected for several reasons. First, chaperones are likely to interact with proteins at a point in time in the folding process beyond where the conserved core is buried. Second, ubiquitous motifs which facilitate folding are perhaps more easily envisioned on the surface of a wide variety of proteins rather than in the conserved cores of those proteins, since the core region may be necessary for unique protein function and a stable folded state. We have identified a pos- sible motif on the surface of P-sheet proteins which may be important in protein folding. This motif was identified by an analysis of amphiphilic a-helical po- tential in p-sheet proteins. Many a-helices are amphiphilic, having both a hy- drophobic and a hydrophilic side to the protein structure.' Methods have been developed to quan- tify the amphilicity of a-helices which may be formed by any given s e q u e n ~ e . ~ ? ~ This quantity, termed amphiphilic potential, is useful in predicting the presence of a-helical segments in a variety of helical protein^.^ The prediction method works be- cause a high amphiphilic potential in a given se- quence is caused by the occurrence of certain amino acids in a fashion (3.6 residuedturn) which is spe- cific for an a-helix. Thus, high amphiphilic poten- tials should be a good indication of a-helix in any protein. Contrary to this logic, the predictive scheme fails dramatically when used with proteins which are not predominately a-helical. The failure results because very large amphiphilic a-helical potential exists for sequences which are not a-helical in the native c0nformation.4~~ These nonhelical regions have large (non-random) amphiphilic a-helical po- tential and are found in a wide variety of protein^.^,^ The function of these sequences has not been ex- plained. We report an observation that these regions are present in distinct arrangements on the surface of all p-sheet proteins which we have analyzed to date. Concluding that these arrangements do not arise from pure coincidence, we suggest that these Received May 15, 1995;revision accepted December 7,1995. Address reprint requests to William Parker, Department of Surgery, Duke University Medical Center, Box 2605, Durham, NC 27710. 0 1996 WILEY-LISS. INC.

Upload: john-j

Post on 06-Jun-2016

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: The surface of β‐sheet proteins contains amphiphilic regions which may provide clues about protein folding

PROTEINS Structure, Function, and Genetics 25253-260 (1996)

The Surface of P-Sheet Proteins Contains Amphiphilic Regions Which May Provide Clues About Protein Folding William Parker' and John J. Stezowski2 'Department of Surgery, Duke University Medical Center, Durham, North Carolina 27707; 2Department of Chemistry, University of Nebraska, Lincoln Nebraska 68588-0304

ABSTRACT A major bottleneck in the field of biochemistry is our limited understand- ing of the processes by which a protein folds into its native conformation. Much of the work on this issue has focused on the conserved core of the folded protein. However, one might imag- ine that a ubiquitous motif for unaided folding or for the recognition of chaperones may in- volve regions on the surface of the native struc- ture. We explore this possibility by an analysis of the spatial distribution of regions with am- phiphilic a-helical potential on the surface of P-sheet proteins. All proteins, including p-sheet proteins, con-

tain regions with amphiphilic a-helical poten- tial. That is, any a-helix formed by that region would be amphiphilic, having both hydropho- bic and hydrophilic surfaces. In the three-di- mensional structure of all p-sheet proteins an- alyzed, we have found a distinct pattern in the spatial distribution of sequences with am- phiphilic a-helical potential. The amphiphilic regions occur in ring shaped clusters approxi- mately 20 to 30 A in diameter on the surface of the protein. In addition, these regions have a strong preference for positively charged amino acids and a lower preference for residues not favorable to a-helix formation. Although the purpose of these amphiphilic regions which are not associated with naturally occurring a-helix is unknown, they may play a critical role in highly conserved processes such as protein folding. o 1996 Wiey-Lies, ~ n c .

Key words: protein structure, protein folding, chaperone, folding path, amphi- philic sequences, p-sheet proteins

INTRODUCTION Proteins fold in a manner which is still poorly un-

derstood. Little effort has been directed toward find- ing possible motifs present on the surface of proteins which might be involved in the formation of a cor- rectly folded protein from an unfolded precursor. Ex- posed structural features which are involved in

chaperone recognition or in folding might be ex- pected for several reasons. First, chaperones are likely to interact with proteins at a point in time in the folding process beyond where the conserved core is buried. Second, ubiquitous motifs which facilitate folding are perhaps more easily envisioned on the surface of a wide variety of proteins rather than in the conserved cores of those proteins, since the core region may be necessary for unique protein function and a stable folded state. We have identified a pos- sible motif on the surface of P-sheet proteins which may be important in protein folding. This motif was identified by an analysis of amphiphilic a-helical po- tential in p-sheet proteins.

Many a-helices are amphiphilic, having both a hy- drophobic and a hydrophilic side to the protein structure.' Methods have been developed to quan- tify the amphilicity of a-helices which may be formed by any given sequen~e .~?~ This quantity, termed amphiphilic potential, is useful in predicting the presence of a-helical segments in a variety of helical protein^.^ The prediction method works be- cause a high amphiphilic potential in a given se- quence is caused by the occurrence of certain amino acids in a fashion (3.6 residuedturn) which is spe- cific for an a-helix. Thus, high amphiphilic poten- tials should be a good indication of a-helix in any protein. Contrary to this logic, the predictive scheme fails dramatically when used with proteins which are not predominately a-helical. The failure results because very large amphiphilic a-helical potential exists for sequences which are not a-helical in the native c0nformation.4~~ These nonhelical regions have large (non-random) amphiphilic a-helical po- tential and are found in a wide variety of protein^.^,^ The function of these sequences has not been ex- plained. We report an observation that these regions are present in distinct arrangements on the surface of all p-sheet proteins which we have analyzed to date. Concluding that these arrangements do not arise from pure coincidence, we suggest that these

Received May 15, 1995; revision accepted December 7,1995. Address reprint requests to William Parker, Department of

Surgery, Duke University Medical Center, Box 2605, Durham, NC 27710.

0 1996 WILEY-LISS. INC.

Page 2: The surface of β‐sheet proteins contains amphiphilic regions which may provide clues about protein folding

254 W. PARKER AND J.J. STEZOWSKI

regions of amphiphilic a-helical potential may be important in conserved processes such as protein folding, protein degradation, or interaction with other molecules such as chaperones.

Proteins Selected for Study Twenty proteins which contain predominantly

p-sheet were selected for study. The proteins selected were as follows (protein data bank identification in parentheses): porcine pepsin (3pep), papaya papain (lppd), N-terminal and C-terminal domains of As- pergillis TAKA amylase (2taa), chicken liver dihy- drofolate reductase (8dfr)) bovine chymosin B (lcms), human erythrocyte carbonic anhydrase B (2cab), Penicillium acid proteinase (3app), Streptomyces pro- teinase A (2sga), bovine pancreas ribonuclease A (lrn31, rat fatty acid binding protein (lifb), human retinol binding protein (lrbp), human interleukin-1 p (7ilb), Fab fragment of a mouse antibody (ldbj), Dro- sophila neural adhesion molecule (lcfi), human po- liovirus subunits (2plv; three proteins), flu neuraminidase (livc), and Erwinia pectate lyase C (2pec). Structures included representatives of com- mon protein fold families @-greek key, p-jelly roll, a/p-TIM barrel, a/$-doubly wound, a + p-sandwich and multidomain) as well as structures in the p-com- plex sandwich, p-trefoil, and P-propeller protein fold families. In addition, a random sequence was used for study. The random sequence was generated using the random numbers provided with the IBM basic pro- gram with the assumption that these numbers are truly random with respect to a-helical periodicity. The mapping of random numbers to amino acids was varied randomly after every 20,000 numbers to avoid repeated use of the same random sequence.

Selection of Regions With High a-Helical Potential

The helical hydrophobic moment, a quantitative measure of helical amphiphilicity, was calculated according to Eisenberg et ale2 by the following equa- tion. The hydrophobic moment is given by {[summa- tion from n = 1 to n = N of (H, sin + [sum- mation from n = 1 to n = N of (H,, cos (Sn)12}1’2/N, where n is a specific amino acid in a segment of N residues, 8 is the angle between residues as viewed down the helical axis (100 degrees for an a-helix), and H,, is the hydrophobic value assigned to residue n. Hydrophobic values were assigned to each amino acid according to Eisenberg et aL3 A segment length (N) of 11 residues was selected for all calculations and a moving window algorithm was used for all sequences as described by Rose.‘ Values for “high amphiphilic a-helical potential were defined as those which indicate a-helix in a-helical proteins. Since three consecutive amphiphilic potential val- ues greater than 0.35 are a good indication of helix in predominately a-helical proteins (89% of helices

METHODS

located4), the same cutoff values for the present analysis of p-sheet proteins were used. Probability of Random Occurrence of Regions With Amphiphilic a-Helical Potential

The probability of occurrences of regions with high amphiphilic a-helical potential by chance was estimated by tabulating the number of regions with high amphiphilic a-helical potential (defined above) in a randomly generated sequence 10,000 amino ac- ids long. Calculations of amphiphilic a-helical po- tential were performed on the random sequence as described above and the fraction of residues in an amphiphilic region of a given length was deter- mined. No continuous sequences of high amphiphilic potential greater than six amino acids in length were found in the random sequence of 10,000 amino acids. As extrapolation of the present results to larger segments of amphiphilicity might introduce large error, the fraction of amino acids occurring in regions of continuous amphiphilic potential longer than six residues was taken to be 1/10,000, resulting in a very conservative estimate (overestimate) of probability. The probability of an amphiphilic re- gion of given length occurring randomly at least once in a test sequence was then calculated based on probabilities derived from the random sequence analysis using a summation function. The probabil- ity of the occurrence of all regions with amphiphilic a-helical potential in a test sequence was approxi- mated as the product of the probabilities of the oc- currence of each individual amphiphilic region within that sequence. Amino Acid Preference of Regions With Amphiphilic a-Helical Potential

The relative frequencies of amino acids occurring in amphiphilic regions were quantitated using the “conformational parameter” (P) described by Chou and F a ~ m a n . ~ In our case, the amphiphilic confor- mational parameter Pam = f,,,,/<f,,>, where f, is the frequency of a given residue occurring in am- phiphilic regions, and <f,> is the average fre- quency of that residue in the primary sequence.

RESULTS Patterns of High Amphiphilic Potential in the Primary Structure

Table I shows the location and length of regions with amphiphilic a-helical potential in 20 p-sheet proteins. Fifteen out of 20 sequences contain a re- gion of high amphiphilic a-helical potential within 30 residues of the N-terminus. Fourteen out of the 20 sequences contain a region of high amphiphilic a-helical potential within 30 residues of the C-ter- minus. Although regions of high amphiphilic a-he- lical potential appear to be widely distributed rather than grouped in clusters, no other patterns of the occurrence of the amphiphilic regions in the primary sequence were identified.

Page 3: The surface of β‐sheet proteins contains amphiphilic regions which may provide clues about protein folding

CLUES ABOUT PROTEIN FOLDING 255

TABLE I. Location, Length, and Number of Regions With High Amphiphilic a-Helical Potential*

No. of No. of total amphiphilic Location and length of amphiphilic regions

Protein residues regions P value (residue, length) Pepsin

Papain

Amylase

Dihydrofolate reductase Cymosin B Carbonic anhydrase B

Acid proteinase Proteinase A Ribonuclease A Fatty acid binding protein

Retinol binding protein

Interleukin- 1 p Fab light chain Fab heavy chain Neural adhesion molecule Polio virus protein 1

Polio virus protein 2 Polio virus protein 3 Nuraminidase

Pectate lyase C

326 5

212 6

478 14

189 3 323 4 261 8

323 3 181 5 124 2 131 6

182 6

153 2 216 4 219 2 205 4 302 8

272 5 238 3 388 10

353 10

< 10-4

< 10-8

< 10-8

< 10-4 <0.001 < 10-6

<0.001 < 10-6 <0.006 < 10-6

<0.001

<0.001 C0.003 <0.001 <0.002 < 10-7

< 10-4 <0.001 < 10-6

<10-11

Phe27,4; Pro135,8; Asp215,4; Asp253,6;

Leu54,5; Glu89,lO; Va1110,7; Va1130,3;

Thr16,8; Gly44,5; Ala102,3; Tyr113,3; Thr179,4;

Ile306,14

Leu134,7; Tyr186,9

Trp190,5; His210,lO; Met246,3; Leu258,3; Asp271,7; Ser303,3; Gly398,4; Gln432,4; Gly457,9

Arg28,lO; Va174,25; Arg137,6 Lys53,5; Asn68,4; Asp140,3; Ile303,13 Pro14,4; Glu59,5; Asn74,6; Asp87,9; Ala133,3;

Gln103,8; Thr143,4; Phe208,7 Thr8,3; Arg46,6; His61,5; Ala92,5; Tyr166,7 Phe8,5; Pro93,4 Trp6,3; Lys20,9; Glu43,3; Asn57,8; Asn87,8;

Va161,3; Tyr118,4; Pro141,4; Pro146,4;

Leu6,5; Thr137,12 Ser26,4; Va156,lO; Thr77,4; Pro209,3 Gly8,9; Gly33,5; (sequence truncated) Pro6,3; Ala73,5; Asp109,3; Va1175,5 Leu6,12; Ser76,6; Arg120,4; Gln153,3;

Tyr205,9; Gly238,6; Lys256,3; Va1268,lO Tyr9,3; Arg37,6; Lys81,12; Gly102,3; Ser169,6 Glu102,6; Asp182,4; Ser221,ll Ser7,3; Va1131,14; Arg168,4; Va1206,15;

Asn155,12; Ile168,4; Ser229,4

Ala124,3

Va1152,4; Arg163,6

Pro245,ll; Gln266,3; Thr284,12; Asn321,3; Cys336,3; Arg349,4

Ala23,lO; Cys72,6; Glu86,7; Ile107,ll; Ile185,9; Asp200,6; Asn231,3; Phe257,8; Gly274,4; Leu3445

*The probability value (P) shown in column 4 was calculated as described in the Methods section and describes an estimate of the probability of the regions of amphiphilic a-helical potential found in these proteins occurring on a random basis. The estimate is very conservative in cases where proteins have continuous regions of amphiphilic potential longer than six amino acids. The amino acids listed in column 5 are the N-terminal amino acids of a sequence containing high amphiphilic a-helical potential and are followed by the length of the amphiphilic region. Amino acids are numbered sequentially as they are listed in the corresponding PDB file.

Evaluation of the position of regions of high am- phiphilic a-helical potential with respect to the po- sition of p-strands indicates that regions of am- phiphilic a-helical potential occur randomly with respect to the occurrence of P-strands and p-turns. On average, one third of the amino acids which oc- cur in regions of amphiphilic a-helical potential also occur in p-strand, but large variability is seen from protein to protein. This indicates that high am- phiphilic a-helical potentials are not a result of a harmonic of p-strand periodicity (periodicity of two residues/turn). Further evidence that amphiphilic a-helical potentials are not a result of a harmonic of p-strand periodicity is provided by a recent study which shows that regions of P-strand contain peri- odicity of a-helices rather than (not in addition to) the expected periodicity of p-strands.8

Patterns of High Amphiphilic Potential in the Three-Dimensional Structure

We have examined the position of regions with high amphiphilic a-helical potential in the three- dimensional structures of a representative sample (20 structures) of p-sheet proteins. In Figure 1, all regions which have high amphiphilic a-helical po- tential are shown as green CPK models. Some other regions with moderately high amphiphilic a-helical potential are shown as yellow CPK models. Since these proteins are predominately p-sheet proteins, few of the regions with high amphiphilic a-helical potential correspond to native helical structure.

The first 10 structures we examined are shown in Figure 1. A pattern in the spatial distribution of the regions containing amphiphilic a-helical potential

Page 4: The surface of β‐sheet proteins contains amphiphilic regions which may provide clues about protein folding

256

A

C

W. PARKER AND J.J. STEZOWSKI

Fig. 1A-D. (See legend, page 258.)

Page 5: The surface of β‐sheet proteins contains amphiphilic regions which may provide clues about protein folding

CLUES ABOUT PROTEIN FOLDING

E

F

G

H

257

Fig. 1 E-H. (See legend, page 258.)

Page 6: The surface of β‐sheet proteins contains amphiphilic regions which may provide clues about protein folding

258 W. PARKER AND J.J. STEZOWSKI

I

J

Fig. 1. Distribution of regions with amphiphilic a-helical poten- tial in predominately p-sheet proteins. The following proteins are each pictured (column 1) with the plane of amphiphilic regions horizontal and (column 2) rotated 90 degrees with respect to the images in column 1 so that the plane of amphiphilic regions is parallel with the paper: (A) porcine pepsin, (8) papaya papain, (C) C-terminal domain of Aspergillus TAKA-amylase A, (D) N-terminal domain of Aspergillus-TAKA amylase A, (E) chicken liver dihydro- folate reductase, (F) bovine chymosin 6, (G) human erythrocyte carbonic anhydrase 6, (H) Penicilliurn acid proteinase, (I) Strep-

in the native structure is evident (Fig. 1). The pat- terns shown in Figure 1 are typical and all p-sheet proteins which we have studied to date. In general, the potentially amphiphilic a-helical regions form somewhat planer “rings” or “U” shapes. The rings are 20 to 35 A in diameter, with the majority being approximately 25 to 30 A in diameter, depending on where they are measured. Each ring is composed of four to as many as ten amphiphilic elements. These rings are depicted in column 2 of Figure 1. Column 1 (Fig. 1) depicts the proteins turned approximately 90 degrees with respect to column 2 such that they have a minimum profile. The localization of these “rings” of amphiphilic potential is especially evident in the larger proteins (Fig. lA,C,D,F,G,H). The 30 A “rings” engulf smaller proteins (Fig. lB,E,I) almost entireIy. We have performed calculations to deter- mine the location of regions with amphiphilic a-he-

fornyces proteinase A, and (J) bovine pancreas ribonuclease. All regions which have large amphiphilic a-helical potentials4 are shown as green CPK models. Some regions with somewhat lower amphiphilic potential values (two consecutive values greater than 0.30) are shown as yellow CPK models. The length and cutoff values of these “weakly amphiphilic” regions are dependent on the protein studied. For example, ribonuclease contains much weaker amphiphilic potential throughout the sequence than does TAKA amylase.

lical potential in several small p-sheet proteins ( 4 6 0 residues) such as interleukin-1 receptor an- tagonist (PDB file 2irt), carboxylic ester hydrolase (PDB file 3bp2) and ribonuclease A (Fig. 1J). These small proteins contain a limited number (two or three) of regions with amphiphilic a-helical poten- tial, and thus may provide a simple starting point for further study of regions with amphiphilic a-he- lical potential in p-sheet proteins. However, since two or three segments will always lie in a plane, the analysis of the three-dimensional distribution of re- gions with amphiphilic a-helical potential in these proteins is less interesting.

The “cavity” between these amphiphilic rings is often filled with the N-terminal region of the protein (chymosin, papain, and acid proteinase). In other cases, it is filled exclusively with side chains or co- factor binding sites (dihydrofolate reductase, ribo-

Page 7: The surface of β‐sheet proteins contains amphiphilic regions which may provide clues about protein folding

CLUES ABOUT PROTEIN FOLDING 259

TABLE 11. Preferential Occurrence of the Amino Acids in Regions With Amphiphilic a-Helical

Potential and in a-Helix*

Amino acid Total residues Pam pa Arg 60 2.59 0.98 LYS 243 1.51 1.16 Gln 214 1.17 1.11 ASP 327 1.14 1.01 Leu 342 1.12 1.21 Asn 313 1.10 0.67 Ile 286 1.05 1.08 CYS 114 0.97 0.70 Glu 205 0.97 1.51 Phe 206 0.96 1.13

92 0.96 1.08 246 0.93 0.69

np 1.00

Tyr His 89 0.91 Val 372 0.89 1.06 Met 85 0.86 1.45 Pro 258 0.83 0.57 Thr 395 0.80 0.83 Ser 486 0.74 0.77 Ala 339 0.73 1.42 GlY 433 0.71 0.57

*The number of residues used in the calculation is listed in column two and the amphiphilic “conformational parameter” (Pam) for each amino acid is listed in column three. Column four lists the conformational parameter for a-helix (P,) as deter- mined by Chou and Fasman.6 All of the 20 predominately p-sheet structures which we have analyzed to date were used in the calculation.

nuclease, pepsin, carbonic anhydrase B) or loops which are only partially inserted into the cavity (C- terminal domain of TAKA amylase).

Amino Acid Composition of Regions With High Amphiphilic a-Helical Potential

The tendency of an amino acid to occur in am- phiphilic regions may be measured using methods which have been used successfully to determine which amino acids are preferentially found in a-he- lices, p-sheets, and other structures.6 Using this sys- tem, conformational parameters are assigned such that 1.0 indicates no preference of an amino acid for a given type of structure, lower numbers indicate that the amino acid is not often found, and larger numbers indicate a tendency for a given amino acid to be found. The tendency for each amino acid to be found in regions of amphiphilic a-helical potential (Pam) is shown in Table 11. The tendency for amino acids to be found in a-helical conformations (Pa) is given in Table I1 for comparison. The amphiphilic regions show a striking preference for the positively charged amino acids Arg and Lys (Pam > 1.0). The “helix breaking” residues Pro, Gly, Ser, and Thr (P, < 1.0) have low amphiphilic conformational param- eters (Pam < 1.0).

Probability of Occurrence of High Amphiphilic a-Helical Potential in a Random Sequence

Although it is unlikely that discrete patterns of amphiphilic regions apparently independent of the native fold could occur randomly, it is of interest to determine to what extent high amphiphilic a-helical potential may occur on a random basis. For this rea- son, the frequency of the occurrence of regions of high amphiphilic a-helical potential in random se- quences was tabulated. Three consecutive am- phiphilic potential values greater than 0.35 oc- curred randomly about once every 100 residues. Four consecutive high amphiphilic potential values occurred about once every 500 residues and five or six consecutive high amphiphilic potential values occurred once every 5,000 residues. Seven or more consecutive high amphiphilic potential values were not observed in the random sequence. Based on this random sequence analysis, the probability of the random occurrence of regions of amphiphilic a-heli- cal potential in the P-sheet proteins studied was cal- culated (Table I). The probabilities of the am- phiphilic regions occurring by chance ranged from P < 0.006 (ribonuclease) to P < 1 x (pectate lyase C). Further, in the protein sequences studied, runs of high amphiphilic a-helical potential occurred an average of every 134 residues which were greater in length than any consecutive runs of high potential observed in the entire random se- quence. Thus, while some shorter regions of am- phiphilic potential might occur on a random basis, the larger regions with amphiphilic a-helical poten- tial observed in the P-sheet proteins studied do not occur on a random basis.

DISCUSSION How the amino acid sequence of a protein deter-

mines the native structure of that protein is one of the major unsolved questions of science today. The folding process of a protein is responsible for the selection of a single native conformation from the seemingly infinite number of conformational possi- bilities. Although folding is energy driven, it is clear that it is the specificity of the protein for certain folding paths which guides the process, rather than free energy.’

Interest in protein folding has focused on the con- served hydrophobic “core” of the protein. It is ac- cepted that the formation of this core provides the required energy for driving the folding process, but there is a lack of understanding about the guidance mechanism of the folding process. The surface of the protein as a possible source of information concern- ing the guidance system of the folding process has been ignored to a large extent. Regions of positive electrostatic potential on the surface of proteins such as chymases, superoxide dismutase, profilin,

Page 8: The surface of β‐sheet proteins contains amphiphilic regions which may provide clues about protein folding

260 W. PARKER AND J.J. STEZOWSKI

fibroblast growth factor receptor, and others have been studied because they are thought to be impor- tant in interactions with other proteins or sub- strates. Unfortunately, when such regions have been found in the protease protein CIO or in the IL and human growth hormone family of proteins,” they have not been addressed or have been attrib- uted to a “novel substrate recognition exosite.” Analysis of surface charge on proteins seldom re- veals a random distribution of such charge. It is pos- sible that these patterns and the patterns described in this work are indicative of a general motif for folding or chaperone recognition. For example, re- gions of amphiphilic a-helical potential may be key components in a diffusion-collision mechanism of protein folding or may bind to a chaperone, main- taining a conformationally restricted folding inter- mediate.

While we observe that the rings of amphiphilic structure are conserved, the sequences that encode these regions and probably the exact location of these regions in the primary sequence is not highly conserved. In this way, these sequences are analo- gous to the positively charged face of superoxide dis- mutase, where it is proposed by Desideri and co- workers that “coordinated mutations” have occurred in the protein, thus maintaining the electrostatic pattern on the surface of the protein.”

At present, there is little experimental evidence that these regions of high amphiphilic a-helical po- tential are important during any in vivo process such as protein folding. Preliminary evidence was presented recently demonstrating that tumor necro- sis factor-a, a P-sheet protein, has a large amount of a-helix in a partially folded state.13 This finding may provide support for the importance of a-helical intermediates in the folding of P-sheet proteins. The differences in folding between chymotrypsin and chymotrypsinogen may provide further indication of such involvement. Chymotrypsinogen is capable of fully refolding after thermal denaturization, while chymotrypsin is not.14 One of the modifications made during the production of chymotrypsin from chymotrypsinogen is the disruption of a region

which contains very high amphiphilic a-helical po- tential, but no a-helix in the native structure. This region, the far N-terminal portion of chymotrypsino- gen, contains amphiphilic a-helical potential larger than any observed in a random sequence lo5 amino acids long and is lost upon removal of residues 14 and 15 during production of chymotrypsin. While these observations are interesting, considerable work, both experimental and nonexperimental, will be needed to evaluate the possible importance of these amphiphilic regions in any in vivo process.

REFERENCES 1.

2.

3.

4.

5.

6.

7.

8.

9.

10.

11.

12.

13.

14.

Schiffer, M., Edmundson, A.B. Use of helical wheels to represent the structure of proteins and to identify seg- ments with helical potential. Biophys. J. 7:121-135, 1967. Eisenberg, D., Weiss, R.M., Terwilliger, T.C. The helical hydrophobic moment: A measure of the amphilicity of a helix. Nature 299:371-374, 1982. Eisenberg, D., Schwartz, E., Komaromy, M., Wall, R. Analysis of membrane and surface protein sequences with hydrophobic moment plot. J. Mol. Biol. 179:125-142,1984. Parker, W., Song, P.-S. Location of helical regions in tet- rapyrrole containing proteins by a helical hydrophobic mo- ment analysis. J. Biol. Chem. 265:17568-17575, 1990. Parker, W., Song, P.-S. Protein structure in SDS micelle protein complexes. Biophys. J. 61:1435-1439, 1992. Rose, G.C. Prediction of chain turns in globular proteins on a hydrophobic basis. Nature 272586490, 1978. Chou, P.Y., Fasman, G.D. Emperical predictions of protein conformation. Ann. Rev. Biochem. 47:251-276, 1978. West, W.W., Hecht, M.H. Binary patterning of polar and nonpolar amino acids in the sequences and structures of native proteins. Protein Sci. 4:2032-2039, 1995. Lattman, E.E., Rose, G.D. Protein folding-What’s the question? Proc. Natl. Acad. Sci. USA. 90:439-441, 1993. Fisher, C.L., Greengard, J.S., Griffin, J.H. Models of the serine protease domain of the human antithrombotic plasma factor activated protein C and its zymogen. Protein Sci. 3588-599, 1994. Demchuk, E., Mueller, T., Oschkinat, H., Sebald, W., Wade, R. Receptor binding properties of four-helix bundle growth factions deduced from electrostatic analysis. Pro- tein Sci. 3:920-935, 1994. Desideri, A., Falconi, M., Polticelli, F., Bolognesi, M., Dji- novic, K., Rotilio, G. Evolutionary conservativeness of electric field in the Cu,Zn superoxide dismutase active site. Evidence for co-ordinated mutation of charged amino acid residues. J. Mol. Biol. 223:337-342, 1992. Narhi, L.O., Philo, J., Li, T., Arakawa, T. Tumor necrosis factor-a, a P-sheet protein, can exist in an a-helical con- formation. Protein Sci. 4, suppl 2:72, 1995. Brandts, J., Lumry, R. Rotatory dispersion changes during the thermal denaturation of chymotrypsinogen and chy- motrypsin. J. Am. Chem. SOC. 83:4290-4292, 1961.