synthesis, expression and characterisation of peptides comprised of perfect repeat motifs based on a...

10
Synthesis, expression and characterisation of peptides comprised of perfect repeat motifs based on a wheat seed storage protein Kevin A. Feeney, Arthur S. Tatham, Simon M. Gilbert, Roger J. Fido, Nigel G. Halford, Peter R. Shewry * IACR-Long Ashton Research Station, Department of Agricultural Sciences, University of Bristol, Long Ashton, Bristol BS41 9AF, UK Received 30 October 2000; received in revised form 23 January 2001; accepted 25 January 2001 Abstract We have developed a novel method for constructing synthetic genes that encode a series of peptides comprising perfect repeat motifs based on a high molecular weight subunit (HMW glutenin subunit), a highly repetitive storage protein from wheat seed. A series of these genes of sequentially increasing size was produced, four of which (called R3, 4, 5, 6) were expressed in Escherichia coli. Activity of the synthetic genes in E. coli was confirmed by Northern blot analysis but SDS- PAGE of crude protein extracts failed to show any expressed peptides when stained using Coomassie brilliant blue R250. However, Western blots probed with a HMW glutenin subunit-specific polyclonal antibody showed the presence of the R6 peptide (M r 22 005) in the crude cell extracts and both this and the R3 peptide (M r 12 005) were subsequently purified by extraction with hot aqueous ethanol followed by precipitation with acetone and separated by RP-HPLC. The R4 and R5 peptides were not purified. The purified R3 and R6 peptides absorbed Coomassie brilliant blue R250 or other protein stains only weakly and this was considered to account for their failure to be revealed by staining of separations of the crude protein extracts. Circular dichroism spectroscopy showed that both peptides had similar L-turn rich structures similar to the repetitive sequences present in the whole HMW glutenin subunits. We conclude that expression of perfect repeat peptides in E. coli is a suitable system for the study of structure-function relationships in wheat gluten proteins and other highly repetitive proteins. ß 2001 Elsevier Science B.V. All rights reserved. Keywords : High molecular weight glutenin subunit ; Repetitive sequence ; Heterologous expression 1. Introduction Repeated sequences based on one or several short peptide motifs are present in a number of animal proteins, notably ¢brous proteins such as silks, elas- tin and collagen. These sequences confer precise bio- mechanical properties, such as elasticity, which are crucial for the operation of the organism, and adopt unusual but highly organised structures, often based on L-reverse turn or L-sheet structures [1]. Repeated sequences also occur in some plant pro- 0167-4838 / 01 / $ ^ see front matter ß 2001 Elsevier Science B.V. All rights reserved. PII:S0167-4838(01)00155-8 Abbreviations : CD, circular dichroism ; BSA, bovine serum albumin ; DTT, dithiothreitol ; E. coli, Escherichia coli ; HMW, high molecular weight ; IPTG, isopropylthiogalactoside ; NMR, nuclear magnetic resonance ; PVDF, polyvinyl di£uoride ; RP- HPLC, reversed-phase high performance liquid chromatography ; SDS-PAGE, sodium dodecylsulphate-polyacrylamide gel electro- phoresis; TFE, tri£uoroethanol; TTBS, 25 mM Tris-HCl, pH 7.5, containing 0.5 NaCl and 0.1% (v/v) Triton 400 * Corresponding author. Fax : +44-1275-394-299 ; E-mail : [email protected] Biochimica et Biophysica Acta 1546 (2001) 346^355 www.bba-direct.com

Upload: independent

Post on 11-Dec-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

Synthesis, expression and characterisation of peptides comprised ofperfect repeat motifs based on a wheat seed storage protein

Kevin A. Feeney, Arthur S. Tatham, Simon M. Gilbert, Roger J. Fido,Nigel G. Halford, Peter R. Shewry *

IACR-Long Ashton Research Station, Department of Agricultural Sciences, University of Bristol, Long Ashton, Bristol BS41 9AF, UK

Received 30 October 2000; received in revised form 23 January 2001; accepted 25 January 2001

Abstract

We have developed a novel method for constructing synthetic genes that encode a series of peptides comprising perfectrepeat motifs based on a high molecular weight subunit (HMW glutenin subunit), a highly repetitive storage protein fromwheat seed. A series of these genes of sequentially increasing size was produced, four of which (called R3, 4, 5, 6) wereexpressed in Escherichia coli. Activity of the synthetic genes in E. coli was confirmed by Northern blot analysis but SDS-PAGE of crude protein extracts failed to show any expressed peptides when stained using Coomassie brilliant blue R250.However, Western blots probed with a HMW glutenin subunit-specific polyclonal antibody showed the presence of the R6peptide (Mr 22 005) in the crude cell extracts and both this and the R3 peptide (Mr 12 005) were subsequently purified byextraction with hot aqueous ethanol followed by precipitation with acetone and separated by RP-HPLC. The R4 and R5peptides were not purified. The purified R3 and R6 peptides absorbed Coomassie brilliant blue R250 or other protein stainsonly weakly and this was considered to account for their failure to be revealed by staining of separations of the crude proteinextracts. Circular dichroism spectroscopy showed that both peptides had similar L-turn rich structures similar to therepetitive sequences present in the whole HMW glutenin subunits. We conclude that expression of perfect repeat peptides inE. coli is a suitable system for the study of structure-function relationships in wheat gluten proteins and other highlyrepetitive proteins. ß 2001 Elsevier Science B.V. All rights reserved.

Keywords: High molecular weight glutenin subunit; Repetitive sequence; Heterologous expression

1. Introduction

Repeated sequences based on one or several shortpeptide motifs are present in a number of animalproteins, notably ¢brous proteins such as silks, elas-tin and collagen. These sequences confer precise bio-mechanical properties, such as elasticity, which arecrucial for the operation of the organism, and adoptunusual but highly organised structures, often basedon L-reverse turn or L-sheet structures [1].

Repeated sequences also occur in some plant pro-

0167-4838 / 01 / $ ^ see front matter ß 2001 Elsevier Science B.V. All rights reserved.PII: S 0 1 6 7 - 4 8 3 8 ( 0 1 ) 0 0 1 5 5 - 8

Abbreviations: CD, circular dichroism; BSA, bovine serumalbumin; DTT, dithiothreitol ; E. coli, Escherichia coli ; HMW,high molecular weight; IPTG, isopropylthiogalactoside; NMR,nuclear magnetic resonance; PVDF, polyvinyl di£uoride; RP-HPLC, reversed-phase high performance liquid chromatography;SDS-PAGE, sodium dodecylsulphate-polyacrylamide gel electro-phoresis ; TFE, tri£uoroethanol; TTBS, 25 mM Tris-HCl, pH7.5, containing 0.5 NaCl and 0.1% (v/v) Triton 400

* Corresponding author. Fax: +44-1275-394-299;E-mail : [email protected]

BBAPRO 36391 29-3-01

Biochimica et Biophysica Acta 1546 (2001) 346^355www.bba-direct.com

teins, the best-characterised example being the glutenproteins of wheat grain and related proteins fromother cereals. These proteins constitute a class ofstorage proteins called prolamins, which are presentonly in cereal seeds and are characterised by insolu-bility in water or dilute salt solutions but solubility inalcohol-water mixtures [2]. The sole function of theprolamins is to act as a store of nitrogen, carbon andsulphur for mobilisation during germination andtheir structure may have evolved to allow for tightpacking and high stability. However, in wheat doughthey form an extended network, called gluten, whichconfers the visco-elastic properties that allow wheat£our to be processed into bread and a range of otherfood products [3]. Wheat gluten comprises over 50individual proteins, all of which contain domainsbased on repeated sequences. However, recent atten-tion has focused on one group, called the high mo-lecular weight (HMW) subunits, which appear to bethe main determinants of gluten elasticity [4,5].

Two HMW glutenin subunit genes are present oneach group 1 chromosome of wheat, encoding onehigh Mr x-type subunit and one low Mr y-type sub-unit [4]. The HMW glutenin subunits encoded bythese genes comprise between about 630 and 830amino acid residues with Mr ranging from about67 500 to 87 700 [5]. All of the HMW glutenin sub-units have similar structures, consisting of a centralrepetitive domain £anked by short non-repetitive do-mains at the N- and C-termini (of 81^104 and 42residues, respectively). Variation in length of the re-petitive domain is largely responsible for the di¡er-ences in size of the whole proteins. This domain isbased on repeating units made up from three repeatmotifs: a hexapeptide (consensus Pro Gly Gln GlyGln Gln), a tripeptide (consensus Gly Gln Gln) anda nonapeptide (consensus Gly Tyr Tyr Pro Thr SerLeu/Pro Gln Gln). Whereas the hexapeptide occursin tandem arrays, the tripeptide and nonapeptidemotifs only occur interspersed with hexapeptides.There are also some di¡erences between the x-typeand y-type subunits, tripeptides being absent from y-type subunits and leucine being favoured at position7 of the nonapeptide in y-type subunits comparedwith proline in the x-type [6]. In addition there isconsiderable degeneracy within the repeat sequences,although the length of the repeats and the residues atcertain positions (notably 3, 5 and 6 of the hexapep-

tide, 1^3 of the tripeptide and 1, 6 and 8 of the no-napeptide) are tightly conserved [6].

The HMW glutenin subunits are assembled in vivoto form high Mr polymers, which appear to be sta-bilised by inter-chain disulphide bonds formed be-tween cysteine residues present in the N- and C-ter-minal domains [7] and by extensive hydrogenbonding between glutamine residues which constituteabout 40 mol% of the repetitive domains [8]. Gluta-mine is present at six of the nine highly conservedpositions within the repeat motifs [6], possibly indi-cating that substitutions at these positions would de-crease the extent of hydrogen bonding and therebydestabilise the structure.

In addition to their importance in food processingproperties, the structures and biophysical propertiesof the repeated sequences present in plant and animalproteins are also of interest in relation to their bio-logical roles and to the development of novel bioma-terials [1]. However, direct analysis of these proper-ties is often di¤cult because of the presence ofaccompanying non-repetitive domains and the as-sembly of the proteins into highly cross-linked in-soluble polymers. These drawbacks can be overcomeby the expression of protein subunits or domains inheterologous systems, but the situation may still notbe ideal as the peptide repeat motifs may be poorlyconserved, restricting the application and interpreta-tion of biophysical analyses.

The aim of the present study was to devise a novel,£exible method for constructing synthetic genes en-coding varying lengths of peptides based on perfectHMW glutenin subunit repeat sequences. This wouldallow us to show proof of concept, that expression ofsynthetic peptides in Escherichia coli is a viable strat-egy for studying cereal storage protein structure(gene constructs comprising repeat sequences fromother HMW glutenin subunits and related cerealstorage protein genes have been produced beforebut successful expression and puri¢cation of proteinand analysis of protein structure has not been re-ported). It would also enable us to determine thestructure adopted by perfect repeats without interfer-ence from degenerate repeats that might destabiliseit, and compare the behaviour of `perfect' peptideswith peptides derived from real HMW glutenin sub-units, which contain perfect and degenerate repeats,and with whole HMW glutenin subunit proteins, in

BBAPRO 36391 29-3-01

K.A. Feeney et al. / Biochimica et Biophysica Acta 1546 (2001) 346^355 347

which the repetitive domain is £anked by globularregions. Furthermore, the ability to synthesise pep-tides of varying length would enable us to determinethe relationship between peptide size and propertiessuch as conformation and stability.

2. Materials and methods

2.1. Construction of synthetic genes encoding perfectrepeat peptides

Four oligonucleotides were synthesised (Genosys,UK), with the following nucleotide sequences (eachgiven 5P-3P) :

1. CATGGCTCCA GGGCAAGGGC AATGCGGGTA TTA-

CCCGACT TCACTGCAGT GCCCGGGACA GGGACA-

GCAA TAG (73-mer)2. GATCCTATTGC TGTCCCTGTC CCGGGCACTG CA-

GTGAAGTC GGGTAATACC CGCATTGCCC TTGCC-

CTGCA GC (73-mer)3. ACAACCAGGA CAAGGACAAC AAGGGTACTA CCC-

AACTTCT CTGCAGCAAC CGGGGCAGGG GCAGCA-

GGGA TATTATCCGA CGTCATTGCA (90-mer)4. ATGACGTCGG ATAATATCCC TGCTGCCCCT GCC-

CCGGTTG CTGCAGAGAA GTTGGGTAGT ACCCTT-

GTTG TCCTTGTCC TGGTTGTTGC A (90-mer)

The oligonucleotides were annealed (1 with 2; 3with 4) by mixing in equimolar amounts, heatingbrie£y to 100³C and cooling slowly to below 37³C.The double-stranded DNA molecules produced inthis way were used to construct synthetic genes inplasmid pUCBM20 (Boehringer, discontinued) usingstandard molecular biology techniques [9]. PlasmidpUCBM20 and its derivatives were maintained inE. coli SURE cells (Stratagene). The nucleotide se-quence of each synthetic gene was checked using aSequenase 2 kit (Amersham, UK). Once this wascompleted the synthetic genes were transferred toplasmid pET3d or PET28 (Novagen) [10].

2.2. Expression of synthetic genes

The pET plasmids containing the synthetic geneswere transformed into E. coli BLR(DE3)pLysS cells

and grown at 37³C on 2YT medium [9] containing100 Wg/ml ampicillin (pET3d) or 50 Wg/ml kanamycin(pET28), 25 Wg/ml chloramphenicol and 10 Wg/mltetracycline in `ba¥ed' £asks in a shaking incubator.When the cell density was equivalent to an OD600 of0.6, protein expression was induced by adding IPTGto a ¢nal concentration of 0.4 mM.

2.3. Puri¢cation and characterisation of the perfectrepeat peptides

E. coli cells were harvested by centrifugation(6500Ug for 10 min at 4³C), resuspended in 90%(v/v) ethanol containing 2% (w/v) DTT and disruptedby sonication with a microprobe. The cell lysate wasincubated at 60³C for 3 h with occasional agitationusing a vortex mixer. Cell debris was removed bycentrifugation (10 000Ug for 10 min at 4³C) andthe supernatant mixed with 4 vols. of acetone. Afterstanding for 30 min at 320³C, the precipitate wasremoved by centrifugation (10 000Ug for 10 min),allowed to dry in air and then redissolved in 8 Murea containing 2% (w/v) DTT. After centrifugation,dissolved proteins and peptides were separated byRP-HPLC on a Vydac C18 column (10 mmU250mm, 5 W particle size). Elution was achieved with alinear gradient of 0^40% (v/v) aqueous acetonitrile in0.05% (v/v) TFA over 50 min at a £ow rate of 2.5 ml/min and the eluates were monitored at 280 nm. Ma-jor peaks eluting between 32 and 36% (v/v) acetoni-trile concentration were collected and freeze dried.

The Tris-Tricine system of Scha«gger and von Ja-gow [11] was used for SDS-PAGE and samples weretransferred either to Pro-blot for N-terminal aminoacid sequence analysis at the John Innes Centre or toPVDF membrane for Western blot analysis [12]. Ineach case, electrophoretic transfer was for 30 min ata constant 50 V. Western blots were probed withantibodies raised in rabbits to the repetitive consen-sus sequence Gly Tyr Tyr Pro Thr Ser Pro Gln GlnPro Gly Gln of HMW glutenin subunits [13]. Theprimary antibody was diluted 1/2500 in TTBS con-taining 1% (w/v) BSA and then incubated for 16 h at21³C in the presence of immobilised peptides. Thesecondary, alkaline phosphatase-labelled goat anti-rabbit antibody (Sigma) was diluted 1/6250 inTTBS and incubated at 37³C, together with the sub-

BBAPRO 36391 29-3-01

K.A. Feeney et al. / Biochimica et Biophysica Acta 1546 (2001) 346^355348

strate and colour development reagent according tothe supplier's instructions, until the immunoreactivecomponents were visualised.

3. Results

3.1. Construction and expression of synthetic genesencoding repetitive peptides

The aim of the study was to construct and expressin E. coli synthetic genes encoding peptides compris-ing di¡erent numbers of perfect hexapeptide plusnonapeptide repeating units (Pro Gly Gln Gly GlnGln and Gly Tyr Tyr Pro Thr Ser Leu Gln Gln) ofthe y-type HMW glutenin subunits. Four oligonu-cleotides (1^4, see Section 2) were synthesised andannealed (1 with 2; 3 with 4) to produce two dou-ble-stranded DNA molecules which were named in-

sert A and insert B (Fig. 1). Insert A was cloned intoplasmid pUCBM20 after digestion of the plasmidwith the restriction enzymes NcoI and BamHI. It en-codes a protein of 23 amino acids (Fig. 1), compris-ing methionine and alanine residues at the N-termi-nus followed by a hexapeptide in which theglutamine residue at position 6 of the motif is re-placed with a cysteine residue. This is followed bya perfect nonapeptide except that the glutamine atposition 9 of the motif is also replaced with a cys-teine residue. There is a perfect hexapeptide repeat atthe C-terminal end. The cysteines were incorporatedeight residues from the N-terminal end and sevenfrom the C-terminal end to allow for the assemblyof the peptides into polymers by disulphide bondformation.

Insert B encodes a sequence of 30 residues com-prising perfect hexapeptide and nonapeptide repeats(Fig. 1). This was cloned into the PstI site in the

Fig. 1. Nucleotide sequences of insert A, insert B and synthetic gene R1, and derived amino acid sequences of the peptides that theyencode. Insert A has overhanging ends (shown in bold), the one on the `top' strand compatible with those produced by NcoI diges-tion, the one on the `bottom' strand compatible with those produced by BamHI digestion. Insert A was cloned into plasmidpUCBM20 and also contains an internal PstI restriction site. Insert B has overhanging ends compatible with those produced by PstIdigestion and an internal PstI restriction site. Insertion of insert B into the PstI site of insert A in pUCBM20, as shown by the ar-rows, resulted in the production of the synthetic gene R1. R1 encodes a 53 amino acid peptide, and retains a unique PstI site, sincePstI sites were not regenerated at either end of the insertion. The nucleotides of R1 that originated from insert A are underlined. Thenon-underlined nucleotides in the middle originated from the plasmid. The presence of the unique PstI site in R1 meant that anotherinsert B sequence could be added to produce sequentially larger genes, increasing in length by 90 bp, encoding peptides of 83 (R2),113 (R3), 143 (R4), 173 (R5) and 203 (R6) amino acids.

BBAPRO 36391 29-3-01

K.A. Feeney et al. / Biochimica et Biophysica Acta 1546 (2001) 346^355 349

middle of insert A in plasmid pUCBM20 to create asynthetic gene, R1, that encoded a protein of 53 ami-no acids (Fig. 1). Sequentially larger genes weremade by cloning additional copies of insert B intothe PstI site in the middle of the central insert Bsequence. This was possible because each cloningstep produced a gene with a unique PstI site in thecentral insert B sequence.

The amount of repetition within the sequences wasminimised by alternative codon usage. For example,the ¢rst hexapeptide in insert B was encoded by thesequence CCA GGA CAA GGA CAA CAA, where-as the second was encoded by the sequence CCGGGG CAG GGG CAG CAG. This was done inpreference to using the codons favoured by E. coli,even though the latter may have resulted in moree¤cient expression, to maximise the stability of thesynthetic genes. Even so, we experienced consider-able problems with stability until we used theSURE strain, which has mutations in the recA,recB and recJ genes, as host.

In each case, the addition of one insert of the Bsequence increased the size of the synthetic gene by90 bp, adding 30 amino acids to the size of the en-coded protein. The nucleotide sequences of genes en-coding peptides of 113 (R3), 143 (R4), 173 (R5) and203 (R6) amino acids were con¢rmed. All four geneswere then excised using the NcoI and BamHI restric-tion sites at either end and transferred to the vectorpET3d for expression of the encoded peptides,although only the R3 and R6 peptides were subse-quently studied in detail. Their structures are shownin Fig. 2. The R3 peptide comprises the Met-Ala N-terminus followed by eight hexapeptides alternatingwith seven nonapeptides. The R6 peptide comprisesthe same N-terminus followed by 14 hexapeptidesalternating with 13 nonapeptides. Each also has acysteine residue at position 8 with respect to the N-terminus and 37 with respect to the C-terminus toallow for polymer formation. Their predicted molec-ular weights are 12 202 (R3) and 22 005 (R6).

Plasmid pET3d carries a gene imparting resistanceto ampicillin. Later in the study we used an alterna-tive vector, pET28, which carries a gene impartingresistance to kanamycin. We found that the pET28system gave signi¢cantly higher yields of protein,possibly because selection with kanamycin is moree¤cient than with ampicillin. The host strain used

for protein expression BLR (DE3) pLysS is preferredfor expression of repetitive sequences as it is recA3

but recB� and recJ�. Given the problems that wehad experienced previously with stability of the syn-thetic genes, every new batch was initiated by a newtransformation.

3.2. Puri¢cation and characterisation of the perfectrepeat peptides

After induction of expression with IPTG, the cellscontaining the peptide expression vectors were grownovernight at 37³C and harvested by centrifugation.Preliminary analyses of the total cell proteins bySDS-PAGE failed to show appreciable amounts ofany of the peptides although Northern blotting (Fig.3) revealed that mRNA was produced by all fourconstructs. Assuming that translation was una¡ected,this indicated that our failure to detect the peptidescould have resulted from instability of the expressedproteins or from failure of the protein to be ¢xed andstained (the latter could have occurred because theproteins completely lack charge [14]). We thereforeresorted to immunodetection of the protein using apolyclonal antibody (anti-HMW R2) raised against a

Fig. 2. Structure of peptides R3 and R6. Each has Met Ala atthe N-terminus, followed by alternating hexapeptides (Pro GlyGln Gly Gln Gln) (light grey) and nonapeptides (Gly Tyr TyrPro Thr Ser Leu Gln Gln) (dark grey). Each repeat motif is`perfect' with the exception of a cysteine residue (C) at position8 with respect to the N-terminus and 37 with respect to the C-terminus to allow for polymer formation.

BBAPRO 36391 29-3-01

K.A. Feeney et al. / Biochimica et Biophysica Acta 1546 (2001) 346^355350

synthetic peptide, the sequence of which (Gly TyrTyr Pro Thr Ser Pro Gln Gln Pro Gly Cys) corre-sponds to HMW glutenin subunit nonapeptide re-peat motif and the ¢rst two residues of a hexapeptidemotif, linked to a cysteine residue [13]. Reaction ofthis antibody with a Western blot of total cell pro-teins showed the presence of an immunoreactiveband in the induced cells expressing the R6 con-struct, which migrated with an apparent Mr of about30 000. This mobility is consistent with that expectedof the R6 peptide (true Mr 22 000) allowing for thefact that cereal prolamins usually show anomalouslyhigh Mr by SDS-PAGE [15,16]. The identity wassubsequently con¢rmed by N-terminal amino acidsequencing (see below). In addition, several endoge-nous E. coli proteins present in both non-inducedand induced cells also reacted with the antibody(not shown).

Immunodetection was clearly the most reliableway of detecting the peptide and it was, therefore,used to monitor the puri¢cation of the putative R6peptide by ethanol extraction, acetone precipitationand RP-HPLC, the peptide eluting between 32 and36% (v/v) acetonitrile. The same protocol was thenused to purify the R3 peptide, although the antibodyfailed to react with any major bands in the crudeextract of induced cells (not shown). The reason forthe failure of the antibody to bind to the R3 peptidein the crude cell extracts is not known but it couldrelate to the smaller size of the peptide comparedwith R6. A similar procedure could be used to purify

the R4 and R5 peptides but this was not done as itwas considered that structural di¡erences would bemore apparent between the R3 and R6 peptides.

The identities of the R3 and R6 peptide prepara-tions were con¢rmed by automated N-terminal ami-no acid sequencing. In both cases single clean se-quences were obtained which were identical for tenresidues to those encoded by the constructs: Met AlaPro Gly Gln Gly Gln Cys Gly Tyr.

The puri¢ed R3 and R6 peptides were both stainedby Coomassie blue when separated by SDS-PAGE(Fig. 4A, lanes b, c), although the R3 peptide stainedless intensely. The major bands also migrated moreslowly than would be expected based on their knownmasses, between the Mr 12 000 and Mr 17 000 markerproteins for R3 (Mr 12 203) and close to the Mr

30 000 marker protein for R6 (Mr 22 005). This isconsistent with the behaviour of the whole HMWglutenin subunits, which exhibit anomalously highapparent Mr when separated by SDS-PAGE[15,16]. The R6 peptide comprised 203 amino acidresidues and exhibited an Mr by SDS-PAGE about8000 greater than the true Mr. It can, therefore, be

Fig. 3. Northern blot analysis of mRNA fraction from E. colicells expressing synthetic genes encoding the R3 (lane a), R4(b), R5 (c) and R6 (d) peptides. The blot was probed with in-sert B radiolabelled with 32P.

Fig. 4. SDS-PAGE (A) and Western blotting using the anti-HMW R2 antibody (B) of the puri¢ed R3 (lanes b) and R6(lanes c) peptides. The arrows in A indicate putative dimericforms, lane a in A shows standard proteins of known Mr :1, cytochrome c (12 300); 2, myoglobin (17 200); 3, carbonic an-hydrase (30 000); 4, ovalbumin (42 700); 5, albumin (66 250);6, ovotransferrin (76 000^78 000). m, d and t in B indicatemonomeric and putative dimeric and trimeric forms of the pep-tide, respectively. 10.5 Wg of the R3 peptide and 5.0 Wg of theR6 peptide were loaded in A and B.

BBAPRO 36391 29-3-01

K.A. Feeney et al. / Biochimica et Biophysica Acta 1546 (2001) 346^355 351

calculated that the mass of each amino acid residuewas overestimated by about 40. This compares wellwith data reported by D'Ovidio et al. [17] who com-pared the apparent and true Mr of subunit 1Dx5 andmodi¢ed forms of the subunit in which the repetitivedomain had been either decreased (by 17.2% or36.6%) or increased (by 22.5%) in size. In this casethe Mr of each individual amino acid residue in therepetitive domain was overestimated by about 45 bySDS-PAGE.

The SDS-PAGE separations showed that both theR3 and R6 preparations also contained bands ofabout twice the apparent Mr of major peptide com-ponents (see arrows in Fig. 4A, lanes b, c). Thesemay be dimers of the peptides stabilised by disul-phide bonds formed by re-oxidation during separa-tion. Immunodetection of the R6 peptide on Westernblots with the anti-HMW R2 antibody also revealedbands consistent in size with the monomeric and di-meric forms of the peptide (bands m and d, respec-tively, in Fig. 4B, lane c). Additionally, a furtherband of higher mass was visualised which was pre-sumably trimeric (band t in Fig. 4B, lane c). It is ofinterest that the putative dimeric and trimeric formsof the R6 peptide reacted more strongly with theantibody than the monomeric form. A similar e¡ectwas also observed with the R3 peptide. The mono-meric form (band m in Fig. 4B, lane b) reacted onlyweakly with the antibody while putative dimeric andtrimeric bands (d and t, respectively, in Fig. 4B, laneb) reacted more strongly.

The yields of the R3 and R6 peptides were about6^7 mg/l and 8^10 mg/l, respectively. The R3 and R6peptides were also found to be readily soluble inwater, in contrast to the whole subunits.

3.3. Conformational analysis of the synthetic peptides

Circular dichroism (CD) spectroscopy has beenused extensively to determine the secondary structurecontents of HMW glutenin subunits. Early studiesindicated that the repetitive sequences in these pro-teins formed L-turns but this structure was di¤cultto identify by spectroscopy of intact subunits due tothe contribution of K-helix in the N- and C-terminaldomains to the spectra. More detailed studies were,therefore, carried out on an Mr 58 000 repetitivefragment which was expressed in E. coli using a sub-

clone from the HMW glutenin subunit 1Dx5 gene[18]. This con¢rmed the presence of L-reverse turnsbut showed that these were in equilibrium with poly-L-proline II conformation, the latter being favouredat low temperatures. However, although the Mr

58 000 peptide consists wholly of repeated sequences,the individual motifs contain a range of amino acidsubstitutions resulting in an `imperfect' structure.

The CD spectra of the R3 and R6 perfect repeatpeptides were determined in tri£uoroethanol (TFE)and compared with the spectrum of the Mr 58 000peptide in 70% TFE (the latter being insoluble in100% TFE). The spectra of the three peptides werehighly similar (Fig. 5), with the R3 and Mr 58 000peptides showing minima around 201^202 nm and220 nm. A similar minimum around 220 nm wasalso seen in the spectrum of the R6 peptide but theminimum at 201^202 nm was shifted to about 204^205 nm. TFE is known to promote the formation ofK-helical structure [19], but this is unlikely to beformed by the three peptides because of their highcontent and regular distribution of proline residues.The spectra are, however, consistent with the pres-ence of L-reverse turns. Previous studies using struc-ture prediction [20], CD spectroscopy [21] and NMRspectroscopy [22,23] have indicated that the predom-inant structural feature in the repetitive domains is L-turns in equilibrium with poly-L-proline-II like struc-ture [18]. This is consistent with the minima of

Fig. 5. Far-UV circular dichroism spectroscopy of the R3(9 9 9) and R6 (^ W ^) peptides dissolved in tri£uoroethanol,subunit 1Dy10 dissolved in tri£uoroethanol (999999999999) and the Mr

58 000 peptide (999) dissolved in 70% (v/v) aqueous tri£uo-roethanol.

BBAPRO 36391 29-3-01

K.A. Feeney et al. / Biochimica et Biophysica Acta 1546 (2001) 346^355352

around 201^202 and 220 nm observed for the R3 andMr 58 000 peptides. The shift of the minimum in R6to 204^205 nm could be accounted for by an alter-ation in the ratio of type I/III to type II turns and/oran alteration in the proportions of L-turns and poly-L-proline-II like structure.

In contrast, when HMW glutenin subunit 1Dy10from wheat grain was dissolved in TFE the CD spec-trum was typical of a protein containing K-helix,with negative maxima at about 207^208 nm and at222 nm [24]. This may result from the formation ofK-helix by the N- and C-terminal domains as dis-cussed previously for HMW glutenin subunit 1Dx5[18]. This di¡erence emphasises the importance ofremoving the N- and C-terminal domains whenstudying the repetitive sequences which comprisemost of the HMW glutenin subunit proteins.

4. Discussion

We have reported a novel method for producingsequentially larger synthetic genes encoding peptidescomprising multiple perfect repeats of the hexapep-tide and nonapeptide motifs present in y-type HMWglutenin subunits. This method meant that we wereable to make genes encoding sequentially larger pro-teins in steps of 30 amino acid residues comprisingtwo hexapeptide and two nonapeptide repeats. Intheory there is no limit to the size of gene producedin this way. Furthermore, our results have shown forthe ¢rst time that high yields can be achieved byexpression of perfect repeat peptides as non-fusionproteins, and that these peptides can be readily pu-ri¢ed using standard procedures, thus providingproof of concept that the construction of syntheticgenes and their expression in E. coli is a viable meth-od for producing peptides to be used in the study ofcereal seed protein structure.

In contrast to whole HMW glutenin subunits, theperfect repeat peptides were readily soluble in water.Although this could relate to their small size, higherMr repetitive peptides, including an Mr 58 000 pep-tide expressed in E. coli [18] and a peptide releasedby cleavage of a whole subunit [25] are also watersoluble. This solubility presumably results from hy-drogen bonding of the glutamine residues to water.In contrast, the insolubility of the whole proteins as

reduced monomers may result from inter-chain hy-drogen bonding of aligned proteins. Atomic forcemicroscopy in the hydrated solid state has also dem-onstrated that the whole HMW glutenin subunitsand repetitive peptides interact di¡erently [26]. Boththe whole subunit and the Mr 58 000 peptide alignedside-by-side to form ¢brils. However, whereas the Mr

58 000 peptide formed linear rods the whole proteinformed a branched network. This suggests that thenon-repetitive N- and C-terminal domains form spe-ci¢c interactions between individual subunits. Theseinteractions could facilitate alignment and hydrogenbond formation between the repetitive domains re-sulting in larger, more stable and less soluble poly-mers. It is also notable that the R3 and R6 peptideswere only stained e¤ciently with Coomassie brilliantblue R250 or other commonly used protein stainswhen in high concentration and were undetectedwhen present in the crude extracts of E. coli cells.

Analysis of the secondary structures of the perfectrepeat peptides by CD spectroscopy con¢rmed thatthey formed L-turns, which is consistent with pre-vious analyses of whole proteins and synthetic pep-tides. However, di¡erences were observed betweenthe spectra of the R3 and R6 peptides. The shorterpeptide, R3, appeared to have a structure similar tothat of an Mr 58 000 peptide which was derived froma naturally occurring subunit [18] and contained de-generate as well as perfect repeat sequences. In con-trast, the longer R6 peptide appeared to have a struc-ture with a higher content of class II L-turns and/orpoly-L-proline II structure. This could be related tothe presence in R6 of a longer array of perfect re-peats than are present in the Mr 58 000 peptide or inany of the whole HMW glutenin subunits character-ised to date.

Two previous reports have described the expres-sion of synthetic gluten protein genes in E. coli. An-derson et al. [27] reported the expression of a syn-thetic HMW glutenin subunit gene, in which 32perfect repeats of a 15 residue sequence (Pro GlyGln Gly Gln Gln Gly Tyr Tyr Pro Thr Ser ProGln Gln) were £anked by synthetic N- and C-termi-nal domains. More recently, Elmorjani et al. [28]have described the construction and expression ofpeptides comprising eight, 16 and 32 copies of thepentapeptide Pro Gln Gln Pro Tyr which is a con-sensus motif in a di¡erent wheat seed storage protein

BBAPRO 36391 29-3-01

K.A. Feeney et al. / Biochimica et Biophysica Acta 1546 (2001) 346^355 353

family, the gliadins. These were expressed as fusionproteins with thioredoxin, only small amounts beingobtained as free peptides.

Anderson et al. [27] reported that the syntheticglutenin accounted for about 10^20% of the totalbacterial protein with a yield of 15^30 mg/l, butdid not report any puri¢cation and characterisation.Similar yields were reported by Elmorjani et al. [23],about 20% of the total proteins for the smaller pep-tides and 15% for the larger one. The authors specu-lated that the lower yield of the large peptide couldhave resulted from less e¤cient expression or lossesduring solubilisation from inclusion bodies. Elmorja-ni et al. [28] also puri¢ed the fusion proteins by af-¢nity chromatography on a nickel-chelation column,and released the repetitive peptides by proteolyticdigestion. The eight-motif peptide was then puri¢edby RP-HPLC and shown to have similar UV absor-bance spectra to a synthetic decapeptide (Pro GlnGln Pro Tyr Pro Gln Gln Pro Ala) but detailedanalyses were not reported.

We are not aware of the reasons why these studieswere not taken to the point where puri¢ed proteincould be characterised, although we can speculatethat the low yields reported for the non-fusion pep-tides by Elmorjani et al. [28] may be partially ac-counted for by poor staining, assuming that the pep-tides used in that study had similar stainingcharacteristics to R3 and R6. Whatever the reasons,it is clearly important for future studies that we sum-marise why the present study was successful: themethod for constructing the synthetic genes was sim-ple and readily repeatable; the nucleotide sequenceswere designed to reduce repetition as much as possi-ble by exploiting the degeneracy of the genetic code;the nucleotide sequence of each synthetic gene waschecked before use. Construction of the syntheticgenes was carried out in a pUC-based plasmid usinga host strain (SURE) that carries multiple rec muta-tions; expression was performed using a T7-basedsystem, giving high yields; kanamycin selection wasused to maintain the expression plasmid in the hostcell (this was not essential but gave higher yields).Each expression batch was initiated with a freshtransformation of the expression host cells, eventhough the cells were recA3 ; expression levels andpuri¢cation were determined and monitored by im-munodetection as well as staining, since the un-

charged proteins stained very poorly; excellent puri-¢cation was achieved using solvent separationfollowed by reversed-phase HPLC.

In conclusion, we have developed a novel strategyfor the routine synthesis of a series of perfect repeatpeptides in E. coli. The peptides synthesised usingthis system had similar structures to the related se-quences present in whole proteins and peptides de-rived from these, with di¡erences, which may haverelated to length and degree of sequence conserva-tion. The system is, therefore, ideal to study struc-ture-function relationships in wheat gluten and inother highly repetitive proteins.

Acknowledgements

IACR receives grant-aided support from the Bio-technology and Biological Sciences Research Councilof the UK. Part of the work was supported by Euro-pean Union FAIR Grant CT96-1170: Improving theQuality of EU Wheats for Use in the Food Industry,`Eurowheat'. We would like to thank the Engineeringand Physical Science Research Council, NationalChiroptical Centre, King's College London, for theuse of facilities and Dr Sandra Denery-Papini(INRA, Nantes) for providing the anti-HMW R2antibody.

References

[1] A.S. Tatham, P.R. Shewry, Trends Biochem. Sci. 25 (2000)567^571.

[2] P.R. Shewry, M.J. Miles, A.S. Tatham, Prog. Biophys. Mol.Biol. 16 (1994) 37^59.

[3] P.R. Shewry, A.S. Tatham, F. Barro, P. Barcelo, P. Lazzeri,Biotechnology 13 (1995) 1185^1190.

[4] P.I. Payne, Annu. Rev. Plant Physiol. 38 (1987) 141^153.[5] P.R. Shewry, N.G. Halford, A.S. Tatham, J. Cereal Sci. 15

(1992) 105^120.[6] P.R. Shewry, N.G. Halford, A.S. Tatham, in: Oxford Sur-

veys in Plant Cell and Molecular Biology, Oxford UniversityPress, Oxford, 1989, pp. 163^219.

[7] P.R. Shewry, A.S. Tatham, J. Cereal Sci. 25 (1997) 207^227.

[8] P.S. Belton, J. Cereal Sci. 29 (1999) 103^107.[9] J. Sambrook, F.E. Fritsch, T. Maniatis, Molecular Cloning:

a Laboratory Manual, 2nd edn., Cold Spring Habor Labo-ratory Press, Cold Spring Habor, NY, 1989.

[10] F.W. Studier, J. Mol. Biol. 219 (1991) 37^44.

BBAPRO 36391 29-3-01

K.A. Feeney et al. / Biochimica et Biophysica Acta 1546 (2001) 346^355354

[11] H. Scha«gger, G. von Jagow, Anal. Biochem. 166 (1987) 368^379.

[12] R.J. Fido, A.S. Tatham, P.R. Shewry, in: H. Jones (Ed.),Methods in Molecular Biology, vol. 49, Humana Press, To-toima, NJ, 1995, pp. 399^422.

[13] S. Denery-Papini, Y. Popineau, L. Quillien, M.H.V. VanRe-genmortel, J. Cereal Sci. 23 (1996) 133^144.

[14] E.M. Davis, Am. Biotechnol. Lab. 6 (1988) 28^37.[15] A.P. Goldsbrough, N.J. Bulleid, R.B. Freedman, R.B. Fla-

vell, Biochem. J. 263 (1989) 837^842.[16] N.A.C. Bunce, R.P. White, P.R. Shewry, J. Cereal Sci. 3

(1985) 131^142.[17] R. D'Ovidio, O.D. Anderson, S. Masci, J. Skerritt, E. Por-

ceddu, J. Cereal Sci. 25 (1997) 1^8.[18] S.M. Gilbert, N. Wellner, P.S. Belton, J.A. Green¢eld, G.

Siligardi, P.R. Shewry, A.S. Tatham, Biochim. Biophys.Acta 36141 (2000) 1^12.

[19] J.W. Nelson, D. Isaacson, N.R. Kallenbach, Proteins 1(1986) 211^217.

[20] A.S. Tatham, P.R. Shewry, B.J. Mi£in, FEBS Lett. 177(1984) 205^208.

[21] A.S. Tatham, A.F. Drake, P.R. Shewry, J. Cereal Sci. 11(1990) 189^200.

[22] A.A. Van Dijk, L.L. VanWijk, A. VanVliet, P. Haris, E.VanSwieten, G.I. Tesser, G.T. Robillard, Protein Sci. 6(1997) 637^648.

[23] A.A. Van Dijk, E. DeBoef, A. Bekkers, L.L. VanWijk, E.VanSwieten, R.J. Hamer, G.T. Robillard, Protein Sci. 6(1997) 649^656.

[24] R.W. Woody, Peptides 7 (1985) 15^114.[25] A.C.A.P.A. Bekkers, E. De Boef, A.A. Van Dijk, R.J.

Hamer, J. Cereal Sci. 29 (1999) 109^112.[26] A. Humphris, T.J. McMaster, M.J. Miles, S.M. Gilbert,

P.R. Shewry, A.S. Tatham, Cereal Chem. 77 (2) (2000)107^110.

[27] O.D. Anderson, J.C. Kuhl, A. Tam, Gene 174 (1996) 51^58.

[28] K. Elmorjani, M. Thievin, T. Michon, Y. Popineau, J.N.Hallet, J. Gueguen, Biochem. Biophys. Res. Commun. 239(1997) 240^246.

BBAPRO 36391 29-3-01

K.A. Feeney et al. / Biochimica et Biophysica Acta 1546 (2001) 346^355 355