predicting the expression and solubility of membrane proteins center for high throughput structural...

1
PREDICTING THE EXPRESSION AND SOLUBILITY OF MEMBRANE PROTEINS Center for High Throughput Structural Biology Mark E. Dumont *† , Michael A. White * , Kathy Clark , Elizabeth J. Grayhack *† , and Eric. M. Phizicky *† Departments of * Biochemistry and Biophysics, and Pediatrics, University of Rochester Medical Center, Rochester, NY 14642 Conclusions 1. Membrane proteins can be overexpressed on a genomic scale, many of them at high levels. 2. Many factors affect overexpression of soluble and membrane proteins similarly. 3. While overall hydrophobicity of membrane proteins is negatively correlated with expression, hydrophobicity of membrane regions is positively correlated with expression. 4. The presence of a predicted signal sequence, topological orientation in the membrane, and normal subcellular localization do not appear to affect the ability of yeast membrane proteins to be overexpressed. 5. The majority of yeast membrane proteins can be solubilized using a small set of detergents. 6. Solubility in shorter chain detergents is dependent on specific protein properties. 7. Increasing polarity of protein TM segments tends to decrease efficiency of solubilization by short chain detergents. Summary The challenge of overexpression and solubilization of eukaryotic integral membrane proteins is one of the most significant obstacles to structure determination of this important class of proteins. To identify properties of membrane proteins that may be predictive of successful overexpression, we analyzed expression levels of the genomic complement of over 1,000 predicted membrane proteins in a recently completed Saccharomyces cerevisiae protein expression library. 1 We detected statistically significant positive and negative correlations between high membrane protein expression and protein properties such as size, overall protein hydrophobicity, number of transmembrane helices, and amino acid composition of transmembrane segments. Expression levels of membrane and soluble proteins exhibited a nearly identical negative correlation with protein size and the overall hydrophobicity. However, high-level membrane protein expression was positively correlated with the hydrophobicity of predicted transmembrane segments. To further characterize yeast membrane proteins as potential targets for structure determination, we tested the solubility of 123 of the highest expressed yeast membrane proteins in six commonly used detergents. Over 75% of our test proteins could be classified into just four detergent solubility patterns. Protein size, number of transmembrane segments, and the hydrophobicity of predicted transmembrane segments all showed significant correlations with solubility in some detergents. These results suggest that bioinformatic approaches may be capable of identifying certain classes of membrane proteins most likely to be amenable to high level recombinant expression and efficient detergent solubilization, facilitating structural genomics approaches to membrane protein structure determination. 1 Gelperin DM, White MA, Wilkinson ML, Kon Y, Kung LA, Wise KJ, Lopez-Hoyo N, Jiang L, Piccirillo S, Yu H, Gerstein M, Dumont ME, Phizicky EM, Snyder M, and Grayhack EJ. (2005) Genes Dev. 19, 2816-2826. Prediction of Yeast Transmembrane Proteins Two different transmembrane helix prediction programs were used to identify and classify membrane proteins in the yeast genome. We used TMHMM v. 2.0 2 , http://www.cbs.dtu.dk/services/TMHMM/) to predict 1,155 integral membrane proteins in the MORF collection. From this set of 1,155 proteins, we removed 63 that were predicted by the Phobius program 3 (http://phobius.binf.ku.dk/) to have only a signal peptide and no transmembrane segments. This left a total of 1,092 proteins predicted to have one or more transmembrane helices. Since TMHMM may not be best for determining the actual topology of a membrane protein 4 , we used HMMTOP predictions 5 http://www.enzim.hu/hmmtop/) to predict of the topology of the membrane proteins identified as such by TMHMM. In very few cases where we were aware of good experimental data suggesting a topology different from the HMMTOP prediction, we used this experimentally determined topology in our analysis. 2 Krogh A, Larsson B, von Heijne G, and Sonnhammer EL. (2001) J Mol Biol. 305:567-80. 3 Kall L, Krogh A, and Sonnhammer EL. (2004) J Mol Biol. 338:1027-36. 4 Lehnert U, Xia Y, Royce TE, Goh CS, Liu Y, Senes A, Yu H, Zhang ZL, Engelman DM, and Gerstein M.(2004)Q. Rev Biophys. 37:121-46. 5 Tusnady GE, and Simon I.(2001) Bioinformatics 17:849-50. The MORF Yeast Protein Overexpression Library The yeast MORF library is a genomic collection of Saccharomyces cerevisiae strains expressing C-terminally tagged proteins under Gal control 1 The MORF library contains 5,574 sequence-verified clones tested for protein expression by Western blot Factors evaluated for correlations with membrane protein expression and solubilization Codon usage, codon adaptation index Molecules per cell under chromosomal expression Percentage of total protein residues that are aromatic Isoelectric point Size (kDa) GRAVY score (overall protein hydrophobicity) Homolog in yeast or other organism Percentage of protein in transmembrane segments Percentage of transmembrane residues that are hydrophobic (WFLIVMY) Percentage of transmembrane residues that are charged/polar (EDKRHNQST) Percentage of transmembrane residues that are aromatic (WYF) Testing Solubilization of High-Expressing Yeast Proteins in Six Different Detergents Detergents used: Triton X-100 (TX-100), lauryldimethylamine-N-oxide (LDAO), Fos- choline 12 (FC-12, dodecylphosphocholine), tetraethyleneglycol monooctyl ether (C8E4), n-octyl--D-glucoside (OG), and n-dodecyl--D-maltoside (DDM). Procedure: 9 l of yeast whole-cell lysate (about 4.2 g protein) were solubilized by addition of 141 l of 1% TX-100, LDAO, FC-12, DDM, or 2% of OG and C 8 E 4 in 20 mM Hepes pH 7.5, 500 mM NaCl, and 10% glycerol, followed by centrifugation at 109,000 g for 1 hour at 21 o C. A portion of the supernatant was diluted in loading buffer for SDS PAGE then analyzed by immunoblotting using anti-HA antibodies. MORF library vector insert region (Gateway cloning) P GAL ORF 3C His6 ATT site ATT site HA ZZ Transmembrane proteins can be expressed almost as well as soluble proteins in yeast 95% of cloned soluble proteins in the MORF library are expressed. 88% of cloned predicted membrane proteins in the MORF library are expressed. Expression was detected by immunoblotting of whole-cell lysates using antibodies against the HA-epitope tag. (“ND”, not detected) Membrane Proteins All MORF Proteins Disagreement between TMHMM and HMMTop in predictions of Transmembrane Proteins in the Yeast Proteome Cellular membrane localization Predicted to contain a signal peptide Membrane protein characteristics Number of predicted transmembrane segments N- and C-terminal orientation across membrane Average transmembrane segment length = Membrane proteins Number of transmembrane segments Percent of protein in TM segments Percent charged and polar residues in TM segments Percent hydrophobic residues in TM segments Factors such as size, overall hydrophobicity, and pI have similar effects on soluble and membrane protein expression. However, there are membrane- specific factors. Bars = Number of ORFs per bin Solubilization Efficiency (123 total proteins) Solid bars: Effective solubilization Hatched bars: Partial solubilization Detergent solubilization of membrane proteins: Correlations with number and polarity of TM segments DDM TX- 100 OG C 8 E 4 DDM TX- 100 OG C 8 E 4 Number of TM segments Percent charged and polar residues in TM segments TotalTM O R Fs (H M M Top) 2037 TotalTM O R Fs (TM HMM) 1168 O R Fs w here predictions disagree: 1375 O R Fs w here one program predicts 0 TM segm ents and the otherpredicts >0 920 O R Fs w />1 predicted TM segm ent(HM M Top) 1101 O R Fs w />1 predicted TM segm ent(TM HM M ) 707 Venn diagram of Proteins Solubilized by Different Detergents

Upload: violet-hubbard

Post on 02-Jan-2016

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: PREDICTING THE EXPRESSION AND SOLUBILITY OF MEMBRANE PROTEINS Center for High Throughput Structural Biology Mark E. Dumont *†, Michael A. White *, Kathy

PREDICTING THE EXPRESSION AND SOLUBILITY OF MEMBRANE PROTEINS Center for High Throughput Structural Biology

Mark E. Dumont*†, Michael A. White*, Kathy Clark†, Elizabeth J. Grayhack*†, and Eric. M. Phizicky*†

Departments of * Biochemistry and Biophysics, and †Pediatrics, University of Rochester Medical Center, Rochester, NY 14642

Conclusions1. Membrane proteins can be overexpressed on a genomic scale, many of them

at high levels.

2. Many factors affect overexpression of soluble and membrane proteins similarly.

3. While overall hydrophobicity of membrane proteins is negatively correlated with expression, hydrophobicity of membrane regions is positively correlated with expression.

4. The presence of a predicted signal sequence, topological orientation in the membrane, and normal subcellular localization do not appear to affect the ability of yeast membrane proteins to be overexpressed.

5. The majority of yeast membrane proteins can be solubilized using a small set of detergents.

6. Solubility in shorter chain detergents is dependent on specific protein properties.

7. Increasing polarity of protein TM segments tends to decrease efficiency of solubilization by short chain detergents.

Summary The challenge of overexpression and solubilization of eukaryotic integral membrane proteins is one of the most significant obstacles to structure determination of this important class of proteins. To identify properties of membrane proteins that may be predictive of successful overexpression, we analyzed expression levels of the genomic complement of over 1,000 predicted membrane proteins in a recently completed Saccharomyces cerevisiae protein expression library.1 We detected statistically significant positive and negative correlations between high membrane protein expression and protein properties such as size, overall protein hydrophobicity, number of transmembrane helices, and amino acid composition of transmembrane segments. Expression levels of membrane and soluble proteins exhibited a nearly identical negative correlation with protein size and the overall hydrophobicity. However, high-level membrane protein expression was positively correlated with the hydrophobicity of predicted transmembrane segments. To further characterize yeast membrane proteins as potential targets for structure determination, we tested the solubility of 123 of the highest expressed yeast membrane proteins in six commonly used detergents. Over 75% of our test proteins could be classified into just four detergent solubility patterns. Protein size, number of transmembrane segments, and the hydrophobicity of predicted transmembrane segments all showed significant correlations with solubility in some detergents. These results suggest that bioinformatic approaches may be capable of identifying certain classes of membrane proteins most likely to be amenable to high level recombinant expression and efficient detergent solubilization, facilitating structural genomics approaches to membrane protein structure determination.1Gelperin DM, White MA, Wilkinson ML, Kon Y, Kung LA, Wise KJ, Lopez-Hoyo N, Jiang L, Piccirillo S, Yu H, Gerstein M, Dumont ME, Phizicky EM, Snyder M, and Grayhack EJ. (2005) Genes Dev. 19, 2816-2826.

Prediction of Yeast Transmembrane Proteins

Two different transmembrane helix prediction programs were used to identify and classify membrane proteins in the yeast genome. We used TMHMM v. 2.02, http://www.cbs.dtu.dk/services/TMHMM/) to predict 1,155 integral membrane proteins in the MORF collection. From this set of 1,155 proteins, we removed 63 that were predicted by the Phobius program3 (http://phobius.binf.ku.dk/) to have only a signal peptide and no transmembrane segments. This left a total of 1,092 proteins predicted to have one or more transmembrane helices. Since TMHMM may not be best for determining the actual topology of a membrane protein4, we used HMMTOP predictions5 http://www.enzim.hu/hmmtop/) to predict of the topology of the membrane proteins identified as such by TMHMM. In very few cases where we were aware of good experimental data suggesting a topology different from the HMMTOP prediction, we used this experimentally determined topology in our analysis.

2Krogh A, Larsson B, von Heijne G, and Sonnhammer EL. (2001) J Mol Biol. 305:567-80. 3Kall L, Krogh A, and Sonnhammer EL. (2004) J Mol Biol. 338:1027-36. 4Lehnert U, Xia Y, Royce TE, Goh CS, Liu Y, Senes A, Yu H, Zhang ZL, Engelman DM, and Gerstein M.(2004)Q. Rev Biophys. 37:121-46. 5Tusnady GE, and Simon I.(2001) Bioinformatics 17:849-50.

The MORF Yeast Protein Overexpression Library The yeast MORF library is a genomic collection of Saccharomyces

cerevisiae strains expressing C-terminally tagged proteins under Gal control1

The MORF library contains 5,574 sequence-verified clones tested for protein expression by Western blot

Factors evaluated for correlations with membrane protein expression and solubilization

Codon usage, codon adaptation index Molecules per cell under chromosomal expression Percentage of total protein residues that are aromatic Isoelectric point Size (kDa) GRAVY score (overall protein hydrophobicity) Homolog in yeast or other organism Percentage of protein in transmembrane segments Percentage of transmembrane residues that are hydrophobic (WFLIVMY) Percentage of transmembrane residues that are charged/polar (EDKRHNQST) Percentage of transmembrane residues that are aromatic (WYF)

Testing Solubilization of High-Expressing Yeast Proteins in Six Different Detergents

Detergents used: Triton X-100 (TX-100), lauryldimethylamine-N-oxide (LDAO), Fos-choline 12 (FC-12, dodecylphosphocholine), tetraethyleneglycol monooctyl ether (C8E4), n-octyl--D-glucoside (OG), and n-dodecyl--D-maltoside (DDM). Procedure: 9 l of yeast whole-cell lysate (about 4.2 g protein) were solubilized by addition of 141 l of 1% TX-100, LDAO, FC-12, DDM, or 2% of OG and C8E4 in 20 mM Hepes pH 7.5, 500 mM NaCl, and 10% glycerol, followed by centrifugation at 109,000 g for 1 hour at 21oC. A portion of the supernatant was diluted in loading buffer for SDS PAGE then analyzed by immunoblotting using anti-HA antibodies.

MORF library vector insert region (Gateway cloning)

PGAL ORF 3CHis6ATT siteATT site HA ZZ

Transmembrane proteins can be expressed almost as well as soluble proteins in yeast

95% of cloned soluble proteins in the MORF library are expressed.

88% of cloned predicted membrane proteins in the MORF library are expressed.

Expression was detected by immunoblotting of whole-cell lysates using antibodies against the HA-epitope tag. (“ND”, not detected)

Membrane Proteins All MORF Proteins

Disagreement between TMHMM and HMMTop in predictions of Transmembrane Proteins in the Yeast Proteome

Cellular membrane localization Predicted to contain a signal peptide Membrane protein characteristics Number of predicted transmembrane segments N- and C-terminal orientation across membrane Average transmembrane segment length

= Membrane proteins

Number of transmembrane segments Percent of protein in TM segments

Percent charged and polar residues in TM segments

Percent hydrophobic residues in TM segments

Factors such as size, overall hydrophobicity, and pI have similar effects on soluble and membrane protein expression.

However, there are membrane-specific factors.

Bars = Number of ORFs per bin

Solubilization Efficiency(123 total proteins)

Solid bars: Effective solubilizationHatched bars: Partial solubilization

Detergent solubilization of membrane proteins:Correlations with number and polarity of TM segments

DDM TX-100

OG C8E4

DDM TX-100

OG C8E4

Number of TM segmentsPercent charged and polar residues in

TM segments

Total TM ORFs (HMMTop) 2037

Total TM ORFs (TMHMM) 1168

ORFs where predictions disagree: 1375

ORFs where one program predicts 0 TM segments and the other predicts >0 920

ORFs w/ >1 predicted TM segment (HMMTop) 1101

ORFs w/ >1 predicted TM segment (TMHMM) 707

Venn diagram of Proteins Solubilized by Different Detergents