2009 NIGMS WorkshopEnabling Technologies for Structural Biology
March 4th-6th, 2009
Extra-Cellular Mammalian ProteinsAs Structural Genomics Targets
Steve Anderson
Quote from NESG PSI-2 Center Grant Application, Nov. 2004:
“…Table II-1 indicates the relative success rate, in terms of solubility and structure depositions, of targets from key organisms when expressed in E. coli. The data indicate that eukaryotic targets are some three-fold less likely to be expressed and soluble in our E. coli expression systems. Moreover, although structures of some of these eukaryotic targets will be determined in the coming months, the soluble eukaryotic proteins (particularly human proteins) are some three-fold less likely to result in NMR or crystal structures. Despite the tremendous overall success of bacterial expression systems, success rates with eukaryotic proteins in these systems are limited, and eukaryotic protein sample production remains a major challenge for structural genomics.”
Table II-1. PDB submissions based on target organism
Organism Cloned % Soluble In PDB % PDB/cloned
E. coli 237 82 21 9.0
B. subtilis 216 50 7 3.3
S. cerevisiae 490 17 5 1.0
D. melanogast er 113 17 1 0.9
Human 970 25 7 0.7
Major Roadblock:
Many interesting proteins (e.g., eukaryotic proteins) -- especially eukaryotic secreted proteins -- do not express well in commonly-used E. coli systems.
We are especially interested in eukaryotic secreted proteins.
What about E. coli secretion vectors?
Suitable media for enrichment with 15N and 13C (or labeling with SeMet) is the key issue because minimal medium appears to strongly inhibit secretion.
Lu et al (1997) J. Mol. Biol. 266, 441.
Protein A secretion vector for producing samples for structural biologyWould this enable isotope enrichment, too?
Purif 1 2 3 4 M 1 2 3 4
Osmotic shockate IgG-purified
1 - Celtone + 0.2% glucose2 - Spectra + 0.2% glucose3 - MJ media (standard media for labeling protein)4 - 2XTY + 0.2% glucose (optimal rich media, but can’t be used for labeling)
Test expression of ZZ-OM3D-P1LysExpression in Celtone media is better than expression in MJ media
A P K V D G L E G S G E N L Y F Q S L E *…GGCGCCGAAAGTAGAC GGTCTAGAA GGTAGCGGT GAAAACCTGTATTTTCAGAGC CTCGAG TAA
GGGCCCAAGCTTGAATTC……CCGCGGCTTTCATCTG CCAGATCTT CCATCGCCA CTTTTGGACATAAAAGTCTCG GAGCTC ATT CCCGGGTTCGAACTTAAG… Nar I Xba I Xho I HindIIIEcoRI
TEV cleavage site
C-term of Z domain coding
Cloning sites of new pEZZ318 ZZ-fusion vector
• Synthetic genes with N- or C-terminal His tags inserted as Xho I / Hind III fragments
L E G S G H H H H H H H H G S G E N L Y F Q S S * …TCTAGAA GGCTCTGGT CATCACCATCACCATCACCATCAC GGCAGCGGT GAAAACCTGTATTTTCAGAGCTCTTAA GGGCCCAAGCTT… …AGATCTT CCGAGACCT GTAGTGGTAGTGGTAGTGGTAGTG CCGTCGCCA CTTTTGGACATAAAAGTCTCGAGAATT CCCGGGTTCGAA… Xba I Sac I Hind III
Octa-His tag TEV cleavage site
Version 1.0 (TEV cleavage):
• Synthetic genes without His tags inserted as Sac I / Hind III fragments
Version 2.0 (His-tag / TEV cleavage):
Our source of targets:Human Cancer Protein Interaction Network (HCPIN)
Systematically complete structural coverage of pathways and interaction networks
Study structures of complexes
Pathway-Interaction SubnetHuang, Montelione, et al (2008) Molec. Cell. Proteomics 7, 2048.
Human Cancer Pathway Interaction Network*
• Cell cycle progression • Apoptosis• Toll-like receptor pathway• Interferon alpha/beta • JAK-STAT pathway• TGF-beta pathway• PI3K pathway• MAPK pathway
Huang, Montelione, et al (2008) Molec. Cell. Proteomics 7, 2048.
*For further information see Janet Huang (posters 49 & 50).
2328
658
2971
506
1160
136
Target Selection
~1100 human proteins/domains are selected as NESG targets
http://nmr.cabm.rutgers.edu:9090/PLIMS/
2328
658
2971
506
1160
136
Target Selection
~1100 human proteins/domains are selected as NESG targets
http://nmr.cabm.rutgers.edu:9090/PLIMS/
Approximately 1/3 of the HCPIN targets not selected are predicted to be secreted or membrane-bound proteins.
FSTL3: FSTL3.E3 -SCDGVECGPGKACRMLG-GRPRC-EC APDCSGL-PARLQVCGSDGATYRDECELRAARCRGHPDLSVMYRGRCRK 72 FSTL3.E4 –SCEHVVCPRPQSCVVDQTGSAHCVVCRAAPCPVPSSPGQELCGNNNVTYISSCHMRQATCFLGRSIGVRHAGSCAG 76 * : * :* : :* . * .*. ..:: . . * :.: : * FHR1: FHR1.E4 TS--CVNPPTVQNAHILS----RQMSKYPSGERVRYECRSPYEMFGD---EEVMCLNGNWTEPPQCKD-- 59 FHR1.E5 STGKCGPPPPIDNGDITS----FPLSVYAPASSVEYQCQNLYQLEGN---KRITCRNGQWSEPPKCLH-- 61 FHR1.E2 TF--CDFP-KINHGILYDEEKYKPFSQVPTGEVFYYSCEYNFVSPSKSFWTRITCTEEGWSPTPKCLR-- 65 FHR1.E3 ---LCFFP-FVENGHSES-----SGQTHLEGDTVQIICNTGYRLQNNE--NNISCVERGWSTPPKCRSTD 59 * * :::. . . .. . *. : .. .: * : *: .*:* FBLN4: FBLN4.E4 VNECLTIPEACKGEMKCINHYGGYLCLPRSAAVINDLHGEGPPPPVPPAQHPNPCPPGYEPDDQ---------DSCVD 69 FBLN4.E5 VDECAQALHDCRPSQDCHNLPGSYQCT--------------------------- CPDGYRKIG----------PECVD 41 FBLN4.E6 IDECRYR--YCQHR--CVNLPGSFRCQ--------------------------- CEPGFQLGPNN--------RSCVD 39 FBLN4.E8 IDECSYSSYLCQYR--CINEPGRFSCH---------------------------CPQGYQLLAT---------RLCQD 40 FBLN4.E7 VNECDMG-APCEQR--CFNSYGTFLCR--------------------------- CHQGYELHRDG--------FSCSD 40 FBLN4.E9 IDECESGAHQCSEAQTCVNFHGGYRCVDTN-----------------------RCVEPYIQVSENRCLCPASNPLCRE 55 ::** * * * * : * * : * : CEAM1: CEAM1.E3 ---ELPKPSISSNNSNPVEDKDAVAFTCEPETQD-TTYLWWINNQSLPVSPRLQLSNGNRTLTLLSVTRNDTGPYECEIQNPVS-ANRSDPVTLNVTY 93 CEAM1.E5 LSPVVAKPQIKASKTTVTGDKDSVNLTCSTNDTG-ISIRWFFKNQSLPSSERMKLSQGNTTLSINPVKREDAGTYWCEVFNPIS-KNQSDPIMLNVNY 96 CEAM1.E4 ---GPDTPTISPSDTYYRPGAN-LSLSCYAASNPPAQYSWLING---------TFQQSTQELFIPNITVNNSGSYTCHANNSVTGCNRTTVKTIIVTE 85 .* *....: . : : ::* . * ::. :.:.. * : :. :::*.* *. *.:: *:: : *. FCGR1: FCGR1.E3 TTKAVITLQPPWVSVFQEETVTLHCE---VLHLPGSSS-TQWFLNGTAT--QTSTPSYRITSASVNDSGEYRCQRGLS-G---RSDP-IQLEIHRG 85 FCGR1.E5 LFPAPVLNASVTSPLLEGNLVTLSCETKLLLQRPGLQLYFSFYMGSKTLRGRNTSSEYQILTARREDSGLYWCEAATEDGNVLKRSPELELQVL-G 95 FCGR1.E4 ----WLLLQVSSRVFTEGEPLALRCH----AWKDKLVYNVLYYRNGKAFKFFHWNSNLTILKTNISHNGTYHCSGMGKHR---YTSAGISVTVKE- 84 : . : : ::* *. :: ...: ... * .: ...* * *. . .. :.: :
Predicted Extra-cellular Domains
For secreted HCPIN proteins exhibiting evidence of multiple, reiterated domain modules bounded by phase one intron insertion positions [Patthy (1999) Gene 238,
103], multiple sequence alignments of the intervening exons were prepared.*
*See Chiang et poster (#43) for further information.
Osmotic shockates of targets 401, 601, 801/803, & 901/902Expressed in ZZ vector with TEV-cleavable linker
(38 hr. culture in 15N-Celtone)
•••••
lysozyme
P CP C P C
P - purified
C - TEV cleaved
601 801 901
TEV cleavage of 15N-enriched targets
ZZ
TEV protease
Progress Report:
Some case studies with individual domains
Example 1:
Human follistatin-like protein 3_domain 1 (exon 3)
- Paolo Rossi
• TGF antagonist• binds activin A• implicated in glucose &
fat homeostasis
Example 2:
Sushi domain from human complement factor H-related 1 protein.
>>> examination of spectra of 15N-labeled material led to the conclusion that this domain was relatively unstructured.
Is this due to the fact that some domains may need to be packed against adjacent domains for stability’s sake?
Herbert et al (2006) J. Biol. Chem. 281, 16512
There is some evidence that sushi domains are close-packed in holoproteins -- see structure of a pair of sushi domains from human complement factor H (left).
Purifed ZZ fusion of recombinant Sushi domain from human complement factor H-related protein 1
run on reducing and non-reducing gels
Reducing Non-reducing
M - + + + :incubation with TEV buffer*
full-lengthfusion
Conclusion 1:Multimeric disulfide cross-linked concatamers can form from recombinant proteins in the periplasmic space.
Conclusion 2: Thiol-disulfide exchange is promoted by the redox character* of the TEV protease cleavage buffer, allowing breakage of inter-molecular disulfides and refolding to the monomer species.
*includes 3 mM GSH and 0.3 mM GSSG
Connectivity map showing completeness of assignments
Example 3:Assignments for human fibulin-4 (FBLN4) domain 6 (exon 9)
- Swapna Gurla
• predicted to be a Ca++-binding EGF-like domain• binds to extracellular matrix proteins• dysregulated in colon cancer• involved in embryonic development & remodelling
Potential disulfide scrambling issues with FBLN4_domain 6 motivated us to improve the purification protocol by adding an ion exchange step.
Mono Q purification of FBLN4_domain 6
We then checked for incorrect disulfide bond formation by purifying FBLN4_domain 6 in the presence of oxidized glutathione (GSSG), which should reversibly cap any exposed thiols, and then treated the purified sample with iodoacetamide (IAM) in the presence of 6M Gdn-HCl, which should irreversibly alkylate any buried thiols.
Result: Based on MALDI-TOF MS, >90% of the protein appeared to be of the correct molecular weight and fully disulfide bonded.
MALDI-TOF of FBLN4_domain 6 (no IAM)
- Haiyan Zheng
MALDI-TOF of FBLN4_domain 6 (+ IAM)
- Haiyan Zheng
N
C
Human fibulin-4 (FBLN4) domain 6 (exon 9)
Preliminary(Further structure calculations arein progress….)
- Swapna Gurla
Summary of results so far(still in research phase):
6 - targets cloned
3 - expressed
2 - 3D structural information (one expressed domain was soluble but disordered)
>>> The numbers are small but promising!
Conclusion:
• Facile expression of extracellular human proteins as structural genomics targets looks promising. This effort may even result in lower levels of attrition (cloned --> 3D structure) than have “classic” expression approaches.
• Prospective domain parsing of larger extracellular human proteins is possible using the phase 1 intron rule.
Mission Statement. The long-range goal of the Protein Structure Initiative is to make the three-dimensional atomic-level structures of most proteins easily obtainable from knowledge of their corresponding DNA sequences.
Huang, Montelione, et al (2008) Molec. Cell. Proteomics 7, 2048.
“Holy Grail” of structural genomics (cf. Mission Statement of PSI): Complete structural coverage of some domain families in an organism?
For example, the EGF domain family
Yi-Wen Chiang
Davis AndersonJung B. Seo
Yushen Qian
Paolo RossiSwapna Gurla
Guy Montelione
Haiyan ZhengPeter Lobel
Thanks also to
Tom Acton
Li Chung Ma
Rong Xiao
John Everett
Mike Baran