structural genomics and the protein folding problem george n. phillips, jr. university of...

43
Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006

Upload: lia-charley

Post on 14-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006

Structural Genomics and the Protein Folding

Problem

George N. Phillips, Jr.University of Wisconsin-Madison

February 15, 2006

Page 2: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006

High-throughputDNA Sequencing

GeneModel

FunctionalAssignments

Basic Understanding/Applications

(e.g. therapeutics)

Structure Determination& Experimental Analysis

Modeling& Inference

From DNA to biological function

Page 3: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006

Developing a gene modelGlimmer (Gene Locator and Interpolated Markov ModelER)GlimmerHMM for eukaryotic genomes (more advanced)

Genome sequencingGenome assemblyRegulatory elementsIdentification of ORF’s

All but the simplest genomes are works in progress. It is estimated that 80% of gene models have errors at present!Comparative genomics should help the process, as will sequencing

of expressed sequence tags and other genomics projects

Efficient implementation of a generalized pair hidden Markov model for comparative gene finding. W.H. Majoros, M. Pertea, and S.L. Salzberg. Bioinformatics 21:9 (2005), 1782-88.

Page 4: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006

PfamMany others…

HYSIELNASLLERGV…HLNIEDNPSCNAMGV…PLNIELNASLNEPGV…WERIELNASLNER--…HQRIEL--SLMMRG-…

HLNIEDNPSCNAMGV…PLNIELNASLNEPGV…WERIELNASLNER--…HQRIEL--SLMMRG-…

HYSIELNASLLERGV…HLNIEDNPSCNAMGV…WERIELNASLNER--…HQRIEL--SLMMRG-…

HLNIEDNPSCNAMGV…PLNIELNASLNEPGV…WERIELNASLNER--…HQRIEL--SLMMRG-… HYSIELNASLLERGV…HLNIEDNPSCNAMGV…PLNIELNASLNEPGV…WERIELNASLNER--…HQRIELK-SLMMRG-… HYSIELNASLLERGV…

HLNIEDNPSCNAMGV…PLNIELNASLNEPGV…WERIELNASLNER--…HQRIEL--SLMMRG-…

The “sequence-space” of proteins

Universe of all protein sequences

PSI-BLASTHMM

Page 5: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006

PFAM “domains”

Alex Bateman, Lachlan Coin, Richard Durbin, Robert D. Finn, Volker Hollich, Sam Griffiths-Jones, Ajay Khanna, Mhairi Marshall, Simon Moxon, Erik L. L. Sonnhammer, David J. Studholme, Corin Yeats and Sean R. Eddym Nucleic Acids Research(2004) Database Issue 32:D138-D141

Page 6: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006

High-throughputDNA Sequencing

GeneModel

FunctionalAssignments

Basic Understanding/Applications

(e.g. therapeutics)

Structure Determination& Experimental Analysis

Modeling& Inference

Flow of information from DNA to functional understanding

Page 7: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006

X-ray Laboratory

Page 8: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006

Crystallography reveals locationsof electron ‘clouds’ of the atoms:And the polypeptide chain can

be traced through space

Page 9: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006

ScopCath

The “fold-space” of proteins

Universe of all protein structures

Page 10: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006

Murzin et al. http://scop.mrc-lmb.cam.ac.uk/scop/data/scop.b.html

Page 11: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006

Glimpes of the “fold space” of proteins

Hou, Sims, Zhang, and Kim, PNAS 100:2386 (2003)

Page 12: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006

High-throughputDNA Sequencing

GeneModel

FunctionalAssignments

Basic Understanding/Applications

(e.g. therapeutics)

Structure Determination& Experimental Analysis

Modeling& Inference

Flow of information from DNA to functional understanding

Page 13: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006

Connections between sequence and structure

Universe of sequences Universe of structures

Page 14: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006

Connections between sequence and structure

Universe of sequences Universe of structures

?

Page 15: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006

At what level of homology can one trust a structural inference?

Redfern, Orengo et al., J. Chromatography B 815:97 (2005)

Page 16: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006

What is structural genomics?

• Experimental determination of key structures (target selection is a key part of the idea)

• Modeling of family members• Inferring function (note “infer”)• Making direct use of the new structures

Page 17: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006

Protein Sequences and Folds

• ~100,000 families of proteins that cannot be reliably modeled at present (modeling families: <30% identity over large fraction to a known structure)

• ~50% of all domain families can be assigned to a structure under CATH

Page 18: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006

Protein Structure Initiative (PSI)Mission Statement

“To make the three-dimensional atomic level structures of most proteins easily available from knowledge of their corresponding DNA sequences.”

Page 19: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006

Genseration of new structures

Chandonia and Brenner, Science 311:347 2006.

Page 20: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006

Center for Eukaryotic Structural Genomics

Exclusively eukaryotic targets• 60% fold-space targets (emphasis on eukaryote-only

families• 20% disease relevant• 20% outreach – targets from the community

Overall goals are to reduce the costs of determining structures of proteins from eukaryotes by refining all steps in the pipeline

Supported by National Institutes of HealthJohn Markley- PI, George Phillips/Brian Fox Co-PI’s

Page 21: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006

University of Wisconsin’s Center for Eukaryotic Structural Genomics

(~75 total, 3/4 unique)

Page 22: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006

How does one clone, express, purify, and solve structures

not previously studied?

An industry-style pipeline

Page 23: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006

Protein from E. coli cells Protein from cell-free

PCR cloning -> DNA

Protein from E. coli cells

Construct design

Protein from cell-free

Screening:YieldMS

Functional assays

1-5 mg scale

Fluidigm chip crystallization screening (+)

NMR 15N-1H HSQC or 1H screening (+)

Flexi®Vector plasmids

10-100 mg scale: 13C,15N for NMR, Se-Met for X-ray

2-10 mg scale: 13C,15N for NMR, Se-Met for X-ray

Protein from E. coli cells Protein from cell-free

PCR cloning -> DNA

Protein from E. coli cells

Construct design

Protein from cell-free

Screening:YieldMS

Functional assays

1-5 mg scale

Fluidigm chip crystallization screening (+)

NMR 15N-1H HSQC or 1H screening (+)

Flexi®Vector plasmids

10-100 mg scale: 13C,15N for NMR, Se-Met for X-ray

2-10 mg scale: 13C,15N for NMR, Se-Met for X-ray

Pipeline details: cell-based and cell-free protein production for X-ray and NMR

Note: project involves sequencing, which aids gene modeling!

Page 24: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006

Sesame—integrated LIMS in use at CESG

Open access to the public—structures, protocols, reagents, progress… http://www.uwstructuralgenomics.org

Zolnai et al., J. Struct. Func. Genomics 4:11 (2003)

Page 25: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006

At1g18200

Mis-annotated prior to our work, but structure led to discovery of function.

Page 26: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006

>>Alignment of GalP_UDP_transf vs 1Z84:A|PDBID|CHAIN|SEQUENCE/15-196

*->kkfsplDhvhrrynpLtlvwilVsphrakRPikqsqsLidlkkeLwq ++ ++ + +r p t +w+ sp+rakRP 1Z84:A|PDB 15 GDSVENQSPELRKDPVTNRWVIFSPARAKRP---------------- 45

gavetpkvptdplhdp.dcysakLcpg........atratgevNPdyest + ++k p+ p p++c+ c g++++ ++ r++ ++ P + 1Z84:A|PDB 46 -TDFKSKSPQNPNPKPsSCP---FCIGreqecapeLFRVP-DHDPNWKLR 90

yvLkspkkftndFyalseDnpyikvsvSNeaIaknplfqlksvrGhelci + +n ++als+ +++ +++++ G +++ 1Z84:A|PDB 91 VI-------ENLYPALSRN---LETQ------------STQPETG--TSR 116

VI...CF......SKPehDptlpalakeeirevvdaWqlcteelGyegre +I + F++ +S P h+ l + i+ ++ a + + 1Z84:A|PDB 117 TIvgfGFhdvvieS-PVHSIQLSDIDPVGIGDILIAYKKRINQIA----- 160

nhpayqnvqIFEmNkGaemGcsnpHPYaYFnEHGQvwatsfiP<-* h + + q+F N Ga G s H H Q a++ +P 1Z84:A|PDB 161 QHDSINYIQVFK-NQGASAGASMSHS------HSQMMALPVVP 196

Pfam B: 13 and 136 matches to #’s 7198 and 11634

http://www.sanger.ac.uk/Software/Pfam/

Page 27: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006

Blind prediction of structure:CASP and At5g18200

Page 28: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006

High-throughputDNA Sequencing

GeneModel

FunctionalAssignments

Basic Understanding/Applications

(e.g. therapeutics)

Structure Determination& Experimental Analysis

Modeling& Inference

Flow of information from DNA to functional understanding

Page 29: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006

Function space of proteinsKEGG = Kyoto Encyclopedia of Genes and GenomesThe Gene Ontology project (GO)

Metabolism Cellular Processes

SignalProcessing

Enzymes

Don’t forget protein-protein interactions exist also!

Page 30: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006

At2g17340

Related to a human protein associated with Hallervorden-Spatz syndrome, a neurological disorder?

Page 31: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006

81 protein samples sent to Toronto:8 solved CESG structures, 73 randomly chosen

Generalized assays for: phosphatase, esterase, phospodiesterase, protease, amino acid dehydrogenase, alcohol dehydrogenase, organic acid dehydrogenase, amino acid oxidase, alcohol oxidase, organic acid oxidase, beta-lactamase, beta-galactosidase, arylsulfatase, lipase.

Results:- Solid hits: 3 phosphatases, 5 esterases- Weaker hits: 9 more esterases, 6 phosphodiesterases - No hits: all others

A. Yakuknin et al. Current Opinion in Chemical Biology, 8:42 (2004)

Parallel Enzyme Activity Testing (Collaboration with University of Toronto)

Page 32: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006

Activity Assay Substrate JR5670

Phosphodiesterase bis-pNPP 0.016

Dehydrogenase Amino Acids 0.032

Dehydrogenase Acids 0.016

Dehydrogenase Alcohols 0.022

Dehydrogenase Aldehyde -0.045

Dehydrogenase Sugars 0.003

Thioesterase palmitoyl-CoA 0.108

Oxidase NAD(P)H Ox -0.115

Protease Protease Mix 0.118

Phosphatase pNPP > 1

Target: At2g17340/JR5670

• Absorbance >0.25 is a tentative signal, >0.5 is a strong signal.

Initial Assay: Wide-spectrum

Page 33: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006

High-throughputDNA Sequencing

GeneModel

FunctionalAssignments

Basic Understanding/Applications

(e.g. therapeutics)

Structure Determination& Experimental Analysis

Modeling& Inference

Flow of information from DNA to functional understanding

Page 34: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006

At2g17340

Enzyme of unknown specificity.

Page 35: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006

A functional annotation lesson

Page 36: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006

Functional Annotation by Inference

From raw DNA sequences, one looks for genomic features such as promoters, alternative splicing of mRNAs, retrotransposons, pseudogenes, tandem duplications, synteny, and homology.

It Is homology, both from sequence and from structure, that allow functional inferences to be made.

Prosite, Dali, VAST, FFAS03

Some tool integrate knowledge from many sources into one place, acting a meta-servers of clues.

Page 37: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006

Connections between structure and function

Universe of structuresUniverse of functions

Page 38: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006

Connections between structure and function

Universe of structuresUniverse of functions

Convergent evolution

Page 39: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006

Connections between structure and function

Universe of structuresUniverse of functions

Divergent evolution

Page 40: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006

At1g18200

Misleading annotation prior to our work, but structure led to

discovery of function.

Page 41: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006

High-throughputDNA Sequencing

GeneModel

FunctionalAssignments

Basic Understanding/Applications

(e.g. therapeutics)

Structure Determination& Experimental Analysis

Modeling& Inference

Flow of information from DNA to functional understanding

Page 42: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006

Summary

Structural genomics efforts are gaining momentum and helping to assign new functions to orfs and to fill in the space of all possible

protein folds.

Page 43: Structural Genomics and the Protein Folding Problem George N. Phillips, Jr. University of Wisconsin-Madison February 15, 2006

Administration Madison (Primm, Troestler, Markley, Phillips, Fox)Cloning/sequencing pipeline Madison (Wrobel, Fox)Expression pipeline Madison (Frederick, Fox, Riters)E. coli cell growth pipeline Madison (Sreenath, Burns, Seder, Fox)Cell-Free System Madison (Vinarov, Markley, Newman)Protein purification pipeline Madison (Vojtik, Phillips, Fox, Ellefson, Jeon)Mass spectrometry Madison (Aceti, Sabat, Sussman)

Madison NMRFAM (Song, Tyler, Cornilescu, Markley) NMR spectroscopy Milwaukee MCW (Peterson, Volkman, Lytle)Crystallization / crystallography Madison (Bingman, Phillips, Bitto, Han, Bae, Meske)

Argonne (Advanced Photon Source)Bioinformatics Madison (Bingman, Sun, Phillips, Wesenberg)

Indianapolis (Dunker)Milwaukee MCW (Twigger, de la Cruz)

Computational support Madison (Bingman, Ramirez, Phillips)Sesame Madison (Zolnai, Markley, Lee)

The Center for Eukaryotic Structural Genomics(supported by NIH GM64598 and GM074901)