jarek meller division of biomedical informatics,

40
JM - http://folding.chmcc.o rg 1 Knowledge-based protocols for protein structure prediction: from protein threading to solvent accessibility prediction and back to protein structure prediction by threading Jarek Jarek Meller Meller Division of Biomedical Informatics, Division of Biomedical Informatics, Children’s Hospital Research Foundation Children’s Hospital Research Foundation & Department of Biomedical Engineering, & Department of Biomedical Engineering, UC UC

Upload: pippa

Post on 06-Jan-2016

38 views

Category:

Documents


1 download

DESCRIPTION

Knowledge-based protocols for protein structure prediction : from protein threading to solvent accessibility prediction and back to protein structure prediction by threading. Jarek Meller Division of Biomedical Informatics, - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Jarek Meller Division of Biomedical Informatics,

JM - http://folding.chmcc.org 1

Knowledge-based protocols for protein structure prediction:from protein threading to solvent accessibility prediction and back to protein structure prediction by threading

Jarek MellerJarek Meller

Division of Biomedical Informatics, Division of Biomedical Informatics, Children’s Hospital Research Foundation Children’s Hospital Research Foundation & Department of Biomedical Engineering, UC& Department of Biomedical Engineering, UC

Page 2: Jarek Meller Division of Biomedical Informatics,

JM - http://folding.chmcc.org 2

Outline of the talk

Protein structure and complexity of conformational search: from de novo structure prediction to similarity based methods

Protein structure prediction by sequence-to-structure matching (threading and fold recognition)

Secondary structure and solvent accessibility prediction Improving fold recognition and de novo simulations with

accurate solvent accessibility prediction A story from our backyard: predicting interaction

between pVHL and RNA Pol II

Page 3: Jarek Meller Division of Biomedical Informatics,

JM - http://folding.chmcc.org 3

Polypeptide chains: backbone and side-chains

C-ter

N-ter

Page 4: Jarek Meller Division of Biomedical Informatics,

JM - http://folding.chmcc.org 4

Distinct chemical nature of amino acid side-chains

ARG

PHE

GLU

VALCYS

C-ter

N-ter

Page 5: Jarek Meller Division of Biomedical Informatics,

JM - http://folding.chmcc.org 5

Hydrogen bonds and secondary structures

helix

strand

Page 6: Jarek Meller Division of Biomedical Informatics,

JM - http://folding.chmcc.org 6

Tertiary structure and long range contacts: annexin

Page 7: Jarek Meller Division of Biomedical Informatics,

JM - http://folding.chmcc.org 7

Domains, interactions, complexes: VHL

HIF - 1

Elongin B

Elongin C

V H L

Page 8: Jarek Meller Division of Biomedical Informatics,

JM - http://folding.chmcc.org 8

Multiple alignment and PSSM

Page 9: Jarek Meller Division of Biomedical Informatics,

JM - http://folding.chmcc.org 9

Protein folding problem

The protein folding problem consists of predicting three-dimensional structure of a protein from its amino acid sequence

Hierarchical organization of protein structures helps to break the problem into secondary structure, tertiary structure and protein-protein interaction predictions

Computational approaches for protein structure prediction: similarity based and de novo methods

Page 10: Jarek Meller Division of Biomedical Informatics,

JM - http://folding.chmcc.org 10

Ab initio (or de novo) folding simulations

Ab initio folding simulations consist of conformational search with an empirical scoring function (“force field”) to be maximized (minimized)

Computational bottleneck: exponential search space and sampling problem (global optimization!)

Fundamental problem: inaccuracy of empirical force fields and scoring functions (folding potentials)

Importance of mixed protocols, such as Rosetta by D. Baker and colleagues (Monte Carlo fragment assembly)

Page 11: Jarek Meller Division of Biomedical Informatics,

JM - http://folding.chmcc.org 11

Similarity based approaches to structure prediction: from sequence alignment to fold recognition

High level of redundancy in biology: sequence similarity is

often sufficient to use the “guilt by association” rule: if similar sequence then similar structure and function

Multiple alignments and family profiles can detect evolutionary relatedness with much lower sequence similarity, hard to detect with pairwise sequence alignments: Psi-BLAST by S. Altschul et. al.

Many structures are already known (see PDB) and one can match sequences directly with structures to enhance structure recognition: fold recognition (not for new folds!)

For both, fold recognition and de novo simulation, prediction of intermediate attributes such secondary structure or solvent accessibility helps to achieve better sensitivity and specificity

Page 12: Jarek Meller Division of Biomedical Informatics,

JM - http://folding.chmcc.org 12

Why “fold recognition”?

Divergent (common ancestor) vs. convergent (no ancestor) evolution

PDB: virtually all proteins with 30% seq. identity have similar structures, however most of the similar structures share only up to 10% of seq. identity !

Page 13: Jarek Meller Division of Biomedical Informatics,

JM - http://folding.chmcc.org 13

Going beyond sequence similarity: threading and fold recognition

When sequence similarity is notdetectable use a library of knownstructures to match your querywith target structures.

One needs a scoring (“energy”) functionthat measures compatibilitybetween sequences and structures.

Page 14: Jarek Meller Division of Biomedical Informatics,

JM - http://folding.chmcc.org 14

Scoring alternative conformations with empirical (knowledge-based) folding potentials

misfolded

native

E

Ideally, each misfolded structure should have an energy higher than the native energy, i.e. :

Emisfolded - Enative > 0

Page 15: Jarek Meller Division of Biomedical Informatics,

JM - http://folding.chmcc.org 15

Simple contact model for protein structure prediction

Each amino acid is represented by a point in 3D space and two amino acids are said to be in contact if their distance is smaller than a cutoff distance, e.g. 7 [Ang].

Page 16: Jarek Meller Division of Biomedical Informatics,
Page 17: Jarek Meller Division of Biomedical Informatics,
Page 18: Jarek Meller Division of Biomedical Informatics,

JM - http://folding.chmcc.org 18

Sequence-to-structure matching with contact models

Generalized string matching problem: aligning a string of amino acids against a string of “structural sites” characterized by other residues in contact

Finding an optimal alignment with gaps using inter-residue pairwise models:

E = k< l k l , is NP-hard because of the non-local character of scores

at a given structural site (identity of the interaction partners may change depending on location of gaps in the alignment)

R.H. Lathrop, Protein Eng. 7 (1994)

Page 19: Jarek Meller Division of Biomedical Informatics,

JM - http://folding.chmcc.org 19

Hydrophobic contact model and sequence-to-structure alignment

HPHPP-

Solutions to this yet another instance of the global optimization problem:a) Heuristic (e.g. frozen environment approximation)b) “Profile” or local scoring functions (folding potentials)

Page 20: Jarek Meller Division of Biomedical Informatics,

JM - http://folding.chmcc.org 20

Implementing threading protocols: LOOPP

LOOPP in CAFASP4

•About average for all fold recognition targets

(missing some easy targets, recognized by PsiBlast)

• Third best server in the category of difficult targets

• Best predictions among the servers for 3 difficult

targets

• Further improvements necessary to make the

predictions more robust

Joint work with Ron Elber

Page 21: Jarek Meller Division of Biomedical Informatics,

JM - http://folding.chmcc.org 21

Using sequence similarity, predicted secondary structures and contact potentials: fold recognition protocols

In practice fold recognition methods are often mixtures of sequence matching and threading, with compatibility between a sequence and a structure measured by:

i) sequence alignment ii) contact potentials iii) predicted secondary structures (compared to the

secondary structure of a template)

Page 22: Jarek Meller Division of Biomedical Informatics,

JM - http://folding.chmcc.org 22

Predicting 1D protein profiles from sequences: secondary structures and solvent accessibility

SABLE serverhttp://sable.cchmc.org

POLYVIEW serverhttp://polyview.cchmc.org

a) Multiple alignment and family profiles improve prediction of localstructural propensities

b) Use of advanced machine learning techniques, such as Neural Networks or Support Vector Machines improves results as well

B. Rost and C. Sander were first to achieve more than 70%accuracy in three state (H, E, C) classification, applying a) and b).

Page 23: Jarek Meller Division of Biomedical Informatics,

JM - http://folding.chmcc.org 23

Predicting 1D protein profiles from sequences: secondary structures and solvent accessibility

PDB

Sable

PsiPred

Prof

Relative solvent accessibility prediction is typically cast as a classification problem

Page 24: Jarek Meller Division of Biomedical Informatics,

JM - http://folding.chmcc.org 24

Variability in surface exposure for structurally equivalent residues does not support classification

Page 25: Jarek Meller Division of Biomedical Informatics,

JM - http://folding.chmcc.org 25

Neural Network-based regression for relative solvent accessibility (RSA) prediction

Input layer

Hidden layers Output layer

[0,1]

Context units (Elman)

2))(()( ii

i ozyzSSE

Page 26: Jarek Meller Division of Biomedical Informatics,

JM - http://folding.chmcc.org 26

Accuracy of predictions depends on the level of surface exposure: error measures and fine tuning

Page 27: Jarek Meller Division of Biomedical Informatics,

JM - http://folding.chmcc.org 27

Overall accuracy of different regression models

S163cc / MAE / RMSE

S156cc / MAE / RMSE

S135cc / MAE / RMSE

S149cc / MAE / RMSE

SABLE-a 0.65 / 15.6 / 20.8 0.64 / 15.9 / 21.0 0.66 / 15.3 / 20.5 0.64 / 16.0 / 21.0

SABLE-wa 0.66 / 15.5 / 21.2 0.64 / 15.7 / 21.3 0.67 / 15.3 / 20.9 0.65 / 15.8 / 21.4

LS 0.63 / 16.3 / 21.0 0.62 / 16.5 / 21.1 0.65 / 15.9 / 20.5 0.62 / 16.5 / 21.2

SVR1 0.62 / 15.9 / 21.3 0.61 / 16.1 / 21.4 0.64 / 15.6 / 20.8 0.62 / 16.2 / 21.5

SVR2 0.62 / 16.6 / 22.8 0.61 / 16.7 / 22.7 0.64 / 16.4 / 22.5 0.61 / 16.9 / 23.0

Non-linear models: Rafal Adamczak; Linear models: Michael Wagner; Datasets and servers: Aleksey Porollo and Rafal Adamczak

Page 28: Jarek Meller Division of Biomedical Informatics,

JM - http://folding.chmcc.org 28

Regression vs. two-class classification

Method S163 S156 S135 S149

ACCpro server 25% 70.4% / 0.41 69.8% / 0.41 70.6% / 0.42 71.1% / 0.43

SABLE-wa BS62 71.7% / 0.43 71.1% / 0.42 72.2% / 0.44 72.2% / 0.44

SABLE-wa binary 71.4% / 0.42 70.9% / 0.41 71.9% / 0.43 72.1% / 0.44

SABLE-2c 25% 76.7% / 0.53 75.8% / 0.52 77.1% / 0.54 76.4% / 0.53

SABLE-wa 77.3% / 0.54 76.5% / 0.52 77.3% / 0.54 76.6% / 0.53

Page 29: Jarek Meller Division of Biomedical Informatics,

JM - http://folding.chmcc.org 29

Predicting transmembrane domains

Page 30: Jarek Meller Division of Biomedical Informatics,

JM - http://folding.chmcc.org 30

Predicting transmembrane domains

Page 31: Jarek Meller Division of Biomedical Informatics,

JM - http://folding.chmcc.org 31

Now back to threading and folding simulations

Applications in filtering out incorrect models in both de novo simulations and fold recognition

Domain structure prediction, protein-protein interactions

Better sensitivity in finding correct matches in threading: one story as an example

Page 32: Jarek Meller Division of Biomedical Informatics,

JM - http://folding.chmcc.org 32

Modeling the RNA Polymerase II Interaction with Modeling the RNA Polymerase II Interaction with the von Hippel-Lindau Proteinthe von Hippel-Lindau Protein: from experimental clues to structure prediction and back to experiment.

Jarek MellerChildren’s Hospital Research Foundation

Joint work with M. Czyzyk-Krzeska and her group,

College of Medicine, University of Cincinnati

Page 33: Jarek Meller Division of Biomedical Informatics,

JM - http://folding.chmcc.org 33

A play of life (script and beyond):A play of life (script and beyond):

Stage: protein society or proteosome Rules of life: proteins are assembled and degraded:

nursery (ribosome) vs. police and gillotine (ubiquitination and proteasome)

Social order: one look at the equilibrium in the system:

Holy scriptures (DNA)

Army of scribers (middle class proteins)

Temple priests (selected proteins)

Transcription

Translation

“I think we need to adjust the interpretation of the script … “(regulation of replication and transcription)

Law and oppression

Page 34: Jarek Meller Division of Biomedical Informatics,

JM - http://folding.chmcc.org 34

Hypoxia-induced stabilization of Hif-1aHypoxia-induced stabilization of Hif-1aGraphics from R.K. Bruick and S.L.McKnight, Science 295

Page 35: Jarek Meller Division of Biomedical Informatics,

JM - http://folding.chmcc.org 35

Experimental clues:Experimental clues:

Observation: correlation between pVHL levels and transcript elongation of the tyrosine hydroxylase gene (M. Czyzyk-Krzeska)

Could pVHL influence the transcription by interaction with elongation complex co-factors ?

Where to start? Experiment without a model is usually not a very good idea. Could in silico study and bioinformatics help?

Page 36: Jarek Meller Division of Biomedical Informatics,

36

Searching for pVHL interaction targets:Searching for pVHL interaction targets:

Hif-1a ODD interacts with pVHL – other pVHL targets should have domains structurally resembling that of Hif1-a ODD

Use the Hif-1a ODD sequence as a query in order to find other structures that are compatible with it

Rpb1Rpb6

Hif-1a ODD

Pro-OHpVHL

Page 37: Jarek Meller Division of Biomedical Informatics,

JM - http://folding.chmcc.org 37

RNA Polymerase II in the act of transcription,RNA Polymerase II in the act of transcription, Gnatt, Kornberg et. al., Science 292 (2001)

Page 38: Jarek Meller Division of Biomedical Informatics,

JM - http://folding.chmcc.org 38

C-ter Rpb1

Rpb6

The C-terminal of Rpb1 and Rpb6 form a pocket on the surface of The C-terminal of Rpb1 and Rpb6 form a pocket on the surface of RNA Polymerase II complex. RNA Polymerase II complex. C-ter of Rpb1 and Rpb6 represented by cartoons.

Page 39: Jarek Meller Division of Biomedical Informatics,

JM - http://folding.chmcc.org 39

Could the Hif ODD fragment resemble C-terminal Could the Hif ODD fragment resemble C-terminal fragment of RNA Polymerase II ?fragment of RNA Polymerase II ?

A motif similar to that of ODD found, but that could occur by chance. We used sequence alignments and threading to measure similarity between these fragments.

Sequences about 25% identical for a short fragment of about 50 aa – not significant.

Predicted secondary structures similar.

Suggestive but still not significant similarity.

However, a weak match between the adjacent Rpb6 and the consecutive part of the Hif-1a sequence was observed in threading (3D-PSSM, Loopp).

Prediction: the ODD shares 3D structure with C-ter fragment of Rpb1 and Rpb6.

Implication: VHL is likely to interact with Rpb1/Rpb6!

Page 40: Jarek Meller Division of Biomedical Informatics,

JM - http://folding.chmcc.org 40

Experimental results (MCK):Experimental results (MCK):

RNA Pol II peptides suggested by computational analysis do bind to pVHL and this binding is controlled by hydroxylation of the critical PRO residue.

Co-immunoprecipitations of hyper-phosphorylated RNA Pol II and pVHL observed: interaction confirmed.

Ubiquitination of Rpb1 confirmed. Biological meaning?