computational protein design

32
Aka, The Inverse Folding Problem Topic 18 er 39, Du and Bourne “Structural Bioinformatics”

Upload: aliza

Post on 23-Feb-2016

58 views

Category:

Documents


8 download

DESCRIPTION

Computational Protein Design. Aka, The Inverse Folding Problem. Topic 18. Chapter 39, Du and Bourne “Structural Bioinformatics”. Protein Design is an Inverse Problem of Structure Prediction. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Computational Protein Design

Aka, The Inverse Folding Problem

Topic 18Chapter 39, Du and Bourne “Structural Bioinformatics”

Page 2: Computational Protein Design

Protein Design is an Inverse Problem of Structure Prediction

MDVGQAVIFLGPPGAGKGTQASRLAQELGFKKLSTGDILRDHVARGTPLGERVRPIMERGDLVPDDLILELIREELAERVIFDGFPRTLAQAEALDRLLSETGTRLLGVVLVEVPEEELVRRIL…

Biology

Adopted from Amy Keating’s slides at MIT.

Page 3: Computational Protein Design

Different Types of Protein Design

Protein design

Grand challengeDe novo design

Immediate Practical applications

Design of new proteins -- novel protein folds-- binding interfaces-- enzymatic activities-- etc.

Redesign of existing proteins-- increased thermostability-- altered binding specificity-- improved binding affinity-- enhanced enzymatic activity-- altered substrate specificity

Current Opinion in Biotechnology 2007, 18:1-7.

Page 4: Computational Protein Design

Protein Design Problems

Annu. Rev. Biochem. 2008. 77:363-382.

Page 5: Computational Protein Design

Goal: design a protein that adopts a given structure

Open problems with assessment:-- What resolution is required? (fold,

sidechain, loop, etc?)-- Stability of the designed protein-- Structural uniqueness-- Must solve the structure to know how

you did!

There are typically many sequences that adopt the fold, so you must try to find one that the most stable.

That is, minimize the quantity:

DGfold = Gfolded – Gunfolded

Search through many possible sequences, and then pick the one with the best Gfold.

Design target Designed protein

Page 6: Computational Protein Design

The big challengesSearch

The search space is astronomical: 20n

Except in rare subspace search problems, this is computationally intractable.

It is practically impossible to DGfold because…

-- What is the structure of the folded state? (sidechain and loop positions)

-- How do we model the unfolded state?

-- Entropy?!

Instead, we focus on the energy of the folded protein, meaning native structure interactions. That is, replace DGfold with DEfold using MM force fields.

Energy

Page 7: Computational Protein Design

Sidechain packingDesign target Designed protein

As we did with structure prediction in homology modeling, we will typically use a rotamer library-based approach.

Page 8: Computational Protein Design

Search algorithms for large spaces

Exhaustive search – too slow!

Stochastic methods-- Monte Carlo-- Genetic algorithms

Pruning algorithms (which are deterministic)-- Branch and Bound-- Dead End Elimination

For all-atom protein design, some amount of stochasticism is generally required. Purely deterministic approaches rarely succeed in designing complete proteins.

Page 9: Computational Protein Design

Dead End EliminationEliminate, one at a time, rotamer choices that cannot under any circumstance be part of the minimum energy solution.

From Wikipedia: DEE is a method for minimizing a function over a discrete set of independent variables. The basic idea is to identify "dead ends",

i.e., "bad" combinations of variables that cannot possibly yield the global minimum and to refrain from searching such combinations further. Hence,

dead-end elimination is a mirror image of dynamic programming techniques in which "good" combinations are identified and explored further.

Although the method itself is general, it has been developed and applied mainly to the problems of predicting and designing the structures of proteins.

Page 10: Computational Protein Design

Dead End EliminationThere is a global minimum energy conformation (GMEC) where each residue as a unique rotamer, meaning the GEMC is the set of rotamers that has

the lowest energy.

Energy is defined as pairwise decomposable, meaning the total energy is broken down into pairwise interactions + the energy of the rotamer interacting

with the backbone.

Self energy Pairwise energy

Page 11: Computational Protein Design

Dead End EliminationThere is a global minimum energy conformation (GMEC) where each residue as a unique rotamer, meaning the GEMC is the set of rotamers that has

the lowest energy.

Energy is defined as pairwise decomposable, meaning the total energy is broken down into pairwise interactions.

In this example, assuming a fixed background (black), the rotamer that has the lower energy is chosen.

Page 12: Computational Protein Design

Dead End EliminationHowever, all of the other rotamers are not fixed. (Nor is it realistic to assume the backbone is either, but we’ll brush that issue aside for now.)

If the blue rotamer is always lower energy than the red, for example, then we can eliminate consideration of it from all considerations of future

configurations.

Iterate till completion.

Put otherwise, if the “worst case scenario” for blue is better than the “best case scenario” for red, then you always choose blue.

Page 13: Computational Protein Design

Dead End Elimination in WordsDead-end elimination algorithms provide a deterministic approach to finding the global minimum energy conformation (GMEC) of a set of amino

acid side chains anchored to specified backbone coordinates. All of the rotamers at a particular residue position are essentially in competition for

inclusion in the GMEC. The idea underlying DEE algorithms is that, by comparing the energy contributions of different candidate rotamers at a given

position, it is possible to identify certain rotamers which cannot exist in the GMEC. These dead-ending rotamers can be eliminated from future

consideration, thus decreasing the combinatorial size of the problem.

To follow this approach, the potential function used to evaluate the conformational energy must be expressed solely in terms of pairwise interactions.

The relative merits of candidate rotamers at a given position can then be ascertained without having to evaluate the total energy of all conformations

using each of the candidates. Instead, only the portion of the total energy that arises from pairwise interactions with the position in question need be

considered. By comparing the relative size of the pairwise energy contributions using each of the candidate rotamers at this position, it is possible to

identify incompatibility with the GMEC without knowledge of the actual minimum energy. The combinatorial cost of this procedure is far less

than the cost of complete enumeration of the energy of each conformation.

Pierce et al., 2000, J Comp Chem, 999-1009.

Page 14: Computational Protein Design

Dead End EliminationThis condition implies that i

r can be eliminated if the net energy

contribution resulting from its best-case pairwise interactions with

rotamers at all other positions (spanned by ju

) is still worse than that

produced by the worst-case pairwise interactions of some other

candidate rotamer, it, at the same position.

Pierce et al., 2000, J Comp Chem, 999-1009.

Different dead-end elimination criteria for sample energy profiles. The abscissa represents all possible conformations of the protein and the ordinate describes the net energy contribution produced by interactions with specific rotamers at position i. (a) Original DEE: ir is eliminated by it1, but not by it2.

Page 15: Computational Protein Design

Dead End EliminationIf two rotamers red,blue at residue position i, identify and eliminate rotamers that cannot be part of the best solution. Here, the red rotamer is

eliminated by the blue.

Note: Cannot afford to calculate energies for all of these configurations!

Most favorable interaction of red with conformational backgroundMost unfavorable interaction of blue with conformational background

Page 16: Computational Protein Design

DEE algorithm applied to protein design

Page 17: Computational Protein Design

Dead End Elimination: Goldstein criterionEnergy is composed of two parts: interaction with template and pairwise interactions between residues.

What is the least energy it would cost to replace it with i

r? We use the simple Goldstein criterion, which eliminates rotamer r at position i if, when

compared to some other rotamer t at the same position, the following inequality is satisfied:

Page 18: Computational Protein Design

DEE algorithm applied to protein design

If DE > 0, then eliminate ir.

Apply iteratively to all rotamer pairs.

The energy profile changes as rotamers are eliminated, leading to

elimination of further rotamers.

Page 19: Computational Protein Design

Coiled-coil design (Mayo et al.)

Page 20: Computational Protein Design

Biosensor design (Hellinga et al.)

The Hellinga lab has designed many different receptors based on the bPBP fold.

Page 21: Computational Protein Design

Protein-protein interface design (Love, Mayo, et al.)

Page 22: Computational Protein Design

Rosetta Design

Initial sequence selection

(primarily 12-6, HB, and Born terms)

Monte Carlo minimization

(both at rotamer and backbone levels)

Sequence optimization

Sketch input structure (the fold) Final structure

Note: this step is analogous to structure

prediction!

Repeat till convergence

Page 23: Computational Protein Design

Top7 (Baker, Kuhlman, et al.)

Page 24: Computational Protein Design

Conformational switch (Kuhlman, et al.)

unfolded

folded

Fold

ed to

unf

olde

d tra

nsiti

on a

s zi

nc is

titra

ted

in

Page 25: Computational Protein Design

The ideal: Designed sequences that meet both criteria

state of the art

The Holy Grail

Page 26: Computational Protein Design
Page 27: Computational Protein Design

TS: transition state

Page 28: Computational Protein Design

Design model: purpleX-ray crystal structure: green

Page 29: Computational Protein Design

The dirty little secret of protein design…

For every high impact success in the protein design literature, there are dozens (perhaps hundreds) of spectacular failures that go unreported.

Paraphrased from S. Mayo (Protein Society Meeting, 2006).

Page 30: Computational Protein Design

Scientific misconduct?

Design of a novel triosephosphate isomerase

Page 31: Computational Protein Design

DEE repacking around catalytic site

Scientific misconduct?

Design of a novel triosephosphate isomerase

Page 32: Computational Protein Design

Lineweaver-Burke plots

As do I!

Scientific misconduct?

Design of a novel triosephosphate isomerase