computational protein design

Aka, The Inverse Folding Problem

Topic 18Chapter 39, Du and Bourne “Structural Bioinformatics”

Protein Design is an Inverse Problem of Structure Prediction

MDVGQAVIFLGPPGAGKGTQASRLAQELGFKKLSTGDILRDHVARGTPLGERVRPIMERGDLVPDDLILELIREELAERVIFDGFPRTLAQAEALDRLLSETGTRLLGVVLVEVPEEELVRRIL…

Biology

Adopted from Amy Keating’s slides at MIT.

Different Types of Protein Design

Protein design

Grand challengeDe novo design

Immediate Practical applications

Design of new proteins -- novel protein folds-- binding interfaces-- enzymatic activities-- etc.

Redesign of existing proteins-- increased thermostability-- altered binding specificity-- improved binding affinity-- enhanced enzymatic activity-- altered substrate specificity

Current Opinion in Biotechnology 2007, 18:1-7.

Protein Design Problems

Annu. Rev. Biochem. 2008. 77:363-382.

Goal: design a protein that adopts a given structure

Open problems with assessment:-- What resolution is required? (fold,

sidechain, loop, etc?)-- Stability of the designed protein-- Structural uniqueness-- Must solve the structure to know how

you did!

There are typically many sequences that adopt the fold, so you must try to find one that the most stable.

That is, minimize the quantity:

DGfold = Gfolded – Gunfolded

Search through many possible sequences, and then pick the one with the best Gfold.

Design target Designed protein

The big challengesSearch

The search space is astronomical: 20n

Except in rare subspace search problems, this is computationally intractable.

It is practically impossible to DGfold because…

-- What is the structure of the folded state? (sidechain and loop positions)

-- How do we model the unfolded state?

-- Entropy?!

Instead, we focus on the energy of the folded protein, meaning native structure interactions. That is, replace DGfold with DEfold using MM force fields.

Energy

Sidechain packingDesign target Designed protein

As we did with structure prediction in homology modeling, we will typically use a rotamer library-based approach.

Search algorithms for large spaces

Exhaustive search – too slow!

Stochastic methods-- Monte Carlo-- Genetic algorithms

Pruning algorithms (which are deterministic)-- Branch and Bound-- Dead End Elimination

For all-atom protein design, some amount of stochasticism is generally required. Purely deterministic approaches rarely succeed in designing complete proteins.

Dead End EliminationEliminate, one at a time, rotamer choices that cannot under any circumstance be part of the minimum energy solution.

From Wikipedia: DEE is a method for minimizing a function over a discrete set of independent variables. The basic idea is to identify "dead ends",

i.e., "bad" combinations of variables that cannot possibly yield the global minimum and to refrain from searching such combinations further. Hence,

dead-end elimination is a mirror image of dynamic programming techniques in which "good" combinations are identified and explored further.

Although the method itself is general, it has been developed and applied mainly to the problems of predicting and designing the structures of proteins.

Dead End EliminationThere is a global minimum energy conformation (GMEC) where each residue as a unique rotamer, meaning the GEMC is the set of rotamers that has

the lowest energy.

Energy is defined as pairwise decomposable, meaning the total energy is broken down into pairwise interactions + the energy of the rotamer interacting

with the backbone.

Self energy Pairwise energy

Dead End EliminationThere is a global minimum energy conformation (GMEC) where each residue as a unique rotamer, meaning the GEMC is the set of rotamers that has

the lowest energy.

Energy is defined as pairwise decomposable, meaning the total energy is broken down into pairwise interactions.

In this example, assuming a fixed background (black), the rotamer that has the lower energy is chosen.

Dead End EliminationHowever, all of the other rotamers are not fixed. (Nor is it realistic to assume the backbone is either, but we’ll brush that issue aside for now.)

If the blue rotamer is always lower energy than the red, for example, then we can eliminate consideration of it from all considerations of future

configurations.

Iterate till completion.

Put otherwise, if the “worst case scenario” for blue is better than the “best case scenario” for red, then you always choose blue.

Dead End Elimination in WordsDead-end elimination algorithms provide a deterministic approach to finding the global minimum energy conformation (GMEC) of a set of amino

acid side chains anchored to specified backbone coordinates. All of the rotamers at a particular residue position are essentially in competition for

inclusion in the GMEC. The idea underlying DEE algorithms is that, by comparing the energy contributions of different candidate rotamers at a given

position, it is possible to identify certain rotamers which cannot exist in the GMEC. These dead-ending rotamers can be eliminated from future

consideration, thus decreasing the combinatorial size of the problem.

To follow this approach, the potential function used to evaluate the conformational energy must be expressed solely in terms of pairwise interactions.

The relative merits of candidate rotamers at a given position can then be ascertained without having to evaluate the total energy of all conformations

using each of the candidates. Instead, only the portion of the total energy that arises from pairwise interactions with the position in question need be

considered. By comparing the relative size of the pairwise energy contributions using each of the candidate rotamers at this position, it is possible to

identify incompatibility with the GMEC without knowledge of the actual minimum energy. The combinatorial cost of this procedure is far less

than the cost of complete enumeration of the energy of each conformation.

Pierce et al., 2000, J Comp Chem, 999-1009.

Dead End EliminationThis condition implies that i

r can be eliminated if the net energy

contribution resulting from its best-case pairwise interactions with

rotamers at all other positions (spanned by ju

) is still worse than that

produced by the worst-case pairwise interactions of some other

candidate rotamer, it, at the same position.

Pierce et al., 2000, J Comp Chem, 999-1009.

Different dead-end elimination criteria for sample energy profiles. The abscissa represents all possible conformations of the protein and the ordinate describes the net energy contribution produced by interactions with specific rotamers at position i. (a) Original DEE: ir is eliminated by it1, but not by it2.

Dead End EliminationIf two rotamers red,blue at residue position i, identify and eliminate rotamers that cannot be part of the best solution. Here, the red rotamer is

eliminated by the blue.

Note: Cannot afford to calculate energies for all of these configurations!

Most favorable interaction of red with conformational backgroundMost unfavorable interaction of blue with conformational background

DEE algorithm applied to protein design

Dead End Elimination: Goldstein criterionEnergy is composed of two parts: interaction with template and pairwise interactions between residues.

What is the least energy it would cost to replace it with i

r? We use the simple Goldstein criterion, which eliminates rotamer r at position i if, when

compared to some other rotamer t at the same position, the following inequality is satisfied:

DEE algorithm applied to protein design

If DE > 0, then eliminate ir.

Apply iteratively to all rotamer pairs.

The energy profile changes as rotamers are eliminated, leading to

elimination of further rotamers.

Coiled-coil design (Mayo et al.)

Biosensor design (Hellinga et al.)

The Hellinga lab has designed many different receptors based on the bPBP fold.

Protein-protein interface design (Love, Mayo, et al.)

Rosetta Design

Initial sequence selection

(primarily 12-6, HB, and Born terms)

Monte Carlo minimization

(both at rotamer and backbone levels)

Sequence optimization

Sketch input structure (the fold) Final structure

Note: this step is analogous to structure

prediction!

Repeat till convergence

Top7 (Baker, Kuhlman, et al.)

Conformational switch (Kuhlman, et al.)

unfolded

folded

Fold

ed to

unf

olde

d tra

nsiti

on a

s zi

nc is

titra

ted

in

The ideal: Designed sequences that meet both criteria

state of the art

The Holy Grail

TS: transition state

Design model: purpleX-ray crystal structure: green

The dirty little secret of protein design…

For every high impact success in the protein design literature, there are dozens (perhaps hundreds) of spectacular failures that go unreported.

Paraphrased from S. Mayo (Protein Society Meeting, 2006).

Scientific misconduct?

Design of a novel triosephosphate isomerase

DEE repacking around catalytic site



Lineweaver-Burke plots

As do I!



computational protein design

Documents