protein design

22
Protein Design CS273: Final Project Charles Kou [email protected] du Crystal structure of top7 A novel protein structure created with RosettaDesign. http://rosettadesign.med.unc.edu/

Upload: maine

Post on 23-Mar-2016

55 views

Category:

Documents


8 download

DESCRIPTION

Protein Design. Crystal structure of top7 – A novel protein structure created with RosettaDesign. CS273: Final Project Charles Kou [email protected]. http://rosettadesign.med.unc.edu/. What is Protein Design. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Protein Design

Protein Design

CS273: Final ProjectCharles [email protected]

Crystal structure of top7 – A novel protein structure created with RosettaDesign.

http://rosettadesign.med.unc.edu/

Page 2: Protein Design

What is Protein Design Opposite of structure prediction:

determine low energy sequence that yield given structure.

Computationally difficult: Search space of 20^n where n =

sequence length (20 amino acids) Major algorithms: Dead-end

elimination, genetic algorithms, Monte Carlo, Branch & Bound.

http://www.stanford.edu/class/cs273/project/project.html

Page 3: Protein Design

Major Algorithms Trade off between thoroughness and

computational speed. Monte Carlo / Genetic Algorithm:

Can sample space with infinite number of solutions Sidechain identity, side chain orientation and

backbone structure can be varied continuously. No guarantee of reaching global energy minimum.

Dead-End Elimination Allows only discrete conformations. Rejection criteria is used to prune the search

space.Desjarlais JR, Clarke ND. Computer search algorithms in protein modification and design. Curr Opin Struct Biol. 1998 Aug;8(4):471-5. PubMed

Page 4: Protein Design

Review: Energy LandscapeReview: Energy Landscape

q1

qi

q2

qj

qN-1

qN

defined over large dimensionalconformation space

JC Lantombe, Energy2.ppt

Page 5: Protein Design

Review: Example Energy Review: Example Energy FunctionFunction

• E = bonded terms + non-bonded terms + solvation terms

• E = (ES + EQ + ES-B + ETor) + (EvdW + Edipole) Bonded terms

- Relatively few Non-bonded terms

- Depend on distances between pairs of atoms - O(n2) Expensive to compute

Solvation terms- May require computing molecular surface

JC Lantombe, Energy2.ppt

Page 6: Protein Design

Random walk through conformation space At each cycle:

– Perturb current conformation at random– Accept step with probability:

(Metropolis acceptance criterion) The conformations generated by an arbitrarily long

MCS are Boltzman distributed, i.e., #conformations in V ~

ΔE- kTP(accept)=min 1,e

Review: Monte Carlo Review: Monte Carlo Simulation (MCS)Simulation (MCS)

E-kT

Ve dV

JC Lantombe, Energy2.ppt

Page 7: Protein Design

Monte Carlo Simulation Tend to waste time in local min. May consist of millions of steps. Energy must be evaluated frequently

(computationally expensive). Use ChainTree to improve

performance.

Koehl, P and Levitt, M. De novo protein design. I. In search of stability and specificity. Journal of Molecular Biology, 293, 1161-1181 (1999).

Lotan, I., Schwarzer, F., Halperin, D., Latombe, J.C.: Efficient maintenance and self-collision testing for kinematic chains. In: Symposium on Computational Geometry (2002) 43–52

Page 8: Protein Design

Genetic AlgorithmStarts with First generation pool. Iteratively apply genetic operators

(selection, recombination, mutation). Evloves toward better solution (low

energy function).

S. M. Larson, J. England, J. DesJarlais, and V. S. Pande. Thoroughly sampling sequence space: large-scale protein design of structural ensembles. Protein Science 11 2804-281 (2002). Protein Science

Page 9: Protein Design

Selection• Selection function takes into account

the value of fitness function. This gives priority to the “fit” organism but also gives chance for “less fit” organisms.

http://en.wikipedia.org/wiki/Genetic_algorithm

Page 10: Protein Design

Selection Method• Roulette Method: probability of selection is

proportional to the value of fitness function

• Tournament: picks k individuals (tournament size), and choose the individual with probability p. Iterate with probability p*(1-p), then p*(p*(1-p)) …

• Higher k = less chance for weaker individual.

http://en.wikipedia.org/wiki/Tournament_selection

http://en.wikipedia.org/wiki/Roulette_wheel_selection

Page 11: Protein Design

Recombination, Mutation Recombination: different segment of the

structure which is optimized in parallel can be recombined into the same model. Recombination occurs with a set probability. Otherwise, the population is propogated to the next generation.

Mutation: avoids local minima by mutating the child with a set probability.

Similar to MC: there is no guarantee to converge into global minimum.

http://en.wikipedia.org/wiki/Genetic_operator

Page 12: Protein Design

Genome@home• Genome@Home uses

distributed computing and genetic algorithm.

• It also incorporates backbone flexibility using Monte Carlo (random perturbation with RMSD<1.0a) which improves the result.

http://www.stanford.edu/group/pandegroup/genome/

Page 13: Protein Design

Dead-end Elimination Discrete conformational search. Functionally equivalent to exhustive search. It uses rejection criteria to prune the search

space. The robustness depends on the

discreteness and the rejection criteria used. Guaranteed convergence to global min. Initially used for sidechain placement. More

difficult for protein design because of high degrees of freedom.

Looger LL, Hellinga HW. Generalized dead-end elimination algorithms make large-scale protein side-chain structure prediction tractable: implications for protein design and structural genomics. J Mol Biol. 2001 Mar 16;307(1):429-45. PubMed

Page 14: Protein Design

Energy of conformation Reformulation of sidechain placement problem:

Amino acid identity is used instead of rotamer. The general DEE allows residue up to 300. Energy of conformation is defined as sum of

interaction among side chains and sum of interaction of sidechain and the backbone.

Rejection criteria is used and iterated until no more rotamers can be eliminated. Convergence occurs, or reduces the problem sufficiently for exhaustive serach.

Page 15: Protein Design

DEE filter: Rejection Criteria

Simple Criterion: If lowest energy struct that can be found using a given sidechain rotamer (low energy side chain conformation) is higer than the highest energy struct w/ different rotamer, the first rotamer is eliminated.

Page 16: Protein Design

DEE filter: Rejection Criteria

Goldstein Criteria: if energy struct containing one rotamer is always lowered by changing to a second one, the first one is eliminated.

Page 17: Protein Design

DEE filter: Rejection Criteria

Generalized Criterion: residues are added in group, eliminated clusters of rotamers in the groups maybe excluded from the minimum operator, in addition to those which form dead-end clusters with c.

Page 18: Protein Design

Mean Field Theory

• Reduce search space.• Self-consistency is

sought by placing amino acids at pre-selected positions in a given structure.

• Energy function is minimized by mean field.

Voigt CA, Mayo SL, Arnold FH, Wang ZG. Computational method to reduce the search space for directed protein evolution. Proc Natl Acad Sci U S A. 2001 Mar 27;98(7):3778-83. PNAS

Page 19: Protein Design

Review: Branch & Bound

• Set of solutions can be partitioned into subsets (branch)

• Upper limit on a subset’s solution can be computed fast (bound)

Branch & Bound1. Select subset with best

possible bound2. Subdivide it, and compute

a bound for each subset

S.Batzoglou, Threading2.ppt

Page 20: Protein Design

Rosetta Design• Initial backbone designed

without regard to side-chain packing.

• Iterates between sequence design and backbone optimization using Monte Carlo.

• Perturbation in random change in the torsional angles of 1-5 random residue, or substitution of backbone torsonal angles of 1-3 consecutive residues with torsional angles from a structure in the PDB. Sidechain optimization. Accept/reject using Metropolis criterion.

• 1.17-a backbone atom RMSD between model and structure.

Kuhlman B, Dantas G, Ireton GC, Varani G, Stoddard BL, Baker D. Design of a novel globular protein fold with atomic-level accuracy. Science. 2003 Nov 21;302(5649):1364-8. PubMed

Crystal structure of top7 – A novel protein structure created with RosettaDesign.

http://rosettadesign.med.unc.edu/

Page 21: Protein Design

Using Rosetta Design• Red: PDB 1A1M:

Mhc Class I Molecule B*5301 Complexed With Peptide Typdinqml From Gag Protein Of Hiv2

• Blue: Rosetta Stone Designed

• Visualized with Deep View / Swiss-PdbViewer.

http://www.rcsb.org/pdb/cgi/explore.cgi?pid=195321117535569&pdbId=1A1M

http://us.expasy.org/spdbv/

Page 22: Protein Design

b.e.a.n.s.• A simple openGL based

program was developed to test monte carol and genetic algorithms on designing “chain of jelly beans.”

• User is able to vary the initial structure of the “beans” and compare the efficiency of the algorithms via built-in timer.