Computational Protein Design. 2. Computational Protein Design Techniques

Download Computational Protein Design. 2. Computational Protein Design Techniques

Post on 11-May-2015

1.618 views

Category:

Technology

3 download

Embed Size (px)

TRANSCRIPT

<ul><li>1.Computational Protein Design 2. Computational Protein Design Techniques Pablo Carbonell pablo.carbonell@issb.genopole.fr iSSB, Institute of Systems and Synthetic BiologyGenopole, University dvry-Val dEssonne, France mSSB: December 2010Pablo Carbonell (iSSB)Computational Protein DesignmSSB: December 2010 1 / 45</li></ul> <p>2. Outline1 Introduction2 Computational Protein Descriptors3 Sequence-based CPD4 Structure-based CPD5 Search Algorithms in CPD6 De Novo Design7 Challenges in Sequence and Structure-Based CPD Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 2 / 45 3. Outline1 Introduction2 Computational Protein Descriptors3 Sequence-based CPD4 Structure-based CPD5 Search Algorithms in CPD6 De Novo Design7 Challenges in Sequence and Structure-Based CPD Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 3 / 45 4. Computational Protein DesignPablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 4 / 45 5. A Blueprint of CPD Approaches RS : research studies Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 5 / 45 6. Outline1 Introduction2 Computational Protein Descriptors3 Sequence-based CPD4 Structure-based CPD5 Search Algorithms in CPD6 De Novo Design7 Challenges in Sequence and Structure-Based CPD Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 6 / 45 7. Molecular Signature Descriptors A 2D representation of the molecular graphs Atomic signature : as an undirected colored graphs G(V , E, C),Xh with V : atoms, E : bonds, C : atom typeh (G) =(x) (1) The signature descriptor of height h of atom xxV in the molecular graph G, or h (x), is a The signature is a systematic canonical representation of the subgraph of codication of the molecular G containing all atoms that are at distance h graph [Faulon et al., 2004] from x(methylcyclopropane) =1 [C]([H][C]([H][H][C,0])[C,0]([H][H])[C]([H][H][H]))2 [C]([H][H][C]([H][C,0][C]([H][H][H]))[C,0]([H][H]))1 [C]([H][H][H][C]([H][C]([H][H][C,0])[C,0]([H][H])))1 [H]([C]([C]([H][H][C,0])[C,0]([H][H])[C]([H][H][H])))4 [H]([C]([H][C]([H][C,0][C]([H][H][H]))[C,0]([H][H])))3 [H]([C]([H][H][C]([H][C]([H][H][C,0])[C,0]([H][H])))) Pablo Carbonell (iSSB) Computational Protein DesignmSSB: December 2010 7 / 45 8. Molecular Signature of Reactions and Proteins Signature of a reaction. The signature of reaction RS1 + S2 + . . . + Sn P1 + P2 + . . . + Pn (2) that transforms n substrates into m products is given by the difference between the signature of the products and the signature of the substrates: hXh Xh (R) =(p) (s)(3) pPsS Signature of protein sequences. The protein P is represented by the linear chain given by its collapsed graph at residue level, a reduced molecular graph representation G(V , E, C) known as string signature where V : residues a A, E : contiguous in sequence, C : amino acid typeh Xh(P) = (a)(4) aA Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 8 / 45 9. Protein Contact Maps The protein contact map is a graph representation of the 3D interactions at residue level G(V , E, C) where V : residues, E : contacts, C : amino acid type Two residues are considered to interact when atoms between both residues are at a distance lower than a predetermined threshold (tipically 4.5 5 ) Contact maps can account for long-range interactions and conformational states Song et al. [2010] Pablo Carbonell (iSSB) Computational Protein DesignmSSB: December 2010 9 / 45 10. Outline1 Introduction2 Computational Protein Descriptors3 Sequence-based CPD4 Structure-based CPD5 Search Algorithms in CPD6 De Novo Design7 Challenges in Sequence and Structure-Based CPD Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 10 / 45 11. Sequence and Structure-Based CPD Sequence-based CPD methods are in some cases a good trade-off between complexity of the model and accuracy of the predictionsPablo Carbonell (iSSB)Computational Protein Design mSSB: December 2010 11 / 45 12. Sequence-based Knowledge-based potentials The simplest way to score a protein and to identify active regions is through amino acid scales or indexes AAindex is a database of544 amino acid indexes94 Amino Acid Matrices47 amino acid pair-wise contact potentialsExamples: hydrophobicity,accessibility, van der Waals volume,secondary structure propensity,exibilityThis approach is widely used whenanalyzing conserved motifs andcorrelated mutations in protein foldfamilies through multiple alignmentsPablo Carbonell (iSSB)Computational Protein Design mSSB: December 2010 12 / 45 13. Quantitative Structure-Activity Relationship (QSAR) Techniques The goal is to model causal relationships QSAR is a statistical method used between extensively by the chemical and pharmaceutical industries instructures of interacting molecules small-molecules and peptide measurables properties of scientic optimizationor commercial interest such as ADME/Tox (absorption, distribution, metabolism, excretion, and toxicity) of drugs Pablo Carbonell (iSSB) Computational Protein DesignmSSB: December 2010 13 / 45 14. QSAR Model Evaluation Model predictability is generally evaluated through the leave-one-out (LOO) cross-validation correlation coefcient q 2 Partial least-squares (PLS) regression is commonly used Additional nonlinear terms can be added through the use of nonlinear regression or machine learning techniques (kernel methods, random forests, etc)Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 14 / 45 15. QSAR Modeling WorkowPablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 15 / 45 16. Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 16 / 45 17. Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 17 / 45 18. The ProSAR Algorithm An extension of SAR-based approaches to CPD It formalizes the decision-making processes about which mutations to include in combinatorial librariesNXXy=cij xij(5) i=1 jAy : the predicted function (activity) of the protein sequencecij : the regression coefcients corresponding to the mutational effect of having residuej among the 20 amino acids A at postion ixij : binary variable indicating the presence or absence of residue j at position iPablo Carbonell (iSSB) Computational Protein DesignmSSB: December 2010 18 / 45 19. Improving Catalytic Function by ProSAR-driven Enzyme Evolution Statistical analysis of protein sequence activity relationships Bacterial biocatalysis ofAtorvastatin (Lipitor) (cholesterol-lowering drug) Codexis Inc. Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 19 / 45 20. Outline1 Introduction2 Computational Protein Descriptors3 Sequence-based CPD4 Structure-based CPD5 Search Algorithms in CPD6 De Novo Design7 Challenges in Sequence and Structure-Based CPD Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 20 / 45 21. Structure-based CPDEnergy functions and molecular force eldsLocal conformational restrictionsPredicting entropic factorsProtein topological propertiesFrom Narasimhan et al. [2010]Pablo Carbonell (iSSB) Computational Protein DesignmSSB: December 2010 21 / 45 22. Energy Functions and Molecular Force Fields In structure-based CPD, folds are usually represented by the spatial coordinates of the backbone atoms or design scaffold Protein design is done by amino acid side chains along the scaffold Side chains are only permitted to assume a discrete set of statistically preferred conformations: rotamers Rotamer/backbone and rotamer/rotamer interaction energies are tabulated These potential energies can then be approximated by using any of the standard force elds : CHARMM, AMBER, GROMOS Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 22 / 45 23. Molecular Force FieldsAMBER: a classical force eld for energy and MD calculations: X 1 X 1X 1 V (r N ) =kb (l l0 )2 +ka ( 0 )2 + Vn [1 + cos(n )] 22 2 bonds angles torsionsN1 X ( " 6 #)X Nr0ij 12 r0ij qi qj +i,j 2 + (6)rij rij4 0 rij j=1 i=j+1P 1(): energy between covalently bonded atoms.Pbondsangles (): energy due to the geometry of electron orbitals involved in covalent 2bonding.Ptorsions (): energy for twisting a bond due to bond order (e.g. double bonds) and 3neighboring bonds or lone pairs of electrons.PN1 PNi=j+1 (): non-bonded energy between all atom pairs: 4j=1 1van der Waals energies 2Electrostatic energies Pablo Carbonell (iSSB)Computational Protein DesignmSSB: December 2010 23 / 45 24. Structure-based Knowledge-based Potentials They are built by performing a large-scale statistical study of structural databases such as PDB (Protein Data Bank) Rotamer libraries ( 150 rotameric states) Binary patterning: only some type of amino acids are allowed based on the hydrophobic environment An implicit solvation model Secondary structure propensity Frequency of small segments in the PDB Pairwise potentials van der Waals interactions Hydrogen bonding Electrostatics Entropy-based penalties for exible side-chainsFrom Boas and Harbury [2007]Pablo Carbonell (iSSB)Computational Protein Design mSSB: December 2010 24 / 45 25. Energy Functions Design along the backbone or scaffold Rotamer/backbone and rotamer/rotamer interact. energies tabulated Precomputed from molecular force elds : CHARMM, AMBER, GROMOSTotal energy of the protein X XETOT = Ek (rk ) +Ekl (rk , rl ) (7) kk =l N : length of the protein rk : the rotamer of the kth side chain Ek (rk ) : the self-energy of a particular rotamer rk Ekl (rk , rl ) : the pair energy of rotamers rk , rj Pablo Carbonell (iSSB)Computational Protein Design mSSB: December 2010 25 / 45 26. The Role of Dynamics Besides protein structure, protein dynamics can play a direct role in molecular recognition Flexible proteins recognize their targets through induced t or conformational selection, likely showing promiscuity Binding is commonly enthalpy-driven, but in some cases entropy is important, for instance:Proteins with multiple binding sitesSmall hydrophobic molecules Two types of source of protein motions:Protein exibility: intraconformational dynamics (fast time scale motions)Conformational heterogeneity: interconformational dynamicsGibbs free energy: G =H T S(8)S=Ssolv + Sconf + Srt (9)Sconf : conformational entropy of protein and ligandSrtf : rotational and translational degree of freedoms Pablo Carbonell (iSSB)Computational Protein Design mSSB: December 2010 26 / 45 27. Predicting Side-chain Dynamics from Structural Descriptors The Lipari-Szabo model free approach approach allows to quantify motions from NMR experiments by computing the generalized order parameter S 2 Protein backbone dynamics : 15 NH and 13 C H NMR relaxation methods Protein side chain methyl dynamics : 13 C H NMR relaxation methods (side-chain motions in the picosecond-to-nanosecond time regime) From the BMRB we compiled S 2 data for 18 proteins, including 10 proteins in 2 or more different states : calmodulin, barnase, pdz, mup, dfhr, staphylococcal nuclease, pin1, sh3 domain, MSG This technique provides only measurements for the C of methyl groups in side chains : ALA, LEU, ILE, MET, THR, VAL Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 27 / 45 28. Structural Descriptors of Methyl Dynamics We consider the following parameters inuencing side-chain dynamics : Packing density at the methyl site i and its neighboring residues j within a sphere of r =5 0 1X X B X Pi = Cj erij = erjk A erij (10) C @ rij</p>

Recommended

View more >