structure-based biologics protein drug design using ... · structure-based biologics protein drug...
TRANSCRIPT
Structure-based biologics protein drug design using BioLuminate
David A. Pearlman Schrödinger, Inc. Cambridge, MA Schrödinger Webinar Series 3 December 2015
Schrödinger biologics
Software Platform
(BioLuminate)
End user tool (Comp
chemists, bench scientists)
Collaborations and contract
research
Next generation research
(Aggregation, Immunogenicity, Solubility, etc.)
Requires a platform intuitive enough for bench scientists and sophisticated enough
for expert users
BioLuminate: A biologics design toolkit
• Comfortable learning curve supports use across disciplines
– Focus on workflows and tasks – Feature rich – GUI modeled on PyMOL
Experiment is great, but sometimes it’s not enough
Growing embrace of
structure-based computational approaches in
biologics discovery
Obtain Protein
Structure
Theoretical Calculations
Improved Proteins
Structure-based protein biologics modeling
• Predict changes stability/affinity • Binding site/Epitope ID (protein docking) • In silico affinity maturation / library
design • Antibody Humanization • Liability ID / remove (aggregation,
reactivity, immunogenicity, solubility) • Stabilization (cysteine disulphides) • Enzyme improvement • ADC site ID • Formulation and delivery
Xray or Predicted from Sequence
• Homology modeling • Specialized antibody prediction tools
Obtain Protein
Structure
Theoretical Calculations
Improved Proteins
Structure-based antibody modeling
Xray or Predicted from Sequence
Predicting Fv from sequence
Predicting antibody CDR: The H3 loop is difficult
For antibody, L1→L3, H1, H2 usually “pretty good” using homology models. But H3 is a problem:
Framework L1 L2 L3 H1 H2 H3 Length range
6-13 3-3 7-8 7-9 5-6 10-13
RMS (Å) 0.9 1.0 0.5 1.4 1.3 1.1 3.3
Based on blinded prediction of 9 antibody structures, using four different “best practice” approaches. Almagro et al. (2011) Proteins 79 3050-3066
• H3 structure is very important – Antigen recognition – Frequent mutations during affinity maturation
• But H3 is most problematic – Rules/homology often don’t work – Large variation in length (5-26)
• Prime De novo approach to H3 prediction – Based on physics + knowledge-based terms – Friesner lab, Columbia; Jacobson lab, UCSF
• Proven state-of-art method for loop prediction
Dealing with H3 loop prediction
• Kai Zhu & Tyler Day: “Ab Initio Structure Prediction of the Antibody Hypervariable Loop” – Proteins: Struct. Funct. Bioinf. (2013) 81 1081-1089.
• Prediction in native crystal structure
– Remove H3 from crystal coordinates, build H3 de novo
• Benchmark set of 53 structures – 10 of length 4-6 (short) – 29 of length 7-11 (medium) – 14 of length 12-22 (long) – Sivasubramanian et al. (2009) Proteins 74 497
• Calculations run in BioLuminate program
Prime Applied to Antibody H3 Prediction
Antibody H3 Loop Predictions using Prime
H3 Loop Length
4-6 7-9 10-11 12-14 >17
Prime* 0.2 0.5 0.5 0.9 3.7
Rosetta± 1.7 1.5 1.7 2.9 4.0
*Zhu & Day Proteins: Struct. Funct. Bioinf. (2013) 81 1081
±Sivasubramanian, Sircar, Chaudhury & Gray (2009) Proteins 74 497
Average RMS deviations from x-ray for H3 (Å)
Crystal Symmetry Improves Predictions
H3 Loop Length
4-6 7-9 10-11 12-14 >17
Prime* 0.2 0.5 0.5 0.9 3.7
Prime* with sym 0.2 0.4 0.5 0.8 2.0
*Zhu & Day Proteins: Struct. Funct. Bioinf. (2013) 81 1081
±Sivasubramanian, Sircar, Chaudhury & Gray (2009) Proteins 74 497
Average RMS deviations from x-ray for H3 (Å)
The 2nd Blinded Antibody Modeling Assessment (AMA-II) 2013
Volume 82, Issue 8 August 2014
• 7 Participants: – Schödinger; CCG; Accelrys; Rosetta (Jeff Grey @John Hopkins)),
Macromoltek; Astellas Pharma + Osaka U; PIGS server
• Predict 10 unpublished structures (4 human Ab, 6 mouse Ab) • Two stages:
– Stage 1: Predict full Fv from sequence – Stage 2: Predict H3 given xray coordinates of remainder of structure
Method Fv RMSD Framework RMSD
All loops RMSD –H3
H3 RMSD
Schrödinger 1.1 ± 0.2Å 0.8 ± 0.2Å 1.1 ± 0.4Å 2.7 ± 0.8Å Accelrys 1.1 ± 0.3Å 0.9 ± 0.3Å 1.1 ± 0.5Å 3.0 ± 1.1Å CCG 1.1 ± 0.2Å 0.9 ± 0.3Å 1.0 ± 0.3Å 3.3 ± 0.9Å Rosetta (Jeff Grey) 1.1 ± 0.2Å 0.8 ± 0.2Å 1.1 ± 0.4Å 2.6 ± 0.9Å Macromoltek 1.4 ± 0.2Å 1.2 ± 0.2Å 1.2 ± 0.3Å 3.0 ± 1.0Å Astellas + Osaka U 1.1 ± 0.2Å 0.8 ± 0.2Å 1.0 ± 0.2Å 2.3 ± 0.6Å PIGS server 1.2 ± 0.1Å 0.9 ± 0.2Å 0.9 ± 0.4Å 3.1 ± 1.1Å Average 1.1 ± 0.2Å 0.9 ± 0.2A 1.1 + 0.4Å 2.8 ± 0.9Å
AMA-II : Overall results for Round 1: Full Fv from sequence
• All methods are generally producing decent models • H3 is the recurrent problem
Method H3 RMSD (Round 1)
H3 RMSD (Round 2)
Schrödinger 2.7 ± 0.8Å 1.4 ± 1.1Å Accelrys 3.0 ± 1.1Å 2.3 ± 1.0Å CCG 3.3 ± 0.9Å 2.5 ± 1.6Å Rosetta (Jeff Grey) 2.6 ± 0.9Å 2.1 ± 1.1Å Macromoltek 3.0 ± 1.0Å 3.3 ± 1.2Å Astellas + Osaka U 2.3 ± 0.6Å 1.4 ± 1.9Å PIGS server 3.1 ± 1.1Å Average 2.8 ± 0.9A 2.2 ± 0.9Å
AMA-II : Overall results for Round 2: Predict H3, given xray structure of remainder of Fv
• Impressive automated prediction using Prime
Blinded H3 predictions: Prime versus other methods
Model H3 Length
Prime Prediction
Prime rank versus other methods
RMSD of best method (if not Prime)
AM-2 11 3.2 2 3.0 AM-3 8 0.5 1 N/A AM-4 8 1.1 4 1.0 AM-5 8 3.2 6 0.9 AM-6 14 3.1 1 N/A AM-7 8 0.4 1 N/A AM-8 11 1.8 1 N/A AM-9 10 0.6 1 N/A
AM-10 11 1.0 1 N/A AM-11 10 0.5 1 N/A
RMSD distances in Å
Best in competition Not best, but very close Miss
Blinded H3 predictions: Loop lookup versus Prime
Model H3 Length
Best in Database
Homology Prediction
Prime Prediction
AM-2 11 1.7 4.3 3.2 AM-3 8 0.9 1.5 0.5 AM-4 8 0.7 2.2 1.1 AM-5 8 1.0 2.4 3.2 AM-6 14 2.6 3.1 3.1 AM-7 8 1.5 2.3 0.4 AM-8 11 1.7 3.3 1.8 AM-9 10 0.7 1.9 0.6
AM-10 11 1.5 2.8 1.0 AM-11 10 0.4 2.6 0.5
Average 1.3 2.7 1.4
RMSD distances in Å Prime prediction better than ANY loop model in database
BioLuminate provides tools for antibody humanization
Automated framework
replacement
Homology + mutation
calculations to optimize loop region
• Aggregation / viscosity • Immunogenicity • Solubility • Post-translational modifications (glycosylation, etc.) • Reactive hot spots • Thermal stability • They can’t “fix it in formulation”
• → Build out liabilities without destroying affinity or stability • Need to calculate affinity & stability changes with sequence
You might be a biologics designer if…
…the following keep you up at night…
ID libabilties
Evaluate possible
mutations
Select and create
mutants
It is critical to engineer out protein liabilities early
• Aggregation/Viscosity propensity • Hotspots: Deamidation, oxidation,
glycosylation, proteolysis • Immunogenicity • Solubility • Thermal stability • IP avoidance
• Calculate energy changes – Affinity – Stability
• Mutations: – Remove liabilities – Retain affinity – Retain stability
Liability ID using BioLuminate
Aggregation hot spots
Reactive residue ID
Immunogenicity prediction
Titration curve / Isoelectric point
• How does this residue change affect: – Stability – Affinity (to other molecules)
Removing Liabilities: Residue mutation studies
ABC AXC
?
• For liability reduction: – Mutate out liability – Maintain stability – Maintain affinity (if applicable)
• Empirical scoring methods – Approximate – Fast – Can only predict parameterized moieties
• MM-GBSA – Approximate (implicit solvent) – Fast (< 1 minute per calculation)
• FEP (Free Energy Perturbation) – Precise (mean unsigned error ~1 kcal/mol) – Computationally more expensive, explicit solvent – ~1-2 calculations per GPU processor/day
• Requires huge amount conformational sampling
Predicting free energy changes: Affinity/Stability of Residue A to Residue B
ABC AXC
?
Physics-based methods (Can predict non-standard AA)
Skip
Free energy calculations (ΔG)Question: What is ΔG between two structurally similar molecules?
Method:
ΔGstatisticalmechanics
Thermodynamic ensemble of conformations
Ensemble generated using Molecular Dynamics
Changes in affinity/binding
and stability
FEP: Technologies Facilitate a Robust Solution
Improved force field…….
Enhanced sampling........
Hardware acceleration…
Automated setup………..
Error estimates………….
OPLS3
REST
GPU
FEP Mapper
Cycle Closure
Faster computers and GPU blast off
CPU speeds during FEP era increased by ~8000x
GPU
GPU 50-100x through
parallelism
Schrödinger’s GPU Cluster
400 GPUs: Two 200 GPU racks
Roughly the same processing power as
the total of every home PC in the USA
in 1987
Year Relative amount sampling
How long would it take?
Sufficient for accuracy?
1987 1x 1 month on supercomputer
No
2015 3000x 1 day on GPU Yes
FEP for affinity validated for small molecule ligands
-‐15
-‐14
-‐13
-‐12
-‐11
-‐10
-‐9
-‐8
-‐7
-‐6
-‐5
-‐4
-‐15 -‐14 -‐13 -‐12 -‐11 -‐10 -‐9 -‐8 -‐7 -‐6 -‐5 -‐4
BACE
CDK2
JNK1
MCL1
ΔG FEP
(kcal/mol)
ΔG Expt. (kcal/mol) |ΔΔGFEP – ΔΔGExpt.| (kcal/mol)
Percen
tage
46.2%
24.8%
15.4%
7.4% 6.2%
0%
10%
20%
30%
40%
50%
< 0.6 0.6-‐1.2 1.2-‐1.8 1.8-‐2.4 >2.4
• Over 500 perturbations tested for 17 systems w/ identical automated protocol – RMSE ≈ 1.2 kcal/mol
Wang et al. (2015) JACS 137 2695-2703
What does RMSE ≈ 1.2 kcal/mol mean for predicFons?
• The probability the sign of ∆∆G is correct depends on the size of |∆∆G|
Derived from Shirts, M. R., Mobley, D. L., & Brown, S. P. (2010) Free-‐energy calcula-ons in structure-‐based drug design.
Predicted |ΔΔG|
(kcal/mol)
Probability predicted sign agrees with experiment
0.6 69.1% 1.2 84.1% 1.4 87.8% 1.8 93.3% 2.4 97.7%
• 7 projects • Protein:small
molecule ligand binding
• 158 prospective FEP predictions
• Experimentally tested AFTER prediction
• 111 predictions within 1 log unit
• Only 9 predictions off by > 2 log unit
FEP validated for small molecule ligands: Performance in Prospective Drug Discovery Collaborations
How do we calculate free energy changes for affinity?
ΔΔGbinding = ΔG1 – ΔG2 = ΔGA – ΔGB
A
B
1 2
§ Experimental: Measure vertical processes 1 and 2.
§ Theoretical: Calculate horizontal processes A and B
FEP delta affinity results across all protein systems
223 mutations R2 = 0.76 Mean Unsigned Err = 1.7 kcal/mol Slope = 0.98
158 mutations R2 = 0.52 Mean Unsigned Err = 2.8 kcal/mol Slope = 1.08
How do we calculate free energy changes for stability?
ΔΔGstability = ΔG1 – ΔG2 = ΔGA – ΔGB
A
B
1 2
§ Experimental: Measure vertical processes 1 and 2.
§ Theoretical: Calculate horizontal processes A and B
Protein stability predictions using FEP Applied to systems from Fold-X Test Set System PDB ID #
Mutations R2-value MUE ΔΔG Sign
correct T4-Lysozyme 2LZM 66 0.67 1.2 92%
Human Lysozyme
1REX 45 0.66 1.3 80%
Peptostrept. Magn. Prot. L
1HZ6 44 0.59 1.1 89%
B1 IG binding protein G
1PGA 24 0.37 1.1 79%
Fibronectin II domain
1TEN 32 0.33* / 0.68 1.6 / 1.3 88% / 93%
FK506 BP 1FKB 27 0.4 1.6 85%
All 238 0.57 1.2 87%
Errors in Kcal/mol; *: Result strongly affected by terminal outliers
FEP performance compared to other methods
Software R2-value achieved* Stabilizing/destabilizing
% correct CC/PBSA 0.31 79% EGAD 0.35 71% FoldX 0.25 70% Hunter 0.20 69% I-Mutant2.0 0.29 78% Rosetta 0.07 73% FEP 0.57 87%
• FEP: Appreciably better R2
• FEP: Better correct stabilizing/destabilizing classification • (Non FEP results from Potapov, 2009, Prot. Eng. Des. Sel., 22, 553)
Introducing mutations to create stable disulphide bonds
ID residues close enough to be candidates
Apply weighted scoring function • A) Function inferred from geometries in 11399 PDB struct +
• B) Implicit solvent energy function (MM-GBSA)
Create triaged list of mutations likely to create stable disulphides
Application of cys scanning to find stable construct
LPA + ONO agonist. Not stable enough to be crystallized
ID 5 sets of mutation candidates to introduce new disulphide using
BioLuminate
(Asp204, Val282) -> (Cys204…Cys282) works
Substantially better thermal stability
Crystals suitable for a structure
Other BioLuminate features…
• I Tubert-Brohman, W Sherman, M Repasky & T Beuming (2013) J. Chem. Inf. Model. 53 1689-1699
Automated protein-peptide docking
Design peptide linkers of any composition between protein domains
Protein crosslink design
Non-standard amino acids supported in residue scanning/affinity maturation
Non-standard amino acids
Automatically evaluates structure quality on a large number of criteia
Protein quality report
Other BioLuminate features…
Quickly ID, characterize and visualize residues at protein interface
Interactive Protein Interaction Table
Identify structural problems and visualize them in the workspace
Protein quality visualizer
Homology modeling, chimeric models, advanced loop modeling
Homology Modeling
Advanced sequence viewer (work in progress)
Sequence viewer
• BioLuminate offers an extensive set of foundation tools that facilitate – A suitably low learning curve for new users, non expert users – Collaboration – Advanced exploration of new approaches to fundamental problems in biologics
• BioLuminate features first in class approaches to – Antibody modeling – Protein engineering, including cysteine scanning and FEP
Future work…
• Kai Zhu • Tyler Day • Dora Warshaviak • Colleen Murrett • Richard Friesner
Acknowledgements
• Thomas Steinbrecher • Fiona McRobb • Jeffrey Sanders
DRUG DISCOVERY COLLABORATIONS
Antibody predictions Protein FEP Calculations
• Woody Sherman • Robert Abel
Immunogenicity
• Tyler Day
Aggregation patch analyzer • Johannes Maier