practical impact of ai on drug discovery - merck group · practical impact of ai on drug discovery....
TRANSCRIPT
Friedrich RippmannComputational Chemistry & Biology, Merck KGaA, Darmstadt, Germany
AI-PI, San Francisco, 27 February 2019
Turning methodological progress in operational benefit
Practical impact of AI on drug discovery
Merck KGaA, Darmstadt, Germany is the oldest pharmaceutical and chemical company worldwide
1668 Friedrich Jacob Merck (1621-1678)
buys the Angel Pharmacy (Engel-Apotheke)
1827 Emanuel Merck (1794-1855) starts
production on an industrial scale
North AmericaRest of the world
1914 WWI
33
Deep Learning & Machine Learning are at “Peak” Hype
See : Gartner Hype Cycle 2017, gartner.com
So: where isthe benefit???
Text Miningfor Competitive
Intelligence
HTS evaluation
Potential AI contributions to the Drug Discovery Process
HTS Image Analysis
Side effect prediction
HTS deconvolution
Cpd discovery & optimization without HTS
Cpd optimization
Pre-HTS supplementation byVirtual Screening
Drug/cpd repositioning/repurposing
Computer-aided synthesis
5
cryoEM and tomographyimage analysis
Patient selection
6
Main benefit, so far, is in Predictive Models, based on advanced Machine Learning
Applications are numerous (next slides)…
Where is the benefit of AI?????
Predictive modelsmust be easy togenerate
Deep Learning applied in Chemoinformatics
Collaboration is key
• Research collaboration with leader in the field of deep learning (Prof. Sepp Hochreiter, Uni Linz, winner of Tox21 challenge)
Neural Networkimage recognition Activity prediction
Deep learning technology
Machinery known from image recogniton
Applied to drug discovery
Generation of Predictive Models: example kinase models to predict selectivity
Achievements so far
• 277 novel kinase models generated• Data basis: 4,800 compounds measured in 277 kinase assays
high predictivity
goodpredictivity
reasonable predictivity
36 122 200
What’s in for the chemist?
• Prediction of kinase selectivity for newly designed molecules
Benefit: in silico profiling of compounds & cpd. ideas
Neural Networks & Hyperparameters
NN-Architectures
• Layer-Type
•Number of Layers
•Neurons per Layer
•Activation-Functions
Training-Parameters
•Optimiser
• Learning-Rate
•Weight-Decay
•Batch-Size
• Loss-Function
• …
Hyperparameters
Millions of unique
combinations possible
Genetic Algorithm for Hyperparameter Optimisation
5.1
5.2
4
1 2 3
Influence of Hyperparameters
1_activation (344)
First hidden
layer
Activation-function
of this layer
Number of
contributing pairs
Contributing pairs only differ by
the shown parameter
Boxplots are based on the
absolute difference of both inner-
kappa values of all contributing
pairs
User-Interface
Benefit: semi-automatic generation of high-quality Machine Learning models
Predictionsmust be easy to access
15
MOCCA: Merck Online Computational Chemistry Analyzer
16
Available models
Global models PhysChem Kinase selectivity model
Project-specific models
Great variety, e.g.
- local hERG models
- AOX
- Biliary excretion (literature)
- Various target-specific activity models
- Nucleus permeability
- .. many more
Benefit: predicted bad compounds will NOT be made
17
Reduction of assay requests (2 months after discontinuation ofcomprehensive assaying of ALL newly synthesized compounds)
0
10
20
30
40
50
60
70
80
90
100
log P log D Ksol
-86% -30% -19%
There is potential for further reductions (especially for kinetic solubility)
• Biological assays have sizeable experimental variability
• Re-testing triggered automatically if reliable in silico prediction differs from experimental result
• Leads to better data quality, and subsequently to better models
• Virtuous circle: in silico models improve in vitro models and vice versa
Automatic re-testing improves data quality
Benefit: improvement of data quality & better models
Applied In Vitro ToxicologyBuilding In Silico Models to Trigger Retesting: A Strategy on Howto Use Predictive Models to Identify Potentially Incorrect In Vitro Intrinsic Clearance ResultsFabian P. Steinmetz, Carl Petersson, Ugo Zanelli, Paul CzodrowskiPublishedOnline:9Nov2018https://doi.org/10.1089/aivt.2018.0018
Supportingcomplexanalyses by AI=> achievingobjectiveness
Big Data waiting to be analyzed by AI
HTSEval: millions of molecules, thousands of actives, a lot of additional information
Current status: Standardized, half-automated analysis of HTS runs
Future status:Fully automated analysis of HTS runs (achieving OBJECTIVENESS) – but final selection remains with the chemist!
Benefit: objective series generation and prioritization
21
Expert Systems supported by AI
Simulation of (human) Dose and Clearance
Assessment of PK parameters in early project phases(HD, HO, LO, ED)
automated data retrieval
standardized (mechanistic) estimation of missing data
(human) Clearance extrapolation with visual confidence check
easy comparison of compound profiles
interactive parameter modeling
“AI inside” in a complex Expert System
Hugecompoundlibraries basedon feasiblereactions
ELAB
REAXYS
70008 571423
21459
23%
Merck AcceSSible InVentory
BUILDING BLOCKS
CHEMICAL REACTIONS LOOK-UP
Tailored libraries
MASSIV space
look-up space(1020 per reference)
…
1020
in silico synthesis
0
200000
400000
600000
800000
1000000
1200000
1400000
BROAD MEDIUM NARROW BROAD MEDIUM NARROW BROAD MEDIUM NARROW
ELAB REAXYS
CLASSCODES
total
CLASSCODES
singular
CLASSCODES
unique
# o
fC
LA
SS
CO
DES
novel chemical matter
106
104
1020
Benefit: access to large novel compound spaces
AccuratePrediction ofbindingconstants
Technological advances allow for application of FEP in industry setting
GPU is 100x times faster than CPU
In-house cluster with 110 GPUs
GPU
FEP is based on molecular dynamics simulations with a detailed energy function, full flexibility and explicit solvent.
FEP prediction: Large-scale prospective benchmarking
Predictive performance evaluated over 10 validated projects
FEP validation prediction
RMSE [kcal/mol] 1.05 1.83
R2 0.78 0.35
% within 1 kcal 69 47
% within 2 kcal 93 7426
FEP+ in drug discovery at Merck
10 validated targets
>25,000 perturbations
5,000 final predictions
360 compounds synthesized
25 evaluated targets
27
Broad application across multiple targets and series
2828
Integration of AI Tools for Compound Optimization
Merck Example
MASSIV: Enumeration of synthetically
accessible chemical space
MOCCA
FEP calculations
CHEMATICA/SYNTHIA
MOCCA: Application of predictive
models based on Deep Learning
FEP: Binding constant
prediction
12
3
Manual inspection
Virtual Screening as 1st filter
MASSIV
4 CHEMATICA: Retrosynthetic
evaluation & prioritization
Benefit: gaining speed in cpd. optimization
Mega trendSharing
activity data and corresponding models remain under respective owner control
assays from
partner 2
assays from
partner 3
assays from
partner 1
com
pounds
Federated and privacy-preserving learningmulti-task learning across partners
IMI2 Call 14
What‘s in forNBEs???
32
Antibody Hit Discovery
From Diversity to Candidate Hits
Fu
nctio
nality
Affin
ity
Com
petitio
n/
MoA
Div
ersity
& F
un
ctio
nality
Bin
din
g &
Sele
ctiv
ity
33
NCE Hit Optimization
Diversity and Potency
Higher Potency
Hit
Nearby Chemical Space Distant
Chemical Space
Theoretical Chemical
Space
High
Low
LowNon-binders
No pathwayToo unstable
High
Too large
34
Antibody Affinity Maturation
Diversity and Affinity
Higher Affinity
Hit 1
Nearby Sequence
Space
High
Low
Distant Sequence
Space
High
LowNon-binders
Theoretical Sequence
SpaceToo large
Non-canonicalToo
unstable
35
• Deep sequencing for heavy & light genes
• In silico pipeline for antibody V-gene annotation &
clustering by sequence
Capturing diversity to improve affinity
Is there a higher affinity variant in the same cluster as the reference hit?
IGHV3-30
Reference 1
Reference 2
IGHV4-31
Cluster TypeVH
variantVL
variant
VHi:VLjvariant pairs
XHybridoma
Fusion9 17 152
Y Spleen+LN 66 20 1320
Z Spleen+LN 156 37 5772
testable
test of patience
36
Capturing diversity to improve affinity
The modular nature of antibodies results in additive effects on affinity
5x2x
5x2x
VH variant 5 fold better VL variant 2 fold better VH/VL variant
10 fold better
VH
VL
Vanita D. Sood - CONFIDENTIAL | 15.03.201837
The additivity model shows good correlation between calculated and measured affinities
• Experimental determination of kd of native
pairings
Cluster TypeVH
variantVL
variant
VHi:VLjvariant pairs
VHi:VLNVHN:VLjnative pairs
Y Spleen+LN 66 20 1320 85
Z Spleen+LN 156 37 5772 192 VHN:VLN
VHi :VLN
VHN:VLN
VLN : VLjPredicted Fold Affinity
Change of VHi:VLj
Anti-Y Anti-Z
38
Capturing diversity to improve affinity
The additivity model applied to nearby sequence space
testable
testable
Cluster i
Cluster j
Cluster TypeVH
variantVL
variant
VHi:VLjvariant pairs
VHi:VLNVHN:VLjnative pairs
Y Spleen+LN 66 20 1320 85
Z Spleen+LN 156 37 5772 192
VH VL
Predictive Models for affinity using Machine Learning
Output prediction of binding affinity
DATA
TRAININGSET
FEATURE SELECTION
TRAIN MODEL
TESTMODEL
IMPROVEMODEL
MODEL
BINDING FOLDreference A K G V K A R L K E A S I K G Y YES 1
MUT 1 A P G V K A R L K E A R I K G I NO -1
MUT 2 M R G V K A R L K E A S I K G I NO -2
MUT 3 V P G V K A R L K E A S I K G Y YES 3
MUT 4 V R G V K A R L K E A L I K G I YES 2
MUT 5 V R G V K A R P K E A L I K G I YES(NO) 2(-1)
MUT 6 A R G V K A R L K E A L I K G Y YES 1
MUT 7 V P G V K A R P K E A S I K G I YES 2
MUT 8 M R G V K A R L K E A S R K G S NO -1
39
Predictive Models for affinity using Machine Learning
Predict affinity class (better/worse) with accuracy & specificity
Binary Response:Anti-Y Binary Response:Anti-Z
BETTER WORSETOTAL COUNT
BETTER 94% 6% 250
WORSE 9% 91% 354
TRU
E C
LASS
PREDICTED CLASS
BETTER WORSETOTAL
COUNT
BETTER 12.5% 87.5% 88
WORSE 1% 99% 721
PREDICTED CLASS
Anti-Y Anti-Z
Sensitivity (TP/PV) 0.89 0.65
Specificity 0.96 0.90
Precision (TP/RP) 0.94 0.13
Accuracy 0.92 0.89
TRU
E C
LASS
But sensitivity & precision are not good with unbalanced training set
40
Predictive Models for affinity using Machine Learning
Predict affinity classes (fold change bins) with accuracy & specificity
kd > 20x kd in [10x,20x] kd in [5x,10x] kd in [-5x,5x] kd < 5xTOTAL COUNT
RECALL
kd > 20x83.33% 0.00% 16.67% 0.00% 0.00% 6
83%
kd in [10x,20x]9.09% 81.82% 9.09% 0.00% 0.00% 11
82%
kd in [5x,10x]0.00% 0.00% 48.48% 51.52% 0.00% 33
48%
kd in [-5x,5x]0.00% 0.79% 1.06% 95.77% 2.38% 378 96%
kd <5x0.00% 0.00% 0.00% 20.45% 79.55% 176 76%
PRECISION 83% 75% 72% 87% 93%
TRU
E C
LASS
PREDICTED CLASS
Overall Accuracy:88% (anti-Y)
41Benefit: gaining speed in antibody optimization
Acknowledgements
The Team
An Qi
Yves FomekongNanfack
Qingyong Ji
David Nannemann
John Wesolowski
Jinyang Zhang
Youbin Wang
Shruti Pratapa
Jukka Konola
Xiubin Gu
Maria Soloviev
Xinyan Zhao
Christel Iffland
Mireille Krier
Tim Knehans
Vanita Sood
Fabian Steinmetz
Paul Czodrowski
Wolf-Guido Bolick
Christina Schindler
Thomas Clarke
Jim Yang
Alex Rolfe
Discovery Technologies Research Bioinformatics
1 2
42
Questions?
Simulation done with NMSim/RCNMA; Gohlke, Rippmann et al.