“found in translation” –neural machine translation …...“found in translation” –neural...
TRANSCRIPT
“Found in Translation” – Neural machine translation models for chemical reaction prediction
Philippe Schwaller (@phisch124)
Theophile Gaudin, Riccardo Pisoni, David Lanyi, Costas Bekas & Teodoro Laino
IBM Research Zurich, Switzerland
Alpha LeeUniversity of Cambridge
Chem. Sci., 2018, 9, 6091-6098
Exploring the nearly endless chemical space
O
O
OHO
N
S
O
NH
OO
OH
O
O-Na+
Cl
Cl
ClCl
Cl
N
OH O
OH
HO
S
O
O
OH
Design Make
Test
De Novo MoleculesVAE / GAN / RL
Synthetic route planningReaction outcome prediction
Experimental verificationCredits to Marwin Segler
Chemical reaction prediction
Data
US patents
Lowe (2012,2017)
text-mining
Cl:1C:2
C:3
C:4 C:8
N:5N:6
C:7OH:14
N+:15O-:16
O:17
S:9O:10
O:11
OH:12
OH:13
Cl:1
C:2C:3
N+:15
O:14
O-:16
C:4
C:8
N:5
N:6C:7+ +
Jin et. al.(NIPS 2017)
filtering
SMARTS- reactants>>products- partly correct atom-mapping- >1M reactions
Benchmark dataset(USPTO_500k)
Representing molecules for MLMolecular fingerprints000010000….0100
Text-based representations- SMILES / INCHI - CN1C=NC2=C1C(=O)N(C(=O)N2C)C
Graph-based
1D 2D 3D
3D structure
N
N
O
N
O
N
N
N
O
N
O
N
Atoms as letters, molecules as words
SMILES to SMILES prediction with sequence-2-sequence models Schwaller et. al. Chem. Sci., 2018, 9, 6091-6098 / Nam & Kim arXiv:1612.09529
How do Seq-2-Seq models work?
Step-by-step
Inspired by Hendrik’s talk in Zurich
Step-by-stepwith attention
Link to reaction prediction?- Reactants to products
Br . C C = C 1 C …Source
C
Prediction
Attention weights
Plotting the attention weights at every decoder time step.
Attention weightsChloro Buchwald-Hartwig amination:US20170240534A1
Source: Reactants
Targ
et: P
rodu
ct
SMILES-2-SMILES
Trained end-2-endTemplate-free
Fully data-driven
Nochemicalknowledge incorporated
No atom-mappingrequired
Attention weightsand confidence score
Less than 0.5%invalid SMILES
Limitations
Only as good asthe training data
Difficult to includenegative data
Chemically unreasonablepredictions possible
What other template-free approaches exist?Graph neural networks• Jin et al. (MIT, 2017): Weisfeiler-Lehman Networks (WLDN)• Network 1: Reaction center identification• Network 2: Product candidate ranking• Trained separately • Outperforms template-based by 10% on USPTO_500k dataset
• Bradshaw et al. (Cambridge, 2018): Electron path prediction• Gated Graph Neural Networks• Outperforms WLDN on USPTO_350k dataset
(subset without more difficult reactions, e. g. cycloadditions)Fundamental limitation: require atom-mapped training sets
Data set Jin et. al., Schwaller et. al. Our new model MIT_500k 74.0 % <74.0 % 87.3 %
How do we perform compared to others? Reactants > reagents Products
Reactants & reagents mixed Products
Data set Jin et. al., WLDN
Schwaller et. al., Seq-2seq
Bradshaw et al., GGNN
MIT_500k 79.6 % 80.3 %
Top-1 accuracy:
On USPTO_MIT benchmark dataset.Schwaller et al.: Chem. Sci., 2018, 9, 6091-6098Jin et al.: NIPS, 2017, 30, 2607-2616Bradshaw et al.: arXiv:1805.10970
-subset 350k 84.0 % 87.0 %
Our new model 88.8 %
What else can we do?Reaction scoring
CC=C1CCCCC1.Cl>>CCC1(Cl)CCCCC1
CC=C1CCCCC1.Cl>>CC(Cl)C1CCCCC1
Score: 0.99
When we provide the modelwith reactants>>products
Score: 0.001
Markovni
kov
Anti-Markovnikov
IBM RXNfor ChemistryFreely available now:research.ibm.com/ai4chemistry
#RXNFORCHEMISTRY
Booth #530
[email protected] / @phisch124
IBM RXN
Chemistry
for
research.ibm.com/ai4chem
istry
Questions?
Philippe Schwaller