“found in translation” –neural machine translation …...“found in translation” –neural...

“Found in Translation” – Neural machine translation models for chemical reaction prediction

Philippe Schwaller (@phisch124)

Theophile Gaudin, Riccardo Pisoni, David Lanyi, Costas Bekas & Teodoro Laino

IBM Research Zurich, Switzerland

Alpha LeeUniversity of Cambridge

Chem. Sci., 2018, 9, 6091-6098

Exploring the nearly endless chemical space

O

O

OHO

N

S

O

NH

OO

OH

O

O-Na+

Cl

Cl

ClCl

Cl

N

OH O

OH

HO

S

O

O

OH

Design Make

Test

De Novo MoleculesVAE / GAN / RL

Synthetic route planningReaction outcome prediction

Experimental verificationCredits to Marwin Segler

Chemical reaction prediction

Data

US patents

Lowe (2012,2017)

text-mining

Cl:1C:2

C:3

C:4 C:8

N:5N:6

C:7OH:14

N+:15O-:16

O:17

S:9O:10

O:11

OH:12

OH:13

Cl:1

C:2C:3

N+:15

O:14

O-:16

C:4

C:8

N:5

N:6C:7+ +

Jin et. al.(NIPS 2017)

filtering

SMARTS- reactants>>products- partly correct atom-mapping- >1M reactions

Benchmark dataset(USPTO_500k)

Representing molecules for MLMolecular fingerprints000010000….0100

Text-based representations- SMILES / INCHI - CN1C=NC2=C1C(=O)N(C(=O)N2C)C

Graph-based

1D 2D 3D

3D structure

N

N

O

N

O

N

N

N

O

N

O

N

Atoms as letters, molecules as words

SMILES to SMILES prediction with sequence-2-sequence models Schwaller et. al. Chem. Sci., 2018, 9, 6091-6098 / Nam & Kim arXiv:1612.09529

https://arxiv.org/abs/1612.09529

How do Seq-2-Seq models work?

Step-by-step

Inspired by Hendrik’s talk in Zurich

Step-by-stepwith attention

Link to reaction prediction?- Reactants to products

Br . C C = C 1 C …Source

C

Prediction

Attention weights

Plotting the attention weights at every decoder time step.

Attention weightsChloro Buchwald-Hartwig amination:US20170240534A1

Source: Reactants

Targ

et: P

rodu

ct

SMILES-2-SMILES

Trained end-2-endTemplate-free

Fully data-driven

Nochemicalknowledge incorporated

No atom-mappingrequired

Attention weightsand confidence score

Less than 0.5%invalid SMILES

Limitations

Only as good asthe training data

Difficult to includenegative data

Chemically unreasonablepredictions possible

What other template-free approaches exist?Graph neural networks• Jin et al. (MIT, 2017): Weisfeiler-Lehman Networks (WLDN)• Network 1: Reaction center identification• Network 2: Product candidate ranking• Trained separately • Outperforms template-based by 10% on USPTO_500k dataset

• Bradshaw et al. (Cambridge, 2018): Electron path prediction• Gated Graph Neural Networks• Outperforms WLDN on USPTO_350k dataset

(subset without more difficult reactions, e. g. cycloadditions)Fundamental limitation: require atom-mapped training sets

Data set Jin et. al., Schwaller et. al. Our new model MIT_500k 74.0 % <74.0 % 87.3 %

How do we perform compared to others? Reactants > reagents Products

Reactants & reagents mixed Products

Data set Jin et. al., WLDN

Schwaller et. al., Seq-2seq

Bradshaw et al., GGNN

MIT_500k 79.6 % 80.3 %

Top-1 accuracy:

On USPTO_MIT benchmark dataset.Schwaller et al.: Chem. Sci., 2018, 9, 6091-6098Jin et al.: NIPS, 2017, 30, 2607-2616Bradshaw et al.: arXiv:1805.10970

-subset 350k 84.0 % 87.0 %

Our new model 88.8 %

https://arxiv.org/abs/1805.10970

What else can we do?Reaction scoring

CC=C1CCCCC1.Cl>>CCC1(Cl)CCCCC1

CC=C1CCCCC1.Cl>>CC(Cl)C1CCCCC1

Score: 0.99

When we provide the modelwith reactants>>products

Score: 0.001

Markovni

kov

Anti-Markovnikov

IBM RXNfor ChemistryFreely available now:research.ibm.com/ai4chemistry

#RXNFORCHEMISTRY

Booth #530

[email protected] / @phisch124

IBM RXN

Chemistry

for

research.ibm.com/ai4chem

istry

Questions?

Philippe Schwaller

mailto:[email protected]

“found in translation” –neural machine translation …...“found in translation” –neural...

Documents