evaluating automatic atom mapping algorithms · 2015. 4. 21. · acs national meeting,...

27
Evaluating the Quality and Performance of Automatic Atom Mapping Algorithms ACS National Meeting, Philadelphia, USA 20 th August 2012 Daniel Lowe and Roger Sayle NextMove Software Cambridge, UK

Upload: others

Post on 26-Feb-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Evaluating Automatic Atom Mapping Algorithms · 2015. 4. 21. · ACS National Meeting, Philadelphia, USA 20th August 2012 Test set Reactions Pharmaceutical ELN subset 18,244 ChemReact68

Evaluating the Quality and Performance of Automatic Atom

Mapping Algorithms

ACS National Meeting, Philadelphia, USA 20th August 2012

Daniel Lowe and Roger Sayle

NextMove Software

Cambridge, UK

Page 2: Evaluating Automatic Atom Mapping Algorithms · 2015. 4. 21. · ACS National Meeting, Philadelphia, USA 20th August 2012 Test set Reactions Pharmaceutical ELN subset 18,244 ChemReact68

What is Atom-Mapping?

Mapping algorithm

ACS National Meeting, Philadelphia, USA 20th August 2012

Page 3: Evaluating Automatic Atom Mapping Algorithms · 2015. 4. 21. · ACS National Meeting, Philadelphia, USA 20th August 2012 Test set Reactions Pharmaceutical ELN subset 18,244 ChemReact68

Why Perform Atom-Mapping?

• Assigning roles to reagents

• Normalization of reactions for registration

ACS National Meeting, Philadelphia, USA 20th August 2012

Page 4: Evaluating Automatic Atom Mapping Algorithms · 2015. 4. 21. · ACS National Meeting, Philadelphia, USA 20th August 2012 Test set Reactions Pharmaceutical ELN subset 18,244 ChemReact68

Why Perform Atom-Mapping?

• More precise database searches

– Solvents/catalysts can be distinguished from reactants

– Allows the relationship between the reactant atoms and product atoms to be made explicit

ACS National Meeting, Philadelphia, USA 20th August 2012

Page 5: Evaluating Automatic Atom Mapping Algorithms · 2015. 4. 21. · ACS National Meeting, Philadelphia, USA 20th August 2012 Test set Reactions Pharmaceutical ELN subset 18,244 ChemReact68

Example

ACS National Meeting, Philadelphia, USA 20th August 2012

• I want to find reactions converting an alkene to a cyclopropane so I search for C=C>>C1CC1

Page 6: Evaluating Automatic Atom Mapping Algorithms · 2015. 4. 21. · ACS National Meeting, Philadelphia, USA 20th August 2012 Test set Reactions Pharmaceutical ELN subset 18,244 ChemReact68

Why Perform Atom-Mapping?

• Identifying suspect reactions:

ACS National Meeting, Philadelphia, USA 20th August 2012

Page 7: Evaluating Automatic Atom Mapping Algorithms · 2015. 4. 21. · ACS National Meeting, Philadelphia, USA 20th August 2012 Test set Reactions Pharmaceutical ELN subset 18,244 ChemReact68

Qualities to look for in an atom mapping algorithm

• Chemically plausible atom mappings

• Ability to distinguish genuine reactants from solvents/catalysts

• Support for unbalanced reactions

– Side product not specified

– Reactant stoichiometry > 1

• Fast run-time

ACS National Meeting, Philadelphia, USA 20th August 2012

Page 8: Evaluating Automatic Atom Mapping Algorithms · 2015. 4. 21. · ACS National Meeting, Philadelphia, USA 20th August 2012 Test set Reactions Pharmaceutical ELN subset 18,244 ChemReact68

Algorithms Evaluated

ACS National Meeting, Philadelphia, USA 20th August 2012

Vendor:Program Version

Accelrys:Pipeline Pilot 8.5.0.200

ChemAxon:Marvin 5.10.1

GGA:Indigo 1.1

InfoChem:ICMAP 5.10

PerkinElmer:ChemDraw Ultra 12.0

Page 9: Evaluating Automatic Atom Mapping Algorithms · 2015. 4. 21. · ACS National Meeting, Philadelphia, USA 20th August 2012 Test set Reactions Pharmaceutical ELN subset 18,244 ChemReact68

Methodology

ACS National Meeting, Philadelphia, USA 20th August 2012

Test set Reactions

Pharmaceutical ELN subset 18,244

ChemReact68 database 67,926

SPRESI database subset 5,230

Reactions extracted from 2008-2011 USPTO patent applications*

562,872

* Lowe, D. M. Automated Extraction of Reactions from the Patent Literature. 243rd ACS National Meeting & Exposition, San Diego, CA, March 27, 2012.

Page 10: Evaluating Automatic Atom Mapping Algorithms · 2015. 4. 21. · ACS National Meeting, Philadelphia, USA 20th August 2012 Test set Reactions Pharmaceutical ELN subset 18,244 ChemReact68

Methodology-cont.

• Reaction SMILES were used as input and output for all algorithms bar ICMAP

• Input and output was converted to and from RDF for use with ICMAP

• Indigo was ran with its default configuration and more lenient settings for matching valences, charges and bond orders

• Marvin was configured to use its best quality mapping strategy

ACS National Meeting, Philadelphia, USA 20th August 2012

Page 11: Evaluating Automatic Atom Mapping Algorithms · 2015. 4. 21. · ACS National Meeting, Philadelphia, USA 20th August 2012 Test set Reactions Pharmaceutical ELN subset 18,244 ChemReact68

Ability to map all product atoms

ACS National Meeting, Philadelphia, USA 20th August 2012

Page 12: Evaluating Automatic Atom Mapping Algorithms · 2015. 4. 21. · ACS National Meeting, Philadelphia, USA 20th August 2012 Test set Reactions Pharmaceutical ELN subset 18,244 ChemReact68

c-c bonds broken

ACS National Meeting, Philadelphia, USA 20th August 2012

Page 13: Evaluating Automatic Atom Mapping Algorithms · 2015. 4. 21. · ACS National Meeting, Philadelphia, USA 20th August 2012 Test set Reactions Pharmaceutical ELN subset 18,244 ChemReact68

Speed Comparison

ACS National Meeting, Philadelphia, USA 20th August 2012

1.7 3.6 1.6 4.0 Average reagents per

reaction

Page 14: Evaluating Automatic Atom Mapping Algorithms · 2015. 4. 21. · ACS National Meeting, Philadelphia, USA 20th August 2012 Test set Reactions Pharmaceutical ELN subset 18,244 ChemReact68

Simple mappings

ACS National Meeting, Philadelphia, USA 20th August 2012

Marvin/ChemDraw/Indigo/ICMAP

Page 15: Evaluating Automatic Atom Mapping Algorithms · 2015. 4. 21. · ACS National Meeting, Philadelphia, USA 20th August 2012 Test set Reactions Pharmaceutical ELN subset 18,244 ChemReact68

Simple mappings

ACS National Meeting, Philadelphia, USA 20th August 2012

Marvin/ChemDraw/Indigo/ICMAP

Page 16: Evaluating Automatic Atom Mapping Algorithms · 2015. 4. 21. · ACS National Meeting, Philadelphia, USA 20th August 2012 Test set Reactions Pharmaceutical ELN subset 18,244 ChemReact68

More complicated Mappings

ACS National Meeting, Philadelphia, USA 20th August 2012

ChemDraw

Marvin

Page 17: Evaluating Automatic Atom Mapping Algorithms · 2015. 4. 21. · ACS National Meeting, Philadelphia, USA 20th August 2012 Test set Reactions Pharmaceutical ELN subset 18,244 ChemReact68

More complicated Mappings

ACS National Meeting, Philadelphia, USA 20th August 2012

ICMAP

Indigo

Page 18: Evaluating Automatic Atom Mapping Algorithms · 2015. 4. 21. · ACS National Meeting, Philadelphia, USA 20th August 2012 Test set Reactions Pharmaceutical ELN subset 18,244 ChemReact68

Reuse of reactants

ACS National Meeting, Philadelphia, USA 20th August 2012

Page 19: Evaluating Automatic Atom Mapping Algorithms · 2015. 4. 21. · ACS National Meeting, Philadelphia, USA 20th August 2012 Test set Reactions Pharmaceutical ELN subset 18,244 ChemReact68

Reuse of reactants

ACS National Meeting, Philadelphia, USA 20th August 2012

Marvin

Page 20: Evaluating Automatic Atom Mapping Algorithms · 2015. 4. 21. · ACS National Meeting, Philadelphia, USA 20th August 2012 Test set Reactions Pharmaceutical ELN subset 18,244 ChemReact68

Reuse of reactants

ACS National Meeting, Philadelphia, USA 20th August 2012

ChemDraw

Page 21: Evaluating Automatic Atom Mapping Algorithms · 2015. 4. 21. · ACS National Meeting, Philadelphia, USA 20th August 2012 Test set Reactions Pharmaceutical ELN subset 18,244 ChemReact68

Reuse of reactants

ACS National Meeting, Philadelphia, USA 20th August 2012

Indigo

Page 22: Evaluating Automatic Atom Mapping Algorithms · 2015. 4. 21. · ACS National Meeting, Philadelphia, USA 20th August 2012 Test set Reactions Pharmaceutical ELN subset 18,244 ChemReact68

Reuse of reactants

ACS National Meeting, Philadelphia, USA 20th August 2012

ICMAP

Page 23: Evaluating Automatic Atom Mapping Algorithms · 2015. 4. 21. · ACS National Meeting, Philadelphia, USA 20th August 2012 Test set Reactions Pharmaceutical ELN subset 18,244 ChemReact68

Single Atom Mapping

ACS National Meeting, Philadelphia, USA 20th August 2012

ICMAP/Marvin

ChemDraw/Indigo

Page 24: Evaluating Automatic Atom Mapping Algorithms · 2015. 4. 21. · ACS National Meeting, Philadelphia, USA 20th August 2012 Test set Reactions Pharmaceutical ELN subset 18,244 ChemReact68

Bugs and quirks

• Marvin

– 2 unsuccessful mappings produced unchecked exceptions rather than checked exceptions

• ChemDraw

– Hydrogen on aromatic atoms missing in SMILES output

• Indigo

– Calculation of valency fails for aromatic sulfur

ACS National Meeting, Philadelphia, USA 20th August 2012

Page 25: Evaluating Automatic Atom Mapping Algorithms · 2015. 4. 21. · ACS National Meeting, Philadelphia, USA 20th August 2012 Test set Reactions Pharmaceutical ELN subset 18,244 ChemReact68

Bugs and quirks

• ICMAP

– Single atom products are interpreted as empty molecules or occasionally replaced by a product from a previous reaction (bug reported)

– Input files must be < 2gb and use dos line endings

ACS National Meeting, Philadelphia, USA 20th August 2012

Page 26: Evaluating Automatic Atom Mapping Algorithms · 2015. 4. 21. · ACS National Meeting, Philadelphia, USA 20th August 2012 Test set Reactions Pharmaceutical ELN subset 18,244 ChemReact68

conclusions

• ICMAP produced the best quality mappings on the tested sets

• Atom mapping isn’t as simple as finding a maximum common subgraph mapping

• In all the algorithms there were aspects that could be improved to yield appreciable benefits

ACS National Meeting, Philadelphia, USA 20th August 2012

Page 27: Evaluating Automatic Atom Mapping Algorithms · 2015. 4. 21. · ACS National Meeting, Philadelphia, USA 20th August 2012 Test set Reactions Pharmaceutical ELN subset 18,244 ChemReact68

acknowledgements

• Ed Griffen and Nick Tomkinson, AstraZeneca.

• Andrew Wooster, GSK.

• Hans Kraut, InfoChem

• Thank you for your time.

ACS National Meeting, Philadelphia, USA 20th August 2012