Transcript
Page 1: Evaluating Automatic Atom Mapping Algorithms · 2015-04-21 · •ICMAP produced the best quality mappings on the tested sets •Atom mapping isn’t as simple as finding a maximum

Evaluating the Quality and Performance of Automatic Atom

Mapping Algorithms

ACS National Meeting, Philadelphia, USA 20th August 2012

Daniel Lowe and Roger Sayle

NextMove Software

Cambridge, UK

Page 2: Evaluating Automatic Atom Mapping Algorithms · 2015-04-21 · •ICMAP produced the best quality mappings on the tested sets •Atom mapping isn’t as simple as finding a maximum

What is Atom-Mapping?

Mapping algorithm

ACS National Meeting, Philadelphia, USA 20th August 2012

Page 3: Evaluating Automatic Atom Mapping Algorithms · 2015-04-21 · •ICMAP produced the best quality mappings on the tested sets •Atom mapping isn’t as simple as finding a maximum

Why Perform Atom-Mapping?

• Assigning roles to reagents

• Normalization of reactions for registration

ACS National Meeting, Philadelphia, USA 20th August 2012

Page 4: Evaluating Automatic Atom Mapping Algorithms · 2015-04-21 · •ICMAP produced the best quality mappings on the tested sets •Atom mapping isn’t as simple as finding a maximum

Why Perform Atom-Mapping?

• More precise database searches

– Solvents/catalysts can be distinguished from reactants

– Allows the relationship between the reactant atoms and product atoms to be made explicit

ACS National Meeting, Philadelphia, USA 20th August 2012

Page 5: Evaluating Automatic Atom Mapping Algorithms · 2015-04-21 · •ICMAP produced the best quality mappings on the tested sets •Atom mapping isn’t as simple as finding a maximum

Example

ACS National Meeting, Philadelphia, USA 20th August 2012

• I want to find reactions converting an alkene to a cyclopropane so I search for C=C>>C1CC1

Page 6: Evaluating Automatic Atom Mapping Algorithms · 2015-04-21 · •ICMAP produced the best quality mappings on the tested sets •Atom mapping isn’t as simple as finding a maximum

Why Perform Atom-Mapping?

• Identifying suspect reactions:

ACS National Meeting, Philadelphia, USA 20th August 2012

Page 7: Evaluating Automatic Atom Mapping Algorithms · 2015-04-21 · •ICMAP produced the best quality mappings on the tested sets •Atom mapping isn’t as simple as finding a maximum

Qualities to look for in an atom mapping algorithm

• Chemically plausible atom mappings

• Ability to distinguish genuine reactants from solvents/catalysts

• Support for unbalanced reactions

– Side product not specified

– Reactant stoichiometry > 1

• Fast run-time

ACS National Meeting, Philadelphia, USA 20th August 2012

Page 8: Evaluating Automatic Atom Mapping Algorithms · 2015-04-21 · •ICMAP produced the best quality mappings on the tested sets •Atom mapping isn’t as simple as finding a maximum

Algorithms Evaluated

ACS National Meeting, Philadelphia, USA 20th August 2012

Vendor:Program Version

Accelrys:Pipeline Pilot 8.5.0.200

ChemAxon:Marvin 5.10.1

GGA:Indigo 1.1

InfoChem:ICMAP 5.10

PerkinElmer:ChemDraw Ultra 12.0

Page 9: Evaluating Automatic Atom Mapping Algorithms · 2015-04-21 · •ICMAP produced the best quality mappings on the tested sets •Atom mapping isn’t as simple as finding a maximum

Methodology

ACS National Meeting, Philadelphia, USA 20th August 2012

Test set Reactions

Pharmaceutical ELN subset 18,244

ChemReact68 database 67,926

SPRESI database subset 5,230

Reactions extracted from 2008-2011 USPTO patent applications*

562,872

* Lowe, D. M. Automated Extraction of Reactions from the Patent Literature. 243rd ACS National Meeting & Exposition, San Diego, CA, March 27, 2012.

Page 10: Evaluating Automatic Atom Mapping Algorithms · 2015-04-21 · •ICMAP produced the best quality mappings on the tested sets •Atom mapping isn’t as simple as finding a maximum

Methodology-cont.

• Reaction SMILES were used as input and output for all algorithms bar ICMAP

• Input and output was converted to and from RDF for use with ICMAP

• Indigo was ran with its default configuration and more lenient settings for matching valences, charges and bond orders

• Marvin was configured to use its best quality mapping strategy

ACS National Meeting, Philadelphia, USA 20th August 2012

Page 11: Evaluating Automatic Atom Mapping Algorithms · 2015-04-21 · •ICMAP produced the best quality mappings on the tested sets •Atom mapping isn’t as simple as finding a maximum

Ability to map all product atoms

ACS National Meeting, Philadelphia, USA 20th August 2012

Page 12: Evaluating Automatic Atom Mapping Algorithms · 2015-04-21 · •ICMAP produced the best quality mappings on the tested sets •Atom mapping isn’t as simple as finding a maximum

c-c bonds broken

ACS National Meeting, Philadelphia, USA 20th August 2012

Page 13: Evaluating Automatic Atom Mapping Algorithms · 2015-04-21 · •ICMAP produced the best quality mappings on the tested sets •Atom mapping isn’t as simple as finding a maximum

Speed Comparison

ACS National Meeting, Philadelphia, USA 20th August 2012

1.7 3.6 1.6 4.0 Average reagents per

reaction

Page 14: Evaluating Automatic Atom Mapping Algorithms · 2015-04-21 · •ICMAP produced the best quality mappings on the tested sets •Atom mapping isn’t as simple as finding a maximum

Simple mappings

ACS National Meeting, Philadelphia, USA 20th August 2012

Marvin/ChemDraw/Indigo/ICMAP

Page 15: Evaluating Automatic Atom Mapping Algorithms · 2015-04-21 · •ICMAP produced the best quality mappings on the tested sets •Atom mapping isn’t as simple as finding a maximum

Simple mappings

ACS National Meeting, Philadelphia, USA 20th August 2012

Marvin/ChemDraw/Indigo/ICMAP

Page 16: Evaluating Automatic Atom Mapping Algorithms · 2015-04-21 · •ICMAP produced the best quality mappings on the tested sets •Atom mapping isn’t as simple as finding a maximum

More complicated Mappings

ACS National Meeting, Philadelphia, USA 20th August 2012

ChemDraw

Marvin

Page 17: Evaluating Automatic Atom Mapping Algorithms · 2015-04-21 · •ICMAP produced the best quality mappings on the tested sets •Atom mapping isn’t as simple as finding a maximum

More complicated Mappings

ACS National Meeting, Philadelphia, USA 20th August 2012

ICMAP

Indigo

Page 18: Evaluating Automatic Atom Mapping Algorithms · 2015-04-21 · •ICMAP produced the best quality mappings on the tested sets •Atom mapping isn’t as simple as finding a maximum

Reuse of reactants

ACS National Meeting, Philadelphia, USA 20th August 2012

Page 19: Evaluating Automatic Atom Mapping Algorithms · 2015-04-21 · •ICMAP produced the best quality mappings on the tested sets •Atom mapping isn’t as simple as finding a maximum

Reuse of reactants

ACS National Meeting, Philadelphia, USA 20th August 2012

Marvin

Page 20: Evaluating Automatic Atom Mapping Algorithms · 2015-04-21 · •ICMAP produced the best quality mappings on the tested sets •Atom mapping isn’t as simple as finding a maximum

Reuse of reactants

ACS National Meeting, Philadelphia, USA 20th August 2012

ChemDraw

Page 21: Evaluating Automatic Atom Mapping Algorithms · 2015-04-21 · •ICMAP produced the best quality mappings on the tested sets •Atom mapping isn’t as simple as finding a maximum

Reuse of reactants

ACS National Meeting, Philadelphia, USA 20th August 2012

Indigo

Page 22: Evaluating Automatic Atom Mapping Algorithms · 2015-04-21 · •ICMAP produced the best quality mappings on the tested sets •Atom mapping isn’t as simple as finding a maximum

Reuse of reactants

ACS National Meeting, Philadelphia, USA 20th August 2012

ICMAP

Page 23: Evaluating Automatic Atom Mapping Algorithms · 2015-04-21 · •ICMAP produced the best quality mappings on the tested sets •Atom mapping isn’t as simple as finding a maximum

Single Atom Mapping

ACS National Meeting, Philadelphia, USA 20th August 2012

ICMAP/Marvin

ChemDraw/Indigo

Page 24: Evaluating Automatic Atom Mapping Algorithms · 2015-04-21 · •ICMAP produced the best quality mappings on the tested sets •Atom mapping isn’t as simple as finding a maximum

Bugs and quirks

• Marvin

– 2 unsuccessful mappings produced unchecked exceptions rather than checked exceptions

• ChemDraw

– Hydrogen on aromatic atoms missing in SMILES output

• Indigo

– Calculation of valency fails for aromatic sulfur

ACS National Meeting, Philadelphia, USA 20th August 2012

Page 25: Evaluating Automatic Atom Mapping Algorithms · 2015-04-21 · •ICMAP produced the best quality mappings on the tested sets •Atom mapping isn’t as simple as finding a maximum

Bugs and quirks

• ICMAP

– Single atom products are interpreted as empty molecules or occasionally replaced by a product from a previous reaction (bug reported)

– Input files must be < 2gb and use dos line endings

ACS National Meeting, Philadelphia, USA 20th August 2012

Page 26: Evaluating Automatic Atom Mapping Algorithms · 2015-04-21 · •ICMAP produced the best quality mappings on the tested sets •Atom mapping isn’t as simple as finding a maximum

conclusions

• ICMAP produced the best quality mappings on the tested sets

• Atom mapping isn’t as simple as finding a maximum common subgraph mapping

• In all the algorithms there were aspects that could be improved to yield appreciable benefits

ACS National Meeting, Philadelphia, USA 20th August 2012

Page 27: Evaluating Automatic Atom Mapping Algorithms · 2015-04-21 · •ICMAP produced the best quality mappings on the tested sets •Atom mapping isn’t as simple as finding a maximum

acknowledgements

• Ed Griffen and Nick Tomkinson, AstraZeneca.

• Andrew Wooster, GSK.

• Hans Kraut, InfoChem

• Thank you for your time.

ACS National Meeting, Philadelphia, USA 20th August 2012


Top Related