eugm15 - alexandre varnek (université de strasbourg): towards an expert system for predicting...
Post on 26-Jul-2015
237 Views
Preview:
TRANSCRIPT
Prediction of Reaction Conditions
for Michael Additions
G. Marcou, J. Aires de Sousa, A. de Luca, D. Latino, V. Rietsch,
D. Horvath, A. Varnek
ChemAxon UGM, Budapest 19-20 May 2015
• Different scenarios of structure-reactivity modeling
• Descriptors for reactions
- Condensed Graph of Reaction / ISIDA descriptors
- Electron Effects Descriptors
• Michael reaction case
OUTLINE
Chemical reactions are difficult objects
+ +
- many species;
- two types of species: reactants and products;
- multi-step reactions,
- dependent on experimental conditions
• How can I synthesize this structure?
• How can I estimate a yield of a given reaction, its
kinetic and thermodynamic parameters ?
• Which reaction conditions I should choose in order
to obtain desirable product selectively ?
Chemical reactions in Chemoinformatics
𝑃𝑟𝑜𝑝𝑒𝑟𝑡𝑦 = 𝐟(𝑫𝒆𝒔𝒄𝒓𝒊𝒑𝒕𝒐𝒓𝒔)
• substructural fragments
• topological indices,
• physico-chem. parameters
• etc…
Parameters directly derived from molecular structure
• Support Vector Machine (SVM)
• Multi-Linear Regression (MLR)
• Artificial Neural Networks
• etc…
Mathematical relationship established with machine learning methods
Quantitative Structure-Activity/Property Relationship (QSAR/QSPR)
𝑃𝑟𝑜𝑝𝑒𝑟𝑡𝑦 = 𝐟(𝒔𝒕𝒓𝒖𝒄𝒕𝒖𝒓𝒆)
• 𝑅𝑒𝑎𝑐𝑡𝑖𝑣𝑖𝑡𝑦 = 𝐟 𝑹𝒆𝒂𝒄𝒕𝒂𝒏𝒕𝒔 𝑠𝑡𝑟𝑢𝑐𝑡𝑢𝑟𝑒
• 𝑅𝑒𝑎𝑐𝑡𝑖𝑣𝑖𝑡𝑦 = 𝐟(𝑷𝒓𝒐𝒅𝒖𝒄𝒕𝒔 𝑠𝑡𝑟𝑢𝑐𝑡𝑢𝑟𝑒)
• 𝑅𝑒𝑎𝑐𝑡𝑖𝑣𝑖𝑡𝑦 = 𝐟(𝑹𝒆𝒂𝒄𝒕𝒊𝒐𝒏 𝑠𝑡𝑟𝑢𝑐𝑡𝑢𝑟𝑒)
Quantitative Structure-Reactivity Relationship (QSRR)
• Different scenarios of structure-reactivity modeling
• Descriptors for reactions
- Condensed Graph of Reaction / ISIDA descriptors
- Electron Effect Descriptors
• Michael reaction case
OUTLINE
Conventional bonds: single, double, aromatic, …
Dynamical bonds:created single, broken single, …
CGR could be viewed as a pseudo-molecule representing a given reaction
Condensed Graph of Reaction
Condensed Graph of Reaction
• CGR condenses the structural information about productsand reactants
several graphs into one only graph
• This simplified presentation opens an opportunity toapply to CGR the methods developed inchemoinformatics for individual molecules
A. Varnek, D. Fourches, F. Hoonakker, V. P. Solov’ev, J. Computer-Aided Molecular Design, 2005, 19, 693-703
ISIDA fragment descriptors
Reaction can be encoded by a descriptor vector
which can be used in structure-reactivity modeling
Condensed graph of
reaction
2 1 2 …
…
ISIDA/CGR fragment descriptors
A. Varnek In: "Chemoinformatics and Computational Chemical Biology", J. Bajorath, Ed., Springer, 2010
Sums of property-dependent P(i) contributions from remote atoms i, at tiK
away from the ‘reactive’ center K, modulated by various working hypotheses:
𝐸𝐸𝐷𝑝,𝑒,𝑜,𝑤,𝑐 𝐾 =
𝑖=1
𝑁
𝛿𝑐 𝑖, 𝐾 × 𝑃𝑝𝑒 (𝑖) × 𝑒𝑥𝑝 −1/𝑤 × 𝜏𝑖𝐾 − 𝑜 2
𝜏𝑖𝐾 = 0.5 𝑖𝑓 𝑖 = 𝐾
𝑠ℎ𝑜𝑟𝑡𝑒𝑠𝑡 − 𝑝𝑎𝑡ℎ 𝑡𝑜𝑝𝑜𝑙𝑜𝑔𝑖𝑐𝑎𝑙 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 𝑖, 𝐾 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
𝛿𝑐(𝑖, 𝐾) = 1 𝑖𝑓 𝑐 = 0 𝑂𝑅(𝑝𝑎𝑡ℎ 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑖 𝑎𝑛𝑑 𝐾 𝑐𝑜𝑛𝑡𝑎𝑖𝑛𝑠 𝑜𝑛𝑙𝑦 𝑢𝑛𝑠𝑎𝑡𝑢𝑟𝑎𝑡𝑒𝑑 𝑎𝑡𝑜𝑚𝑠)0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Property type p=1..11
Property power e=1..2
Neighborhood control o=1..4Neighborhood width w=2..5
Conjugation toggle c=0,1
11x2x4x4x2=704 EED terms…
Electron Effect Descriptors
M.Elhabiri, E. Davioud-Charvet, D. Horvath, A. Varnek et al., Chemistry Eur. J, 2014, 20, 1 – 11
Atom Properties Tracked in EED
• Different scenarios of structure-reactivity modeling
• Descriptors for reactions
- Condensed Graph of Reaction / ISIDA descriptors
- Electron Effect Descriptors
• Michael reaction case
OUTLINE
Solvent
Hydrophobic (kerosene, benzene, …)
Polar Aprotic (THF, DMSO, …)
Polar Protic (water, acetic acid ...)
No solvent: reaction occurs in pure solutions of the reagents
Catalyst
Bronsted acids (Hydrochloric acid, …
Lewis acids (transition metal ions, ….)
Basic (pyridine, Na ethanolate, ….)
No catalyst: a catalyst is not needed, or autocatalysis takes place
Reaction occurs in different conditions characterized by solvent and catalyst
Michael reaction
Michael donor
Michael acceptor
For given Nu and R1 – R4, which conditions
(catalyst, solvent) lead to high reaction yield ?
Michael reaction
• to build QSRR models predicting optimal reaction conditions for a query reaction.
Each model will answer a punctual question such as “Is this processfeasible with Brønsted acid catalysts?”, “Is this process feasible in aproticpolar solvents?”, etc.
• to develop a public predictive tool adapted to the Michael-type reaction case
Michael reaction: goals of the modeling
• 53 polar aprotic solvent
• 52 no solvent
• 103 polar protic solvent
• 93 Lewis acid catalyst
• 61 no catalyst
• 40 hydrophobic solvent
• 57 Bronsted acid catalyst
• 45 Basic catalysis
• 24 Decoys (not observed Michael reactions)
Notice that one same reaction may proceed under
different conditions !
Michael reaction: data
The data set consists in 222 reactions:
Non-occuring reaction
Occuring reaction :
condensation of hydroxylamine and aldehyde
Decoy Michael Addition
Compatibility of the Michael reaction in the database with each of the condition
classes is rendered by the condition bitvector.
0 1 1 0 0 0 1 0 0
SolventHydro-phobic
SolventPolar
Aprotic
SolventPolar Protic
No Solvent CatalystBronsted
Acid
CatalystLewis Acid
CatalystBasic
NoCatalyst
Not observed
A bit is set on if the reaction was seen to happen under this condition.
NOTE – if off, it means we don’t know whether it’s feasible!
An additional ‘no-go’ bit is on if the given Michael addition should never be
observed (because of competing carbonyl addition).
Bitvector of reaction conditions
Modeling setup
Goal: preparation of 9 two-class classification models:
- 8 models for reaction feasibility for each catalyst or solvent type,
- 1 model « Michael / non-Michael »
Descriptors: ISIDA/CGR, EED … and also MOLMAP, CDK
Scenarios: reagent-based, product-based and reaction-based
Performance assessment : ROC AUC in 3-fold cross-validation
Machine-learning methods: Random Forest, SVM, Naïve Bayes
CGR for Michael reaction
created single double to single
Reaction descriptors
EED ISIDA
Solv
:A
Solv
:NA
Solv
:P
Cat:L
A
Cat:N
A
Solv
:H
Cat:B
A
Cat:B
Cat:N
O
1
0.9
0.8
0.7
0.6
0.5
1
0.9
0.8
0.7
0.6
0.5
Solv
:A
Solv
:NA
Solv
:P
Cat:L
A
Cat:N
A
Solv
:H
Cat:B
A
Cat:B
Cat:N
O
ROC AUC
ROC AUC = 1 corresponds to an ideal model
G. Marcou, J. Aires de Sousa, D. Latino, A. Deluca, D. Horvath, V. Rietsch, and A. Varnek J. Chem. Inf. Model., 2015, 55, 239−250.
Michael reaction: models performance
EED ISIDA
Reagent descriptors
Michael reaction: models performance
G. Marcou, J. Aires de Sousa, D. Latino, A. Deluca, D. Horvath, V. Rietsch, and A. Varnek J. Chem. Inf. Model., 2015, 55, 239−250.
http://infochim.u-strasbg.fr/webserv/VSEngine.html
WEB-based expert system predicting Michael addition feasibility
Model validation on the set of 52 reactions extracted from the literature
Model # in the set # correctly predicted
Catalyst and solvent 52 8
aprotic solvent 12 2
Protic solvent 21 21
No catalyst 26 14
base-catalyzed 19 8
Brønsted acids 2 1
Model failure or uncomplete data for the modeling ?
Michael reaction: external validation
Both ISIDA/CGR and EED descriptors provide with acceptable models in cross-validation
Reagent, Product and Reaction scenarios perform similarly
The models cross-validate very well, but external testing is inconclusive – more data are needed !
Conclusions
J. Chem. Inf. Model., 2015, 55, 239−250
Campus France for a Franco-Portugese grant “PESSOA”
ChemAxon for the license
Thanks
top related