enc07 neutral network algorithms 070420
DESCRIPTION
In silico prediction of small molecules properties is widely used in todays industry and academia. Particularly, NMR spectra are predicted by a variety of software packages. Two main approaches are used:Database-based. Compounds are compared against a database, the result is calculated using data for close structural relatives found in the dataset. Regression-based. Experimental database is used to calculate parameters of non-linear regression. Chemical shift is represented as a non-linear function of some variables which describe characteristic features of a molecule of interest.Two outlined approaches require different strategies for further improvement. Database-based results are improved by acquiring larger database and/or including data for user-specific data into calculation.TRANSCRIPT
Advanced Chemistry Development, Inc. (ACD/Labs)
Advancements in NMR Predictions-
Neural Network vs. HOSE Code Algorithms
Brent LefebvreNMR Product Manager
ACD/Labs’ ENC User’s MeetingApril 21, 2007
3
Why Neural Networks?
The Neural Network algorithm offers a very specific advantage
Speed of calculation is hundreds of times fasterThis enables prediction on-the-fly
For Structure Elucidator, a key feature
4
Why Neural Networks?
Also a fresh approach for ACD/Labs to shift predictionWe are always researching new ways to improve our software
Also see our poster (#150) on our new increments scheme
5
Realization
The Neural Network algorithm was outperforming our version 9 HOSE code!Steps were then taken to migrate this algorithm out of Structure Elucidator and into the ACD/CNMR Predictor
6
Implementation
7
Neural Network Algorithm
8
Implementation
Training the Neural NetEntire database from version 9 usedAdditional database of 187,000 shifts used for accuracy testing
9
Neural Network Approach
How does this neural net implementation compare to others in the industry?What is unique about it?Does this make it better or worse?
10
Neural Network Approach
Our research brought us to some new conclusionsSome implementation details differed from previous industry attempts
11
Neural Network Approach
We found that:Characteristics of the Neural Net were NOT the most important factorStructure encoding scheme was most importantSize and accuracy of training set is key
Our huge quality checked database gave us a tremendous advantage
12
Using the Neural Network Predictions
How are they accessed in the software?
13
Using the Neural Network Predictions
14
Using the Neural Network Predictions
15
Limitations of the Neural Network Predictions
Predictions are a black boxNo calculation protocol as for HOSE code
Training of predictions could be possible
Does not outperform HOSE code training
16
Statistics
How do NN compare to old and new HOSE code?When should I use NN?What is the new performance?
17
Prediction Accuracy
We calculate our prediction accuracy for HOSE code the same way every year
A “Leave-one-out” analysis of our entire database (2 million chemical shifts)
This allows us to compare year on year improvementA TRUE analysis of how accurate the predictors are
18
L-O-O Analysis
Database: W:\CNMR.1000\STATISTICS\CNMR105.INTChemical Shifts : 10.5 (1982234 pts)
280260240220200180160140120100806040200-20-40-60Chemical Shifts : Value (ppm)
-40
-20
0
20
40
60
80
100
120
140
160
180
200
220
240
260
280
Database: D :\TEM P\FROM 48\C NM R .800\CNM R 8_ALL.INTChem ical Shifts : 8.0 (1861611 pts)
280260240220200180160140120100806040200-20-40C hem ic a l S h ifts : Va lue (ppm )
-40
-20
0
20
40
60
80
100
120
140
160
180
200
220
240
260
280
Ch
em
ica
l S
hif
ts :
8.0
Version 8.00Version 8.00 Version 10.05Version 10.05
19
Prediction Accuracy
Standard Error of Prediction Formula:
n-1
(expi -
calci)2
n(n-1)
expi -
calci
2
-
Standard Error of Prediction
=
20
Prediction Accuracy
CNMR Predictor Standard ErrorVersion 8 - 3.11 ppmVersion 9 - 2.32 ppmVersion 10.00 - 2.26 ppmVersion 10.05 – 1.84 ppm
A 21% increase in accuracy over version 9!A 41% increase in accuracy over version 8!
21
Prediction Accuracy
Comparison of HOSE and Neural Network>187,000 chemical shifts used in testNN algorithms- 12% accuracy increase over version 9 HOSE CodeVersion 10 HOSE code- 16% accuracy increase over version 9 HOSE code
HOSE Code is better for now
22
The Future of Neural Nets
What is planned for NMR Predictors?How do Neural Networks fit into these plans?
23
The Future of Neural Nets
Version 11 will further integrate the Neural Network Algorithm
An intelligent hybrid approachMuch like the use of incremental scheme today
Stay tuned for more validation results1H NMR validation study
24
Acknowledgements
Kirill BlinovMikhail KvashaMarina Solnetseva and the database teamRyan Sasaki