bradley opal 2011
DESCRIPTION
Jean-Claude Bradley presents at the Opal Events 3rd Annual Drug Discovery Partnership: Filling the Pipeline on Pre-competitive Collaboration: Sharing Data to Increase PredictabilityTRANSCRIPT
Pre-competitive Collaboration: Sharing Data to Increase
Predictability
Jean-Claude Bradley
October 17, 2011
3rd Annual Drug Discovery Partnership: Filling the Pipeline
Associate Professor of ChemistryDrexel University
Opportunities for Competitive Collaboration
Industry is Sharing More
Solubility and Melting Points
are critical properties in the drug discovery
process
Data quality is essential for both
measurements and predictions based on
measurements
Openness is proving to be a powerful tool for
assessing the reliability of data
Solubility prediction for Taxol using Abraham descriptors
Pred Exp
Predicted temperature dependent solubility of Taxol in water based on
melting point (M)
The Trusted Source ModelBefore online databases (early 90s) searching for properties like melting
points using ONE “trusted source” was practical and acceptable as part of the
chemistry culture.• CRC Handbook• Merck Index• Chemical Vendor Catalogs (e.g. Sigma-Aldrich)
• Peer-Reviewed Journals
Single values don’t tend to be contradicted
Question Assumptions
Using technology, we can begin to replace the “trusted source”
model with one based on transparency and provenance
The Chemical Information Validation Sheet
567 curated and referenced measurements from Fall 2010 Chemical Information Retrieval course
Discovering outliers for melting points (stdev/average)
Investigating the m.p. inconsistencies of EGCG
Investigating the m.p. inconsistencies of cyclohexanone
Most popular data sources
Alfa Aesar donates melting points to the public
Open Melting Point Explorer
(Andrew Lang)
OutliersMDPI
datasetPhysProp (EPA
donated all data to public also)
Outliers for ethanol: Alfa Aesar and Oxford MSDS
Inconsistencies and SMILES problems within MDPI dataset
MDPI Dataset labeled with High Trust Level
Open Melting Point DatasetsCurrently 27,000 mps for 20,000 compounds
American Petroleum Institute 5 CPHYSPROP -30 CPHYSPROP 125 Cpeer reviewed journal (2008) 97.5 Cgovernment database -30 Cgovernment database 4.58 C
What is the melting point of 4-benzyltoluene?
The quest to resolve the melting point of 4-benzyltoluene: liquid at room temp and can be frozen <-30C (Evan Curtin)
Open Lab Notebook page measuring the melting point of 4-benzyltoluene
Motivation: Faster Science, Better Science
Ruling out all melting points above -15C?
Oops – 4-benzyltoluene freezes after 16 days at -15C!
Measuring the melting point by slowly heating from -15 C gives 5 C
There are NO FACTS, only measurements embedded
within assumptions
Open Notebook Science maintains the integrity of data
provenance by making assumptions explicit
TRUST
PROOF
Common errors in datasets
multiple melting points for the same compound in the same database
stereochemistry issues sign inversion conversion errors (Kelvin/Celcius
Fahrenheit/Celcius) bad SMILES (non-rendering) salts associated with SMILES for free base using boiling point for melting point
Open Random Forest modeling of Open Melting Point data using CDK descriptors
(Andrew Lang)
R2 = 0.78, TPSA and nHdon most important
Melting point prediction service
Melting point predictions and measurements on iPhone/iPad (Andrew Lang and Alex Clark)
Publication of double+ validated melting point dataset to Nature
Precedings and LuLu
Crowdsourcing Solubility Data
ONS Challenge Judges
ONS Challenge Award Winners
Web services for summary data
(Andrew Lang)
Reaction Attempts Book
Reaction Attempts Book: Reactants listed Alphabetically
Interactive NMR spectra using JSpecView or ChemDoodle and the Open JCAMP-DX
format
Predicting Best Solvent for Imine Formation using solubility and melting
point data (Evan Curtin)
Predicting Yield of Imine Formation in Ethanol
(Evan Curtin)
Google Apps Scripts web services
Google Apps Scripts for conveniently exploring melting
point data
Straight chain carboxylic acids from 1 to 10 carbons
Straight chain alcohols from 1 to 10 carbons
Comparison of model with triple validated measurements
Cyclic primary amines from 3 to 6 carbons (cyclobutylamine flagged for validation – only single source available)
Google Apps Scripts for planning reactions and creating schemes
Open Melting Points in Supplementary Data Pages of Wikipedia (Martin Walker)
All ONS web services
Some Initiatives Promoting More Openness in Drug Discovery
Open Primary Research in Drug Design using Web2.0 tools (malaria)
(blogs, wikis, Second Life, mailing lists)
Docking
Synthesis
Testing
Rajarshi GuhaIndiana U
JC BradleyDrexel U
Phil RosenthalUCSF
(malaria)
Dan ZaharevitzNCI
(tumors)
Tsu-Soo TanNanyang Inst.
Outcome of Guha-Bradley-Rosenthal collaboration
Conclusions
• For science to progress quickly there is great benefit in moving away from a “trusted source” model to one based on transparency and data provenance
• Open Notebook Science can be a useful tool in this context