cinf 2012 talk recrystallization app

55
The deployment of an app from Open Data feeds and algorithms: Recommending recrystallization solvents Jean-Claude Bradley December 13, 2012 ACS-CINF Symposium Associate Professor of Chemistry Drexel University

Upload: jean-claude-bradley

Post on 10-May-2015

604 views

Category:

Education


2 download

DESCRIPTION

Jean-Claude Bradley presents on a recrystallization app based on Open Data feeds and models.

TRANSCRIPT

Page 1: CINF 2012 talk Recrystallization App

The deployment of an app from Open Data feeds and algorithms: Recommending

recrystallization solvents

Jean-Claude Bradley

December 13, 2012

ACS-CINF Symposium

Associate Professor of ChemistryDrexel University

Page 2: CINF 2012 talk Recrystallization App

The importance of recrystallization

• Generally preferred if there is a known solvent that gives a good yield

• Scales much more easily and cheaply than chromatography

• However, for new compounds much trial and error may be needed

Page 3: CINF 2012 talk Recrystallization App

The Recrystallization App

(Andrew Lang)

Page 4: CINF 2012 talk Recrystallization App

What are good solvents to recrystallize benzoic acid?

(Andrew Lang)

Page 5: CINF 2012 talk Recrystallization App

Click on the solvent to see temp curve

(Andrew Lang)

Page 6: CINF 2012 talk Recrystallization App

Deliver melting point data via App

(Andrew Lang)

Page 7: CINF 2012 talk Recrystallization App

How does it work?

1. Look up the solvent boiling point

2. Look up the room temperature solubility or predict it via Abraham descriptors predicted from a model using the CDK

3. Look up the solute melting point or predict it via a model using the CDK

4. Use the melting point and the solubility at room temperature to predict the solubility at boiling

5. Calculate the predicted recrystallization yield

Page 8: CINF 2012 talk Recrystallization App

Openness in Chemistry

WHY?

The Recrystallization App produces and uses Open Data:• Open Solubility Collection and Models• Open Melting Point Collection and

Models• Modeling depends mainly on CDK (Open

Source Software with Open Descriptors)• Open Notebook Science

Page 9: CINF 2012 talk Recrystallization App

Open Data Collections are essential for this strategy

Open DataOpen Data

Open Data

transparent transformation

Transparent chain of provenance

Page 10: CINF 2012 talk Recrystallization App

Open Melting Point DatasetsCurrently 20,000 compounds with Open MPs

Page 11: CINF 2012 talk Recrystallization App

American Petroleum Institute 5 CPHYSPROP -30 CPHYSPROP 125 Cpeer reviewed journal (2008) 97.5 Cgovernment database -30 Cgovernment database 4.58 C

What is the melting point of 4-benzyltoluene?

Page 12: CINF 2012 talk Recrystallization App

Motivation: Faster Science, Better Science

Page 13: CINF 2012 talk Recrystallization App

The quest to resolve the melting point of 4-benzyltoluene: liquid at room temp

and can be frozen <-30C

Page 14: CINF 2012 talk Recrystallization App

Open Lab Notebook page measuring the melting point of 4-benzyltoluene

Page 15: CINF 2012 talk Recrystallization App

Ruling out all melting points above -15C?

Page 16: CINF 2012 talk Recrystallization App

Oops – 4-benzyltoluene freezes after 16 days at -15C!

Page 17: CINF 2012 talk Recrystallization App

Measuring the melting point by slowly heating from -15 C gives 5 C

Page 18: CINF 2012 talk Recrystallization App

There are NO FACTS, only measurements embedded

within assumptions

Open Notebook Science maintains the integrity of data

provenance by making assumptions explicit

Page 19: CINF 2012 talk Recrystallization App

Open Random Forest modeling of Open Melting Point data using CDK descriptors

(Andrew Lang)

R2 = 0.78, TPSA and nHdon most important

Page 20: CINF 2012 talk Recrystallization App

Melting point prediction service

Page 21: CINF 2012 talk Recrystallization App

Web services for summary data

(Andrew Lang)

Page 22: CINF 2012 talk Recrystallization App

Using a Google Spreadsheet as a “dashboard interface” for reaction planning and analysis

Page 23: CINF 2012 talk Recrystallization App

Calling Google App Scripts

Page 24: CINF 2012 talk Recrystallization App

Calling Google App Scripts

(Andrew Lang and Rich Apodaca)

Page 25: CINF 2012 talk Recrystallization App

Never having to leave the Google Spreadsheet dashboard for access to key info

(Andrew Lang and Rich Apodaca)

Page 26: CINF 2012 talk Recrystallization App

A click away from an interactive NMR display (using JCAMP-DX format and ChemDoodle)

(Andrew Lang)

Page 27: CINF 2012 talk Recrystallization App

Google Apps Scripts for conveniently exploring melting

point data

Page 28: CINF 2012 talk Recrystallization App

Straight chain carboxylic acids from 1 to 10 carbons

Straight chain alcohols from 1 to 10 carbons

Comparison of model with triple validated measurements

Page 29: CINF 2012 talk Recrystallization App

Cyclic primary amines from 3 to 6 carbons (cyclobutylamine flagged for validation – only single source available)

Page 30: CINF 2012 talk Recrystallization App

Open Melting Points in Supplementary Data Pages of Wikipedia (Martin Walker)

Page 31: CINF 2012 talk Recrystallization App

Dibenzalacetone derivatives docking against tubulin (paclitaxel site)

(Andrew Lang)

Page 32: CINF 2012 talk Recrystallization App

“Simple” aldol condensation synthesis

Top Hit(no reports of synthesis)

In top ten(a few reports of synthesis)(Andrew Lang)

Page 33: CINF 2012 talk Recrystallization App

Information from the literature on the target synthesis

Page 34: CINF 2012 talk Recrystallization App

Information from the literature on the target synthesis

Page 35: CINF 2012 talk Recrystallization App

Searching for aldol condensations of acetone in the Reaction Attempts

database (about 90% of reactions in Open Notebooks are “not successful”)

(Andrew Lang)

Page 36: CINF 2012 talk Recrystallization App

An example of a “failed experiment” in an Open Notebook with useful

information

Page 37: CINF 2012 talk Recrystallization App

A failed experiment reveals the importance of aldehyde solubility

Page 38: CINF 2012 talk Recrystallization App

An example of a successful experiment in an Open Notebook

Page 39: CINF 2012 talk Recrystallization App

A successful synthesis by avoiding water, dramatically increasing NaOH and long reaction

time

Page 40: CINF 2012 talk Recrystallization App

Chemical Information Retrieval 2012 property assignment

Page 41: CINF 2012 talk Recrystallization App

Melting Point Outlier List

Page 42: CINF 2012 talk Recrystallization App

Melting Point Outlier example

Page 43: CINF 2012 talk Recrystallization App

Solubility Outlier List

Page 44: CINF 2012 talk Recrystallization App

Solubility of benzoic acid in 1-octanol discrepancies

Page 45: CINF 2012 talk Recrystallization App

Using ChemSpider to ensure all stereocenters are defined before

searching for properties

Page 46: CINF 2012 talk Recrystallization App

Using the InChIKey to find single isomers

Page 47: CINF 2012 talk Recrystallization App

Chemical Information Validation Sheet 2012

Page 48: CINF 2012 talk Recrystallization App

Each entry validated with an image

Page 49: CINF 2012 talk Recrystallization App

Avoiding redundant property data points with a single click within the validation

sheet

Page 50: CINF 2012 talk Recrystallization App

Open Chemical Property Matrix (OCPM)

logP

Abraham descriptors

Melting point

Aqueous solubility

Octanol solubility

Vapor pressure

Flash point

Boiling point

Page 51: CINF 2012 talk Recrystallization App

Open Chemical Property Matrix (OCPM)

Page 52: CINF 2012 talk Recrystallization App

OCPM relationships

Page 53: CINF 2012 talk Recrystallization App

OCPM melting point sheet

Page 54: CINF 2012 talk Recrystallization App

Dibenzalacetone libraries are promising for connecting the OCPM with useful applications

Page 55: CINF 2012 talk Recrystallization App

Conclusions

More openness in chemistry can make science more efficient

Provide interfaces that make sense to the end users: Open Data, Open Models and Open Source Software to modelersApps (smartphones, Google App Scripts, etc.) for chemists at the bench

Acknowledgements

Andrew Lang (code, modeling)Bill Acree (modeling, solubility data contribution)Antony Williams (ChemSpider services, mp data curation)Matthew McBride and Rida Atif (recrystallization and synthesis)Kayla Gogarty (OCPM)