2012 acs skolnik symposium - chemspotlight

19
Automated Molecular Data Extraction using Open Babel & ChemSpotlight: The Semantic Desktop Prof. Geoff Hutchison Department of Chemistry University of Pittsburgh [email protected] ACS CINF: Skolnik Symposium 21 August 2012 http://hutchison.chem.pitt.edu

Upload: geoffrey-hutchison

Post on 17-May-2015

1.446 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: 2012 ACS Skolnik Symposium - ChemSpotlight

Automated Molecular Data Extraction using Open Babel & ChemSpotlight:

The Semantic Desktop

Prof. Geoff HutchisonDepartment of ChemistryUniversity of [email protected]

ACS CINF: Skolnik Symposium21 August 2012

http://hutchison.chem.pitt.edu

Page 2: 2012 ACS Skolnik Symposium - ChemSpotlight

”— Prof. Henry S. Rzepa (Imperial College) Spring 2005 ACS Meeting, San Diego, CA

I can plug my iPod into any computer and it will recognize my music and give me all sorts of metadata: artist, title, type of music...

Why can’t I read the chemical metadata off my chemistry files?

Page 3: 2012 ACS Skolnik Symposium - ChemSpotlight

Pre-History: Chem://Dig

Index files, websites

Based on Chem MIME

Find files on extension

Perceive chemistry

Database Store

Search, Filter

Retrieval

H. Rzepa et al. New J. Chem (2002) 26 p. 656

Page 4: 2012 ACS Skolnik Symposium - ChemSpotlight

Open Babel

Open Babel (Started 2001)

http://openbabel.org/

Free, open source chemical toolbox

Cross-platform: Win, Mac, Linux...

Both user-tools & C++ library

Interfaces in Python, Perl, Ruby, Java, C#

Supports chemistry, bioinformatics, solid-state…

100+ file formats and variants

O’Boyle et al. J. Cheminf. 2011, 3:33

Page 5: 2012 ACS Skolnik Symposium - ChemSpotlight

Chemical Database?

1. Some way to store data (Organize it)

2. Index it3. Search / filter4. Visualize results

Page 6: 2012 ACS Skolnik Symposium - ChemSpotlight

ChemSpotlight: Indexing Architecture

Spotlight Open Babel

+ + ~300 lines of code

http://chemspotlight.openmolecules.net/

Page 7: 2012 ACS Skolnik Symposium - ChemSpotlight

ChemSpotlight: “Un” Database

Use the system-wide search databaseNo (Visible) Database!

Index files in-place

Includes textual data(e.g., chemical names, formulas, etc.)

Multiple retrieval and filtering interfaces(i.e., any third-party search tool works)

http://chemspotlight.openmolecules.net/

Page 8: 2012 ACS Skolnik Symposium - ChemSpotlight

So What’s Stored / Perceived

Formula, mass, SMILES, InChInet_sourceforge_openbabel_Formula = C21H36N7O8S

Fingerprints, number of atoms, bonds, residues

PDB, SDF keywords, properties

Calculation keywords:kMDItemComment = "Gaussian 09 #n B3LYP/6-31G(d) Opt"

Calculation results (HOMO, LUMO, Dipole Moment)net_sourceforge_chemspotlight_DipoleMoment = 3.5

Page 9: 2012 ACS Skolnik Symposium - ChemSpotlight

ChemSpotlight “Un” Database

Page 10: 2012 ACS Skolnik Symposium - ChemSpotlight

ChemSpotlight “Un” Database

Page 11: 2012 ACS Skolnik Symposium - ChemSpotlight

How Do We Visualize?

“QuickLook” previews

New code ~800 lines

Generate SDF, PDB, CIF (if needed)

Pass off to ChemDoodleWeb Components

Pseudo-3D, interactive JS+ HTML5

… or SVG generation from Open Babel

http://web.chemdoodle.com/

Page 12: 2012 ACS Skolnik Symposium - ChemSpotlight

Organic Heterojunction Solar Cells

p-type material

n-type material

Transparent Electrode

Reflective Electrode

light

+- Circuit

Page 13: 2012 ACS Skolnik Symposium - ChemSpotlight

ΔE ≥ Exciton Binding Energy e-

h+

Optical Excitation

Anode

Cathode

Effective

Heterojunction

Bandgap

Hole

Conducting

PolymerElectron

Conductor(Nanoparticle)

Organic Heterojunction Solar Cells

p-type material

n-type material

Transparent Electrode

Reflective Electrode

light

+- Circuit

Page 14: 2012 ACS Skolnik Symposium - ChemSpotlight

Pipeline Model for Finding New Molecules

Monomers

...

>106

Possible Structures

ElectronicProperties

OpticalProperties

SyntheticScore

~9 m

inut

esJ Phys Chem C 2011 vol. 115 pp. 16200

Page 15: 2012 ACS Skolnik Symposium - ChemSpotlight

Pipeline Model for Finding New Molecules

Monomers

Fast Screening

Slower

...

>106

Possible Structures

ElectronicProperties

OpticalProperties

SyntheticScore

~9 m

inut

esJ Phys Chem C 2011 vol. 115 pp. 16200

Page 16: 2012 ACS Skolnik Symposium - ChemSpotlight

New Genetic Algorithm Approach

Rather than directly driving & wait for calc results

Check Spotlight for new results

“What are top HOMO energies?”

Update GA, generate new candidates, submit new jobs

Page 17: 2012 ACS Skolnik Symposium - ChemSpotlight

Scaling Up the Polymer Solar Search

LUM

O E

nerg

y (e

V)

−3

−2

−1

0

HOMO Energy (eV)−9.5 −9.0 −8.5 −8.0 −7.5 −7.0 −6.5

2nd Gen. Search:

680 Monomers

2800+ Fragments

Search Space:500+ million oligomers

~9 minutes per core

S

Page 18: 2012 ACS Skolnik Symposium - ChemSpotlight

Take-Home Messages

“Big Data” is a Big HeadacheChemSpotlight & Un-Databases Work!Keep data as native files w/separate indexIntegrate into user-friendly toolsSell to users: “What’s in it for me?”

Indexing, retrievalImproved workflows

Page 19: 2012 ACS Skolnik Symposium - ChemSpotlight

Dr. Noel O’BoyleU.C. Cork, Ireland

Casey CampbellPitt (2010)

Marcus HanwellPitt / Kitware