klaus gubernator, craig james, e molecules inc. acs 232nd national meeting division of chemical...
TRANSCRIPT
![Page 1: Klaus Gubernator, Craig James, e Molecules Inc. ACS 232nd National Meeting Division of Chemical Information San Francisco, September 14, 2006 Chemical](https://reader035.vdocuments.mx/reader035/viewer/2022062809/5697bf731a28abf838c7ed16/html5/thumbnails/1.jpg)
Klaus Gubernator, Craig James, eMolecules Inc.
ACS 232nd National Meeting
Division of Chemical Information
San Francisco, September 14, 2006
Chemical Structure Search Engines in Cyberspace
![Page 2: Klaus Gubernator, Craig James, e Molecules Inc. ACS 232nd National Meeting Division of Chemical Information San Francisco, September 14, 2006 Chemical](https://reader035.vdocuments.mx/reader035/viewer/2022062809/5697bf731a28abf838c7ed16/html5/thumbnails/2.jpg)
The web has revolutionized the way we retrieve information
Chemistry is a late participant in this revolution
Chemistry on the Internet
![Page 3: Klaus Gubernator, Craig James, e Molecules Inc. ACS 232nd National Meeting Division of Chemical Information San Francisco, September 14, 2006 Chemical](https://reader035.vdocuments.mx/reader035/viewer/2022062809/5697bf731a28abf838c7ed16/html5/thumbnails/3.jpg)
N
Search Google Images for “Aspirin”
![Page 4: Klaus Gubernator, Craig James, e Molecules Inc. ACS 232nd National Meeting Division of Chemical Information San Francisco, September 14, 2006 Chemical](https://reader035.vdocuments.mx/reader035/viewer/2022062809/5697bf731a28abf838c7ed16/html5/thumbnails/4.jpg)
http://scripts.iucr.org/cgi-bin/paper?cnor=a12172&buy=yes
Acta Cryst. (1975). B31, 1427-1429 The crystal structure of 7-amino-2H,4H-vic-triazolo[4,5-c]-1,2,6-
thiadiazine 1,1-dioxide (ATT)C. Foces-Foces, F. H. Cano and S. García-Blanco
Buy onlineYou may purchase this article in PDF and/or HTML formats. For
purchasers in the UK, and for purchasers elsewhere in the European Community who do not have a VAT number, VAT will be added to the price of the article.
Format* PDF (US $40, plus US $7 for EC purchases)
Structure of “triazolo thiadiazine”
![Page 5: Klaus Gubernator, Craig James, e Molecules Inc. ACS 232nd National Meeting Division of Chemical Information San Francisco, September 14, 2006 Chemical](https://reader035.vdocuments.mx/reader035/viewer/2022062809/5697bf731a28abf838c7ed16/html5/thumbnails/5.jpg)
Datasets (which are, in contrast to other dataset lists, available in a structural format)This list will be expanded continuously. Please don't hesitate to make published datasets publicly available here.
Currently available: 44 DatasetsNote: The Briem/Lessel and Hert/Willett Dataset are only available as MDDR ID's due to license reasons. Please contact MDL for further information on the database. The datasets have nonethless been included here because they are standard datasets for similarity searching. – Andreas Bender
Binary (active/inactive) datasets QSAR datasets QSPR datasets Toxicity datasets Metabolism datasets Permeability datasets Docking datasets Mechanistic datasets Mixed/Other datasets
Cheminformatics.org
![Page 6: Klaus Gubernator, Craig James, e Molecules Inc. ACS 232nd National Meeting Division of Chemical Information San Francisco, September 14, 2006 Chemical](https://reader035.vdocuments.mx/reader035/viewer/2022062809/5697bf731a28abf838c7ed16/html5/thumbnails/6.jpg)
CS(=O)(=O)Nc1ccc(cc1OC2CCCCC2)N(=O)=OCS(=O)(=O)Nc1cc2CCC(=O)c2cc1Oc3ccc(F)cc3FCS(=O)(=O)Nc1cc2CCC(=O)c2cc1Sc3ccc(F)cc3FCS(=O)(=O)Nc1ccc(cc1Sc2ccc(F)cc2F)C(=O)NCS(=O)(=O)Nc1ccc(cc1Sc2ccc(Cl)cc2Cl)S(=O)(=O)NCOc1ccc(cc1)c2sc(nc2c3ccc(cc3)S(=O)(=O)C)c4ccccc4ClCOc1ccc(cc1)c2sc(nc2c3ccc(SC)cc3)c4ccccc4ClCS(=O)(=O)c1ccc(cc1)n2nc(cc2c3ccc(F)cc3)C(F)(F)FCS(=O)(=O)c1ccc(cc1)n2nc(cc2c3ccc(Br)cc3)C(F)(F)FCc1ccc(cc1)c2cc(nn2c3ccc(cc3)S(=O)(=O)N)C(F)(F)FCS(=O)(=O)c1ccc(cc1)c2snnc2c3ccc(F)cc3CC(=O)c1nc(c(o1)c2ccc(c(F)c2)S(=O)(=O)N)c3ccccc3Cc1nc(C2CCCCC2)c(o1)c3ccc(c(F)c3)S(=O)(=O)NCS(=O)(=O)c1ccc(cc1)c2[nH]c(nc2C3CCCCC3)C(F)(F)FCS(=O)(=O)c1ccc(cc1)c2[nH]c(nc2c3ccc(F)cc3)C(F)(F)FCS(=O)(=O)c1ccc(cc1)C2=C(C(=O)OC32CC3)c4ccccc4CS(=O)(=O)c1ccc(cc1)C2=C(C(=O)OC32CCCC3)c4ccccc4CS(=O)(=O)c1ccc(cc1)c2cnn(Cc3ccccc3)c(=O)c2c4ccccc4CS(=O)(=O)c1ccc(cc1)c2nn(Cc3ccccc3)c(c2c4ccc(F)cc4)C(F)(F)FNS(=O)(=O)c1ccc(cc1)c2c(CO)onc2c3ccccc3CS(=O)(=O)c1ccc(cc1)c2cc(Cl)nn2c3ccc(F)cc3NS(=O)(=O)c1ccc(cc1)c2cc(nn2c3ccc(F)cc3)C(F)(F)FNS(=O)(=O)c1ccc(cc1)n2nc(cc2c3nc4cccc(F)c4s3)C(F)F
Stahl dataset
![Page 7: Klaus Gubernator, Craig James, e Molecules Inc. ACS 232nd National Meeting Division of Chemical Information San Francisco, September 14, 2006 Chemical](https://reader035.vdocuments.mx/reader035/viewer/2022062809/5697bf731a28abf838c7ed16/html5/thumbnails/7.jpg)
Unnamed -MTS- 06200418093D 0 0.00000 0.00000 0
13 13 0 0 0 0 0 0 0 0 1 V2000 0.0180 -0.0030 0.0000 S 0 0 0 0 0 0 0 0 0 0 0 0 1.7880 0.0070 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 2.4880 0.0110 1.2120 C 0 0 0 0 0 0 0 0 0 0 0 0 3.8880 0.0200 1.2120 C 0 0 0 0 0 0 0 0 0 0 0 0 4.5880 0.0240 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 6.0030 0.0330 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 6.6610 1.1880 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 6.0400 2.2500 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 8.1410 1.1970 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 8.7570 0.1440 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 8.7890 2.3360 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 3.8880 0.0200 -1.2130 C 0 0 0 0 0 0 0 0 0 0 0 0 2.4880 0.0120 -1.2120 C 0 0 0 0 0 0 0 0 0 0 0 0 1 2 1 0 0 0 0 2 3 2 0 0 0 0 3 4 1 0 0 0 0 4 5 2 0 0 0 0 …M END> <BIO>48.00
$$$$
Yokoyama dataset
![Page 8: Klaus Gubernator, Craig James, e Molecules Inc. ACS 232nd National Meeting Division of Chemical Information San Francisco, September 14, 2006 Chemical](https://reader035.vdocuments.mx/reader035/viewer/2022062809/5697bf731a28abf838c7ed16/html5/thumbnails/8.jpg)
Search Genbank for “aattccgg”
![Page 9: Klaus Gubernator, Craig James, e Molecules Inc. ACS 232nd National Meeting Division of Chemical Information San Francisco, September 14, 2006 Chemical](https://reader035.vdocuments.mx/reader035/viewer/2022062809/5697bf731a28abf838c7ed16/html5/thumbnails/9.jpg)
C
![Page 10: Klaus Gubernator, Craig James, e Molecules Inc. ACS 232nd National Meeting Division of Chemical Information San Francisco, September 14, 2006 Chemical](https://reader035.vdocuments.mx/reader035/viewer/2022062809/5697bf731a28abf838c7ed16/html5/thumbnails/10.jpg)
C
![Page 11: Klaus Gubernator, Craig James, e Molecules Inc. ACS 232nd National Meeting Division of Chemical Information San Francisco, September 14, 2006 Chemical](https://reader035.vdocuments.mx/reader035/viewer/2022062809/5697bf731a28abf838c7ed16/html5/thumbnails/11.jpg)
Why is so little chemistry on the web?
Tradition? Strong providers of subscription services? Searching for chemical structures is
significantly more difficult than text searching?
Chemical identifiers are not standardized?
![Page 12: Klaus Gubernator, Craig James, e Molecules Inc. ACS 232nd National Meeting Division of Chemical Information San Francisco, September 14, 2006 Chemical](https://reader035.vdocuments.mx/reader035/viewer/2022062809/5697bf731a28abf838c7ed16/html5/thumbnails/12.jpg)
Open Access Chemical Search Engines
PubChem - NIH
ChemBank – Harvard
ZINC – UCSF
ChemDB – UC Irvine
ChemExper - Lausanne
ChemFinder – CambridgeSoft
![Page 13: Klaus Gubernator, Craig James, e Molecules Inc. ACS 232nd National Meeting Division of Chemical Information San Francisco, September 14, 2006 Chemical](https://reader035.vdocuments.mx/reader035/viewer/2022062809/5697bf731a28abf838c7ed16/html5/thumbnails/13.jpg)
www.emolecules.com
New Chemistry Search Engine A large database of publicly available
molecular structures Launched November 2005 50,000 searches per month, rapidly
growing
![Page 14: Klaus Gubernator, Craig James, e Molecules Inc. ACS 232nd National Meeting Division of Chemical Information San Francisco, September 14, 2006 Chemical](https://reader035.vdocuments.mx/reader035/viewer/2022062809/5697bf731a28abf838c7ed16/html5/thumbnails/14.jpg)
www.emolecules.com
Free chemistry search site for publicly available chemical information
![Page 15: Klaus Gubernator, Craig James, e Molecules Inc. ACS 232nd National Meeting Division of Chemical Information San Francisco, September 14, 2006 Chemical](https://reader035.vdocuments.mx/reader035/viewer/2022062809/5697bf731a28abf838c7ed16/html5/thumbnails/15.jpg)
![Page 16: Klaus Gubernator, Craig James, e Molecules Inc. ACS 232nd National Meeting Division of Chemical Information San Francisco, September 14, 2006 Chemical](https://reader035.vdocuments.mx/reader035/viewer/2022062809/5697bf731a28abf838c7ed16/html5/thumbnails/16.jpg)
Advanced Search
Powerful features: hit list management union, intersect, subtract, difference manual selection export lists in many formats persistent hitlists
![Page 17: Klaus Gubernator, Craig James, e Molecules Inc. ACS 232nd National Meeting Division of Chemical Information San Francisco, September 14, 2006 Chemical](https://reader035.vdocuments.mx/reader035/viewer/2022062809/5697bf731a28abf838c7ed16/html5/thumbnails/17.jpg)
T
O
![Page 18: Klaus Gubernator, Craig James, e Molecules Inc. ACS 232nd National Meeting Division of Chemical Information San Francisco, September 14, 2006 Chemical](https://reader035.vdocuments.mx/reader035/viewer/2022062809/5697bf731a28abf838c7ed16/html5/thumbnails/18.jpg)
![Page 19: Klaus Gubernator, Craig James, e Molecules Inc. ACS 232nd National Meeting Division of Chemical Information San Francisco, September 14, 2006 Chemical](https://reader035.vdocuments.mx/reader035/viewer/2022062809/5697bf731a28abf838c7ed16/html5/thumbnails/19.jpg)
Content: 16M entries, 5.6M structures
Academic and government databases NIST WebBook DrugBank Protein Ligands
Chemical suppliers 150 electronic catalogs included
Future goal All publicly available chemical information
![Page 20: Klaus Gubernator, Craig James, e Molecules Inc. ACS 232nd National Meeting Division of Chemical Information San Francisco, September 14, 2006 Chemical](https://reader035.vdocuments.mx/reader035/viewer/2022062809/5697bf731a28abf838c7ed16/html5/thumbnails/20.jpg)
Why is it so fast?
Novel chemical search engine technology
Method represents a major departure from previously known algorithms- Molecular keys (MDL)
- Fingerprints (Daylight)
- Feature Trees (BioSolv)
![Page 21: Klaus Gubernator, Craig James, e Molecules Inc. ACS 232nd National Meeting Division of Chemical Information San Francisco, September 14, 2006 Chemical](https://reader035.vdocuments.mx/reader035/viewer/2022062809/5697bf731a28abf838c7ed16/html5/thumbnails/21.jpg)
Search engine technology
Analyze each molecule for distinguishing structural features
Generate all features algorithmically
Normalize features and use them for indexing
Result: very fast searches
N
O
NCl
H2N
O
H2N
H2N
HN
N
N
Cl
HN
N
![Page 22: Klaus Gubernator, Craig James, e Molecules Inc. ACS 232nd National Meeting Division of Chemical Information San Francisco, September 14, 2006 Chemical](https://reader035.vdocuments.mx/reader035/viewer/2022062809/5697bf731a28abf838c7ed16/html5/thumbnails/22.jpg)
Who is eMolecules?
Klaus Gubernator
Craig A. James
Rashmi Mistry
![Page 23: Klaus Gubernator, Craig James, e Molecules Inc. ACS 232nd National Meeting Division of Chemical Information San Francisco, September 14, 2006 Chemical](https://reader035.vdocuments.mx/reader035/viewer/2022062809/5697bf731a28abf838c7ed16/html5/thumbnails/23.jpg)
Summary
Free for depositors and users Very fast search engine High quality user interface Rich functionality Complementary with other engines