acs san diego, march 2012, inchi symposium

61
Accessing NCI/CADD Web Resources by InChI Markus Sitzmann Computer-Aided Drug Design Group, Chemical Biology Laboratory, Frederick National Laboratory for Cancer Research, NIH, DHHS

Upload: markus-sitzmann

Post on 10-Dec-2014

5.865 views

Category:

Business


2 download

DESCRIPTION

 

TRANSCRIPT

Page 1: ACS San Diego, March 2012, InChI Symposium

Accessing NCI/CADD Web Resources by InChI

Markus SitzmannComputer-Aided Drug Design Group, Chemical Biology Laboratory, Frederick National Laboratory for Cancer Research, NIH, DHHS

Page 2: ACS San Diego, March 2012, InChI Symposium

http://cactus.nci.nih.gov

Page 3: ACS San Diego, March 2012, InChI Symposium

Chemical Identifier Resolver (CIR)

http://cactus.nci.nih.gov/chemical/structure

CIR works as a resolver for different chemical structure identifiers orrepresentations. It allows one to convert a givenstructure identifier into anotherrepresentation or structureidentifier.

Page 4: ACS San Diego, March 2012, InChI Symposium

Chemical Structure Representations

chemical structureNCI/CADD Identifiers

InChI/InChIKey

ChemSpider ID

PubChem SID/CID

chemical names

CAS Registry Number

NSC number

FDA UNII

ChemNavigator SID

SMILES

SD File

Chemical FormulaChEBI ID

PDB Ligand ID

MRV

CML

SYBYL Line Notation

GIF image

Page 5: ACS San Diego, March 2012, InChI Symposium

Chemical Structure Representations

InChINCI/CADD Identifiers

InChI/InChIKey

ChemSpider ID

PubChem SID/CID

chemical names

CAS Registry Number

NSC number

FDA UNII

ChemNavigator SID

SMILES

SD File

Chemical FormulaChEBI ID

PDB Ligand ID

MRV

CML

SYBYL Line Notation

GIF image

Page 6: ACS San Diego, March 2012, InChI Symposium

many more …

Chemical Structure Databases

InChI

Page 7: ACS San Diego, March 2012, InChI Symposium

Chemical Identifier Resolver (CIR)

http://cactus.nci.nih.gov/chemical/structure

CIR works as a resolver for different chemical structure identifiers orrepresentations. It allows one to convert a givenstructure identifier into anotherrepresentation or structureidentifier.

Page 8: ACS San Diego, March 2012, InChI Symposium

http://cactus.nci.nih.gov/chemical/structure

Chemical Identifier Resolver (CIR)

Works as a resolver for different chemical structure identifiers. Allows one to convert a givenstructure identifier into anotherrepresentation or structureidentifier.

C7H6O2APtclcactv03051222202D 0 0.00000 0.00000 15 15 0 0 0 0 0 0 0 0999 V2000 2.8660 -2.0600 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 3.7321 -1.5600 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 3.7321 -0.5600 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 2.8660 -0.0600 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 2.0000 -0.5600 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 2.0000 -1.5600 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 2.8660 0.9400 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 3.7321 1.4400 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 2.0000 1.4400 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 2.8660 -2.6800 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 4.2690 -1.8700 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 4.2690 -0.2500 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 1.4631 -0.2500 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 1.4631 -1.8700 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 3.7321 2.0600 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 1 2 2 0 0 0 0 2 3 1 0 0 0 0 3 4 2 0 0 0 0 4 5 1 0 0 0 0 5 6 2 0 0 0 0 1 6 1 0 0 0 0 4 7 1 0 0 0 0 7 8 1 0 0 0 0 7 9 2 0 0 0 0 1 10 1 0 0 0 0 2 11 1 0 0 0 0 3 12 1 0 0 0 0 5 13 1 0 0 0 0 6 14 1 0 0 0 0 8 15 1 0 0 0 0M END$$$$SD file

ChemWriter Editor

WPYMKLBDIGXBTP-FZOZFQFYNA-N

Page 9: ACS San Diego, March 2012, InChI Symposium

http://cactus.nci.nih.gov/chemical/structure

Chemical Identifier Resolver (CIR)

Works as a resolver for different chemical structure identifiers. Allows one to convert a givenstructure identifier into anotherrepresentation or structureidentifier.

benzoic acid65-85-0WLN: QVRUnisept BZAAIDS018010Salvo liquidBenzoic acid-ring-UL-14CST5213864BenzoesaeureCHEBI:30746NSC 149benzenecarboxylic acidphenylformic acidBenzoic acid (JP15/USP)Benzoic acid (TN)18102_RIEDELAromatic hydroxy acidBenzoic acid (7CI,8CI,9CI)Benzoic acid [USAN:JAN]W213128_ALDRICH47849_SUPELCOAcide benzoique [French]Acido benzoico [Italian]Benzoate (VAN)Benzoesaeure [German]Benzoic acid (natural)Acide benzoiqueBenzeneformic acidBenzenemethanoic acidBenzoesaeure GKBenzoesaeure GVBenzoic acid, tech.CarboxybenzeneKyselina benzoovaPhenylcarboxylic acidnames

ChemWriter Editor

WPYMKLBDIGXBTP-FZOZFQFYNA-N

Page 10: ACS San Diego, March 2012, InChI Symposium

http://cactus.nci.nih.gov/chemical/structure

Chemical Identifier Resolver (CIR)

Works as a resolver for different chemical structure identifiers. Allows one to convert a givenstructure identifier into anotherrepresentation or structureidentifier.

InChIKey=WPYMKLBDIGXBTP-UHFFFAOYSA-NInChI=1S/C7H6O2/c8-7(9)6-4-2-1-3-5-6/h1-5H,(H,8,9)C1=CC=C(C=C1)C(O)=O

ChemWriter Editor

WPYMKLBDIGXBTP-FZOZFQFYNA-N

InChIKeyInChI

SMILES

Page 11: ACS San Diego, March 2012, InChI Symposium

Chemical Identifier Resolver (CIR)

programmatic URL API:

http://cactus.nci.nih.gov/chemical/structure/”identifier”/”representation”

if a request is not successful: HTTP404 status message

Page 12: ACS San Diego, March 2012, InChI Symposium

Chemical Identifier Resolver (CIR)

programmatic URL API:

http://cactus.nci.nih.gov/chemical/structure/”identifier”/”representation”

if a request is not successful: HTTP404 status message

http://cactus.nci.nih.gov/chemical/structure/PGZUMBJQJWIWGJ-ONAKXNSWSA-N/cas

204255-11-8 MIME type: text/plain

examples:

http://cactus.nci.nih.gov/chemical/structure/PGZUMBJQJWIWGJ-ONAKXNSWSA-N/image

MIME type: image/gif

Page 13: ACS San Diego, March 2012, InChI Symposium

Chemical Identifier Resolver (CIR)

• access by programming libraries/languages (e.g. Python):

• access from Unix shell level (e.g., via wget):

shell > wget -qO - \http://cactus.nci.nih.gov/chemical/structure/tamiflu/cas204255-11-8

from urllib2 import *url = “http://cactus.nci.nih.gov/chemical/structure/tamiflu/cas”resolver = urlopen(url) try:

response = resolver.read() except HTTPError:

raise “your own error handling”print response204255-11-8

Page 14: ACS San Diego, March 2012, InChI Symposium

InChI/InChIKey(trivial) names

CAS Registry numbers

IUPAC names (OPSIN)

structure images(GIF, PNG)

chemical properties(MW, formula, …)

Database RegIDs(PubChem, ZINC, eMolecules, ChemSpider ID)

structure files (sdf, pdb, cdx, …)

SMILES

Chemical Identifier Resolver: InChI/InChIKey

Page 15: ACS San Diego, March 2012, InChI Symposium

CIR

chemical namesIUPAC names (OPSIN)

CAS numbersSMILES strings

IUPAC InChI/InChIKeysNCI/CADD Identifiers

CACTVS HASHISYNSC number

PubChem SIDZINC Code

ChemSpider IDChemNavigator SID

eMolecule VID

/smiles/names, /iupac_name/cas/inchi, /stdinchi/inchikey, /stdinchikey/ficts, /ficus, /uuuuu /image/file, /sdf/mw, /monoisotopic_mass /formula/twirl/urls/chemspider_id/pubchem_sid/chemnavigator_sid

“identifier” “representation”

http://cactus.nci.nih.gov/chemcial/structure

CIR

Chemical Identifier Resolver (CIR)

Page 16: ACS San Diego, March 2012, InChI Symposium

identifier representation

http request

http response

detection ofthe identifier

type

detection ofthe identifier

type

identifier is afull structure

representation(e.g. SMILES, InChI)

calculation of therequested structure

representation

calculation of therequested structure

representation

identifier is ahashed structure

representation(e.g. InChIKey),

chemical name etc.

database lookup

MIME type

structure

e.g. InChI, GIF image

e.g. CAS number,chemical name

Chemical Identifier Resolver (CIR)

CSDB

Page 17: ACS San Diego, March 2012, InChI Symposium

Chemical Structure Database (CSDB)

• ChemNavigator iResearch Librarycompilation of commercially available screeningcompounds from ~300 international chemistrysuppliers

• PubChem databaseincluding Open NCI database, EPA DSSTox databases, NIAID HIV database, NIST Webbook, NLM ChemIDplus, ChemSpider, …

• Commercial Sources / othersAsinex, Comgenex, eMolecules, …

ChemNav.iResearch Lib.~56%

PubChem~38%

others

~6%

140 chemical structure databases120 million structure records

84.6 million unique structures by FICuS110 million Standard InChIKeys for lookup

current status: (as of March 2010)

Page 18: ACS San Diego, March 2012, InChI Symposium

• PubChem Substance & Compound as separate databases(both updated to 2012)

• ChemNavigator iResearch Library: updated to 2012• new databases, e.g.

• Therapeutic Target Database (TTD) • Human Metabolome Database (HMDB)• DrugBank

• “pull” download of databases also available in PubChem, e.g.• DSSTox, ZINC 2012/01, ChEBI 2012/01, ChEMBL13,

ChemIDplus 2012/01• to a limited extend “historic versions” of databases are archived,

e.g. comparison of PubChem Substance 2007 vs 2012 will be possible

Chemical Structure Database (Update 2012)

Page 19: ACS San Diego, March 2012, InChI Symposium

Chemical Structure Database (CSDB)

Chemical Structure Normalization

• calculation of a set of parent structures with differentsensitivity to chemical features:

structurenormalization

parentstructure

NCI/CADDIdentifier

hashcodecalculation

E_HASHISYFICTS

original structure

record

FICuS

uuuuu

MolfileSDFSMILESChemDraw cdxPDB

SDFSMILESdatabase

both the original structure record & the normalized parent structuresare archived in the database

Page 20: ACS San Diego, March 2012, InChI Symposium

Chemical Structure Database (CSDB)

NCI/CADD Identifiers (FICTS, FICuS, uuuuu)

HNN NH2

O-

ONa+

6C16DE2351F9FF50-FICTS

NNH NH2

OH

O

9850FD9F9E2B4E25-FICTS

HNN

OH

O

NH2HN

NOH

O

NH2HN

N NHOH

O

E92E4BA2869F3611-FICTS 8A7AD1EB498CC76A-FICTS

E92E4BA2869F3611-FICuS 8A7AD1EB498CC76A-FICuSE5F83F10C5DB080A-FICuS

E5F83F10C5DB080A-FICTS

9850FD9F9E2B4E25-uuuuu

9850FD9F9E2B4E25-FICuS

tautomer salt SRhistidine:

structure normalization:

based on CACTVS hashcodes (HASHISY)16-digit hexadecimal number (64-bit unsigned) HN

N NH2

OH

O

9850FD9F9E2B4E25

Page 21: ACS San Diego, March 2012, InChI Symposium

HNN NH2

O-

ONa+

6C16DE2351F9FF50-FICTS

NNH NH2

OH

O

9850FD9F9E2B4E25-FICTS

HNN

OH

O

NH2HN

NOH

O

NH2HN

N NHOH

O

E92E4BA2869F3611-FICTS 8A7AD1EB498CC76A-FICTS

E92E4BA2869F3611-FICuS 8A7AD1EB498CC76A-FICuSE5F83F10C5DB080A-FICuS

E5F83F10C5DB080A-FICTS

9850FD9F9E2B4E25-uuuuu

9850FD9F9E2B4E25-FICuS

tautomer salt SRhistidine:

structure normalization:

based on CACTVS hashcodes (HASHISY)16-digit hexadecimal number (64-bit unsigned) HN

N NH2

OH

O

9850FD9F9E2B4E25

Chemical Structure Database (Update 2012)

FICTS

FICuS

uuuuu

~118 million

~115 million

~100 million

231 small-molecule database367 database releases (full, incremental, “historic versions”)324 million original database records

Chemical Structure Database (Update 2012)

Unique structure count:

Page 22: ACS San Diego, March 2012, InChI Symposium

Chemical Structure Database (Update 2012)

InChI/InChIKey

InChI/InChIKey (Version 1.04) calculated with four InChI flag sets:

Set 1

Set 2

Set 3

Standard Standard InChIKey

DONOTADDH W0 FIXEDH RECMET NEWPS SPXYZ SAsXYZ Fb Fnud KET 15T

DONOTADDH W0 FIXEDH RECMET NEWPS SPXYZ SAsXYZ Fb Fnud KET 15T

DONOTADDH W0 FIXEDH RECMET NEWPS SPXYZ SAsXYZ Fb Fnud KET 15T

Add H

Add H

Add H

Add H

CACTVS

:

:

:

:

Standard Set, Set 1 & Set 2: addition of hydrogen atoms by CACTVSSet 3: addition of hydrogen atoms by the InChI library

Page 23: ACS San Diego, March 2012, InChI Symposium

structurenormalization

parentstructure

NCI/CADDIdentifier

hashcodecalculation

E_HASHISYFICTS

original structure

record

FICuS

uuuuu

Chemical Structure Database (Update 2012)

InChI/InChIKey

• calculation of InChI/InChIKey Standard set, Set 1, Set 2 & Set 3for all original structure records and normalized parent structure:

Set 1 Set 2 Set 3Standard

InChI/InChIKey

Page 24: ACS San Diego, March 2012, InChI Symposium

Using CIR with InChI/InChIKey

Page 25: ACS San Diego, March 2012, InChI Symposium

Using CIR with InChI/InChIKey

(Partial) InChIKey Lookup

http://cactus.nci.nih.gov/chemical/structure/LFQSCWFLJHTTHZ-UHFFFAOYSA-N/smiles

CCO

http://cactus.nci.nih.gov/chemical/structure/LFQSCWFLJHTTHZ-UHFFFAOYSA/smiles`

CCOCC[OH2+]

http://cactus.nci.nih.gov/chemical/structure/LFQSCWFLJHTTHZ/smiles

C(C(O)([2H])[2H])[2H]CC(O)([2H])[2H]C(CO)([2H])([2H])[2H]CC[17OH]C(CO)[2H][14CH3]COCCO

• resolve Standard InChIKey into full structure representation: Ethanol

Page 26: ACS San Diego, March 2012, InChI Symposium

Using CIR with InChI/InChIKey

Chemical File Representation

• available file format representations:

alc Alchemy formatcdxml CambridgeSoft ChemDraw XML formatcerius MSI Cerius II formatcharmm Chemistry at HARvardMacromolecular Mechanics file formatcif Crystallographic Information Filecml Chemical Markup Languagegjf Gaussian input data filegromacs GROMACS file formathyperchem HyperChem file formatjme Java Molecule Editor format

maestro Schroedinger MacroModelstructure file formatmol Symyx molecule filesybyl2/mol2 Tripos Sybyl MOL2 formatmrv ChemAxon MRV formatpdb Protein Data Banksdf Symyx Structure Data Formatsdf3000 Symyx Structure Data Format 3000sln SYBYL Line Notationsmiles SMILESxyz xyz file format

http://cactus.nci.nih.gov/chemical/structure/BSYNRYMUTXBXSQ-UHFFFAOYSA-N/file?format=sdfAspirin

Page 27: ACS San Diego, March 2012, InChI Symposium

Using CIR with InChI/InChIKey

Chemical Structure Images (GIF, PNG)

http://cactus.nci.nih.gov/chemical/structure/XMWRBQBLMFGWIX-UHFFFAOYSA-N/image?height=300&width=300&bgcolor=black&bondcolor=white

http://cactus.nci.nih.gov/chemical/structure/BSYNRYMUTXBXSQ-UHFFFAOYSA-N/image?height=200&width=200&symbolfontsize=7&footer="Aspirin"

Buckyball

Aspirin

Page 28: ACS San Diego, March 2012, InChI Symposium

Using CIR with InChI/InChIKey

3D Chemical Structure Visualization (TwirlyMol)

implemented by Noel O'Boyle (University College Cork, Ireland)

Chrome Safari FF3.6+ IE9 IE8 IE7 IE6

simple javascript that allows you to render a rotatable/zoomable3D representation of a molecule in your web browser

no plugin is needed, only a modern browser:

Page 29: ACS San Diego, March 2012, InChI Symposium

simple viewer:http://cactus.nci.nih.gov/chemical/structure/DDPJWUQJQMKQIF-XPNZOOHZSA-N/twirl

embedded into a web page:

<div id=“canvas” height=“400” width=“400”></div><script src=“http://cactus.nci.nih.gov/chemical/structure/

DDPJWUQJQMKQIF-XPNZOOHZSA-N/twirl_cached/canvas” />

Using CIR with InChI/InChIKey

3D Chemical Structure Visualization (TwirlyMol)

Restasis

Page 30: ACS San Diego, March 2012, InChI Symposium

http://www.coronene.com/blog/

http://chemical-quantum-images.blogspot.com

http://baoilleach.blogspot.com/

Using CIR with InChI/InChIKey

3D Chemical Structure Visualization (TwirlyMol)

Page 31: ACS San Diego, March 2012, InChI Symposium

Using CIR with InChI/InChIKey

Chemical Database URLs

<?xml version="1.0" encoding="UTF-8" ?> <request string="DDPJWUQJQMKQIF-XPNZOOHZSA-N" representation="urls">

<data id="1" resolver=“stdinchikey" string_class=“Standard InChIKey"><item id="1" classification="exact" database="ChemSpider" publisher="ChemSpider">

http://chemspider.com/structure.4939506</item><item id="2" classification="exact" database="ChemSpider“ publisher="PubChem">

http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?sid=43028058</item><item id="3" classification="exact" database="NLM ChemIDplus" publisher="NLM">

http://chem.sis.nlm.nih.gov/chemidplus/direct.jsp?result=advanced&regno=059865133[…]

</data></request>

• request database URLs:

http://cactus.nci.nih.gov/chemical/structure/DDPJWUQJQMKQIF-XPNZOOHZSA-N/urls/xmlRestasis

Page 32: ACS San Diego, March 2012, InChI Symposium

Using CIR with InChI/InChIKey

Chemical Name Lookup

• request (alternative) names:

<?xml version="1.0" encoding="UTF-8" ?> <request string=“BSYNRYMUTXBXSQ-UHFFFAOYSA-N" representation="names">

<data id="1" resolver=“stdinchikey" string_class=“Standard InChIKey"><item id="1" classification=“pubchem_iupac_name">2-acetyloxybenzoic acid</item><item id="2" classification="pubchem_iupac_openeye_name">2-Acetoxybenzoic acid</item><item id="3" classification="pubchem_generic_registry_name">50-78-2</item><item id="4" classification="pubchem_generic_registry_name">11126-35-5</item><item id="5" classification="pubchem_generic_registry_name">11126-37-7</item><item id="6" classification="pubchem_generic_registry_name">2349-94-2</item><item id="7" classification="pubchem_generic_registry_name">26914-13-6</item><item id="8" classification="pubchem_substance_synonym">NCGC00090977-04</item><item id="9" classification="pubchem_substance_synonym">KBioSS_002272</item><item id="10" classification="pubchem_substance_synonym">SBB015069</item><item id="11" classification="pubchem_substance_synonym">Aspirin</item><item id="12" classification="pubchem_substance_synonym">D00109</item>

[…]

http://cactus.nci.nih.gov/chemical/structure/BSYNRYMUTXBXSQ-UHFFFAOYSA-N/names/xmlAspirin

Page 33: ACS San Diego, March 2012, InChI Symposium

Using CIR with InChI/InChIKey

Chemical Properties

• request molecular weight:

http://cactus.nci.nih.gov/chemical/structure/BSYNRYMUTXBXSQ-UHFFFAOYSA-N/weight

180.1598

/mw molecular weight/formula formula/monoisotopic_mass monoisotopic mass/h_bond_donor_count H bond donor count/h_bond_acceptor_count H bond acceptor count/h_bond_center_count H bond center count/rotor_count number of rotatable bonds/effective_rotor_count number of effectively rotatable bonds/rule_of_5_violation_count number of Rule-of-5 violations/xlogp2 octanol−water partition coefficient XLOGP2

/aromatic compound is aromatic/macrocyclic compound is macrocyclic/heteroatom_count heteroatom count/hydrogen_atom_count H atom count/heavy_atom_count heavy atom count/deprotonable_group_count number of deprotonable groups/protonable_group_count number of protonable groups/ring_count number of rings/ringsys_count number of ringsystems

MIME type: text/plain

Aspirin

Page 34: ACS San Diego, March 2012, InChI Symposium

example: all chemical names that contain the words “morphine” and “methyl”(name pattern: ‘+morphine +methyl‘):

http://cactus.nci.nih.gov/chemical/structure/+morphine +methyl/stdinchikey/xml?resolver=name_pattern

Using CIR with InChI/InChIKey

Chemical Name Pattern Search

based on the open sourcefull text search server Sphinx(http://sphinxsearch.com)

• Google-like searches on CIR’s name index (approx. 70 million names)

Page 35: ACS San Diego, March 2012, InChI Symposium

<request string="+morphine +methyl" representation="stdinchikey"><data id="1" resolver="name_pattern" notation="Morphine 3-methyl ether">

<item id="1">InChIKey=OROGSEYTTFOCAN-DNJOTXNNSA-N</item></data><data id="2" resolver="name_pattern" notation="6-Methyl-delta(sup 6)-deoxy-morphine">

<item id="1">InChIKey=CUFWYVOFDYVCPM-GGNLRSJOSA-N</item></data><data id="3" resolver="name_pattern" notation="Morphine, dihydro-6-methyl-">

<item id="1">InChIKey=NBKVWIJQJMEQLE-NGTWOADLSA-N</item></data><data id="4" resolver="name_pattern“ notation="6-METHYL-MORPHINE ETHER">

<item id="1">InChIKey=FNAHUZTWOVOCTL-UHFFFAOYSA-N</item></data><data id="5" resolver="name_pattern" notation="Morphine alcoholic methyl ether">

<item id="1">InChIKey=FNAHUZTWOVOCTL-XSSYPUMDSA-N</item></data><data id="6" resolver="name_pattern" notation="N-Methyl morphine chloride">

<item id="1">InChIKey=MJNCZWBHCFTYFU-SCLAZZCHSA-N</item></data><data id="7" resolver="name_pattern" notation="Morphine, 7-hydroxy-6,6-dimethoxy-3-O-methyl-">

<item id="1">InChIKey=URFKRBIESURBKC-UHFFFAOYSA-N</item></data>

</request>

Search name pattern ‘+morphine +methyl’: 7 matching names

Page 36: ACS San Diego, March 2012, InChI Symposium

Using CIR with InChI/InChIKey

Chemical Name Pattern Search

example: chemical names that contain the words “morphine” and “methyl”but not “hydroxyl” (name pattern: ‘+morphine +methyl -hydroxyl‘): http://cactus.nci.nih.gov/chemical/structure/+morphine +methyl -hydroxyl/stdinchikey/xml?resolver=name_pattern

example: chemical names that contain the substring “morphine”somewhere in the name (name pattern: ‘*morphine*‘) http://cactus.nci.nih.gov/chemical/structure/*morphine*/stdinchikey/xml?resolver=name_pattern

example: chemical names that contain a single character “m” and the word “benzene” in a maximum distance of 3 words (finds meta-substituted aromaticcompounds, name pattern: ‘“m benzene”~3‘):http://cactus.nci.nih.gov/chemical/structure/(m benzene)~3/stdinchikey/xml?resolver=name_pattern

6 matching names

45 matching names

22 matching names

Page 37: ACS San Diego, March 2012, InChI Symposium

Structure Normalization(Tautomerism)

Page 38: ACS San Diego, March 2012, InChI Symposium

Structure Normalization

Tautomerism

rule 12: furanones

rule 11: 1.11 (aromatic) heteroatom H shiftrule 10: 1.9 (aromatic) heteroatom H shiftrule 9: 1.7 (aromatic) heteroatom H shiftrule 8: 1.5 aromatic heteroatom H shift (2)rule 7: 1.5 (aromatic) heteroatom H shift (1)rule 6: 1.3 heteroatom H shiftrule 5: 1.3 aromatic heteroatom H shiftrule 4: special iminerule 3: simple (aliphatic) iminerule 2: 1.5 (thio)keto/(thio)enolrule 1: 1.3 (thio)keto/(thio)enol

21 SMIRKS transform rules:

rule 21: phosphonic acidsrule 20: isocyanidesrule 19: formamidinesulfinic acidsrule 18: cyanic/iso-cyanic acidsrule 17: oxim/nitroso via phenolrule 16: oxim/nitrosorule 15: pentavalent nitro/aci-nitrorule 14: ionic nitro/aci-nitro

rule 13: keten/ynol exchange

Page 39: ACS San Diego, March 2012, InChI Symposium

Structure Normalization

Tautomerism

[O,S,Se,Te;X1:1]=[C;z{1-2}:2][CX4R{0-2}:3][#1:4]>>[#1:4][O,S,Se,Te;X2:1][#6;z{1-2}:2]=[C,cz{0-1}R{0-1}:3]

[N,n,S,s,O,o,Se,Te:1]=[NX2,nX2,C,c,P,p:2][N,n,S,O,Se,Te:3][#1:4]>>[#1:4][N,n,S,O,Se,Te:1][NX2,nX2,C,c,P,p:2]=[N,n,S,s,O,o,Se,Te:3]

32

O1

H 43

2O1H 4

N2

S1 N 3

H

H4

HN2

S1 N3

H

H4

H

1.3 keto/enol

1.3 heteroatom H shift

rule 1: 1.3 (thio)keto/(thio)enol

rule 6: 1.3 heteroatom H shift

Page 40: ACS San Diego, March 2012, InChI Symposium

Structure Normalization

Warfarin - Tautomers

HO

O

O

HO

O

O

O

HO

O

O

O

O

O

O

OH

O

HO

O

O

O

HO

O

OH

O

HO

O

OH

O

HO

O

O

O

HO

O

O

HO

prototropic tautomerism

Page 41: ACS San Diego, March 2012, InChI Symposium

Structure Normalization

Warfarin - Tautomers

HO

O

O

HO

O

O

O

HO

O

O

O

O

O

O

OH

O

HO

O

O

O

HO

O

OH

O

HO

O

OH

O

HO

O

O

O

HO

O

O

HO

http://cactus.nci.nih.gov/chemical/structure/tautomers:warfarin/representationhttp://cactus.nci.nih.gov/chemical/structure/tautomers:warfarin/representation

prototropic tautomerism

Page 42: ACS San Diego, March 2012, InChI Symposium

D76B88C0354759F1-FICuS D76B88C0354759F1-FICuS D76B88C0354759F1-FICuS

D76B88C0354759F1-FICuS D76B88C0354759F1-FICuS D76B88C0354759F1-FICuS

D76B88C0354759F1-FICuS D76B88C0354759F1-FICuS D76B88C0354759F1-FICuS

prototropic tautomerism

Structure Normalization

Warfarin – FICuS Identifier FICuS

http://cactus.nci.nih.gov/chemical/structure/tautomers:warfarin/ficushttp://cactus.nci.nih.gov/chemical/structure/tautomers:warfarin/ficus

prototropic tautomerism

HO

O

O

HO

O

O

O

HO

O

O

O

O

O

O

OH

O

HO

O

O

O

HO

O

OH

O

HO

O

OH

O

HO

O

O

O

HO

O

O

HO

Page 43: ACS San Diego, March 2012, InChI Symposium

D76B88C0354759F1-FICuS D76B88C0354759F1-FICuS D76B88C0354759F1-FICuS

D76B88C0354759F1-FICuS D76B88C0354759F1-FICuS D76B88C0354759F1-FICuS

D76B88C0354759F1-FICuS D76B88C0354759F1-FICuS D76B88C0354759F1-FICuS

09BB2FAADA1508A7-FICuS

09BB2FAADA1508A7-FICuS

2F505A3FCA434B3C-FICuS

ring-chaintautomerism prototropic tautomerism

Structure Normalization

Warfarin – FICuS Identifier FICuS

http://cactus.nci.nih.gov/chemical/structure/tautomers:warfarin/ficushttp://cactus.nci.nih.gov/chemical/structure/tautomers:warfarin/ficusring-chain

tautomerism prototropic tautomerism

HO

O

O

HO

O

O

O

HO

O

O

O

O

O

O

OH

O

HO

O

O

O

HO

O

OH

O

HO

O

OH

O

HO

O

O

O

HO

O

O

HO

O

OH

OHO

O

O

OHO

O

O

O

HO

Page 44: ACS San Diego, March 2012, InChI Symposium

Structure Normalization

Warfarin –

QTXVAVXCBMYBJW-UHFFFAOYSA-N VWSXIGYSLWNCBN-VAWYXSNFSA-N GRAAPKVUSREWIL-UHFFFAOYSA-N

FQEPJUOLUDFINX-UHFFFAOYSA-N UCKRWKACBKRIKB-VAWYXSNFSA-N NNLYDNMZCAHUOV-UHFFFAOYSA-N

PJVWKTKQMONHTI-UHFFFAOYSA-N FVSFCRPKSVCTBA-VAWYXSNFSA-N BBOSKMPTDUUMKL-UHFFFAOYSA-N

LSCYDZJASSKSMJ-UHFFFAOYSA-N

XGIOTBZTMHLTRL-UHFFFAOYSA-N

QUJJIKXCACZKKD-UHFFFAOYSA-N

Standard InChIKey

ring-chaintautomerism

prototropic tautomerism

http://cactus.nci.nih.gov/chemical/structure/tautomers:warfarin/stdinchikeyhttp://cactus.nci.nih.gov/chemical/structure/tautomers:warfarin/stdinchikey

HO

O

O

HO

O

O

O

HO

O

O

O

O

O

O

OH

O

HO

O

O

O

HO

O

OH

O

HO

O

OH

O

HO

O

O

O

HO

O

O

HO

O

OH

OHO

O

O

OHO

O

O

O

HO

Page 45: ACS San Diego, March 2012, InChI Symposium

Structure Normalization

Warfarin –

SAYISSDYYDIVTP-UHFFFAOYNA-N SAYISSDYYDIVTP-UHFFFAOYNA-N PMOPDASZKFXBOL-UHFFFAOYNA-N

SAYISSDYYDIVTP-UHFFFAOYNA-N SAYISSDYYDIVTP-UHFFFAOYNA-N PMOPDASZKFXBOL-UHFFFAOYNA-N

SAYISSDYYDIVTP-UHFFFAOYNA-N SAYISSDYYDIVTP-UHFFFAOYNA-N PMOPDASZKFXBOL-UHFFFAOYNA-N

LSCYDZJASSKSMJ-UHFFFAOYNA-N

FQOKLKCGRHFANU-UHFFFAOYNA-N

FQOKLKCGRHFANU-UHFFFAOYNA-N

InChIKey (W0 RECMET NEWPS SPXYZ SAsXYZ Fb Fnud KET 15T) InChIKey (W0 RECMET NEWPS SPXYZ SAsXYZ Fb Fnud KET 15T)ring-chain

tautomerism prototropic tautomerism

HO

O

O

HO

O

O

O

HO

O

O

O

O

O

O

OH

O

HO

O

O

O

HO

O

OH

O

HO

O

OH

O

HO

O

O

O

HO

O

O

HO

O

OH

OHO

O

O

OHO

O

O

O

HO

InChIKey

Page 46: ACS San Diego, March 2012, InChI Symposium

Structure Normalization

Warfarin

MIME type: text/plain

• “normalize” Standard InChIKey by NCI/CADD’s business rules:

http://cactus.nci.nih.gov/chemical/structure/normalize:QTXVAVXCBMYBJW-UHFFFAOYSA-N/stdinchikey

InChIKey=FQEPJUOLUDFINX-UHFFFAOYSA-N

O

O

O

HO

O

O

O

O

FQEPJUOLUDFINX-UHFFFAOYSA-N QTXVAVXCBMYBJW-UHFFFAOYSA-N

Page 47: ACS San Diego, March 2012, InChI Symposium

add_hyrogens, remove_hydrogens, normalize, ficts, ficus, uuuuu,scaffold_sequence, nostereo, stereoisomers, tautomers

• available operators:

http://cactus.nci.nih.gov/chemical/structure/scaffold_sequence:FQEPJUOLUDFINX-UHFFFAOYSA-N/stdinchikey

O

O

O

Structure Normalization

Chemical Operators

O

O

O O

O

O

XVYBSGQBRUYLNK-UHFFFAOYSA-N BQLSCAPEANVCOG-UHFFFAOYSA-N MERGMNQXULKBCH-UHFFFAOYSA-N

example:

Schuffenhauer et al., J. Chem. Inf. Model. 2007, 47, 47-58

Page 48: ACS San Diego, March 2012, InChI Symposium

Soon: Chemical File Resolver (CFR)

Page 49: ACS San Diego, March 2012, InChI Symposium

Chemical File Resolver (CFR)

CFRchemical

fileHTTP Post HTTP Getchemical

file

• allows conversion of many chemical file formats into another format or other representations

• will have a programmatic URL API & a HTML Web interface• url’izes all elements of the original file, i.e. provides access to each

specific record, field, and any metadata (size, record count, etc.) of the posted file by URLs

• release: Q2/2012 (hopefully)

Page 50: ACS San Diego, March 2012, InChI Symposium

HTTP Post

Chemical File Resolver (CFR)

curl -F upload=@/your/local/file.sdf http://cactus.nci.nih.gov/chemical/file>d85b396ed6ced6348a5b402eb8fcfe8b

• HTTP: post a file (e.g. with curl), CFR replies with a MD5 hash key:

• accepted formats:• chemical file formats: alc, cdxml, cerius, charmm, cif, cml, jme,

maestro, mol, mol2, mrv, pdb, sdf, sdf3000, sln, smiles, xyz, …• text files with a list of identifiers …

CFRchemical

fileHTTP Getchemical

file

Page 51: ACS San Diego, March 2012, InChI Symposium

HTTP PostCFR

chemical file

HTTP Getchemical file

Post a plain text file, e.g.:

curl -F upload=@/your/local/file.sdf http://cactus.nci.nih.gov/TEST/chemical/file>d85b396ed6ced6348a5b402eb8fcfe8b

• after posting a file, CFR replies with a MD5 hash sum:

• accepted formats:• chemical file formats: alc, cdxml, cerius, charmm, cif, cml, jme,

maestro, mol, mol2, mrv, pdb, sdf, sdf3000, sln, smiles, xyz, …• text files with a list of identifier:

ethanolaspirinInChI=1S/C4H10O/c1-3-5-4-2/h3-4H2,1-2H3CCOCCInChIKey=RCINICONZNJXQF-MZXODVADSA-NInChIKey=QTXVAVXCBMYBJW-UHFFFAOYSA-N 204255-11-8 tautomers:guanineChemSpider_ID=1234Pubchem_SID=456

Page 52: ACS San Diego, March 2012, InChI Symposium

Chemical File Resolver (CFR)

CFRchemical

fileHTTP Post HTTP Getchemical

file

• request new file format using the obtained MD5 hash key:

curl http://cactus.nci.nih.gov/TEST/chemical/file/{key}?format={sdf, smi, pdb, cml, …}

d85b396ed6ced6348a5b402eb8fcfe8b

Page 53: ACS San Diego, March 2012, InChI Symposium

Chemical File Resolver (CFR)

CFRchemical

fileHTTP Post HTTP Getchemical

file

• request record 2 and 5 as SMILES string:

curl http://cactus.nci.nih.gov/TEST/chemical/file/{key}?records=2,5&format=smiles

d85b396ed6ced6348a5b402eb8fcfe8b

Page 54: ACS San Diego, March 2012, InChI Symposium

Chemical File Resolver (CFR)

CFRchemical

fileHTTP Post HTTP Getchemical

file

• get field names:

curl http://cactus.nci.nih.gov/TEST/chemical/file/{key}/fields

• get a specific field value from record n:

curl http://cactus.nci.nih.gov/TEST/chemical/file/{key}/n/{field_name}

Page 55: ACS San Diego, March 2012, InChI Symposium

Chemical Structure Web API

ChemicalFile

Resolver

NCI/CADDweb service

NCI/CADD Chemical StructureDatabase (CSDB)

CACTVS

externalweb services

http

ChemicalIdentifierResolver

othersoftwarepackages

Chemical Structure Web API

OPSIN

Page 56: ACS San Diego, March 2012, InChI Symposium

IUPAC InChI/InChIKey Resolver

• (hopefully) there will be many resolvers from differentproviders with different background:• publishers

• commercial databases

• free sources and databases: ChemSpider, PubChem, ChEBI, …

• InChI/InChIKey is the perfect tool to interlink the resolvers

• ChemSpider, PubChem and NCI/CADD are working on a test protocol for a federated InChI/InChIKey resolver

Page 57: ACS San Diego, March 2012, InChI Symposium

IUPAC InChI/InChIKey Resolver

IUPAC Root Resolver

Resolver 1

Resolver 2

Resolver 3

Resolver 3.1

Resolver 3.2

Resolver 3.3

ClientsCIR

Resolver 3

Page 58: ACS San Diego, March 2012, InChI Symposium

http://cactus.nci.nih.gov

Page 59: ACS San Diego, March 2012, InChI Symposium

http://cactus.nci.nih.gov/blog

Page 60: ACS San Diego, March 2012, InChI Symposium

Acknowledgments

The InChI Team

Xemistry GmbH, GermanyWolf-Dietrich Ihlenfeldt

All Database providers ChemNavigatorScott HuttonTad Hurst

University of Cambridge, UKDaniel Lowe

NCI/CADD TeamIgor FilippovMarc Nicklaus

University College Cork, IrelandNoel O’ Boyle

Page 61: ACS San Diego, March 2012, InChI Symposium

Acknowledgments - Software

CACTVS

Python Web FrameworkChemWriter

Python SQL Library

Javascript library

Peter Ertl (Novartis)

Fulltext Search Engine