acs meeting new orleans 2013 (cinf)

76
NCI/CADD Chemical Structure Web Services Markus Sitzmann Computer-Aided Drug Design Group, Chemical Biology Laboratory, Frederick National Laboratory for Cancer Research, NIH, DHHS

Upload: markus-sitzmann

Post on 22-Jun-2015

3.514 views

Category:

Documents


1 download

DESCRIPTION

MY presentation during the CINF Public Databases session at the ACS Meeting in New Orleans.

TRANSCRIPT

Page 1: ACS Meeting New Orleans 2013 (CINF)

NCI/CADD Chemical Structure Web Services

Markus SitzmannComputer-Aided Drug Design Group, Chemical Biology Laboratory, Frederick National Laboratory for Cancer Research, NIH, DHHS

Page 2: ACS Meeting New Orleans 2013 (CINF)

http://cactus.nci.nih.gov

Page 3: ACS Meeting New Orleans 2013 (CINF)

Chemical Structure Web API

NCI/CADDweb service

NCI/CADD Chemical StructureDataBase (CSDB)

CACTVS

externalweb services

http

ChemicalIdentifierResolver

othersoftwarepackages

Chemical Structure Web API

OPSIN

NCI/CADDweb service

Page 4: ACS Meeting New Orleans 2013 (CINF)

Chemical Structures

chemical structureNCI/CADD Identifiers

InChI/InChIKey

ChemSpider ID

PubChem SID/CID

chemical names

CAS Registry Number

NSC number

FDA UNII

ChemNavigator SID

SMILES

SD File

Chemical FormulaChEBI ID

PDB Ligand ID

MRV

CML

SYBYL Line Notation

GIF image

Page 5: ACS Meeting New Orleans 2013 (CINF)

Chemical Identifier Resolver (CIR)

http://cactus.nci.nih.gov/chemical/structure

CIR works as a resolver for different chemical structure identifiers orrepresentations. It allows one to convert a givenstructure identifier into anotherrepresentation or structureidentifier.

Page 6: ACS Meeting New Orleans 2013 (CINF)

Chemical Identifier Resolver (CIR)

http://cactus.nci.nih.gov/chemical/structure

• officially released in June 2009• since then four beta versions

(for testing, learning, experience things)• one larger database update March 2010• since early 2012: major internal rewrite

(which will allow us to add new servicesand API functionality while not breakingthe existing API)

• major database update and servicesplanned for 2013

Page 7: ACS Meeting New Orleans 2013 (CINF)

7

CIR Usage Statistics

0

2,000,000

4,000,000

6,000,000

8,000,000

10,000,000

12,000,000

Typical number of unique IP addresses per month: 4,000 – 8,000

Requests per month since June 2009

Page 8: ACS Meeting New Orleans 2013 (CINF)

8

Academic/Hospitals• St. Olaf College• Carnegie Mellon• Drexel University• Princeton• Mayo

Pharma/Chemical Industry• Eli Lilly• Dow Chemical• Intermune• Procter & Gamble• Vertex

U.S. Government• EPA• NIH (NIEHS, NCI, NLM...)• Lawrence Livermore Natl. Lab.• CDC• DoD

Other• Google• Amazon• HP• Agilent• Symyx

Top Users (US)

Page 9: ACS Meeting New Orleans 2013 (CINF)

• CIR node for KNIME, by Talete s.r.l.• Lab Helper app for Windows Phone• Avogadro molecule editor• Jmol/JSmol open-source viewer for chemical structures in 3D• GChem for Google Spreadsheet• Bioclipse (CIR plugin)• Macs in Chemistry• Accelrys Draw

...and educational tools/sites such as:• Jmol/JSmol Virtual Molecular Model Kit• ISU CheMagic• Caltech Library

9

External web services and applications

Page 10: ACS Meeting New Orleans 2013 (CINF)

Examples using CIR

Page 11: ACS Meeting New Orleans 2013 (CINF)

Chemical Identifier Resolver (CIR)C7H6O2APtclcactv03051222202D 0 0.00000 0.00000 15 15 0 0 0 0 0 0 0 0999 V2000 2.8660 -2.0600 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 3.7321 -1.5600 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 3.7321 -0.5600 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 2.8660 -0.0600 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 2.0000 -0.5600 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 2.0000 -1.5600 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 2.8660 0.9400 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 3.7321 1.4400 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 2.0000 1.4400 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 2.8660 -2.6800 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 4.2690 -1.8700 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 4.2690 -0.2500 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 1.4631 -0.2500 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 1.4631 -1.8700 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 3.7321 2.0600 0.0000 H 0 0 0 0 0 0 0 0 0 0 0 0 1 2 2 0 0 0 0 2 3 1 0 0 0 0 3 4 2 0 0 0 0 4 5 1 0 0 0 0 5 6 2 0 0 0 0 1 6 1 0 0 0 0 4 7 1 0 0 0 0 7 8 1 0 0 0 0 7 9 2 0 0 0 0 1 10 1 0 0 0 0 2 11 1 0 0 0 0 3 12 1 0 0 0 0 5 13 1 0 0 0 0 6 14 1 0 0 0 0 8 15 1 0 0 0 0M END$$$$SD file

ChemWriter Editor

Page 12: ACS Meeting New Orleans 2013 (CINF)

Chemical Identifier Resolver (CIR)benzoic acid65-85-0WLN: QVRUnisept BZAAIDS018010Salvo liquidBenzoic acid-ring-UL-14CST5213864BenzoesaeureCHEBI:30746NSC 149benzenecarboxylic acidphenylformic acidBenzoic acid (JP15/USP)Benzoic acid (TN)18102_RIEDELAromatic hydroxy acidBenzoic acid (7CI,8CI,9CI)Benzoic acid [USAN:JAN]W213128_ALDRICH47849_SUPELCOAcide benzoique [French]Acido benzoico [Italian]Benzoate (VAN)Benzoesaeure [German]Benzoic acid (natural)Acide benzoiqueBenzeneformic acidBenzenemethanoic acidBenzoesaeure GKBenzoesaeure GVBenzoic acid, tech.CarboxybenzeneKyselina benzoovaPhenylcarboxylic acidnames

ChemWriter Editor

Page 13: ACS Meeting New Orleans 2013 (CINF)

Chemical Identifier Resolver (CIR)

InChIKey=WPYMKLBDIGXBTP-UHFFFAOYSA-NInChI=1S/C7H6O2/c8-7(9)6-4-2-1-3-5-6/h1-5H,(H,8,9)C1=CC=C(C=C1)C(O)=O

InChIKeyInChI

SMILES

ChemWriter Editor

Page 14: ACS Meeting New Orleans 2013 (CINF)

Chemical Identifier Resolver (CIR)

programmatic URL API:

http://cactus.nci.nih.gov/chemical/structure/”identifier”/”representation”

if a request is not successful: HTTP404 status message

Page 15: ACS Meeting New Orleans 2013 (CINF)

Chemical Identifier Resolver (CIR)

• access by programming libraries/languages (e.g. Python):

• access from Unix shell level (e.g., via wget):

shell > wget -qO - \http://cactus.nci.nih.gov/chemical/structure/tamiflu/cas204255-11-8

from urllib2 import *url = “http://cactus.nci.nih.gov/chemical/structure/tamiflu/cas”resolver = urlopen(url) try:

response = resolver.read() except HTTPError:

raise “your own error handling”print response204255-11-8

Page 16: ACS Meeting New Orleans 2013 (CINF)

Chemical Identifier Resolver (CIR)

http://cactus.nci.nih.gov/chemical/structure/PGZUMBJQJWIWGJ-ONAKXNSWSA-N/cas

204255-11-8 MIME type: text/plain

examples:

http://cactus.nci.nih.gov/chemical/structure/tamiflu/image

MIME type: image/gif

Page 17: ACS Meeting New Orleans 2013 (CINF)

CIR

chemical namesIUPAC names (OPSIN)

CAS numbersSMILES strings

IUPAC InChI/InChIKeysNCI/CADD Identifiers

CACTVS HASHISYNSC number

PubChem SIDZINC Code

ChemSpider IDChemNavigator SID

eMolecule VIDUNII

/smiles/names, /iupac_name/cas/inchi, /stdinchi/inchikey, /stdinchikey/ficts, /ficus, /uuuuu /image/file, /sdf/mw, /monoisotopic_mass /formula/twirl/urls/chemspider_id/pubchem_sid/chemnavigator_sid

“identifier” “representation”

http://cactus.nci.nih.gov/chemcial/structure

CIR

Chemical Identifier Resolver (CIR)

Page 18: ACS Meeting New Orleans 2013 (CINF)

http://cactus.nci.nih.gov/chemical/structure/LFQSCWFLJHTTHZ-UHFFFAOYSA-N/smiles

CCO

http://cactus.nci.nih.gov/chemical/structure/LFQSCWFLJHTTHZ-UHFFFAOYSA/smiles`

CCOCC[OH2+]

http://cactus.nci.nih.gov/chemical/structure/LFQSCWFLJHTTHZ/smiles

C(C(O)([2H])[2H])[2H]CC(O)([2H])[2H]C(CO)([2H])([2H])[2H]CC[17OH]C(CO)[2H][14CH3]COCCO

• resolve Standard InChIKey into full structure representation: Ethanol

(Partial) InChIKey Lookup

Page 19: ACS Meeting New Orleans 2013 (CINF)

Chemical File Representation

• available file format representations:

alc Alchemy formatcdxml CambridgeSoft ChemDraw XML formatcerius MSI Cerius II formatcharmm Chemistry at HARvardMacromolecular Mechanics file formatcif Crystallographic Information Filecml Chemical Markup Languagegjf Gaussian input data filegromacs GROMACS file formathyperchem HyperChem file formatjme Java Molecule Editor format

maestro Schroedinger MacroModelstructure file formatmol Symyx molecule filesybyl2/mol2 Tripos Sybyl MOL2 formatmrv ChemAxon MRV formatpdb Protein Data Banksdf Symyx Structure Data Formatsdf3000 Symyx Structure Data Format 3000sln SYBYL Line Notationsmiles SMILESxyz xyz file format

http://cactus.nci.nih.gov/chemical/structure/Aspirin/file?format=sdf

Page 20: ACS Meeting New Orleans 2013 (CINF)

Chemical Structure Images (GIF, PNG)

http://cactus.nci.nih.gov/chemical/structure/XMWRBQBLMFGWIX-UHFFFAOYSA-N/image?height=300&width=300&bgcolor=black&bondcolor=white

http://cactus.nci.nih.gov/chemical/structure/Aspirin/image?height=200&width=200&symbolfontsize=7&footer="Aspirin"

Buckyball

Page 21: ACS Meeting New Orleans 2013 (CINF)

Chemical Properties

• request molecular weight:

http://cactus.nci.nih.gov/chemical/structure/BSYNRYMUTXBXSQ-UHFFFAOYSA-N/weight

180.1598

/mw molecular weight/formula formula/monoisotopic_mass monoisotopic mass/h_bond_donor_count H bond donor count/h_bond_acceptor_count H bond acceptor count/h_bond_center_count H bond center count/rotor_count number of rotatable bonds/effective_rotor_count number of effectively rotatable bonds/rule_of_5_violation_count number of Rule-of-5 violations/xlogp2 octanol−water partition coefficient XLOGP2

/aromatic compound is aromatic/macrocyclic compound is macrocyclic/heteroatom_count heteroatom count/hydrogen_atom_count H atom count/heavy_atom_count heavy atom count/deprotonable_group_count number of deprotonable groups/protonable_group_count number of protonable groups/ring_count number of rings/ringsys_count number of ringsystems

MIME type: text/plain

Aspirin

Page 22: ACS Meeting New Orleans 2013 (CINF)

• request (alternative) names:

<?xml version="1.0" encoding="UTF-8" ?> <request string=“Aspirin" representation="names">

<data id="1" resolver=“name" string_class=“Name"><item id="1" classification=“pubchem_iupac_name">2-acetyloxybenzoic acid</item><item id="2" classification="pubchem_iupac_openeye_name">2-Acetoxybenzoic acid</item><item id="3" classification="pubchem_generic_registry_name">50-78-2</item><item id="4" classification="pubchem_generic_registry_name">11126-35-5</item><item id="5" classification="pubchem_generic_registry_name">11126-37-7</item><item id="6" classification="pubchem_generic_registry_name">2349-94-2</item><item id="7" classification="pubchem_generic_registry_name">26914-13-6</item><item id="8" classification="pubchem_substance_synonym">NCGC00090977-04</item><item id="9" classification="pubchem_substance_synonym">KBioSS_002272</item><item id="10" classification="pubchem_substance_synonym">SBB015069</item><item id="11" classification="pubchem_substance_synonym">Aspirin</item><item id="12" classification="pubchem_substance_synonym">D00109</item>

[…]

http://cactus.nci.nih.gov/chemical/structure/Aspirin/names/xml

Chemical Name Lookup

Page 23: ACS Meeting New Orleans 2013 (CINF)

example: all chemical names that contain the words “morphine” and “methyl”(name pattern: ‘+morphine +methyl‘):

http://cactus.nci.nih.gov/chemical/structure/+morphine +methyl/stdinchikey/xml?resolver=name_pattern

based on the open sourcefull text search server Sphinx(http://sphinxsearch.com)

• Google-like searches on CIR’s name index (approx. 70 million names)

Chemical Name Pattern Search

Page 24: ACS Meeting New Orleans 2013 (CINF)

<request string="+morphine +methyl" representation="stdinchikey"><data id="1" resolver="name_pattern" notation="Morphine 3-methyl ether">

<item id="1">InChIKey=OROGSEYTTFOCAN-DNJOTXNNSA-N</item></data><data id="2" resolver="name_pattern" notation="6-Methyl-delta(sup 6)-deoxy-morphine">

<item id="1">InChIKey=CUFWYVOFDYVCPM-GGNLRSJOSA-N</item></data><data id="3" resolver="name_pattern" notation="Morphine, dihydro-6-methyl-">

<item id="1">InChIKey=NBKVWIJQJMEQLE-NGTWOADLSA-N</item></data><data id="4" resolver="name_pattern“ notation="6-METHYL-MORPHINE ETHER">

<item id="1">InChIKey=FNAHUZTWOVOCTL-UHFFFAOYSA-N</item></data><data id="5" resolver="name_pattern" notation="Morphine alcoholic methyl ether">

<item id="1">InChIKey=FNAHUZTWOVOCTL-XSSYPUMDSA-N</item></data><data id="6" resolver="name_pattern" notation="N-Methyl morphine chloride">

<item id="1">InChIKey=MJNCZWBHCFTYFU-SCLAZZCHSA-N</item></data><data id="7" resolver="name_pattern" notation="Morphine, 7-hydroxy-6,6-dimethoxy-3-O-methyl-">

<item id="1">InChIKey=URFKRBIESURBKC-UHFFFAOYSA-N</item></data>

</request>

Search name pattern ‘+morphine +methyl’: 7 matching names

Page 25: ACS Meeting New Orleans 2013 (CINF)

Chemical Name Pattern Search

example: chemical names that contain the words “morphine” and “methyl”but not “hydroxyl” (name pattern: ‘+morphine +methyl -hydroxyl‘): http://cactus.nci.nih.gov/chemical/structure/+morphine +methyl -hydroxyl/stdinchikey/xml?resolver=name_pattern

example: chemical names that contain the substring “morphine”somewhere in the name (name pattern: ‘*morphine*‘) http://cactus.nci.nih.gov/chemical/structure/*morphine*/stdinchikey/xml?resolver=name_pattern

example: chemical names that contain a single character “m” and the word “benzene” in a maximum distance of 3 words (finds meta-substituted aromaticcompounds, name pattern: ‘“m benzene”~3‘):http://cactus.nci.nih.gov/chemical/structure/(m benzene)~3/stdinchikey/xml?resolver=name_pattern

6 matching names

45 matching names

22 matching names

Page 26: ACS Meeting New Orleans 2013 (CINF)

NCI/CADD Chemical Structure DataBase CSDB 2010

Page 27: ACS Meeting New Orleans 2013 (CINF)

Chemical Structure Normalization/Identifier

structurenormalization

parentstructure

NCI/CADDIdentifier

hashcodecalculation

E_HASHISY

original structure

record

MolfileSDFSMILESChemDraw cdxPDB

SDFSMILESdatabase

original structure records, parent structures and identifiersare stored in the database

• stepwise process:

Page 28: ACS Meeting New Orleans 2013 (CINF)

• calculation of a set of parent structures with differentsensitivity to chemical features:

structurenormalization

parentstructure

NCI/CADDIdentifier

hashcodecalculation

FICTS

original structure

record

FICuS

uuuuu

FICTS

FICuS

uuuuu

Chemical Structure Normalization/Identifier

E_HASHISY

all steps are performed using CACTVS

Page 29: ACS Meeting New Orleans 2013 (CINF)

NCI/CADD Identifiers (FICTS, FICuS, uuuuu)

HNN NH2

O-

ONa+

6C16DE2351F9FF50-FICTS

NNH NH2

OH

O

9850FD9F9E2B4E25-FICTS

HNN

OH

O

NH2HN

NOH

O

NH2HN

N NHOH

O

E92E4BA2869F3611-FICTS 8A7AD1EB498CC76A-FICTS

E92E4BA2869F3611-FICuS 8A7AD1EB498CC76A-FICuSE5F83F10C5DB080A-FICuS

E5F83F10C5DB080A-FICTS

tautomer 2 salt SRtautomer 1

structure normalization - histidine:

based on CACTVS hashcodes (HASHISY)16-digit hexadecimal number (64-bit unsigned) HN

N NH2

OH

O

9850FD9F9E2B4E25-FICuS 9850FD9F9E2B4E25-FICuS

9850FD9F9E2B4E25-uuuuu 9850FD9F9E2B4E25-uuuuu9850FD9F9E2B4E25-uuuuu9850FD9F9E2B4E25-uuuuu 9850FD9F9E2B4E25-uuuuu

9850FD9F9E2B4E25

Page 30: ACS Meeting New Orleans 2013 (CINF)

• calculation of Standard InChIKey from the union set ofparent structures

structurenormalization

parentstructure

NCI/CADDIdentifier

hashcodecalculationoriginal

structurerecord

FICTS

FICuS

uuuuu

Chemical Structure Normalization/Identifier

E_HASHISY

Standard InChIKeyunion set:

1.03

Page 31: ACS Meeting New Orleans 2013 (CINF)

Chemical Structure Database (CSDB)

• ChemNavigator iResearch Librarycompilation of commercially available screeningcompounds from ~300 international chemistrysuppliers

• PubChem Substance Databaseincluding Open NCI database, EPA DSSTox databases, NIAID HIV database, NIST Webbook, NLM ChemIDplus, ChemSpider, …

• Commercial Sources / othersAsinex, Comgenex, eMolecules, …

ChemNav.iResearch Lib.~56%

PubChem~38%

others

~6%

140 chemical structure databases120 million structure records

84.6 million unique structures by FICuS110 million Standard InChIKeys for lookup

current status: (released March 2010)

Page 32: ACS Meeting New Orleans 2013 (CINF)

NCI/CADD Chemical Structure DataBase CSDB 2013

Page 33: ACS Meeting New Orleans 2013 (CINF)

FICTS ~125.0 million FICuS ~121.4 million uuuuu ~109.0 million

• >270 small-molecule database• >600 database releases (full, incremental, “historic versions”)• 385 million original database records

Chemical Structure Database 2013

unique structure count:

union set: 141.7 million unique structures

Page 34: ACS Meeting New Orleans 2013 (CINF)

InChI/InChIKey (Version 1.04) calculated with four InChI flag sets:

Set 1

Set 2

Set 3

Standard Standard InChIKey

DONOTADDH W0 FIXEDH RECMET NEWPS SPXYZ SAsXYZ Fb Fnud KET 15T

DONOTADDH W0 FIXEDH RECMET NEWPS SPXYZ SAsXYZ Fb Fnud KET 15T

DONOTADDH W0 FIXEDH RECMET NEWPS SPXYZ SAsXYZ Fb Fnud KET 15T

Add H

Add H

Add H

Add H

CACTVS

:

:

:

:

Standard Set, Set 1 & Set 2: addition of hydrogen atoms by CACTVSSet 3: addition of hydrogen atoms by the InChI library

Chemical Structure Database 2013

Page 35: ACS Meeting New Orleans 2013 (CINF)

• calculation of Standard InChIKey

structurenormalization

parentstructure

NCI/CADDIdentifier

hashcodecalculationoriginal

structurerecord

FICTS

FICuS

uuuuu

E_HASHISY

union set:

Standard InChIKey 1.04

Set 1 Set 2 Set 3Standard

Chemical Structure Database 2013

Page 36: ACS Meeting New Orleans 2013 (CINF)

Chemical Structure Database 2013

• database schema is entirely implemented in python/

• supports many different database engines: Oracle, PostreSQL, MySQL

• SQLAlchemy provides:

• the communication layer with the database engine

• creates a object-oriented data model representation of the database to the “python”-side

• table relationships:

• either defined by Foreign Key relationships in the database or specified on python level

• SQLAlchemy creates table joins on the SQL level

Page 37: ACS Meeting New Orleans 2013 (CINF)

structure_table = Table(‘structure’, metadata,Column(‘id’, Integer, primary_key=True, autoincrement=True),Column(‘hash’, Char(16), unique=True,Column(‘smiles’, Text()),schema=schema

)

class Structure(TableRepr, TableInit):__table__ = structure_table

mapper(Structure, structure_table, relationship={‘name’: relationship(Name, backref=backref(‘structure’,primaryjoin=structure_table.c.id=name_table.c.structure_id

})

Chemical Structure Database 2013

• SQLAlchemy table definition

Page 38: ACS Meeting New Orleans 2013 (CINF)

Chemical Structure Database 2013

• Query the database

> s = db.session.query(Structure).filter(Structure.id==1234).one()<object “Structure”>> s.smilesCCO

> q = select([structure_table.c.id,]).where(structure.c.id==1234)> s = q.execute().fetchone()(CCO,)

• if the object-oriented data model representation creates too much overhead, SQLAlchemy supports writing “almost

bare” SQL but still follows the python paradigms

Page 39: ACS Meeting New Orleans 2013 (CINF)

Chemical Structure Database

• index any chemical structures that can be referenced in some way or has a known source

• may also include virtual chemistry or generic structure collections• collect public dataset/databases/structure collections• normalize them to our standards• make them available in our public web interfaces and APIs

(if we are allowed to)• no refusal/deletion of structures – curation is performed by “keep the

bad and tag it as bad”

track chemical space

• Goals

Page 40: ACS Meeting New Orleans 2013 (CINF)

NCI/CADD Chemical Web Apps

Page 41: ACS Meeting New Orleans 2013 (CINF)
Page 42: ACS Meeting New Orleans 2013 (CINF)

NCI/CADD Chemical Web Apps

• implemented with jQuery Mobile (1.3.0)• HTML5• supports web browser on major mobile platforms: iOS, Android,

BlackBerry, WindowsPhone, Windows 8, Palm, Symbian• supports major Desktop web browsers: Google Chrome, Firefox, IE9/10• WAI-ARIA compliant (W3C specification draft describing accessibility

standards of dynamic Web content for people with disabilities)

• services will be optimized for usage on tabled-sized touch screens devices, however, not (yet) for smart-phone sized devices (current development is done on an iPad3)

• all services work on a common platform

Page 43: ACS Meeting New Orleans 2013 (CINF)
Page 44: ACS Meeting New Orleans 2013 (CINF)
Page 45: ACS Meeting New Orleans 2013 (CINF)
Page 46: ACS Meeting New Orleans 2013 (CINF)
Page 47: ACS Meeting New Orleans 2013 (CINF)
Page 48: ACS Meeting New Orleans 2013 (CINF)
Page 49: ACS Meeting New Orleans 2013 (CINF)

chemical structure

prediction of physicochemical properties and activities

Chemical Activity Predictor - GUSAR

Page 50: ACS Meeting New Orleans 2013 (CINF)

characteristics:

chemical structures are represented byQNA descriptorsMNA descriptors

mathematical algorithmunique algorithm of self- consistent regression allows to select the best set of descriptors for a robust and reliable QSAR model.

main developerAlexey Zakharov

Chemical Activity Predictor - GUSAR

GUSAR Software

Page 51: ACS Meeting New Orleans 2013 (CINF)

comparison was performed on the following data sets:

• ligand–enzyme interactions• ligand–receptor interactions• acute toxicity• interaction with drug-metabolism• enzymes

GUSAR Software

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

CoMFA CoMSIA HQSAR EVA 2DCerius2

3DCerius2

GOLPE GUSAR

Accu

racy

(R2

test

)

Chemical Activity Predictor - GUSAR

Page 52: ACS Meeting New Orleans 2013 (CINF)

Chemical Activity Predictor - GUSAR

• QSAR-based models created by GUSAR can be used separatelyfrom the application

• broad spectra of chemical/biological activity and property prediction models for small molecules in development:• physicochemical properties• assessment of toxicity, metabolism and antineoplastic activities• HIV-1-related models

• will be available as Web App and programmatic URL API:

http://cactus.nci.nih.gov/chemical/activity/CCOCC/boiling_point

{in_applicability_domain: True, datatype: ‘float’, value: 42.660}

Page 53: ACS Meeting New Orleans 2013 (CINF)

Chemical Activities

Categories Models Endpoints

PhysicochemicalProperties

PhysicochemicalModels

Boiling pointDensity Flash pointMelting pointSurface tensionThermal conductivityVapor pressureViscosityWater solubilityHIV-1 Integrase (Strand Transfer) InhibitorHIV-1 Reverse Transcriptase Inhibitor

HIV-ModelsBiological Activities

Page 54: ACS Meeting New Orleans 2013 (CINF)
Page 55: ACS Meeting New Orleans 2013 (CINF)
Page 56: ACS Meeting New Orleans 2013 (CINF)
Page 57: ACS Meeting New Orleans 2013 (CINF)
Page 58: ACS Meeting New Orleans 2013 (CINF)

Activity Endpoints

Page 59: ACS Meeting New Orleans 2013 (CINF)

Activity Endpoints

Page 60: ACS Meeting New Orleans 2013 (CINF)

Activity Endpoints

Page 61: ACS Meeting New Orleans 2013 (CINF)

Activity Endpoints

Page 62: ACS Meeting New Orleans 2013 (CINF)

Prediction ResultsGUSAR• value• unit• in applicability domain• quantitative and

qualitative models

Page 63: ACS Meeting New Orleans 2013 (CINF)

Chemical Activity Predictor – GUSAR beta

http://cactus.nci.nih.gov/chemial/apps

Page 64: ACS Meeting New Orleans 2013 (CINF)

Chemical Activity Predictor – GUSAR beta

http://cactus.nci.nih.gov/chemial/apps

Page 65: ACS Meeting New Orleans 2013 (CINF)
Page 66: ACS Meeting New Orleans 2013 (CINF)

Chemical Structure Lookup Service (CSLS)

• first version was released in 2006, development stalled in 2008• new version will be based on CSDB• new release planned for 2013• allows easy lookup of chemical structures within the constituting

databases in CSDB

Page 67: ACS Meeting New Orleans 2013 (CINF)
Page 68: ACS Meeting New Orleans 2013 (CINF)

InChI/InChIKey Resolver

Page 69: ACS Meeting New Orleans 2013 (CINF)

InChI/InChIKey Resolver

“loose coupling”of InChI resolversprovided by differentorganizations

central list of resolvers

each resolvermust provide aspecific protocol.

Page 70: ACS Meeting New Orleans 2013 (CINF)

InChI/InChIKey Resolver

• Evan Bolton (NCBI, NLM, NIH)• Valery Tkachenko (RSC/ChemSpider)• Marc Nicklaus (CADD Group, NCI, NIH)• Steven Bachrach (Trinity University)• Antony Williams (RSC/ChemSpider)• Markus Sitzmann (CADD Group, NCI, NIH)

Page 71: ACS Meeting New Orleans 2013 (CINF)

Chemical Structure Web API

NCI/CADDweb service

NCI/CADD Chemical StructureDataBase (CSDB)

CACTVS

externalweb services

http

ChemicalIdentifierResolver

othersoftwarepackages

Chemical Structure Web API

NCI/CADDweb service

OPSIN

Page 72: ACS Meeting New Orleans 2013 (CINF)

Chemical Structure Web API

NCI/CADDweb service

NCI/CADD Chemical StructureDataBase (CSDB)

CACTVS

externalweb services

http

ChemicalIdentifierResolver

othersoftwarepackages

Chemical Structure Web API

OPSIN

NCI/CADDweb service

GUSAR

Page 73: ACS Meeting New Orleans 2013 (CINF)

http://cactus.nci.nih.gov/blog

Page 74: ACS Meeting New Orleans 2013 (CINF)

NCI/CADD TeamAlexey ZakharovLaura Guasch PàmiesMegan Peach Marc Nicklaus

Xemistry GmbH, GermanyWolf-Dietrich Ihlenfeldt

Acknowledgements

ChemNavigatorScott HuttonTad Hurst

InChI Team

Pubchem

All other database providers

Page 75: ACS Meeting New Orleans 2013 (CINF)

Acknowledgments - Software

CACTVS

Python Web FrameworkChemWriter

Python SQL Library

Javascript library

Peter Ertl (Novartis)

Fulltext Search Engine

Page 76: ACS Meeting New Orleans 2013 (CINF)

http://cactus.nci.nih.gov