acs salt lake city 2009 cinf talk (inchi symposium)

53
Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers InChI/InChIKey vs. NCI/CADD Structure Identifiers: A comparison Markus Sitzmann Computer-Aided Drug Design Group (NCI/CADD), Laboratory of Medicinal Chemistry, NCI-Frederick, NIH, DHHS

Upload: markus-sitzmann

Post on 10-May-2015

627 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: ACS Salt Lake City 2009 CINF Talk (InChI Symposium)

Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers

InChI/InChIKey vs.NCI/CADD Structure Identifiers:A comparison

Markus Sitzmann

Computer-Aided Drug Design Group (NCI/CADD), Laboratory of Medicinal Chemistry, NCI-Frederick, NIH, DHHS

Page 2: ACS Salt Lake City 2009 CINF Talk (InChI Symposium)

Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers

The Adaption and Use of theIUPAC InChI/InChIKey

NCI/CADD IdentifiersInChI/InChIKey

Chemical Structure Lookup Service

FICTS FICuS uuuuuStd. InChI/InChIKey

74 million structure records – 46 million unique structures

Page 3: ACS Salt Lake City 2009 CINF Talk (InChI Symposium)

Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers

• based on hashcodes calculated by the chemoinformatics toolkit CACTVS

• CACTVS hashcodes: represent a chemical structure uniquely as

16-digit hexadecimal number (64-bit unsigned) have a high sensitivity to structural features of a

compound change if connectivity changes

NCI/CADD Structure IdentifiersUnique Representation of Chemical Structures

HNN NH2

OH

O

9850FD9F9E2B4E25

Page 4: ACS Salt Lake City 2009 CINF Talk (InChI Symposium)

Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers

charged form

A3DAE0788050DDE4 3ECEF579D7DF025A

tautomers

isotope“errors”

E92E4BA2869F36118A7AD1EB498CC76Astereoisomers6C16DE2351F9FF50

HNN NH2

OH

O

NNH NH2

OH

O

HNN

OH

O

NH2

HNN

OH

O

NH2

salt

HNN NH2

O-

ONa+

HNN NH3

+O-

O

8F7A1DE5A733F0E0

O

HNN NH2

ONa

60525E1AF41497B6

HNN NH

OH

O

B2FDA68AEDA06DB9

NHN 15NH2

OH

O

9850FD9F9E2B4E25

Page 5: ACS Salt Lake City 2009 CINF Talk (InChI Symposium)

Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers

inputstructure

MDL MolfileMDL SDFSMILESChemDraw cdxPDB

structurenormalization

parentstructure

MDL SDFSMILESdatabase

NCI/CADDIdentifier

hashcodecalculation

NCI/CADD Structure IdentifiersUnique Representation of Chemical Structures

E_HASHISY

Page 6: ACS Salt Lake City 2009 CINF Talk (InChI Symposium)

Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers

• adjustable levels of sensitivity:

NCI/CADD Structure Identifiers

Fragments

sensitive

keep only largestorganic fragment

Isotopes

ignoreisotope labels

sensitive

D

D

D

D

D

D

Charges

uncharge

sensitive

find canonicaltautomer

O O

Stereochemistry

sensitive

COOH

NH2

discard stereoinformation

O-

O

NH3+

OH

O

NH2

un-sensitive un-sensitive un-sensitive un-sensitive

sensitive

O OH

O OH

Tautomers

COOH

HNH2

COOH

NH2

HNa+

O

O-

O

OH

Structure Normalization

un-sensitive

Page 7: ACS Salt Lake City 2009 CINF Talk (InChI Symposium)

Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers

NCI/CADD Structure Identifiers

Fragments Isotopes Charges

sensitive

sensitive

sensitive

D

D

D

D

D

D

O OCOOH

NH2

un-sensitive un-sensitive un-sensitive un-sensitive

O-

O

NH3+

OH

O

NH2

Tautomers Stereochemistry

sensitive

sensitive

O OH

O OH

COOH

HNH2

COOH

NH2

HNa+

O

O-

O

OH

Structure Normalization

Page 8: ACS Salt Lake City 2009 CINF Talk (InChI Symposium)

Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers

NCI/CADD Structure Identifiers

Fragments Isotopes Charges

sensitive

sensitive

sensitive

D

D

D

D

D

D

O OCOOH

NH2

FF II CC

FICTS identifier: representation of the exact drawing

un-sensitive un-sensitive un-sensitive un-sensitive un-sensitive

TT

O-

O

NH3+

OH

O

NH2

≠ ≠ ≠

Tautomers Stereochemistry

sensitive

sensitive

O OH

O OH

COOH

HNH2

COOH

NH2

H

SS

Na+

O

O-

O

OH

=

=

Structure Normalization

Page 9: ACS Salt Lake City 2009 CINF Talk (InChI Symposium)

Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers

NCI/CADD Structure Identifiers

Fragments Isotopes Charges

sensitive

sensitive

sensitive

D

D

D

D

D

D

O OCOOH

NH2

FF II CC

FICuS identifier: comes closest to how a chemist perceives a compound

un-sensitive un-sensitive un-sensitive un-sensitive un-sensitive

uu

O-

O

NH3+

OH

O

NH2

≠≠ ≠ ≠

Tautomers Stereochemistry

sensitive

sensitive

O OH

O OH

COOH

HNH2

COOH

NH2

H=

= ≠

SS

Na+

O

O-

O

OH

Structure Normalization

Page 10: ACS Salt Lake City 2009 CINF Talk (InChI Symposium)

Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers

NCI/CADD Structure Identifier

Fragments Isotopes Charges Tautomers Stereochemistry

Na+

sensitive

sensitive

sensitive

sensitive

sensitive

O

O-

D

D

D

D

D

D

O-

O

NH3+

O OH

O OH

COOH

HNH2

COOH

NH2

H

O

OH

O OCOOH

NH2OH

O

NH2

=

=== = = =

=

uuuuu identifier: closely related forms of the same compound

uu uuuuuuuu

un-sensitive un-sensitive un-sensitive un-sensitive un-sensitive

Structure Normalization

Page 11: ACS Salt Lake City 2009 CINF Talk (InChI Symposium)

Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers

NCI/CADD Structure Identifier

correct structure:add hydrogen atomscorrect functional groupscorrect metal atom

bonds

inputstructure

normalize or discardstereo

informationdefine canonical

tautomer

discard isotope labels

d

Structure Normalization

get largest fragment & uncharge:delete complex centerget largest organic fragmentdelete radical centeruncharge structure

uuuuu

uuuuS

uuuTu

uuuTS

FICuu

FICuS

FICTS

FICTu

n

n

n

n

d

d

d

define canonicalresonance form/

protonation state

parent structures

Page 12: ACS Salt Lake City 2009 CINF Talk (InChI Symposium)

Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers

NCI/CADD Structure Identifier

9850FD9F9E2B4E25-FICTS-01-57 9850FD9F9E2B4E25-FICuS-01-789850FD9F9E2B4E25-uuuuu-01-27

<CACTVS hashcode (E_HASHISY)>-<tag>-<version>-<checksum>

HNN NH2

OH

O

Page 13: ACS Salt Lake City 2009 CINF Talk (InChI Symposium)

Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers

A3DAE0788050DDE4-FICTS E5F83F10C5DB080A-FICTS

B2FDA68AEDA06DB9-FICTS

9850FD9F9E2B4E25-FICTS

E5F83F10C5DB080A-FICTS

E92E4BA2869F3611-FICTS8A7AD1EB498CC76A-FICTS6C16DE2351F9FF50-FICTS

HNN NH2

OH

O

NNH NH2

OH

O

HNN

OH

O

NH2

HNN

OH

O

NH2

HNN NH2

O-

ONa+

HNN NH3

+O-

O

O

HNN NH2

ONa

HNN NH

OH

ONH

N 15NH2

OH

O

9850FD9F9E2B4E25-FICTS

charged form

tautomers

isotope

salt

stereoisomers

FICTS

“errors”

Page 14: ACS Salt Lake City 2009 CINF Talk (InChI Symposium)

Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers

A3DAE0788050DDE4-FICuS E5F83F10C5DB080A-FICuS

B2FDA68AEDA06DB9-FICuS

9850FD9F9E2B4E25-FICuS

E5F83F10C5DB080A-FICuS

E92E4BA2869F3611-FICuS8A7AD1EB498CC76A-FICuS9850FD9F9E2B4E25-FICuS

HNN NH2

OH

O

NNH NH2

OH

O

HNN

OH

O

NH2

HNN

OH

O

NH2

HNN NH2

O-

ONa+

HNN NH3

+O-

O

O

HNN NH2

ONa

HNN NH

OH

ONH

N 15NH2

OH

O

9850FD9F9E2B4E25-FICuS

charged form

tautomers

isotope

salt

stereoisomers

FICuS

“errors”

Page 15: ACS Salt Lake City 2009 CINF Talk (InChI Symposium)

Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers

9850FD9F9E2B4E25-uuuuu9850FD9F9E2B4E25-uuuuu

9850FD9F9E2B4E25-uuuuu

9850FD9F9E2B4E25-FICuS

9850FD9F9E2B4E25-uuuuu

9850FD9F9E2B4E25-uuuuu9850FD9F9E2B4E25-uuuuu9850FD9F9E2B4E25-uuuuu

HNN NH2

OH

O

NNH NH2

OH

O

HNN

OH

O

NH2

HNN

OH

O

NH2

HNN NH2

O-

ONa+

HNN NH3

+O-

O

O

HNN NH2

ONa

HNN NH

OH

ONH

N 15NH2

OH

O

9850FD9F9E2B4E25-uuuuu

charged form

tautomers

isotope

stereoisomers

salt

uuuuu

“errors”

Page 16: ACS Salt Lake City 2009 CINF Talk (InChI Symposium)

Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers

HNDVDQJCIGZPNO-UHFFFAOYSA-N

HNDVDQJCIGZPNO-CDYZYAPPSA-N

HNDVDQJCIGZPNO-RXMQYKEDSA-N HNDVDQJCIGZPNO-YFKPBYRVSA-NHNDVDQJCIGZPNO-UHFFFAOYSA-N

HNN NH2

OH

O

NNH NH2

OH

O

HNN

OH

O

NH2

HNN

OH

O

NH2

HNN NH2

O-

ONa+

HNN NH3

+O-

O

O

HNN NH2

ONa

HNN NH

OH

ONH

N 15NH2

OH

O

HNDVDQJCIGZPNO-UHFFFAOYSA-N

charged form

tautomers

isotope

stereoisomers

salt

Std. InChIKey

“errors”

HNDVDQJCIGZPNO-UHFFFAOYSA-N

UHPNKBYGGMJTIM-UHFFFAOYSA-M

UHPNKBYGGMJTIM-UHFFFAOYSA-M

Page 17: ACS Salt Lake City 2009 CINF Talk (InChI Symposium)

Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers

Structure Normalization

Tautomers

canonicaltautomer

?

O

OOH

O

OOH

O

OO

Page 18: ACS Salt Lake City 2009 CINF Talk (InChI Symposium)

Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers

• CACTVS: generation of all formal tautomers for a given organic compound (prototropic tautomerism)

• rule set of 21 transforms encoded as (CACTVS-extended) SMIRKS

• types of tautomerism covered:

TautomersStructure Normalization

1.3, 1.5 keto/enol imine/enamine imine/amine lactam/lactim 1.3, 1.5, 1.7, 1.11 hydrogen atom shift on (aromatic) heteroatoms keten/ynol nitro/aci-nitro nitroso/oxime special cases: cyanic/iso-cyanic acid, phosphonic acid,

formamidinesulfonic acid, isocyanide, furanones and more …

Page 19: ACS Salt Lake City 2009 CINF Talk (InChI Symposium)

Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers

TautomersStructure Normalization

transform: 1.3 keto-enol

[O,S,Se,Te;X1:1]=[Cx1:2][CX4R{0-2}:3][#1:4]>>[#1:4][O,S,Se,Te;X2:1][Cx1,cx1:2]=[C,cx1,cx0:3]

transform: 1.3 heteroatom H shift

[N,n,S,s,O,o,Se,Te:1]=[NX2,nX2,C,c,P,p:2][N,n,S,O,Se,Te:3][#1:4]>>[#1:4][N,n,S,O,Se,Te:1][NX2,nX2,C,c,P,p:2]=[N,n,S,s,O,o,Se,Te:3]

transform: 1.5 heteroatom H shift

[nX2,NX2,S,O,Se,Te:1]=[C,c,nX2,NX2:6][C,c:5]=[C,c,nX2:2][N,n,S,s,O,o,Se,Te:3][#1:4]>>[#1:4][N,n,S,O,Se,Te:1][C,c,nX2,NX2:6]=[C,c:5][C,c,nX2:2]=[NX2,S,O,Se,Te:3]

• 21 SMIRKS transforms, examples:

Page 20: ACS Salt Lake City 2009 CINF Talk (InChI Symposium)

Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers

N

NH

NH

N

O

H2N

N

NH

N

HN

O

H2N

N

NH

N

N

OH

H2N

HN

N NH

N

O

H2N

N

N NH

N

OH

H2N

HN

N N

HN

O

H2N

N

N N

HN

OH

H2N

HN

N N

N

OH

H2N

HN

NH

NH

N

O

HN

N

NH

NH

N

OH

HN

HN

NH

N

HN

O

HN

N

NH

N

HN

OH

HN

HN

NH

N

N

OH

HN

HN

N NH

N

OH

HN

HN

N N

HN

OH

HN

TautomersStructure Normalization

A6199E68A788F2F5-FICTS 959B273B619C709F-FICTS

61248C4A7D045A47-FICTS

675R4FCC50F45026-FICTS

0B345B47F6625113-FICTS

181CA9BCE3EF47F4-FICTS

1AD375920BE60DAD-FICTS

67196F0B20B1D934-FICTS

BCCDA7D0CDACF120-FICTS CE8F480C11DBFC4F-FICTS

D46A1E6500B06AB6-FICTS

D979CF9770AC0BA5-FICTS

56FFE8B5619FB01-FICTS F802E527EC5C61BF-FICTS EF060DA9D97091DE-FICTS

BCCDA7D0CDACF120-FICuS

guanine

UYTPUPDQBNUYGX-UHFFFAOYSA-N

Page 21: ACS Salt Lake City 2009 CINF Talk (InChI Symposium)

Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers

Tautomerism & Stereochemistry

O Z

O E

methyl propenyl ketone

Structure Normalization

Page 22: ACS Salt Lake City 2009 CINF Talk (InChI Symposium)

Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers

O Z

O E

OH

tautomer

tautomer

methyl propenyl ketone

Structure Normalization

Tautomerism & Stereochemistry

Page 23: ACS Salt Lake City 2009 CINF Talk (InChI Symposium)

Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers

O Z

O E

OH

O

76D03F08ACDF6C0C-FICuS

FICUS disregards stereo-chemistry on double bonds if the double bond is notlocated during tautomer generation.

tautomer

tautomer

methyl propenyl ketone

InChI/InChIKey - NCI/CADD Identifier comparison

Tautomerism & Stereochemistry

Page 24: ACS Salt Lake City 2009 CINF Talk (InChI Symposium)

Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers

O Z

O E

OH

O

76D03F08ACDF6C0C-FICuS

FICUS disregards stereo-chemistry on double bonds if the double bond is notlocated during tautomer generation.

tautomer

InChI=1S/C5H8O/c1-3-4-5(2)6/h3-4H,1-2H3/b4-3+

LABTWGUMFABVFG-ONEGZZNKSA-N

InChI=1S/C5H8O/c1-3-4-5(2)6/h3-4,6H,1H2,2H3/b5-4-

LYGWZVOQSCPYDG-PLNGDYQASA-N

InChI=1S/C5H8O/c1-3-4-5(2)6/h3-4H,1-2H3/b4-3-

LABTWGUMFABVFG-ARJAWSKDSA-N

tautomer

methyl propenyl ketone

InChI/InChIKey - NCI/CADD Identifier comparison

Tautomerism & StereochemistryInChI=1S/C5H8O/c1-3-4-5(2)6/h3-4H,1-2H3

LABTWGUMFABVFG-UHFFFAOYSA-N

Page 25: ACS Salt Lake City 2009 CINF Talk (InChI Symposium)

Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers

O Z

O E

OH

821D8C17ACE5040E-FICTS

6EB4AA2BAA11965F-FICTS

1677645190718885-FICTS

tautomer

tautomer

O

76D03F08ACDF6C0C-FICTS

methyl propenyl ketone

FICTS “sees” four different structures

InChI/InChIKey - NCI/CADD Identifier comparison

Tautomerism & Stereochemistry

Page 26: ACS Salt Lake City 2009 CINF Talk (InChI Symposium)

Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers

Charges in Resonance SystemsStructure Normalization

F3A27F03AE77A722

F3A27F03AE77A722

62FADCB01F197FC9

canonicalresonancestructure?

uncharge

uncharge

problem!

2E011EE4519F7920

NNH

NNH

H

NN

H NN

HH

different protonation states

Page 27: ACS Salt Lake City 2009 CINF Talk (InChI Symposium)

Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers

Structure Normalization

• generation of all formal resonance structures for a given (charged) organic compound

• rule set of 14 transforms encoded as (CACTVS-extended) SMIRKS

shifting of charges:5 rules

recombination of charges:5 rules

separation of charges:4 rules

ON

O

ON

O

ON

O

ON

O

ON

O

ON

O

Charges in Resonance Systems

Page 28: ACS Salt Lake City 2009 CINF Talk (InChI Symposium)

Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers

Structure Normalization

(no plausible unpolarized resonance structure can be drawn)

münchnones:

N

OO

N

OO

N

OO

N

OO

N

OO

N

OO

N

OO

N

OO

1.2 shift

1.2 recombination

1.2 recombination

separation(pentavalent N atom) 1.3 shift

1.3 shift

1.3 recombination 1.3 shift 1.3 shift1.3 shift1.3 shift

Charges in Resonance Systems

IUYUGWCTOLFFCL-UHFFFAOYSA-N F68AC07DE0D3379F-FICuS

Page 29: ACS Salt Lake City 2009 CINF Talk (InChI Symposium)

Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers

• PubChem database(including Open NCI database, EPA DSSTox databases, NIAID HIVdatabases, NIST Webbook, NLM ChemIDplus, ChemSpider …)

• ChemNavigator iResearch Library(compilation of commercially available

screening compounds from

~250 international chemistry suppliers)

• Commercial Sources / Others

(Asinex, Comgenex, …)

»Chemical Structure Lookup Service« Database

74 million structure records (~46 million unique structures)

InChI/InChIKey - NCI/CADD Identifier comparison

ChemNav.iResearch Lib. ~43%

PubChem~47%

Others

~10%

Page 30: ACS Salt Lake City 2009 CINF Talk (InChI Symposium)

Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers

• structure records registered in CSLS: 74.2 million

successful calculation of:Standard InChI/InChIKey: 73.8 million recordsNCI/CADD Structure Identifiers: 73.7 million records

• compound sets (unique chemical structure sets):

Standard InChI/InChIKey:FICTS IdentifierFICuS IdentifierStandard InChIKey (first block)uuuuu Identifier

48,027,94048,023,83546,715,52143,055,58941,671,010

Standard InChI/InChIKeys where calculated by stdinchi-1 (Linux i-386 executable) from the original SD file records

Unique Structure CountsInChI/InChIKey - NCI/CADD Identifier comparison

Page 31: ACS Salt Lake City 2009 CINF Talk (InChI Symposium)

Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers

original structure record set(74.2 million)

FICuS compound set(46.7 million unique)

Standard InchI/InChIKey setcalculated by stdinchi-1

(73.8 million, 48.0 million unique)

Detailed ComparisonInChI/InChIKey - NCI/CADD Identifier comparison

Page 32: ACS Salt Lake City 2009 CINF Talk (InChI Symposium)

Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers

original structure record set(74.2 million)

FICuS compound set(46.7 million unique)

Standard InchI/InChIKey setcalculated by stdinchi-1

(73.8 million, 48.0 million unique)

Detailed Comparison

1 conflicts?

InChI/InChIKey - NCI/CADD Identifier comparison

Page 33: ACS Salt Lake City 2009 CINF Talk (InChI Symposium)

Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers

original structure record set(74.2 million)

FICuS compound set(46.7 million unique)

Standard InchI/InChIKey setcalculated by stdinchi-1

(73.8 million, 48.0 million unique)

Detailed Comparison

Standard InChI/InChIKeycalculated by CACTVS

from FICuS compound structure 1 conflicts?

InChI/InChIKey - NCI/CADD Identifier comparison

same InChI/InChIKey?2

Page 34: ACS Salt Lake City 2009 CINF Talk (InChI Symposium)

Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers

1no conflicts between Std. InChI/InChIKey and FICuS

Detailed ComparisonInChI/InChIKey - NCI/CADD Identifier comparison

FICuS linked to a single InChI/InChIKey

both linked to a single structure record

both linked to multiple structure records

62.3

34.4

27.9

all structure records

(46.9%)

(38.0%)

73.7

(84.5%)

structure records(million records)

Page 35: ACS Salt Lake City 2009 CINF Talk (InChI Symposium)

Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers

1conflicts between Std. InChI/InChIKey and FICuS

Detailed ComparisonInChI/InChIKey - NCI/CADD Identifier comparison

structure records(million records)

all structure records

FICuS is linked to multiple InChI/InChIKeys or vice versa

one FICuS is linked to multiple InChI/InChIKeys

one InChI/InChIKey is linked to multiple FICuS

10.4

3.6

6.8

(4.6%)

(9.3%)

(84.5%)

73.7

Page 36: ACS Salt Lake City 2009 CINF Talk (InChI Symposium)

Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers

1conflicts between Std. InChI/InChIKey and FICuS

Detailed ComparisonInChI/InChIKey - NCI/CADD Identifier comparison

structure records(million records)

all structure records

FICuS is linked to multiple InChI/InChIKeys or vice versa

one FICuS is linked to multiple InChI/InChIKeys

one InChI/InChIKey is linked to multiple FICuS

10.4

3.6

6.8

(4.6%)

(9.3%)

(84.5%)

73.7

number of InChIKeys first block0.9

number of InChIKeys first block 2.3

(1.2%)

(3.1%)

Page 37: ACS Salt Lake City 2009 CINF Talk (InChI Symposium)

Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers

Detailed Comparison

2

FICuS

FICTS

uuuuu

46.7

48.0

41.6

6.4 (13.7%)

3.8 (7.9%)

11.9 (28.6%)

compounds (unique structures)(million records)

all compounds

73.7 9.3

4.6

(29.7%)21.9

(6.2%)

(12.7%)

structure records(million records)

all records

InChI/InChIKey - NCI/CADD Identifier comparison

same InChI/InChIKey?

InChI changes InChI changes

Page 38: ACS Salt Lake City 2009 CINF Talk (InChI Symposium)

Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers

Detailed Comparison

FICuS

FICTS

uuuuu

46.7

48.0

41.6

6.4 (13.7%)

3.8 (7.9%)

11.9 (28.6%)

compounds (unique structures)(million records)

all compounds

structure records(million records)

all records

InChI/InChIKey - NCI/CADD Identifier comparison

3.2 6.3(7.6%) (8.4%)vs. InChIKey first block

InChI changes InChI changes

2same InChI/InChIKey?

73.7 9.3

4.6

(29.7%)21.9

(6.2%)

(12.7%)

Page 39: ACS Salt Lake City 2009 CINF Talk (InChI Symposium)

Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers

(formal) tautomer count > 1(formal) tautomer count > 3(formal) tautomer count > 10full stereocontains metal atomsmetal complexessalthas resonance chargesinorganic

compound classification

14.5%

18.5%

28.9%

16.9%

34.5%

52.1%

18.6%

52.1%

33.9%

56.4%25.4%5.5%

25.7%0.8%0.2%1.0%0.2%0.1%

Detailed ComparisonInChI/InChIKey - NCI/CADD Identifier comparison

occurrence inFICuS set

occurrence in FICuS subset

(InChI changes)

Page 40: ACS Salt Lake City 2009 CINF Talk (InChI Symposium)

Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers

FICuS: 12 different structure recordslinked to this structure

Std. InChI/InChIKey (stdinchi-1): calculates 3 different strings/keys for these12 structure records (all have the same connectivity layer/first block)

all of these 3 StdInChI/InChIKey differ from the StdInChI/InChIKey calculated after FICuS normalization (including connectivity layer/first block)

InChI/InChIKey - NCI/CADD Identifier comparison

HN

O

N

NH

O

O

ChemBlock A3422/0145215

Page 41: ACS Salt Lake City 2009 CINF Talk (InChI Symposium)

Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers

HN

O

N

NH

O

O

N

O

N O

O NH

ZE

InChI/InChIKey - NCI/CADD Identifier comparison

HN

O

N

NH

O

O

ChemBlock A3422/0145215

Page 42: ACS Salt Lake City 2009 CINF Talk (InChI Symposium)

Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers

HN

O

N

NH

O

O

N

O

N O

O NH

ZE

tautomer:

InChI/InChIKey - NCI/CADD Identifier comparison

HN

O

N

NH

O

O

ChemBlock A3422/0145215

N

O

N

NH

O

O

Page 43: ACS Salt Lake City 2009 CINF Talk (InChI Symposium)

Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers

HN

O

N

NH

O

O

N

O

N O

O NH

ZE

tautomer:

HN

O

N O

O NH

tautomericinterconversion?

InChI/InChIKey - NCI/CADD Identifier comparison

HN

O

N

NH

O

O

ChemBlock A3422/0145215

N

O

N

NH

O

O

Page 44: ACS Salt Lake City 2009 CINF Talk (InChI Symposium)

Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers

HN

O

N

NH

O

O

N

O

N O

O NH

ZE

tautomer:

HN

O

N O

O NH

tautomericinterconversion?

tautomericinterconversion?

SR

InChI/InChIKey - NCI/CADD Identifier comparison

HN

O

N

NH

O

O

N

O

N

NH

O

O

N

O

N

NH

O

O

ChemBlock A3422/0145215

N

O

N

NH

O

O

Page 45: ACS Salt Lake City 2009 CINF Talk (InChI Symposium)

Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers

HN

O

N

NH

O

O

N

O

N O

O NH

ZE

tautomer:

HN

O

N O

O NH

tautomericinterconversion?

tautomericinterconversion?

InChI/InChIKey - NCI/CADD Identifier comparison

HN

O

N

NH

O

O

ChemBlock A3422/0145215

N

O

N

NH

O

O

SR

N

O

N

NH

O

O

N

O

N

NH

O

O

Page 46: ACS Salt Lake City 2009 CINF Talk (InChI Symposium)

Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers

HN

O

N

NH

O

O

N

O

N O

O NH

ZE

tautomer:

HN

O

N O

O NH

tautomericinterconversion?

tautomericinterconversion?

SR

InChI/InChIKey - NCI/CADD Identifier comparison

HN

O

N

NH

O

O

N

O

N

NH

O

O

N

O

N

NH

O

O

ChemBlock A3422/0145215

N

O

N

NH

O

O

How many structures?

ZINC04685909

ChemBlock A3422/0145215ChemNavigator 47748165NIST MS-Lib 1967005690

ChemNavigator 34903393

ChemNavigator 65635274

Page 47: ACS Salt Lake City 2009 CINF Talk (InChI Symposium)

Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers

HN

O

N

NH

O

O

N

O

N O

O NH

ZE

tautomer:

HN

O

N O

O NH

tautomericinterconversion?

tautomericinterconversion?

SR

InChI/InChIKey - NCI/CADD Identifier comparison

HN

O

N

NH

O

O

N

O

N

NH

O

O

N

O

N

NH

O

O

ChemBlock A3422/0145215

N

O

N

NH

O

O

How many structures?

InChIKey A

InChIKey B

InChIKey C

same connectivity layer/block

FICuS

parent structure

Page 48: ACS Salt Lake City 2009 CINF Talk (InChI Symposium)

Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers

DithiazinineInChI/InChIKey - NCI/CADD Identifier comparison

S

N

SN

I

original structure

Page 49: ACS Salt Lake City 2009 CINF Talk (InChI Symposium)

Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers

DithiazinineInChI/InChIKey - NCI/CADD Identifier comparison

S

N

SN

I

S

N

SN

I

original structure

best representation

Page 50: ACS Salt Lake City 2009 CINF Talk (InChI Symposium)

Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers

DithiazinineInChI/InChIKey - NCI/CADD Identifier comparison

S

N

SN

I

S

N

SN

HI

H

H

H

H

H

S

N

SN

I

H

H

H

S

N

SN

I

original structure

best representation

InChI

FICuS

ZE

E

ZE

Page 51: ACS Salt Lake City 2009 CINF Talk (InChI Symposium)

Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers

The Adaption and Use of theIUPAC InChI/InChIKey

NCI/CADD IdentifiersInChI/InChIKey

Chemical Structure Lookup Service

FICTS FICuS uuuuuStd. InChI/InChIKey

74 million structure records – 46 million unique structures

http://cactus.nci.nih.gov/lookup

Page 52: ACS Salt Lake City 2009 CINF Talk (InChI Symposium)

Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers

Web Service

Chemical Structure REST Service (beta)

http://cactus.nci.nih.gov/chemical/structure/{identifier}/{method}

http://cactus.nci.nih.gov/chemical/structure/InChIKey=LFQSCWFLJHTTHZ-UHFFFAOYSA-N/smileshttp://cactus.nci.nih.gov/chemical/structure/InChIKey=LFQSCWFLJHTTHZ-UHFFFAOYSA-N/nameshttp://cactus.nci.nih.gov/chemical/structure/InChIKey=LFQSCWFLJHTTHZ-UHFFFAOYSA-N/ficushttp://cactus.nci.nih.gov/chemical/structure/InChIKey=LFQSCWFLJHTTHZ-UHFFFAOYSA-N/stdinchihttp://cactus.nci.nih.gov/chemical/structure/InChIKey=LFQSCWFLJHTTHZ-UHFFFAOYSA-N/image

http://cactus.nci.nih.gov/chemical/structure/ethanol/stdinchikeyhttp://cactus.nci.nih.gov/chemical/structure/64-17-5/stdinchikey

URL scheme:

returns plain text/gif imageif the structure identifier is not resolvable: http 404 status code

Page 53: ACS Salt Lake City 2009 CINF Talk (InChI Symposium)

Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers

Acknowledgments

ChemNavigator

Scott Hutton

Tad Hurst

CADD Group, LMC, NCI

Marc Nicklaus

Igor V. Filippov

CACTVS, Xemistry GmbH

Wolf-Dietrich IhlenfeldtThanks to all database providers

http://cactus.nci.nih.gov

Our web site: