acs salt lake city 2009 cinf talk (inchi symposium)
TRANSCRIPT
Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers
InChI/InChIKey vs.NCI/CADD Structure Identifiers:A comparison
Markus Sitzmann
Computer-Aided Drug Design Group (NCI/CADD), Laboratory of Medicinal Chemistry, NCI-Frederick, NIH, DHHS
Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers
The Adaption and Use of theIUPAC InChI/InChIKey
NCI/CADD IdentifiersInChI/InChIKey
Chemical Structure Lookup Service
FICTS FICuS uuuuuStd. InChI/InChIKey
74 million structure records – 46 million unique structures
Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers
• based on hashcodes calculated by the chemoinformatics toolkit CACTVS
• CACTVS hashcodes: represent a chemical structure uniquely as
16-digit hexadecimal number (64-bit unsigned) have a high sensitivity to structural features of a
compound change if connectivity changes
NCI/CADD Structure IdentifiersUnique Representation of Chemical Structures
HNN NH2
OH
O
9850FD9F9E2B4E25
Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers
charged form
A3DAE0788050DDE4 3ECEF579D7DF025A
tautomers
isotope“errors”
E92E4BA2869F36118A7AD1EB498CC76Astereoisomers6C16DE2351F9FF50
HNN NH2
OH
O
NNH NH2
OH
O
HNN
OH
O
NH2
HNN
OH
O
NH2
salt
HNN NH2
O-
ONa+
HNN NH3
+O-
O
8F7A1DE5A733F0E0
O
HNN NH2
ONa
60525E1AF41497B6
HNN NH
OH
O
B2FDA68AEDA06DB9
NHN 15NH2
OH
O
9850FD9F9E2B4E25
Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers
inputstructure
MDL MolfileMDL SDFSMILESChemDraw cdxPDB
structurenormalization
parentstructure
MDL SDFSMILESdatabase
NCI/CADDIdentifier
hashcodecalculation
NCI/CADD Structure IdentifiersUnique Representation of Chemical Structures
E_HASHISY
Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers
• adjustable levels of sensitivity:
NCI/CADD Structure Identifiers
Fragments
sensitive
keep only largestorganic fragment
Isotopes
ignoreisotope labels
sensitive
D
D
D
D
D
D
Charges
uncharge
sensitive
find canonicaltautomer
O O
Stereochemistry
sensitive
COOH
NH2
discard stereoinformation
O-
O
NH3+
OH
O
NH2
un-sensitive un-sensitive un-sensitive un-sensitive
sensitive
O OH
O OH
Tautomers
COOH
HNH2
COOH
NH2
HNa+
O
O-
O
OH
Structure Normalization
un-sensitive
Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers
NCI/CADD Structure Identifiers
Fragments Isotopes Charges
sensitive
sensitive
sensitive
D
D
D
D
D
D
O OCOOH
NH2
un-sensitive un-sensitive un-sensitive un-sensitive
O-
O
NH3+
OH
O
NH2
Tautomers Stereochemistry
sensitive
sensitive
O OH
O OH
COOH
HNH2
COOH
NH2
HNa+
O
O-
O
OH
Structure Normalization
Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers
NCI/CADD Structure Identifiers
Fragments Isotopes Charges
sensitive
sensitive
sensitive
D
D
D
D
D
D
O OCOOH
NH2
FF II CC
FICTS identifier: representation of the exact drawing
un-sensitive un-sensitive un-sensitive un-sensitive un-sensitive
TT
O-
O
NH3+
OH
O
NH2
≠ ≠ ≠
Tautomers Stereochemistry
sensitive
sensitive
O OH
O OH
COOH
HNH2
COOH
NH2
H
≠
≠
SS
Na+
O
O-
O
OH
=
=
≠
≠
Structure Normalization
Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers
NCI/CADD Structure Identifiers
Fragments Isotopes Charges
sensitive
sensitive
sensitive
D
D
D
D
D
D
O OCOOH
NH2
FF II CC
FICuS identifier: comes closest to how a chemist perceives a compound
un-sensitive un-sensitive un-sensitive un-sensitive un-sensitive
uu
O-
O
NH3+
OH
O
NH2
≠≠ ≠ ≠
Tautomers Stereochemistry
sensitive
sensitive
O OH
O OH
COOH
HNH2
COOH
NH2
H=
= ≠
≠
SS
Na+
O
O-
O
OH
Structure Normalization
Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers
NCI/CADD Structure Identifier
Fragments Isotopes Charges Tautomers Stereochemistry
Na+
sensitive
sensitive
sensitive
sensitive
sensitive
O
O-
D
D
D
D
D
D
O-
O
NH3+
O OH
O OH
COOH
HNH2
COOH
NH2
H
O
OH
O OCOOH
NH2OH
O
NH2
=
=== = = =
=
uuuuu identifier: closely related forms of the same compound
uu uuuuuuuu
un-sensitive un-sensitive un-sensitive un-sensitive un-sensitive
Structure Normalization
Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers
NCI/CADD Structure Identifier
correct structure:add hydrogen atomscorrect functional groupscorrect metal atom
bonds
inputstructure
normalize or discardstereo
informationdefine canonical
tautomer
discard isotope labels
d
Structure Normalization
get largest fragment & uncharge:delete complex centerget largest organic fragmentdelete radical centeruncharge structure
uuuuu
uuuuS
uuuTu
uuuTS
FICuu
FICuS
FICTS
FICTu
n
n
n
n
d
d
d
define canonicalresonance form/
protonation state
parent structures
Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers
NCI/CADD Structure Identifier
9850FD9F9E2B4E25-FICTS-01-57 9850FD9F9E2B4E25-FICuS-01-789850FD9F9E2B4E25-uuuuu-01-27
<CACTVS hashcode (E_HASHISY)>-<tag>-<version>-<checksum>
HNN NH2
OH
O
Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers
A3DAE0788050DDE4-FICTS E5F83F10C5DB080A-FICTS
B2FDA68AEDA06DB9-FICTS
9850FD9F9E2B4E25-FICTS
E5F83F10C5DB080A-FICTS
E92E4BA2869F3611-FICTS8A7AD1EB498CC76A-FICTS6C16DE2351F9FF50-FICTS
HNN NH2
OH
O
NNH NH2
OH
O
HNN
OH
O
NH2
HNN
OH
O
NH2
HNN NH2
O-
ONa+
HNN NH3
+O-
O
O
HNN NH2
ONa
HNN NH
OH
ONH
N 15NH2
OH
O
9850FD9F9E2B4E25-FICTS
charged form
tautomers
isotope
salt
stereoisomers
FICTS
“errors”
Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers
A3DAE0788050DDE4-FICuS E5F83F10C5DB080A-FICuS
B2FDA68AEDA06DB9-FICuS
9850FD9F9E2B4E25-FICuS
E5F83F10C5DB080A-FICuS
E92E4BA2869F3611-FICuS8A7AD1EB498CC76A-FICuS9850FD9F9E2B4E25-FICuS
HNN NH2
OH
O
NNH NH2
OH
O
HNN
OH
O
NH2
HNN
OH
O
NH2
HNN NH2
O-
ONa+
HNN NH3
+O-
O
O
HNN NH2
ONa
HNN NH
OH
ONH
N 15NH2
OH
O
9850FD9F9E2B4E25-FICuS
charged form
tautomers
isotope
salt
stereoisomers
FICuS
“errors”
Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers
9850FD9F9E2B4E25-uuuuu9850FD9F9E2B4E25-uuuuu
9850FD9F9E2B4E25-uuuuu
9850FD9F9E2B4E25-FICuS
9850FD9F9E2B4E25-uuuuu
9850FD9F9E2B4E25-uuuuu9850FD9F9E2B4E25-uuuuu9850FD9F9E2B4E25-uuuuu
HNN NH2
OH
O
NNH NH2
OH
O
HNN
OH
O
NH2
HNN
OH
O
NH2
HNN NH2
O-
ONa+
HNN NH3
+O-
O
O
HNN NH2
ONa
HNN NH
OH
ONH
N 15NH2
OH
O
9850FD9F9E2B4E25-uuuuu
charged form
tautomers
isotope
stereoisomers
salt
uuuuu
“errors”
Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers
HNDVDQJCIGZPNO-UHFFFAOYSA-N
HNDVDQJCIGZPNO-CDYZYAPPSA-N
HNDVDQJCIGZPNO-RXMQYKEDSA-N HNDVDQJCIGZPNO-YFKPBYRVSA-NHNDVDQJCIGZPNO-UHFFFAOYSA-N
HNN NH2
OH
O
NNH NH2
OH
O
HNN
OH
O
NH2
HNN
OH
O
NH2
HNN NH2
O-
ONa+
HNN NH3
+O-
O
O
HNN NH2
ONa
HNN NH
OH
ONH
N 15NH2
OH
O
HNDVDQJCIGZPNO-UHFFFAOYSA-N
charged form
tautomers
isotope
stereoisomers
salt
Std. InChIKey
“errors”
HNDVDQJCIGZPNO-UHFFFAOYSA-N
UHPNKBYGGMJTIM-UHFFFAOYSA-M
UHPNKBYGGMJTIM-UHFFFAOYSA-M
Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers
Structure Normalization
Tautomers
canonicaltautomer
?
O
OOH
O
OOH
O
OO
Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers
• CACTVS: generation of all formal tautomers for a given organic compound (prototropic tautomerism)
• rule set of 21 transforms encoded as (CACTVS-extended) SMIRKS
• types of tautomerism covered:
TautomersStructure Normalization
1.3, 1.5 keto/enol imine/enamine imine/amine lactam/lactim 1.3, 1.5, 1.7, 1.11 hydrogen atom shift on (aromatic) heteroatoms keten/ynol nitro/aci-nitro nitroso/oxime special cases: cyanic/iso-cyanic acid, phosphonic acid,
formamidinesulfonic acid, isocyanide, furanones and more …
Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers
TautomersStructure Normalization
transform: 1.3 keto-enol
[O,S,Se,Te;X1:1]=[Cx1:2][CX4R{0-2}:3][#1:4]>>[#1:4][O,S,Se,Te;X2:1][Cx1,cx1:2]=[C,cx1,cx0:3]
transform: 1.3 heteroatom H shift
[N,n,S,s,O,o,Se,Te:1]=[NX2,nX2,C,c,P,p:2][N,n,S,O,Se,Te:3][#1:4]>>[#1:4][N,n,S,O,Se,Te:1][NX2,nX2,C,c,P,p:2]=[N,n,S,s,O,o,Se,Te:3]
transform: 1.5 heteroatom H shift
[nX2,NX2,S,O,Se,Te:1]=[C,c,nX2,NX2:6][C,c:5]=[C,c,nX2:2][N,n,S,s,O,o,Se,Te:3][#1:4]>>[#1:4][N,n,S,O,Se,Te:1][C,c,nX2,NX2:6]=[C,c:5][C,c,nX2:2]=[NX2,S,O,Se,Te:3]
• 21 SMIRKS transforms, examples:
Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers
N
NH
NH
N
O
H2N
N
NH
N
HN
O
H2N
N
NH
N
N
OH
H2N
HN
N NH
N
O
H2N
N
N NH
N
OH
H2N
HN
N N
HN
O
H2N
N
N N
HN
OH
H2N
HN
N N
N
OH
H2N
HN
NH
NH
N
O
HN
N
NH
NH
N
OH
HN
HN
NH
N
HN
O
HN
N
NH
N
HN
OH
HN
HN
NH
N
N
OH
HN
HN
N NH
N
OH
HN
HN
N N
HN
OH
HN
TautomersStructure Normalization
A6199E68A788F2F5-FICTS 959B273B619C709F-FICTS
61248C4A7D045A47-FICTS
675R4FCC50F45026-FICTS
0B345B47F6625113-FICTS
181CA9BCE3EF47F4-FICTS
1AD375920BE60DAD-FICTS
67196F0B20B1D934-FICTS
BCCDA7D0CDACF120-FICTS CE8F480C11DBFC4F-FICTS
D46A1E6500B06AB6-FICTS
D979CF9770AC0BA5-FICTS
56FFE8B5619FB01-FICTS F802E527EC5C61BF-FICTS EF060DA9D97091DE-FICTS
BCCDA7D0CDACF120-FICuS
guanine
UYTPUPDQBNUYGX-UHFFFAOYSA-N
Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers
Tautomerism & Stereochemistry
O Z
O E
methyl propenyl ketone
Structure Normalization
Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers
O Z
O E
OH
tautomer
tautomer
methyl propenyl ketone
Structure Normalization
Tautomerism & Stereochemistry
Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers
O Z
O E
OH
O
76D03F08ACDF6C0C-FICuS
FICUS disregards stereo-chemistry on double bonds if the double bond is notlocated during tautomer generation.
tautomer
tautomer
methyl propenyl ketone
InChI/InChIKey - NCI/CADD Identifier comparison
Tautomerism & Stereochemistry
Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers
O Z
O E
OH
O
76D03F08ACDF6C0C-FICuS
FICUS disregards stereo-chemistry on double bonds if the double bond is notlocated during tautomer generation.
tautomer
InChI=1S/C5H8O/c1-3-4-5(2)6/h3-4H,1-2H3/b4-3+
LABTWGUMFABVFG-ONEGZZNKSA-N
InChI=1S/C5H8O/c1-3-4-5(2)6/h3-4,6H,1H2,2H3/b5-4-
LYGWZVOQSCPYDG-PLNGDYQASA-N
InChI=1S/C5H8O/c1-3-4-5(2)6/h3-4H,1-2H3/b4-3-
LABTWGUMFABVFG-ARJAWSKDSA-N
tautomer
methyl propenyl ketone
InChI/InChIKey - NCI/CADD Identifier comparison
Tautomerism & StereochemistryInChI=1S/C5H8O/c1-3-4-5(2)6/h3-4H,1-2H3
LABTWGUMFABVFG-UHFFFAOYSA-N
Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers
O Z
O E
OH
821D8C17ACE5040E-FICTS
6EB4AA2BAA11965F-FICTS
1677645190718885-FICTS
tautomer
tautomer
O
76D03F08ACDF6C0C-FICTS
methyl propenyl ketone
FICTS “sees” four different structures
InChI/InChIKey - NCI/CADD Identifier comparison
Tautomerism & Stereochemistry
Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers
Charges in Resonance SystemsStructure Normalization
F3A27F03AE77A722
F3A27F03AE77A722
62FADCB01F197FC9
canonicalresonancestructure?
uncharge
≠
uncharge
problem!
2E011EE4519F7920
NNH
NNH
H
NN
H NN
HH
different protonation states
Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers
Structure Normalization
• generation of all formal resonance structures for a given (charged) organic compound
• rule set of 14 transforms encoded as (CACTVS-extended) SMIRKS
shifting of charges:5 rules
recombination of charges:5 rules
separation of charges:4 rules
ON
O
ON
O
ON
O
ON
O
ON
O
ON
O
Charges in Resonance Systems
Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers
Structure Normalization
(no plausible unpolarized resonance structure can be drawn)
münchnones:
N
OO
N
OO
N
OO
N
OO
N
OO
N
OO
N
OO
N
OO
1.2 shift
1.2 recombination
1.2 recombination
separation(pentavalent N atom) 1.3 shift
1.3 shift
1.3 recombination 1.3 shift 1.3 shift1.3 shift1.3 shift
Charges in Resonance Systems
IUYUGWCTOLFFCL-UHFFFAOYSA-N F68AC07DE0D3379F-FICuS
Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers
• PubChem database(including Open NCI database, EPA DSSTox databases, NIAID HIVdatabases, NIST Webbook, NLM ChemIDplus, ChemSpider …)
• ChemNavigator iResearch Library(compilation of commercially available
screening compounds from
~250 international chemistry suppliers)
• Commercial Sources / Others
(Asinex, Comgenex, …)
»Chemical Structure Lookup Service« Database
74 million structure records (~46 million unique structures)
InChI/InChIKey - NCI/CADD Identifier comparison
ChemNav.iResearch Lib. ~43%
PubChem~47%
Others
~10%
Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers
• structure records registered in CSLS: 74.2 million
successful calculation of:Standard InChI/InChIKey: 73.8 million recordsNCI/CADD Structure Identifiers: 73.7 million records
• compound sets (unique chemical structure sets):
Standard InChI/InChIKey:FICTS IdentifierFICuS IdentifierStandard InChIKey (first block)uuuuu Identifier
48,027,94048,023,83546,715,52143,055,58941,671,010
Standard InChI/InChIKeys where calculated by stdinchi-1 (Linux i-386 executable) from the original SD file records
Unique Structure CountsInChI/InChIKey - NCI/CADD Identifier comparison
Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers
original structure record set(74.2 million)
FICuS compound set(46.7 million unique)
Standard InchI/InChIKey setcalculated by stdinchi-1
(73.8 million, 48.0 million unique)
Detailed ComparisonInChI/InChIKey - NCI/CADD Identifier comparison
Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers
original structure record set(74.2 million)
FICuS compound set(46.7 million unique)
Standard InchI/InChIKey setcalculated by stdinchi-1
(73.8 million, 48.0 million unique)
Detailed Comparison
1 conflicts?
InChI/InChIKey - NCI/CADD Identifier comparison
Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers
original structure record set(74.2 million)
FICuS compound set(46.7 million unique)
Standard InchI/InChIKey setcalculated by stdinchi-1
(73.8 million, 48.0 million unique)
Detailed Comparison
Standard InChI/InChIKeycalculated by CACTVS
from FICuS compound structure 1 conflicts?
InChI/InChIKey - NCI/CADD Identifier comparison
same InChI/InChIKey?2
Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers
1no conflicts between Std. InChI/InChIKey and FICuS
Detailed ComparisonInChI/InChIKey - NCI/CADD Identifier comparison
FICuS linked to a single InChI/InChIKey
both linked to a single structure record
both linked to multiple structure records
62.3
34.4
27.9
all structure records
(46.9%)
(38.0%)
73.7
(84.5%)
structure records(million records)
Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers
1conflicts between Std. InChI/InChIKey and FICuS
Detailed ComparisonInChI/InChIKey - NCI/CADD Identifier comparison
structure records(million records)
all structure records
FICuS is linked to multiple InChI/InChIKeys or vice versa
one FICuS is linked to multiple InChI/InChIKeys
one InChI/InChIKey is linked to multiple FICuS
10.4
3.6
6.8
(4.6%)
(9.3%)
(84.5%)
73.7
Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers
1conflicts between Std. InChI/InChIKey and FICuS
Detailed ComparisonInChI/InChIKey - NCI/CADD Identifier comparison
structure records(million records)
all structure records
FICuS is linked to multiple InChI/InChIKeys or vice versa
one FICuS is linked to multiple InChI/InChIKeys
one InChI/InChIKey is linked to multiple FICuS
10.4
3.6
6.8
(4.6%)
(9.3%)
(84.5%)
73.7
number of InChIKeys first block0.9
number of InChIKeys first block 2.3
(1.2%)
(3.1%)
Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers
Detailed Comparison
2
FICuS
FICTS
uuuuu
46.7
48.0
41.6
6.4 (13.7%)
3.8 (7.9%)
11.9 (28.6%)
compounds (unique structures)(million records)
all compounds
73.7 9.3
4.6
(29.7%)21.9
(6.2%)
(12.7%)
structure records(million records)
all records
InChI/InChIKey - NCI/CADD Identifier comparison
same InChI/InChIKey?
InChI changes InChI changes
Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers
Detailed Comparison
FICuS
FICTS
uuuuu
46.7
48.0
41.6
6.4 (13.7%)
3.8 (7.9%)
11.9 (28.6%)
compounds (unique structures)(million records)
all compounds
structure records(million records)
all records
InChI/InChIKey - NCI/CADD Identifier comparison
3.2 6.3(7.6%) (8.4%)vs. InChIKey first block
InChI changes InChI changes
2same InChI/InChIKey?
73.7 9.3
4.6
(29.7%)21.9
(6.2%)
(12.7%)
Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers
(formal) tautomer count > 1(formal) tautomer count > 3(formal) tautomer count > 10full stereocontains metal atomsmetal complexessalthas resonance chargesinorganic
compound classification
14.5%
18.5%
28.9%
16.9%
34.5%
52.1%
18.6%
52.1%
33.9%
56.4%25.4%5.5%
25.7%0.8%0.2%1.0%0.2%0.1%
Detailed ComparisonInChI/InChIKey - NCI/CADD Identifier comparison
occurrence inFICuS set
occurrence in FICuS subset
(InChI changes)
Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers
FICuS: 12 different structure recordslinked to this structure
Std. InChI/InChIKey (stdinchi-1): calculates 3 different strings/keys for these12 structure records (all have the same connectivity layer/first block)
all of these 3 StdInChI/InChIKey differ from the StdInChI/InChIKey calculated after FICuS normalization (including connectivity layer/first block)
InChI/InChIKey - NCI/CADD Identifier comparison
HN
O
N
NH
O
O
ChemBlock A3422/0145215
Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers
HN
O
N
NH
O
O
N
O
N O
O NH
ZE
InChI/InChIKey - NCI/CADD Identifier comparison
HN
O
N
NH
O
O
ChemBlock A3422/0145215
Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers
HN
O
N
NH
O
O
N
O
N O
O NH
ZE
tautomer:
InChI/InChIKey - NCI/CADD Identifier comparison
HN
O
N
NH
O
O
ChemBlock A3422/0145215
N
O
N
NH
O
O
Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers
HN
O
N
NH
O
O
N
O
N O
O NH
ZE
tautomer:
HN
O
N O
O NH
tautomericinterconversion?
InChI/InChIKey - NCI/CADD Identifier comparison
HN
O
N
NH
O
O
ChemBlock A3422/0145215
N
O
N
NH
O
O
Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers
HN
O
N
NH
O
O
N
O
N O
O NH
ZE
tautomer:
HN
O
N O
O NH
tautomericinterconversion?
tautomericinterconversion?
SR
InChI/InChIKey - NCI/CADD Identifier comparison
HN
O
N
NH
O
O
N
O
N
NH
O
O
N
O
N
NH
O
O
ChemBlock A3422/0145215
N
O
N
NH
O
O
Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers
HN
O
N
NH
O
O
N
O
N O
O NH
ZE
tautomer:
HN
O
N O
O NH
tautomericinterconversion?
tautomericinterconversion?
InChI/InChIKey - NCI/CADD Identifier comparison
HN
O
N
NH
O
O
ChemBlock A3422/0145215
N
O
N
NH
O
O
SR
N
O
N
NH
O
O
N
O
N
NH
O
O
Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers
HN
O
N
NH
O
O
N
O
N O
O NH
ZE
tautomer:
HN
O
N O
O NH
tautomericinterconversion?
tautomericinterconversion?
SR
InChI/InChIKey - NCI/CADD Identifier comparison
HN
O
N
NH
O
O
N
O
N
NH
O
O
N
O
N
NH
O
O
ChemBlock A3422/0145215
N
O
N
NH
O
O
How many structures?
ZINC04685909
ChemBlock A3422/0145215ChemNavigator 47748165NIST MS-Lib 1967005690
ChemNavigator 34903393
ChemNavigator 65635274
Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers
HN
O
N
NH
O
O
N
O
N O
O NH
ZE
tautomer:
HN
O
N O
O NH
tautomericinterconversion?
tautomericinterconversion?
SR
InChI/InChIKey - NCI/CADD Identifier comparison
HN
O
N
NH
O
O
N
O
N
NH
O
O
N
O
N
NH
O
O
ChemBlock A3422/0145215
N
O
N
NH
O
O
How many structures?
InChIKey A
InChIKey B
InChIKey C
same connectivity layer/block
FICuS
parent structure
Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers
DithiazinineInChI/InChIKey - NCI/CADD Identifier comparison
S
N
SN
I
original structure
Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers
DithiazinineInChI/InChIKey - NCI/CADD Identifier comparison
S
N
SN
I
S
N
SN
I
original structure
best representation
Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers
DithiazinineInChI/InChIKey - NCI/CADD Identifier comparison
S
N
SN
I
S
N
SN
HI
H
H
H
H
H
S
N
SN
I
H
H
H
S
N
SN
I
original structure
best representation
InChI
FICuS
ZE
E
ZE
Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers
The Adaption and Use of theIUPAC InChI/InChIKey
NCI/CADD IdentifiersInChI/InChIKey
Chemical Structure Lookup Service
FICTS FICuS uuuuuStd. InChI/InChIKey
74 million structure records – 46 million unique structures
http://cactus.nci.nih.gov/lookup
Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers
Web Service
Chemical Structure REST Service (beta)
http://cactus.nci.nih.gov/chemical/structure/{identifier}/{method}
http://cactus.nci.nih.gov/chemical/structure/InChIKey=LFQSCWFLJHTTHZ-UHFFFAOYSA-N/smileshttp://cactus.nci.nih.gov/chemical/structure/InChIKey=LFQSCWFLJHTTHZ-UHFFFAOYSA-N/nameshttp://cactus.nci.nih.gov/chemical/structure/InChIKey=LFQSCWFLJHTTHZ-UHFFFAOYSA-N/ficushttp://cactus.nci.nih.gov/chemical/structure/InChIKey=LFQSCWFLJHTTHZ-UHFFFAOYSA-N/stdinchihttp://cactus.nci.nih.gov/chemical/structure/InChIKey=LFQSCWFLJHTTHZ-UHFFFAOYSA-N/image
http://cactus.nci.nih.gov/chemical/structure/ethanol/stdinchikeyhttp://cactus.nci.nih.gov/chemical/structure/64-17-5/stdinchikey
URL scheme:
returns plain text/gif imageif the structure identifier is not resolvable: http 404 status code
Comparison Standard InChI/InChIKeys - NCI/CADD Structure Identifiers
Acknowledgments
ChemNavigator
Scott Hutton
Tad Hurst
CADD Group, LMC, NCI
Marc Nicklaus
Igor V. Filippov
CACTVS, Xemistry GmbH
Wolf-Dietrich IhlenfeldtThanks to all database providers
http://cactus.nci.nih.gov
Our web site: