Getting Past Diversityin Assessing
Virtual Library Designs
Bob ClarkTripos, Inc.
St. Louis, Missouri USA
2001 Tripos, Inc.
Where be the dragons?
Stylized data sets• pyridine, pyrimidine & cyclohexane libraries
• semi-homologous “series”
Nearest-neighbor profiles• problems & advantages of subsetting
4-Ureidopiperidine Sulfonamides• combinatorial sub-libraries OptSim™ design
Fingerprint visualization• horizon NLM
N
NN
R3R2
R1
R3R2
R1 R4R1
R2
R3
PyrPymChex
Position All libraries Chex & Pym Pyr only
R1 F, Br, NO2, Et H, Cl, CF3 noneNMe2, Ac, COCF3 Me, iPr, SMeSPh, OPh, CH2Ph Ph
R2 F, Et, CF3, COCF3 Br, NO2, NMe2 Cl, Me, SMe, PhOPh, CH2Ph Ac, SPh
R3 CF3, Ac, COCF3 F, Br, NO2 CN, CO2Me, CONH2
Et, NMe2, AcSPh, OPh, CH2Ph
R4 none none F, iPr, CF3, SMeAc, COCF3, PhSPh, OPh, CH2Ph
Cyclohexane, Pyrimidine and Pyridine Library Compositions*
*RD Clark. J Chem Inf Comput Sci 1997, 37, 1181-1188.
ChexPyr0.311±0.04
ChexPym0.271±0.05
NN similarityfr
eque
ncy
(%)
NN similarity
freq
uenc
y (%
)
Nearest Neighbor Database Comparisons(wrt UNITY 2D substructural fingerprints)*
* RD Clark. Relative and Absolute Diversity Analysis of Combinatorial Libraries. In: Combinatorial Library Design and Evaluation, pp 337-362; AK Ghose & VN Viswanadhan, Eds.; Marcel Dekker, New York, in press.
NN similarity
freq
uenc
y (%
)
Pyr5500Pyr5000.932±0.05
Pyr500Pyr55000.834±0.08
Asymmetry ofNearest Neighbor Profiles
C D
NN similarityfr
eque
ncy
(%)
NN similarity
freq
uenc
y (%
) Pyr*Pyr*0.544±0.02
Pyr2K*Pyr2K*0.560±0.02
Pyr*Pyr0.722±0.08
Pyr2K*Pyr2K0.729±0.09
Nearest Neighbor ProfilesUsing Maximally Diverse Subsets*
* RD Cramer, DE Patterson, RD Clark, F Soltanshahi & MS Lawless.J Chem Inf Comput Sci 1998, 38, 1010-1023.
NOCN tBOCR1CH2NH2 R2SO2Cl N
NH
NH
R1
SO2R2O
4-Ureidopiperidine SulfonamideLibrary*
Primary Amines Sulfonyl chlorides
Property cut-off passed cut-off passed
structure -- 436 -- 178
mol. weight 200 361 350 163
mol. volume 190 Å3 363 255 Å3 165
cLogP 2.6 370 5.0 168
aromatic rings 1 394 2 171
combined -- 308 -- 154
*RD Clark, DE Patterson, F Soltanshahi, JF Blake & JB Matthew. J Mol Graph Modelling 2000, 18, 404-411.
Ureidopiperidine SulfonamideSublibraries
All were constructed using an extension of “standard” OptiSim™ selection technology• subsample size k = 5
• exclusion radius 0.10
• incremental pivot method
Sublibrary 1: Cherry picked• 200 diverse representative products
Sublibrary 2: four blocks, 10 x 5 each• 32 amines + 20 sulfonyl chlorides
Sublibrary 3: single 20 x 10 block• 20 amines + 10 sulfonyl chlorides
A1
B1
A2
B1B1B1 B1 B2
B1 B2B1 B2
B1 B2B1 B2 B3
b21 b22 b23
b31 b32 b33b41 b42 b43B1 B2 B3B1 B2 B3 B4
a21
a22
a23
a31
a32
a33
A1
A3
A1A1
B1 B2 B3 B4 B1 B2 B3 B4 B1 B2 B3 B4 B1 B2 B3 B4 B5B5B5b51 b52 b53
A2
A1
A3
A2
A1
A3
A2
A1
A3
A2
A1
A3
A2
A1
A3
A2
A1
A3
A2
A1
A3
A2
A2
A1
A2
A1
A2
A1
A5
A4
A1
A3
A2
A4A4a41
a42
a43
a51
a52
a53
OptiSim Design Scheme
Ureidopiperidine SulfonamideNearest Neighbor Profiles
NN similarityfr
eque
ncy
(%)
NN similarity
freq
uenc
y (%
)
single block cherry picked cherry picked single block
0.74 ± 0.09(median 0.72)
0.81 ± 0.09(median 0.80)
Self-similarity Profiles forDiverse Subsets from Sub-libraries
(20 compound subsets)
NN similarityfr
eque
ncy
(%)
NN similarity
freq
uenc
y (%
)
cherry-picked: 0.52 ± 0.02 (median 0.515)four-block: 0.55 ± 0.02 (median 0.545)
single block: 0.60 ± 0.05 (median 0.615)
Nearest Neighbor Profilesfor Diverse Subsets are Symmetric
NN similarityfr
eque
ncy
(%)
NN similarity
freq
uenc
y (%
)cherry picked four blockfour block cherry picked
cherry picked single blocksingle block cherry picked
0.61 ± 0.09 (median 0.61)0.62 ± 0.09 (median 0.61)
0.63 ± 0.10 (median 0.58)0.62 ± 0.11 (median 0.58)
PCA(Euclidean)
NLM(Tanimoto)
1
42
3
1
4
2
3
1
4
23
1
4
3
2
1
4
3
2
1
4
3
2
Effect of Horizon Distance (cyclohexanes)
Homolosine Projection
source: Cartography Laboratory Indiana State University
www.indstate.edu/gga/gga_cart
CH2 NH
S CCl3
O
O
X
F
F
NH
S
O
O
XCH3
CH3O
CH2 NH
S
O
O
X
N
HOCH2
ON
Cl
NH
S CCl3
O
O
XN
NH
S CH2CH3
O
O
XSHO
O
O
NH
S CH2CH3Cl
O
O
XOMeMeO
CH2 NH
S
O
O
X
NO
NOCF2H
25
26
33
34
35
36
37
PCA NLMwith Horizon
CH2 NH
S
O
O
X
OMe
H3C
Br
Cl
NH
S
O
O S NOX
O
CH2 NH
S
O
O
X
CH3
HO
Br
MeO
NH
S
O
O S NNXCH2O CH3
CF3
O
HO
NH
S
O
O
X OCF3
OEtEtO
NH
S
O
O S ClXCH2F
Cl
CH2 NH
S
O
O
XO
N
CH2 NH
S
O
O
XO
O
CH2 NH
S
O
O
X N
CF3
FN
Cl
CH3
CH3
CH2 NH
S
O
O
X NO
CH3
CH3
NOH3C
O
NH
S
O
O
XS
CH3
CH3
CH3O
22
23
24
27
28
29
30
31
32
38
39
PCA NLMwith Horizon
cherry pickingfour blockssingle block
42
45
4648
51
53
NH
S
O
O S NNXCH2
CH3
CF3CH3
CH3
Me2NCH2
NH
S CH2CH2CH2CH3
O
O
XN
NH OH
NH
S
O
O S
OMeCOOMe
CH3
CH3
H3CX
NH
S
O
O
S
N
N
X
OH
O
Cl
NH
S
O
O
XMe2N
NN
Cl
CH3
CH3
NH
S CH2
O
O
X
OH
Et2N
Comparison of Sub-Libraries
cherry pickingfour blockssingle block
41
42
43
44
45
46
47
48
49
50
51
53
54
55
CH2 NH
S
O
O
X
FN
N
Cl
F
Br
NH
S
O
O S NNXCH2
CH3
CF3CH3
CH3
Me2NCH2
NH
S CH2CH2CH2CH3
O
O
XN
NH OH
CF3CF2CH2 NH
S CH2CH2CH2Cl
O
O
X
NH
S
O
O SXCH2
O
ON
N
SMe
NH
S
O
O SX
OCH2
CH3
N
NH
S
O
O
X
F
SiMeO
MeO
CH3F
Br
NH
S
O
O S
OMeCOOMe
CH3
CH3
H3CX
NH
S
O
O
S
N
N
X
OH
O
Cl NH
S
O
O NX
NCl
CF3
OH3C
NH
S
O
O
XMe2N
NN
Cl
CH3
CH3
NH
S
O
ONN
X
CH3Cl
O
OMe
CH2 NH
S CH2CH2CH2Cl
O
O
XNOH3C
O
NH
S CH2
O
O
X
OH
Et2N
Comparison of Sub-Libraries
cherry pickingfour blockssingle block
40
42
44
45
46
47
48
49
51
52
55
NH
S
O
O S NNXCH2
CH3
CF3CH3
CH3
Me2NCH2
NH
S
O
O
XN
O
NH
S
O
O SX
OCH2
CH3
N
NH
S
O
O
X
F
SiMeO
MeO
CH3F
Br
NH
S
O
O S
OMeCOOMe
CH3
CH3
H3CX
NH
S
O
O
S
N
N
X
OH
O
Cl
NH
S
O
O NX
NCl
CF3
OH3C
NH
S
O
O
XMe2N
NN
Cl
CH3
CH3
NH
S
O
ONN
X
CH3Cl
O
OMe
NH
S CH2
O
O
X
OH
Et2N
NH
S
O
O S NOX
O
CH3
CH3
H3CO
Comparison of Sub-Libraries
Acknowledgements
NIH SBIR grant 1R43GM58919 David Patterson
• Sr. Fellow
Fred Soltanshahi• Technologist
Trevor Heritage, VP Software R&D
1999 Tripos, Inc.
Take-home:fingerprint similarity
isbiologically relevant (good neighborhood
behavior)