interplay between screenng_data_and _properties_pope
TRANSCRIPT
Andy PopePlatform Technology & Science, GlaxoSmithKline,
Collegeville PA, USA
MipTec 2011, BaselSept. 20-22, 2011
The Interplay between Chemical Properties
and Screening Data
Compound properties aren’t what they used to be…
cLogP (median)
Failed candidate = 3.9
Marketed drug = 2.5
MW (median)
Failed candidate = 432
Marketed drug = 349
*Adapted from Blake JF, Medicinal Chemistry,2005, 1, 649-655
Clo
gP
MW
Properties vs Phase*
Clo
gP
MW
And we have known this for a while. …
Drug discovery chemical property space - Some critical factors . …
- Chemistry methods- Chemistry “culture”
- Screening methods- SAR data
- Hit ID libraries
Drug candidates
Drug discovery chemical property space - Some critical factors . …
- Chemistry methods- Chemistry “culture”
- Screening methods- SAR data
- Hit ID libraries
- Efficiency concepts
- Property guides/rules
- Rigorous property rules
- Fragments
- Lead-like, “Beautiful”
Drug candidates
Does assay data influence discovery chemical property space occupancy?
(…or vice versa)
Large Scale analysis of High Throughput Screening Data
HTS at GSK 330 screens of >500,000 cpds, 2005-2010
Single concentration primary data (10 uM) re-analysed
Compound results binned according to simple compound properties
Meta-data (e.g. target class, screening technology) curated
Academic screening centers (MLPCN) ~100 screens with >250,000 cpds tested & deposited to PubChem BioAssay
from major NIH funded screening centers (NCGC, Scripps, Broad)
Single concentration data re-analysed using same methods as GSK data
The GSK HTS Process
Primary Screen (10 uM – singlicate)
Confirmation(10 uM – duplicate)
Dose response (11 pt 3-fold dilution)
Entire collection (100%)
“Real“ hits (<0.1%)
Potential actives (<1%)
Statistical separationfrom null effect
population
Chemical clustering if hit rate >1%
Chemical clustering for diversity & propertysampling
~
2
~2 million
~20,000
~2,000
Eliminate falsepositives from
primary
HTS Hit Marking Processes
RESPONSE (% control)
% C
om
po
un
ds
---- Binned Raw HTS data---- Fit with raw mean and std. deviation---- Fit with robust mean & std. deviation
2
2
2
22
1x
exP
Normal Distribution:
Raw mean = 1.0Raw SD = 11.1Robust mean = -0.3Robust SD = 5.5
Blue & black curves arenormal distribution fitsusing mean & SD
“hit”
potent hits (& artefacts)
weak hits, artefacts, and statistical “noise”
3 x RSD cut-off
“miss”
RESPONSE (% control)
% C
om
po
un
ds
Typical HTS observed data distribution vs. fit
Effect (% control)
Fre
qu
ency
(%
cp
ds/
bin
)
Note; representative selection of individual
screens from ~330 analyzed
---- Binned Raw HTS data---- Robust distribution fit---- Hit cut-off (mean + 3 x RSD)---- Residual (raw – fit)
Fre
qu
ency
(%
cp
ds
in b
in)
Effect (% control)
Observed data distribution vs. fit – zoom
Note; representative selection of
individual screens from ~330 analyzed
---- Binned Raw HTS data---- Robust distribution fit---- Hit cut-off (mean + 3 x RSD)---- Residual (raw – fit)
GSK HTS campaigns 2005-2010
Average robust Z’ of assay during HTS production
Scre
en
cu
t-o
ff (
me
an +
3 x
RSD
)
Looking for property trends in the GSK HTS dataset
Polar Surface Area (tPSA, Å2)
Hit
Rat
e (
%)
Compounds with tPSA 80-85 Å2
26M measured responses in this bin- 485k marked as “hit”
Hit rate = 100*(485k/26M) = 1.86%
The total polar surface area (tPSA) is defined as the surface sum over all polar atoms< 60 A2 predicts brain penetration> 140 A2 predicts poor cell penetration
- Hit rate for Compounds inspecific tPSA bin
Aggregate results from all 330 campaigns 2005-2010 with >500K tests
e.g. Compound total polar surface area;
Compound shapeliness and flexibility
Fraction of carbons that are sp3 (fCsp3)
Hit
Rat
e (%
)
Flexibility
Hit
Rat
e (%
)
fCsp3 captures “shapeliness” of a compound- Weak positive correlation with MW- More irregular 3D shape lower hit probability
Flexibility = Percentage of a compound’s bonds that arerotatable
- light decrease in HR with Flexibility- No correlation with MW or ClogP
Compound Size (MW)
MW
% C
pd
sin
MW
Bin
Cu
mu
lative % C
pd
s
Middle 80% of Cpds270 470
Molecular Weight (MW)
Hit
Rat
e (
%)
1.50%
2.62%
4.0%
1.2%
Overall Hit rate rises 1.7-fold acrossthe middle 80% of the screening deck
i.e. 70% rise in hit rate from MW = 270 to MW = 470
3.3-fold rise across full MW range- Only bins containing 1M or more
records are shown
HTS hit rates rises significantly
with increasing compound MW
Compound Lipophilicity (ClogP)
ClogP
% C
pd
sin
Clo
gPB
in
Cu
mu
lative % C
pd
s
Middle 80% of Cpds1 5
ClogP
Hit
Rat
e (
%)
1.14%
3.31%
4.5%
1.1% Overall hit rate rises 2.9-fold across the
middle 80% of the screening decki.e. from ClogP = 1 5
4.1-fold rise across full ClogP range
- Only bins containing 1M or morerecords are shown
HTS hit rates rises sharply with increasing compound lipophilicity
Promiscuity v. Molecular Properties
cLogP
% R
ise in
Pro
miscu
ity
% o
f P
rom
iscu
ou
s C
om
po
un
ds
Molecular Weight%
Rise
in P
rom
iscuity
% o
f P
rom
iscu
ou
s C
om
po
un
ds
Across the middle 80% of the screening deck …• Large compounds are 4-fold more likely to have high HFI than small ones (MW: 270 470)• Lipophilic compounds are 10-fold more likely to have high HFI than polar ones ones (cLogP: 1 5)
The prevalence of promiscuous compounds rises sharply with size andlipophilicity
• Hit Frequency Index (HFI)= % of SS HTS campaigns that a compound give activity >cut-off• “Promiscuous” compound HFI ≥ 10% (having seen at least 50 campaigns)
Property distributions vs. promiscuity - cLogPcL
ogP
Inhibition frequency Index* (%)
Note; Compounds
required to have been
run in 50 HTS and
yielded > 50% effect in
a single screen to be
included
*Inhibition frequency index (IFI) = % of screens where cpd yielded >50% inhibition, where total screens run => 50
Compoundshitting ~1 target
Compounds hitting >10% of targets
Frequency at bin > Frequency at bin >Frequency at bin > Frequency at bin >
The “Dark” Matter
Mo
lecu
lar
We
igh
t (D
a)
cLo
gP
– Compounds which have not yielded >50% effect once in >50 screens
Translation of biases to full-curve follow-up
cLogP
% C
om
po
un
ds
Test
ed
Molecular Weight
% C
om
po
un
ds
Test
ed
Elevated testing of large, lipophiliccompounds in the full-curve phase of HTS
Reduced testing of small, polar compoundsin the full-curve phase of HTS
Note; Plots represent data from 402M single-concentration responses &2.1M full-curve results
Property bias in primary HTS hit marking are propagated forward to dose-response follow-up
SS testingFC testingFC – SS differential
Property Trends; translation to dose response
*Across the middle 80% of the deck,….
ClogP
% L
ift
in H
it R
ate
Standard 3SD SS HitsTop 0.1% of SS Responses% of cpds with IC50 <= 10 uM
MW
% L
ift
in H
it R
ate
From *ClogP = 1 5:• 3SD: 2.9X rise in Hit Rate• Top 0.1%: 2.2X rise• FC Active: 1.5X rise
From *MW = 270 470:• 3SD: 1.8X rise in Hit Rate• Top 0.1%: 1.3X rise• FC Active: 1.2X rise
Property effects contribute to hits at all effect levels- i.e not just hits on the statistical margins
Property-dependence decreases through the HTS process
Standard 3SD SS HitsTop 0.1% of SS Responses% of compounds with IC50 <= 10 uM
Property response of individual screens is highly variable
e.g. Screens with largest response to cLogP
cLogP
Hit
rat
e a
s %
of
HR
at
cLo
gP=
3.5
e.g. Screens with smallest response to cLogP
cLogP
Hit
rat
e as
% o
f H
R a
t cL
ogP
=3.5
Property response of individual screens is highly variable
Assay TechnologyH
it r
ate
as %
of
HR
at
cLo
gP=3
.5
cLogP
Colored by Hit rate (%)
Target ClassH
it r
ate
as %
of
HR
at
cLo
gP=3
.5
cLogP
Colored by Hit rate (%)
Improving hit marking
- reducing bias towards high cLogP, MW hits
Virtual partitioning of collection according to property - e.g. sub-collections in different cLogP ranges
Change the hit calling method, so this takes properties as well as % effect intoaccount. - e.g. calculate hit cut-off’s bases on BEI/LEI etc. - “scalar” methods based on correcting the observed biases
And..improving assays and the collection based on awareness of these biases
Improving hit marking – Property Biasing
Hit
Rat
e (%
)
Ordinary HTS Hit MarkingProperty-biased Hit Marking
MW
ClogP
Hit
Rat
e (%
)Ordinary HTS Hit MarkingProperty-biased Hit Marking
More attractive properties- promote
Less attractiveproperties- demote
% C
om
po
un
ds
Mean + 3 x RSD cut-off
RESPONSE (% control)
Property-biased Hit MarkingH
it R
ate
(%
)
MW
ResponseProperty-Binned statsProperty Consensus
Hit
Rat
e (
%)
ClogP
ResponseProperty-Binned statsProperty Consensus
Improving hit marking – Property Binning
Bin 1;Low MW,
cLogP
Bin 1;Low MW,
cLogP
Bin 2;Medium MW,
cLogP
Bin 2;Medium MW,
cLogP
Bin 3;High MW,
cLogP
Bin 3;High MW,
cLogP
Sub-divide screening data into bins of compounds with similar properties- apply 3 x rsd hit cut-offs to each bin
Consensus method combines approaches – routinely implemented
ClogP
(% o
f to
tal c
om
po
un
ds
in H
TS)
- 2004
- 2010
- 2010 <> 2004
Year
% C
om
po
un
ds
Exce
ed
ing
Pro
pe
rty
Lim
it
New
2011
ClogP > 5
MW > 500
CCE Acquisition, Property Bounds2004-05: Lipinski criteria (MW<500, ClogP<5)Most recently: MW<360, ClogP<3Inclusion of DPU lead-op cpds: MW<500, ClogP<5
Evolving the screening collection to smaller, more polar lead-like space
GSK’s Compound Collection Enhancement (CCE) strategy has biased the HTS deck towards decreased size and lipophilicity with the aim of improving chemical startingpoints
Compounds tested in HTS
1.28%
3.80%
Hit
Rat
e (
%)
ClogP
2.14%
Hit
Rat
e (
%)
(MW)
2.27%
Pretty flat
Property trends in MLPCN Screening Data
Primary data from around 100 Academic HTS campaigns obtained fromPubChem BioAssay
Lipophilicity – similar to GSK HTS Compound size – little effect
GSK screening deck (>50 HTSs, 2.01M cpds)ClogP = 0.00835*MW – 0.058, R2 = 0.18
PubChem Compounds (405k)ClogP = 0.00554*MW + 0.97, R2 = 0.09
MLPCN Screening Data – Property Trends
Trellis by individual screens
3 x
rsd
hit
rat
e (%
)
cLogP
Example Individual screen responses to cLogP
Small Beautiful Set Screening
Filtered on;- size and lipophilicity
• 10 ≤ HAC ≤ 28 and -2 ≤ ClogP ≤ 3, bounded
- “promiscuity” – frequent-hitters are eliminated• IFI ≤ 3% (IFI = Inhibition Frequency Index, 3SD hit cutoff)
- hit explosion opportunity• Near Neighbor Count ≥ 20 (in GSK registry
- “shapliness”• fCsp3 ≥ 0.3 (i.e. ≥ 30% of carbon atoms must be sp3)
- acquisition sub-structural filters
- “greedy” diversity selection (no compounds >0.9 similar )
SBS = Subset of the HTS deck which spans thegap between HTS and fragments
SBS2 = ~75,000 compounds
HTS collection (2M)
Tested at higher concentration (e.g. 100-200 uM)
ClogP
(MW)
Conclusions
Standard HTS processes favor the selection of larger, more lipophiliccompounds
There are no clear trends between this behavior and assay technology or target class
Methods have been developed which (to some extent) compensate for property biases to ensure that attractive lead like molecules are selected
- Overall hit rate in relation to downstream triage capacity is also critical
- Aspire to hit rate to as close to “authentic pharmacology” rate as possible
Changing the trajectory of discovery chemical space requires an interplay
between the composition of chemical libaries, assay practice, hit analysis
and downstream Hit to Lead and Lead to Candidate chemistry practice
Acknowledgements
Pat BradyDarren GreenStephen Pickett Sunny HungSubhas ChakravortyNicola RichmondJesus HerranzGonzalo Colmeranjo-Sanchez
…and numerous others who contributed to the 300+ HTS campaigns run by GSK 2005-2010…..
Tony JurewiczGlenn HofmannStan MartensJeff GrossZining WuMehu PatelEmilio DiezJulio Martin-Plaza
James ChanSnehal BhattAmy QuinnGeoff QuiniqueBob Hertzberg
Screening & Compound Profiling
Backups
Year of ScreenH
it r
ate
as %
of
HR
at
cLo
gP=3
.5
cLogP
Colored by Hit rate (%)
Promiscuity v. Molecular Properties – Molecular weightM
ole
cula
r W
eig
ht
(Da)
Inhibition frequency Index (%)
Note; Compounds
required to have been
run in 50 HTS and
yielded > 50% effect in
a single screen to be
included
Compoundshitting ~1 target
Compounds hitting >10% of targets
*Inhibition frequency index (IFI) = % of screens where cpd yielded >50% inhibition, where total screens run => 50
Frequency at bin > Frequency at bin > Frequency at bin > Frequency at bin >
GSK HTS campaigns 2005-2010
Nu
mb
er
of
Scre
en
s
Mean + 3 *RSD of sample data (% control)
Nu
mb
er
of
Scre
en
s
Hit cut-off (% effect @ 10 uM) Hit rate (% of compounds) > cut-off
% compounds with effect > mean + 3 *RSD
Validation and robustness methods cannot detectProperty-biases
cLo
gP
MW
Compound sets used to test robustness of assays and
validate screening process reflect current compound
acquistion practice, not the collection as tested
Dose Response Data – Property Trends
cLogP
% R
ise
in A
ctiv
e R
ate
% o
f Te
sts
Yie
ldin
g p
XC
50
≥ 5
Molecular Weight
% o
f Te
sts
Yie
ldin
g p
XC
50
≥ 5 %
Rise
in A
ctive R
ate
No, size and lipophilicity dependence is still observed in the rate ofidentifying compounds at 10uM activity or better
Is the observed size & lipophilicity bias in HTS single-shot testing an artifactof false positives, e.g. experimental “noise”?
% R
ise in
Active
Rate
Molecular Property Correlations in GSKscreen
Across 2.09M cpds in GSKscreen
Property R2, ± vs MW R2, ± vs ClogP
MW 1, + 0.21, +
ClogP 0.21, + 1.0, +
HAC 0.92, + 0.19, +
fCsp3 0.15, + 0.00
RotBonds 0.36, + 0.04, +
tPSA 0.16, + 0.08, -
Chiral 0.02, + 0.00
HetAtmRatio 0.02, - 0.34, -
Complexity 0.31, + 0.02, +
Flexibility 0.02, + 0.00
AromRings 0.22, + 0.16, +
HBA 0.11, + 0.10, -
HBD 0.01, + 0.02, -
Table below shows the correlation coefficients (R2) between particular molecular properties and MW/ClogP, along with whether the correlation is positive or negative (i.e. the sign of the slope in a linear regression) This data is computed using 2.09M compounds comprising GSKscreen