interplay between screenng_data_and _properties_pope

Andy PopePlatform Technology & Science, GlaxoSmithKline,

Collegeville PA, USA

MipTec 2011, BaselSept. 20-22, 2011

The Interplay between Chemical Properties

and Screening Data

Compound properties aren’t what they used to be…

cLogP (median)

Failed candidate = 3.9

Marketed drug = 2.5

MW (median)

Failed candidate = 432

Marketed drug = 349

*Adapted from Blake JF, Medicinal Chemistry,2005, 1, 649-655

Clo

gP

MW

Properties vs Phase*

Clo

gP

MW

And we have known this for a while. …

Drug discovery chemical property space - Some critical factors . …

- Chemistry methods- Chemistry “culture”

- Screening methods- SAR data

- Hit ID libraries

Drug candidates

Drug discovery chemical property space - Some critical factors . …

- Chemistry methods- Chemistry “culture”

- Screening methods- SAR data

- Hit ID libraries

- Efficiency concepts

- Property guides/rules

- Rigorous property rules

- Fragments

- Lead-like, “Beautiful”

Drug candidates

Does assay data influence discovery chemical property space occupancy?

(…or vice versa)

Large Scale analysis of High Throughput Screening Data

HTS at GSK 330 screens of >500,000 cpds, 2005-2010

Single concentration primary data (10 uM) re-analysed

Compound results binned according to simple compound properties

Meta-data (e.g. target class, screening technology) curated

Academic screening centers (MLPCN) ~100 screens with >250,000 cpds tested & deposited to PubChem BioAssay

from major NIH funded screening centers (NCGC, Scripps, Broad)

Single concentration data re-analysed using same methods as GSK data

The GSK HTS Process

Primary Screen (10 uM – singlicate)

Confirmation(10 uM – duplicate)

Dose response (11 pt 3-fold dilution)

Entire collection (100%)

“Real“ hits (<0.1%)

Potential actives (<1%)

Statistical separationfrom null effect

population

Chemical clustering if hit rate >1%

Chemical clustering for diversity & propertysampling

~

2

~2 million

~20,000

~2,000

Eliminate falsepositives from

primary

HTS Hit Marking Processes

RESPONSE (% control)

% C

om

po

un

ds

---- Binned Raw HTS data---- Fit with raw mean and std. deviation---- Fit with robust mean & std. deviation

2

2

2

22

1x

exP

Normal Distribution:

Raw mean = 1.0Raw SD = 11.1Robust mean = -0.3Robust SD = 5.5

Blue & black curves arenormal distribution fitsusing mean & SD

“hit”

potent hits (& artefacts)

weak hits, artefacts, and statistical “noise”

3 x RSD cut-off

“miss”


% C

om

po

un

ds

Typical HTS observed data distribution vs. fit

Effect (% control)

Fre

qu

ency

(%

cp

ds/

bin

)

Note; representative selection of individual

screens from ~330 analyzed

---- Binned Raw HTS data---- Robust distribution fit---- Hit cut-off (mean + 3 x RSD)---- Residual (raw – fit)

Fre

qu

ency

(%

cp

ds

in b

in)

Effect (% control)

Observed data distribution vs. fit – zoom

Note; representative selection of

individual screens from ~330 analyzed

---- Binned Raw HTS data---- Robust distribution fit---- Hit cut-off (mean + 3 x RSD)---- Residual (raw – fit)

GSK HTS campaigns 2005-2010

Average robust Z’ of assay during HTS production

Scre

en

cu

t-o

ff (

me

an +

3 x

RSD

)

Looking for property trends in the GSK HTS dataset

Polar Surface Area (tPSA, Å2)

Hit

Rat

e (

%)

Compounds with tPSA 80-85 Å2

26M measured responses in this bin- 485k marked as “hit”

Hit rate = 100*(485k/26M) = 1.86%

The total polar surface area (tPSA) is defined as the surface sum over all polar atoms< 60 A2 predicts brain penetration> 140 A2 predicts poor cell penetration

- Hit rate for Compounds inspecific tPSA bin

Aggregate results from all 330 campaigns 2005-2010 with >500K tests

e.g. Compound total polar surface area;

Compound shapeliness and flexibility

Fraction of carbons that are sp3 (fCsp3)

Hit

Rat

e (%

)

Flexibility

Hit

Rat

e (%

)

fCsp3 captures “shapeliness” of a compound- Weak positive correlation with MW- More irregular 3D shape lower hit probability

Flexibility = Percentage of a compound’s bonds that arerotatable

- light decrease in HR with Flexibility- No correlation with MW or ClogP

Compound Size (MW)

MW

% C

pd

sin

MW

Bin

Cu

mu

lative % C

pd

s

Middle 80% of Cpds270 470

Molecular Weight (MW)

Hit

Rat

e (

%)

1.50%

2.62%

4.0%

1.2%

Overall Hit rate rises 1.7-fold acrossthe middle 80% of the screening deck

i.e. 70% rise in hit rate from MW = 270 to MW = 470

3.3-fold rise across full MW range- Only bins containing 1M or more

records are shown

HTS hit rates rises significantly

with increasing compound MW

Compound Lipophilicity (ClogP)

ClogP

% C

pd

sin

Clo

gPB

in

Cu

mu

lative % C

pd

s

Middle 80% of Cpds1 5

ClogP

Hit

Rat

e (

%)

1.14%

3.31%

4.5%

1.1% Overall hit rate rises 2.9-fold across the

middle 80% of the screening decki.e. from ClogP = 1 5

4.1-fold rise across full ClogP range

- Only bins containing 1M or morerecords are shown

HTS hit rates rises sharply with increasing compound lipophilicity

Promiscuity v. Molecular Properties

cLogP

% R

ise in

Pro

miscu

ity

% o

f P

rom

iscu

ou

s C

om

po

un

ds

Molecular Weight%

Rise

in P

rom

iscuity

% o

f P

rom

iscu

ou

s C

om

po

un

ds

Across the middle 80% of the screening deck …• Large compounds are 4-fold more likely to have high HFI than small ones (MW: 270 470)• Lipophilic compounds are 10-fold more likely to have high HFI than polar ones ones (cLogP: 1 5)

The prevalence of promiscuous compounds rises sharply with size andlipophilicity

• Hit Frequency Index (HFI)= % of SS HTS campaigns that a compound give activity >cut-off• “Promiscuous” compound HFI ≥ 10% (having seen at least 50 campaigns)

Property distributions vs. promiscuity - cLogPcL

ogP

Inhibition frequency Index* (%)

Note; Compounds

required to have been

run in 50 HTS and

yielded > 50% effect in

a single screen to be

included

*Inhibition frequency index (IFI) = % of screens where cpd yielded >50% inhibition, where total screens run => 50

Compoundshitting ~1 target

Compounds hitting >10% of targets

Frequency at bin > Frequency at bin >Frequency at bin > Frequency at bin >

The “Dark” Matter

Mo

lecu

lar

We

igh

t (D

a)

cLo

gP

– Compounds which have not yielded >50% effect once in >50 screens

Translation of biases to full-curve follow-up

cLogP

% C

om

po

un

ds

Test

ed

Molecular Weight

% C

om

po

un

ds

Test

ed

Elevated testing of large, lipophiliccompounds in the full-curve phase of HTS

Reduced testing of small, polar compoundsin the full-curve phase of HTS

Note; Plots represent data from 402M single-concentration responses &2.1M full-curve results

Property bias in primary HTS hit marking are propagated forward to dose-response follow-up

SS testingFC testingFC – SS differential

Property Trends; translation to dose response

*Across the middle 80% of the deck,….

ClogP

% L

ift

in H

it R

ate

Standard 3SD SS HitsTop 0.1% of SS Responses% of cpds with IC50 <= 10 uM

MW

% L

ift

in H

it R

ate

From *ClogP = 1 5:• 3SD: 2.9X rise in Hit Rate• Top 0.1%: 2.2X rise• FC Active: 1.5X rise

From *MW = 270 470:• 3SD: 1.8X rise in Hit Rate• Top 0.1%: 1.3X rise• FC Active: 1.2X rise

Property effects contribute to hits at all effect levels- i.e not just hits on the statistical margins

Property-dependence decreases through the HTS process

Standard 3SD SS HitsTop 0.1% of SS Responses% of compounds with IC50 <= 10 uM

Property response of individual screens is highly variable

e.g. Screens with largest response to cLogP

cLogP

Hit

rat

e a

s %

of

HR

at

cLo

gP=

3.5

e.g. Screens with smallest response to cLogP

cLogP

Hit

rat

e as

% o

f H

R a

t cL

ogP

=3.5

Property response of individual screens is highly variable

Assay TechnologyH

it r

ate

as %

of

HR

at

cLo

gP=3

.5

cLogP

Colored by Hit rate (%)

Target ClassH

it r

ate

as %

of

HR

at

cLo

gP=3

.5

cLogP


Improving hit marking

- reducing bias towards high cLogP, MW hits

Virtual partitioning of collection according to property - e.g. sub-collections in different cLogP ranges

Change the hit calling method, so this takes properties as well as % effect intoaccount. - e.g. calculate hit cut-off’s bases on BEI/LEI etc. - “scalar” methods based on correcting the observed biases

And..improving assays and the collection based on awareness of these biases

Improving hit marking – Property Biasing

Hit

Rat

e (%

)

Ordinary HTS Hit MarkingProperty-biased Hit Marking

MW

ClogP

Hit

Rat

e (%

)Ordinary HTS Hit MarkingProperty-biased Hit Marking

More attractive properties- promote

Less attractiveproperties- demote

% C

om

po

un

ds

Mean + 3 x RSD cut-off


Property-biased Hit MarkingH

it R

ate

(%

)

MW

ResponseProperty-Binned statsProperty Consensus

Hit

Rat

e (

%)

ClogP

ResponseProperty-Binned statsProperty Consensus

Improving hit marking – Property Binning

Bin 1;Low MW,

cLogP

Bin 1;Low MW,

cLogP

Bin 2;Medium MW,

cLogP

Bin 2;Medium MW,

cLogP

Bin 3;High MW,

cLogP

Bin 3;High MW,

cLogP

Sub-divide screening data into bins of compounds with similar properties- apply 3 x rsd hit cut-offs to each bin

Consensus method combines approaches – routinely implemented

ClogP

(% o

f to

tal c

om

po

un

ds

in H

TS)

- 2004

- 2010

- 2010 <> 2004

Year

% C

om

po

un

ds

Exce

ed

ing

Pro

pe

rty

Lim

it

New

2011

ClogP > 5

MW > 500

CCE Acquisition, Property Bounds2004-05: Lipinski criteria (MW<500, ClogP<5)Most recently: MW<360, ClogP<3Inclusion of DPU lead-op cpds: MW<500, ClogP<5

Evolving the screening collection to smaller, more polar lead-like space

GSK’s Compound Collection Enhancement (CCE) strategy has biased the HTS deck towards decreased size and lipophilicity with the aim of improving chemical startingpoints

Compounds tested in HTS

1.28%

3.80%

Hit

Rat

e (

%)

ClogP

2.14%

Hit

Rat

e (

%)

(MW)

2.27%

Pretty flat

Property trends in MLPCN Screening Data

Primary data from around 100 Academic HTS campaigns obtained fromPubChem BioAssay

Lipophilicity – similar to GSK HTS Compound size – little effect

GSK screening deck (>50 HTSs, 2.01M cpds)ClogP = 0.00835*MW – 0.058, R2 = 0.18

PubChem Compounds (405k)ClogP = 0.00554*MW + 0.97, R2 = 0.09

MLPCN Screening Data – Property Trends

Trellis by individual screens

3 x

rsd

hit

rat

e (%

)

cLogP

Example Individual screen responses to cLogP

Small Beautiful Set Screening

Filtered on;- size and lipophilicity

• 10 ≤ HAC ≤ 28 and -2 ≤ ClogP ≤ 3, bounded

- “promiscuity” – frequent-hitters are eliminated• IFI ≤ 3% (IFI = Inhibition Frequency Index, 3SD hit cutoff)

- hit explosion opportunity• Near Neighbor Count ≥ 20 (in GSK registry

- “shapliness”• fCsp3 ≥ 0.3 (i.e. ≥ 30% of carbon atoms must be sp3)

- acquisition sub-structural filters

- “greedy” diversity selection (no compounds >0.9 similar )

SBS = Subset of the HTS deck which spans thegap between HTS and fragments

SBS2 = ~75,000 compounds

HTS collection (2M)

Tested at higher concentration (e.g. 100-200 uM)

ClogP

(MW)

Conclusions

Standard HTS processes favor the selection of larger, more lipophiliccompounds

There are no clear trends between this behavior and assay technology or target class

Methods have been developed which (to some extent) compensate for property biases to ensure that attractive lead like molecules are selected

- Overall hit rate in relation to downstream triage capacity is also critical

- Aspire to hit rate to as close to “authentic pharmacology” rate as possible

Changing the trajectory of discovery chemical space requires an interplay

between the composition of chemical libaries, assay practice, hit analysis

and downstream Hit to Lead and Lead to Candidate chemistry practice

Acknowledgements

Pat BradyDarren GreenStephen Pickett Sunny HungSubhas ChakravortyNicola RichmondJesus HerranzGonzalo Colmeranjo-Sanchez

…and numerous others who contributed to the 300+ HTS campaigns run by GSK 2005-2010…..

Tony JurewiczGlenn HofmannStan MartensJeff GrossZining WuMehu PatelEmilio DiezJulio Martin-Plaza

James ChanSnehal BhattAmy QuinnGeoff QuiniqueBob Hertzberg

Screening & Compound Profiling

Backups

Year of ScreenH

it r

ate

as %

of

HR

at

cLo

gP=3

.5

cLogP


Promiscuity v. Molecular Properties – Molecular weightM

ole

cula

r W

eig

ht

(Da)

Inhibition frequency Index (%)

Note; Compounds

required to have been

run in 50 HTS and

yielded > 50% effect in

a single screen to be

included

Compoundshitting ~1 target

Compounds hitting >10% of targets

*Inhibition frequency index (IFI) = % of screens where cpd yielded >50% inhibition, where total screens run => 50

Frequency at bin > Frequency at bin > Frequency at bin > Frequency at bin >

GSK HTS campaigns 2005-2010

Nu

mb

er

of

Scre

en

s

Mean + 3 *RSD of sample data (% control)

Nu

mb

er

of

Scre

en

s

Hit cut-off (% effect @ 10 uM) Hit rate (% of compounds) > cut-off

% compounds with effect > mean + 3 *RSD

Validation and robustness methods cannot detectProperty-biases

cLo

gP

MW

Compound sets used to test robustness of assays and

validate screening process reflect current compound

acquistion practice, not the collection as tested

Dose Response Data – Property Trends

cLogP

% R

ise

in A

ctiv

e R

ate

% o

f Te

sts

Yie

ldin

g p

XC

50

≥ 5

Molecular Weight

% o

f Te

sts

Yie

ldin

g p

XC

50

≥ 5 %

Rise

in A

ctive R

ate

No, size and lipophilicity dependence is still observed in the rate ofidentifying compounds at 10uM activity or better

Is the observed size & lipophilicity bias in HTS single-shot testing an artifactof false positives, e.g. experimental “noise”?

% R

ise in

Active

Rate

Molecular Property Correlations in GSKscreen

Across 2.09M cpds in GSKscreen

Property R2, ± vs MW R2, ± vs ClogP

MW 1, + 0.21, +

ClogP 0.21, + 1.0, +

HAC 0.92, + 0.19, +

fCsp3 0.15, + 0.00

RotBonds 0.36, + 0.04, +

tPSA 0.16, + 0.08, -

Chiral 0.02, + 0.00

HetAtmRatio 0.02, - 0.34, -

Complexity 0.31, + 0.02, +

Flexibility 0.02, + 0.00

AromRings 0.22, + 0.16, +

HBA 0.11, + 0.10, -

HBD 0.01, + 0.02, -

Table below shows the correlation coefficients (R2) between particular molecular properties and MW/ClogP, along with whether the correlation is positive or negative (i.e. the sign of the slope in a linear regression) This data is computed using 2.09M compounds comprising GSKscreen

interplay between screenng_data_and _properties_pope

Technology

binned raw hts data

gsk data

raw mean

robust mean

robust distribution

raw sd

compounds typical hts

real hits hts