joining the dots: integrating high throughput small molecule and rnai screens
TRANSCRIPT
Joining the Dots: Integra0ng High Throughput Small Molecule and
RNAi Screens
Rajarshi Guha NIH Chemical Genomics Center
January 24, 2010 CCMB Seminar Series
Background
• Primarily cheminforma0cs – Data mining, algorithm development, soHware – QSAR, diversity analysis, virtual screening, fragments, polypharmacology, networks
– Work on a variety of Open Source projects
• Recently started moving into bioinforma0cs – Suppor0ng RNAi screens
• Integrate small molecule informa1on & biosystems – systems chemical biology
NIH Chemical Genomics Center
Biology Chemistry
Informa0cs ACOM
NCGC
Assay development and op1miza1on
Compound Op1miza1on
Automa1on, Compound management
SAR analysis, method & tool development
Small Molecules
Genome wide RNAi
Outline
• Small molecule screening at NCGC • The NCGC RNAi infrastructure • Making connec0ons
• RNAi challenges
Small Molecules
Target Iden0fica0on
Lead Discovery
Lead Op0miza0on
Clinical Development
Hun0ng for Leads
• Sensi0vity • Scaling
Assay Op0miza0on
• Fluorescence • High Content
Primary Screening • Select subset
to follow up • Diversity
Cherry Picking
• Counter screen
• Explore SAR
Confirma0on
HTS
The qHTS Paradigm
• Tradi0onal single point screens can miss useful hits
• qHTS involves concentra0on response assays on a high‐throughput scale
• The CRC allows us to categorize hits in a more fine‐grained manner
Inglese, J et al, Proc. Natl. Acad. Sci., 2006, 103, 11473‐11478
Conc. Response Curves • Heuris0c assessment of the significance of a
concentra0on response curve
• We aggregate certain curve classes into “ac0ve”, “inconclusive” and “inac0ve” categories
• Inconclusive is a “catch all” category (i.e., if it not clearly ‘ac0ve’ or ‘inac0ve’)
8
Inac1ve
Ac1ve
Inconclusive
Annota0ons
• NCGC employs a variety of screening libraries – MLSMR (~ 300K)
– LOPAC (~ 1300) – Prestwick, Sytravon, … – Beyond structures and vendor ID’s, not a whole lot of annota0on
– This is a required step for integra0on with RNAi – Obviously not possible for large diverse libraries
• Use target predicBon models?
RNAi
Trans‐NIH RNAi Ini0a0ve ‐ Mission
• Gene func0on • Pathway analysis • Target ID • Compound MoA • Drug antagonist/agonist
To establish a state of the art RNAi screening facility to perform genome-wide RNAi screens with investigators in the intramural NIH community.
Current Status
• Using Qiagen libraries (Kinome & HDG) – Performing comparisons with other vendors
• Pilot phase, run 38 screens so far, ranging from 3 plates to 100 plates
• All screens are currently reporter based
• Will start up phenotypic screens this summer, with new robo0cs
Plate Index
Z
0.2
0.4
0.6
0.8
0 20 40 60 80 100
!
!!!
!
!
!
!
!
!
!
!
!
!!!
!!
!
!
!
!!
!
!!!!
!
!
!
!
!
!!!
!
!
!!
!
!
!!
!
!!!
!
!!
!
!
!
!!
!!
!
!
!!!
!!
!
!!
!
!!
!!
!!
!!
!!
!!
!
!
!
!
!!
!!
!!
!
!!
!
!
cpt!hdg!20nm
!
!
!
!
!!
!
!!
!
!
!
!
!!!
!
!
!!
!
!
!
!!!!
!!!!!
!
!
!
!
!
!
!
!!
!
!
!
!
!!
!
!
!!!
!
!
!
!
!
!
!
!
!
!
!
!!
!!
!
!
!!!!!!
!
!
!
!
!
!!!
!
!
!!
!
!!!!!
!
!
cpt!hdg!5nm
0 20 40 60 80 100
!
!
!!!!!!!!!!!
!
!
!!!!!
cpt!hdg!followup
!
!
!!
!
!!!!!
!!!
!
!
!!!
!!!
!
!!
!
!
!
!!!!!!
!!
!!!!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!!!!
!
!
!
!
!
!
!!
!
!!
!!!!
!!
!!
!!
!
!
!
!
!
!!!
!
!!!
!
!
!!
!
cpt!hdg!redo!20nm
!
!
!
!!
!
!
!
!
!!
!!
!
!
!
!
!!
!
!
!
!!
!
!
!
!
!!
!!!!!
!
!
!!!!!!!
!!!!
!!
!
!!!
!
!!
!!
!
!
!
!
!!
!!
!
!
!
!
!!!!!!!
!
!!
!
!
!
!!
!
!!
!
!!
!
!
!!
cpt!hdg!redo!5nm
!
!
!!!!!!!
!!!!!!
!
!!!!
!
!!!
!!
!!
!
!!!
!
!!!!
!!
!
!
!
!
!!
!
!!!!!!!!
!
!
!
!
!
!!!!!!!
!
!
!
!!!
!
!!!
!
!!
!!
!
!
!!
!
!!
!
!
!!
!
!!
!
cpt!hdg!redo!vo
!!!
!!
!
!
!
!
!!!
!
!!!!!
!!!!!!!!!!!!!
!!!
!
!
!
!
!
!!
!!!
!
!!!!!!
!
!
!!
!!!!!
!
!
!!
!
!!
!
!
!
!
!!!
!!!
!
!
!
!
!
!!!!
!
!
!!
!!!!!
!
cpt!hdg!vo
0.2
0.4
0.6
0.8
!
!
!
cpt!mirna!20nm
0.2
0.4
0.6
0.8!
!!
cpt!mirna!5nm
!!!
cpt!mirna!vo
!
!
!
!!
!
!
indeno!776!10
!
!
!
!!!!!
indeno!776!20
!
!!
!!!!!
indeno!998!40
0 20 40 60 80 100
!
!
!
!!
!
!!
indeno!998!80
0.2
0.4
0.6
0.8
!
!
!!
!
!
!
!
indeno!vo
RNAi Informa0cs Infrastructure
• Summary sta0s0cs
• Correc0ons
QC
• Median • Quar0le • Background
Normaliza0on • Thresholding • Hypothesis tes0ng
• Sum of ranks
Hit Selec0on
• GO seman0c similarity
• Pathways • Interac0ons
Hit Triage
RNAi Analysis Workflow
Raw and Processed
Data
GO annota0ons Pathways Interac0ons
Hit List Follow‐up
RNAi Informa0cs Toolset
• Local databases (screen data, pathways, interac0ons, etc).
• Commercial pathway tools.
• Custom soHware for loading, analysis and visualiza0on.
Back End Services
• Currently all computa0onal analysis performed on the backend
• R & Bioconductor code • Custom R package (ncgcrnai) to support NCGC infrastructure – Partly derived from cellHTS2 – Supports QC metrics, normaliza0on, adjustments, selec0ons, triage, (sta0c) visualiza0on, reports
• Some Java tools for – Data loading – Library and plate registra0on
User Accessible Tools
User Accessible Tools
Deploying Data
• Small molecule HTS results are available via PubChem – RNAi data is also showing up in PubChem
• But what do we want to make available?
• How do we make it available? – Standardized format (MIARE)
– cellHTS2 “format” – Custom viewers – Raw data? Calls?
Challenge ‐ RNAi & Small Molecule Screens
Goal: Develop systems level view of small molecule activity
• Reuse pre-existing MLI data • Develop new annotated libraries
TACGGGAACTACCATAATTTA
CAGCATGAGTACTACAGGCCA
• Run parallel RNAi screen
What targets mediate activity of siRNA and compound
Pathway elucidation, identification of interactions
Target ID and validation
Link RNAi generated pathway peturbations to small molecule activities. Could provide insight into polypharmacology
HTS for NF‐κB Antagonists
• NF‐κB controls DNA transcrip0on
• Involved in cellular responses to s0muli – Immune response, memory forma0on
– Inflamma0on, cancer, auto‐immune diseases
hnp://www.genego.com
HTS for NF‐κB Antagonists
• ME‐180 cell line • S0mulate cells using TNF, leading to NF‐κB ac0va0on, readout via a β‐lactamase reporter
• Iden0fy small molecules and siRNA’s that block the resultant ac0va0on
Small Molecule HTS Summary
• 2,899 FDA‐approved compounds screened
• 55 compounds retested ac0ve
• Which components of the NF‐κB pathway do they hit? – 17 molecules have target/pathway informa0on in GeneGO
– Literature searches list a few more
!9 !8 !7 !6 !5
!60
!40
!20
0
log Concentration (uM)
Activity
!
!
!
!
! !
!
!
!
! !
!
!
!!
!9 !8 !7 !6 !5
!100
!60
!20
0
log Concentration (uM)
Activity
! ! !!
!
!
!
!
!
!! ! ! ! !
!9 !8 !7 !6 !5
!60
!40
!20
0
log Concentration (uM)
Activity
!
!
! !
!
!
! !
!
!!
!
!
!!
Most Potent Actives Proscillaridin A
Trabectidin
Digoxin
Miller, S.C. et al, Biochem. Pharmacol., 2010, ASAP
RNAi HTS Summary
• Qiagen HDG library – 6886 genes, 4 siRNA’s per gene
• A total of 567 genes were knocked down by 1 or more siRNA’s – We consider >= 2 as a “reliable” hit
– 16 reliable hits – Added in 66 genes for follow up via triage procedure
The Obvious Conclusion
• The ac0ve compounds target the 16 hits (at least) from the RNAi screen – Useful if the RNAi screen was small & focused
• But what if we’re inves0ga0ng a larger system? – Is there a way to get more specific?
– Can compound data suggest RNAi non‐hits?
Small Molecule Targets
• Some small molecules interact with core components
Bortezomib (proteosome inhibitor)
Daunorubicin (IκBα inhibitor)
!9 !8 !7 !6 !5
!100
!80
!60
!40
!20
0
log Concentration (uM)
Activity
! !
!
!
!
! !
!
!
!
!
!!
!!
!9 !8 !7 !6 !5
!120
!80
!60
!40
!20
0
log Concentration (uM)
Activity
! ! !
!
!
! !! !
!
!
!
!
!
!
Small Molecule Targets
• Others are ac0ve against upstream targets
• We also get an idea of off ‐target effects
Montelukast (LDT4 antagonist)
!9 !8 !7 !6 !5
!100
!80
!60
!40
!20
0
log Concentration (uM)
Activity
! !
! !
!!
!
!!
!
! !
!
!
!
Compound Networks ‐ Similarity
• Evaluate fingerprint‐based similarity matrix for the 55 ac0ves
• Connect pairs that exhibit Tc> 0.7
• Edges are weighted by the Tc value
• Most groupings are obvious
A “Dic0onary” Based Approach
• Create a small‐ish annotated library – “Seed” compounds
• Use it in parallel small molecule/RNAi screens
• Use a similarity based approach to priori0ze larger collec0ons, in terms of an0cipated targets – Currently, we’d use structural similarity – Diversity of priori0zed structures is dependent on the diversity of the annotated library
Compound Networks ‐ Targets
• Predict targets for the ac0ves using SEA • Target based compound network maps nearly iden0cally to the similarity based network
• But depending on the predicted target quality we get poor (or no) mappings to the RNAi targeted genes
Keiser, M.J. et al, Nat. Biotech., 2007, 25, 197‐206
Gene Networks ‐ Pathways
• Nodes are 1374 HDG genes contained in the NCI PID
• Edge indicates two genes/proteins are involved in the same pathway
• “Good” hits tend to be very highly connected
Wang, L. et al, BMC Genomics, 2009, 10, 220
(Reduced) Gene Networks – Pathways
• Nodes are 526 genes with >= 1 siRNA showing knockdown
• Edge indicates two genes/proteins are involved in the same pathway
Pathway Based Integra0on
• Direct matching of targets is not very useful • Try and map compounds to siRNA targets if the compounds’ predicted target(s) and siRNA targets are in the same pathway – Considering 16 reliable hits, we cover 26 pathways – Predicted compound targets cover 131 pathways
• For 18 out of 41 compounds
– 3 RNAi‐derived pathways not covered by compound‐derived pathways • Rhodopsin, alterna0ve NFkB, FAS
Pathway Based Integra0on
• S0ll not completely useful, as it only handled 18 compounds
• Depending on target predic0ons is probably not a great idea
Integra0on Caveats
• Biggest bonleneck is lack of resolu0on • Currently, both small molecule and RNAi data are 1‐D – Ac0ve or inac0ve, high/low signal – CRC’s for small molecules alleviate this a bit
• High content screens can provide significantly more informa0on and so bener resolu0on – Data size & feature selec0on are of concern
Integra0on Caveats
• Compound annota0ons are key • More comprehensive pathway data will be required
• RNAi and small molecule inhibi0on do not always lead to the same phenotype – Could be indica0ve of promiscuity
– Could indicate true biological differences
Weiss, W.A. et al, Nat. Chem. Biol., 2007, 12, 739-744
CPT Sensi0za0on & “Central” Genes
TOP1 poisons prevent DNA religation resulting in replication-dependent double strand breaks. Cell activates DNA damage response (e.g. ATR).
Yves Pommier, Nat. Rev. Cancer, 2006.
Screening Protocol
Screen conducted in the human breast cancer cell line MDA-MB-231. Many variables to optimize including transfection conditions, cell seeding density, assay conditions, and the selection of positive and negative controls.
Hit Selection Follow-Up Dose Response Analysis
CPT (Log M)
ATR
MAP3K7IP2
CPT (Log M)
Viab
ility
(%)
Viab
ility
(%)
siNeg siATR-A siATR-B siATR-C
siNeg siMAP3K7IP2-A siMAP3K7IP2-B siMAP3K7IP2-C siMAP3K7IP2-D
Multiple active siRNAs for ATR, MAP3K7IP2, and BCL2L1.
Screen #2
Screen #1
Sensitization Ranked by Log2 Fold Change
Sensitization Ranked by Log2 Fold Change
Are These Genes Relevant?
• Some are well known to be CPT‐sensi0zers • Consider a HPRD PPI sub‐network corresponding to the Qiagen HDG gene set
• How “central” are these selected genes? – Larger values of betweenness indicate that the node lies on many shortest paths
– Makes sense ‐ a number of them are stress‐related
– But some of them have very low betweenness values
log Betweenness
log
Fre
qu
en
cy
0 2 4 6
0.0
0.5
1.0
1.5
2.0
2.5
3.0
Are These Genes Relevant? • Most selected genes are densely connected
• A few are not – Generally did not reconfirm
• Network metrics could be used to provide confidence in selec0ons
!
!
!
ACTC1
TWF1
BMPR2
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
COL4A1
CD44
PLG
LCN2
COL1A1
COL1A2
MMP9
MMP7
PRSS2
AREG
COL4A3
COL4A2
COL4A4
COL4A5
COL4A6
FN1
THBS1
IL8
CXCL1
HAPLN1
MMP10
THBS2
TIMP3
KISS1
PZP
BTC
RECK
MMP26
CXCL5
TFPI
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
! !
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
SNCA
! PSEN1
CASP1
BCL2
BCAP31
TP53
MAPK8
IRS1
BCL2L1
CASP8
BAX
CYCS
IRS2
BCL2L11
CAPN1PSEN2
ANTXR1 BAD
FKBP8BAK1
CASP9
VDAC1
CRYAA
CRYAB
BAG1
SIVA1
PPP1CA
CFLAR
BNIP1
BNIP3
BIK
HRK
RAD9A
BECN1
BCL2L14
BMF
BCL2L10
RTN4
BNIP3L
PMAIP1
BCLAF1
MOAP1
NLRP1
IKZF3
TEGT
AVEN
BCL2L12
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
RB1
ACTB
CASP3
HD
ALDOA
TGM2
TUBB
FN1
ITGB1ITGB3
SPARC
HIST2H2BE
SERPINF2
GSTP1
LTBP1
ANXA1
RHOA
PLCD1
TMSB4X
MAP3K12
EIF5AS100A7
KPNA3KPNA4PPHLN1
HIST1H2BG
!
!
!
!
!
!
!
!
!
!
!
!
!
SPTB
FYN
PRNP
FGFR1L1CAM
NCAM1
BDNF
NCAN
ST8SIA2
ST8SIA4
ST8SIA3
GFRA1
GDNF
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
EP300
MDM2
TP53
DAXX
CDKN1A
HCK
AR
GGA3
GGA1
TSG101
DNMT1
DMAP1
HGS
AATF
PDCD6IP
UBA52
VPS28
VPS37A
LRSAM1
VPS37C
VPS37D
VPS37B
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
RAC1
PPP1R9A
MAPK1
CDC2
PPP1R9B
PPP2CA
PDPK1
AKT1
EIF4EBP1
CDC42
RPS6KB1
EEF2K
TERT
FRAP1
TRAF4
STK11
NCBP1
RPS6
NEK6
COASY
POLDIP3
!
!
!
!
!
!
!
!
!
EP300
CREBBP
PDC
CRX
BANF1
NR2E3
NRL
IPO13RAXL1
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
CHD4
TP53
BRCA1
KDR
E2F1
XRCC5
NBN
CHEK2
CLSPN
CHEK1
MSH2
ATR
TREX1
XPA
RHEB
FLT1
RAD17
EEF1E1
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
FYNGRB2
IFNAR1
LCK
PLCG1
SHC1
RASA1
PTPRC
MUC1
CRK
PTPN6
SOS1
CD79B
VAV1
CBL
ABL1
FCGR3A
CD5 CD3E
TUBB
SHB
PTK2B
LCP2
SH3BP2
LAT
CBLB
SIT1
SH2B3
PAG1
GAB2
LAX1
ACP1
TUBA4A
DEF6
CD247
PRLR
DUSP3
ZAP70
WIPF1
SLA
SLAMF6
SLA2
TYROBP
DBNL
PTPN3
FCRL3
NFAM1
CARD11
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
GNA12
YWHAG
THRAYWHAB
YWHAZ
GNAQ
ESR2
ESR1
YWHAE
PRKAR2A
PPARA
AKAP13
RHOA
RXRB
CTNNAL1
!
!
!
GP1BA
GP9
GP1BB
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
RB1
SRC
LEPR
PRKCA
PLCG1
HRAS
SNTA1
RBL1
RBL2
DMD
SNTB1
SNTB2
DGKZ
SNTG1
RASGRP1
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!
!!
!
!
!
!
!
!
!
!
!
STAT3
RELA
SMAD3
!
MAP2K4
MAPK14
EIF2AK2
MAPK8
MAP3K5
MAP4K4
NRIP1
MAP4K1
TRAF6
TRAF3IP2
FOS
CHUKIKBKB
PPM1B
IRAK1
HGS
MAP3K14
IL17RD
SMAD6
IKBKAP
BIRC4
PEBP1
HIPK2
MAP3K7
MAP2K6MAP3K7IP1
SMAD7
PELI3
MAP3K3
TNFRSF11ABIRC1
PPM1L
BCL10
ALS2CR2
CARD11
!
!
!
LEF1
ALX4
CART1
!
!
!
!
!
MC4R
GHRL
MC5R
MC3R
AGRP
Challenge ‐ miRNA Target ID
• Screened a set of 885 human miRNA’s for CPT sensi0za0on
• Iden0fied 23 sensi0zing miRNA’s • But, we don’t have target informa0on
– Predic0ons aren’t par0cularly helpful – Poor overlap with siRNA hits
• Link pathogenic miRNA’s to human targets?
miRAnda TargetScan
Challenge – RNAi Meta Analyses
• Building up a collec0on of screens – Across cell lines, species, … – Not necessarily “designed”
• What do we do with this? – Iden0fy consistent markers – Characterize differences between cell lines
– Extrapolate from gene knockdown to pathway and higher level differences
– Merge with gene expression data
Challenge – Combinatorial RNAi
• Elegant way to probe gene interac0ons • Extend to network interac0ons • Requires efficient experimental design
• Could lead to enhanced target iden0fica0on for polypharmacology
Nir, O. et al, Genome. Res., 2010, ASAP Sahin, O. et al, Proc. Natl. Acad. Sci., 2007,104, 6579-6584 Tischler, J. et al, Genome Biol., 2006, 7, R69
Conclusions
• Building up a wealth of small molecule and RNAi data
• “Standard” analysis of RNAi screens rela0vely straighxorward
• Challenges involve integra0ng RNAi data with other sources
• Primary bonleneck is dimensionality of the data – Simple flourescence‐based approaches do not provide sufficient resolu0on
– High‐content is required
The People
• Scon Mar0n • Pinar Tuzmen
• Dac Trung Nguyen • Yuhong Wang
• Ruili Huang
RNAi
Small Molecules