characterization of chemical libraries using scaffolds and network models

32
Characteriza*on of Chemical Libraries Using Scaffolds and Network Models DacTrung Nguyen, Rajarshi Guha NIH NCATS ACS Na:onal Mee:ng, Boston 2015

Upload: rguha

Post on 15-Apr-2017

712 views

Category:

Science


0 download

TRANSCRIPT

Page 1: Characterization of Chemical Libraries Using Scaffolds and Network Models

Characteriza*on  of  Chemical  Libraries  Using  Scaffolds  and  

Network  Models  

Dac-­‐Trung  Nguyen,  Rajarshi  Guha  NIH  NCATS  

ACS  Na:onal  Mee:ng,  Boston  2015  

Page 2: Characterization of Chemical Libraries Using Scaffolds and Network Models

Outline  

OR  

Page 3: Characterization of Chemical Libraries Using Scaffolds and Network Models

Mo*va*ons  

•  Library  comparison  usually  driven  by  a  need  to  construct  or  expand  a  library  – OLen  with  constraints  on  resources  

•  Two  classes  of  features  to  consider  – Compound-­‐centric  (physchem  proper:es,  bioac:vity,  target  preferences)  

– Library-­‐centric  (diversity,  chemical  space  coverage)  •  Library  comparisons  generally  reduce  to  – Distribu:ons  of  compound  features  (univariate)  – Overlap  in  some  chemical  space  (mul:variate)  

Page 4: Characterization of Chemical Libraries Using Scaffolds and Network Models

Comparing  Libraries  

•  Most  comparisons  employ  a  reduced  (numerical)  representa:on  of  the  structure  – Fingerprints,  BCUTs,  physicochemical  descriptors  

•  Perform  comparisons  in  the  new  space    – PCA,  SOM,  MDS,  GTM,  …  

Schamberger  et  al,  DDT,  2011,  16,  636-­‐641;  Kireeva  et  al,  Mol.  Inf.,  2012,  31,  301-­‐312  

Page 5: Characterization of Chemical Libraries Using Scaffolds and Network Models

Scaffolds  &  Networks  

•  Scaffolds  represent  a  chemically  meaningful  reduced  representa:on  of  the  structures  

•  Can  be  challenging  to  define  what  a  (good)  scaffold  is  

•  A  network  representa:on  of  the  collec:on  of  structures  allows  for  novel  ways  to  perform  library  comparisons  – How  fine  grained  can  such  comparisons  be?  

Page 6: Characterization of Chemical Libraries Using Scaffolds and Network Models

Scaffold  Network  Representa*ons  

•  Scaffolds  are  generated  by  exhaus:ve  enumera:on  of  SSSR  

•  Scaffolds  are  nodes,  connected  by  directed  edges    •  Nodes  are  labeled  by  a  hash  key  of  the  scaffold  

4  compounds   1912  compounds  

Page 7: Characterization of Chemical Libraries Using Scaffolds and Network Models

Scaffold  Network  Construc*on  

•  A  scaffold  network  is  a  directed  graph  •  Edges  denote  sub/super-­‐structure  rela:onships  between  scaffolds  

•  Each  node  in  the  network  represents  a  unique  scaffold          

•  Singletons  are  acyclic  molecules    

Page 8: Characterization of Chemical Libraries Using Scaffolds and Network Models

Datasets  CL1420,  31320  compounds  

CL886,  3552  compounds  

MIPE,  1920  compounds  

Natural  Products,  5000  compounds      Mathews  and  Guha  et  al,  PNAS,  2014,  111,  11365;  Singh  et  al,  JCIM,  2009,  49,  1010  

LOPAC,  1280  compounds  

1079  nodes,  115287  edges  69  trees  

2131  nodes,  1843  edges  129  trees  

Approved,  inves:ga:onal  drugs,  constructed  for  func:onal  diversity    Diverse  library,  designed  for  enrichment  of  bioac:vity  

15283  nodes,  13622  edges  729  trees  

5563  nodes,  4832  edges  239  trees  

23716  nodes,  21468  edges  750  trees  

Page 9: Characterization of Chemical Libraries Using Scaffolds and Network Models

•  The  overall  structure  of  the  complete  network  can    characterize  the  library  

•  But  distribu:ons  of  vertex-­‐level  network  metrics  may    be    informa:ve  

•  We  can  also  consider  approaches  to  iden:fy  “important”  scaffolds  

Scaffold  Network  Representa*ons  

Page 10: Characterization of Chemical Libraries Using Scaffolds and Network Models

Metrics  for  the  Complete  Network  

•  Examined  vertex-­‐level  measures  of  centrality  – Closeness,  betweenness,  …  – High  similarity  of  MIPE  &  NP  and  low  similarity  of  LOPAC  &  NP  is  surprising  (Ertl  et  al,  JCIM,  2008)  

0.00

0.25

0.50

0.75

−10 −9 −8 −7 −6 −5log10(Betweenness)

density

CL1420CL886LOPACMIPENP

0

5000

10000

15000

20000

−8 −7 −6log Closeness (in−degree)

Num

. Sca

ffold

CL1420CL886LOPACMIPENP

Page 11: Characterization of Chemical Libraries Using Scaffolds and Network Models

0.00

0.25

0.50

0.75

1.00

0.00

0.25

0.50

0.75

1.00

0.000

0.025

0.050

0.075

Centralization

CPL

Transitivity

CL1420 CL886 LOPAC MIPE NPLibrary

Value

Metrics  for  the  Complete  Network  

•  Useful  to  summarize    distribu:ons  by  scalar    metrics  

•  Path  length  metrics  are  not  discriminatory  due    to  many  short  paths  

•  Extent  of  clustering  differs  but  is  quite  low  overall  

Page 12: Characterization of Chemical Libraries Using Scaffolds and Network Models

Comparing  Complete  Networks  

•  Library  overlap  is  characterized  by  the  set  of  common  scaffolds  

•  Scaffolds  can  be  ranked  (e.g.,  PageRank)  –   Small  fragments  have  low  PR  – Large  frameworks  have  high  PR  –  Interes:ng  scaffolds  lie  in  between?  

•  Similar  libraries  will  have    common  scaffolds  with  similar  PageRank  values  

PageRank vector

PageRank vector

Subset Common

Fragments

Subset Common

Fragments

Normalized Dot Product

Page 13: Characterization of Chemical Libraries Using Scaffolds and Network Models

Comparing  Complete  Networks  

1 0 0 0 0

0 1 0 0 0

0 0 1 0.2 0.3

0 0 0.2 1 0.3

0 0 0.3 0.3 1

CL1420

CL886

LOPAC

MIPE

NP

CL1420 CL886 LOPAC MIPE NP

Page 14: Characterization of Chemical Libraries Using Scaffolds and Network Models

Scaffold  Recogni*on  •  What  is  a  scaffold?  •  Can  be  addressed  through  the  scaffold  network  – A  scaffold  is  a  hub  within  the  scaffold  network  

•  Provide  a  prac:cal  answer  to  “What  are  the  missing  scaffolds  in  my  library”  

•  Examples  of  unique  scaffolds  in  MIPE  but  not  in  NP  

 

Page 15: Characterization of Chemical Libraries Using Scaffolds and Network Models

Scaffold  Comparison  

Page 16: Characterization of Chemical Libraries Using Scaffolds and Network Models

Reduced  Network  Representa*on  

•  The  complete  network  can  be  reduced  to  a  forest  of  trees  

•  Order  nodes  by  out-­‐degree  •  From  each  node,  traverse  network  un:l  a  terminal  node  is  reached  

•  Result  is  a  set  of  spanning  trees  

Page 17: Characterization of Chemical Libraries Using Scaffolds and Network Models

Reduced  Network  Representa*on  

MIPE,  1912  compounds  

Page 18: Characterization of Chemical Libraries Using Scaffolds and Network Models

Network  Structure  

•  A  scaffold  forest  is  characterized  by  – Disconnected  components    

•  structurally  related  scaffolds,  scaffolds  diversity  – Singletons    

•  scaffolds  with  no  superstructure  – Branching  within  connected  components  

•  scaffold  complexity  

Page 19: Characterization of Chemical Libraries Using Scaffolds and Network Models

Forest  Size  vs  Library  Size  

•  A  large  libraries  doesn’t  imply  a  large  forest  •  Forest  size  is  a  func:on  of  scaffold  diversity  

CL1420,  31K  combinatorial  library   MIPE,  1912  (target)  diverse  library  

Page 20: Characterization of Chemical Libraries Using Scaffolds and Network Models

Summarizing  Forests  

•  A  key  feature  is  the  nature  of  branching  in  individual  trees  

•  Characterized  by  ID  -­‐  informa:on  theore:c  descriptor  of  branching  derived  from  the  distance  matrix  

Bonchev  &  Trinajis:c,  IJQC,  1978,  14,  293-­‐303  

ID  =  978   ID  =  90794  ID  =  3456   ID  =  979252  

Page 21: Characterization of Chemical Libraries Using Scaffolds and Network Models

Summarizing  Forests  

•  Distribu:on  of  ID    dis:nguishes  datasets    primarily  in  the  tails  

•  Aggrega:ng  by  mean  ID  s:ll  discriminates  well  – Driven  by  the  tails  

0.00

0.25

0.50

0.75

1.00

2 4 6log10(ID)

Density

CL1420CL886LOPACMIPENP

0

1

2

3

4

CL1420 CL886 LOPAC MIPE NP

Mea

n lo

g10(

I D)

Page 22: Characterization of Chemical Libraries Using Scaffolds and Network Models

Exploring  the  Forest  

•  The  metric  also  allows  us  to  drill  down  – Select  scaffolds  of  given  branching  complexity  –  Iden:fy  scaffolds  of  given  complexity  range  across  different  libraries  (equivalent  to  finding  holes  in  scaffold  coverage)  

LOPAC,  ID  =  10214   MIPE,  ID  =  10197  

Page 23: Characterization of Chemical Libraries Using Scaffolds and Network Models

Library  Comparison  via  Merging  

•  …  reduces  to  comparing  networks  •  We  compute  a  graph  union  and  construct  new  edges  between  nodes  with  the  same  hash  

•  How  does  the  network  structure  of  the  union  differ  from  the  original    networks?  

•  Can  be  extended  to  merge  more  than  two  networks  

Page 24: Characterization of Chemical Libraries Using Scaffolds and Network Models

Source  Forests  

•  Structurally  similar  networks  

•  2659  iden:cal  nodes  

•  Construct  union  by  connec:ng  nodes  with  iden:cal  hash  

LOPAC   MIPE  

Page 25: Characterization of Chemical Libraries Using Scaffolds and Network Models

Merged  Network  

•  Green  edges  “bridge”  the  two  networks  

•  Trees  can  now  have  two  types  of  nodes  

•  How  can  we  characterize  the  – Contrac:on?  – Degree  of  mixing?  

Page 26: Characterization of Chemical Libraries Using Scaffolds and Network Models

Contrac*on  to  Measure  Overlap  

•  Merging  very  similar  libraries  should  generate    a  smaller  forest  compared  to  the  original  forests              

•  But  this  doesn’t  really  describe  how  the  individual  trees  become  (more)  connected    

Cnorm =F12

F1 + F2

where Fi = G1i,G2i,!,Gni{ }

0.00

0.25

0.50

0.75

1.00

Cl886/CL1420 MIPE/CL886 MIPE/LOPAC MIPE/NP

Cnorm

Page 27: Characterization of Chemical Libraries Using Scaffolds and Network Models

0

25

50

75

100

Cl886/CL1420 MIPE/CL886 MIPE/LOPAC MIPE/NP

% o

f tre

es

Assortive Not Assortive

Assorta*vity    to  Measure  Overlap  

•  Quan:fies  the  no:on  that  “like    connects  to  like”  

•  Undefined  for  trees  that  only  have  one  type  of  vertex  (i.e.,  only  from  a    single  library)  

•  The  number  of  trees    that  are  assorta:ve  is  a  global  indicator  of    library  similarity  

Newman,  Phys.  Rev.  E.,  2003,  026126  

Page 28: Characterization of Chemical Libraries Using Scaffolds and Network Models

0

10

20

30

0.4 0.6 0.8 1.0Assortativity

density

Cl886/CL1420

MIPE/CL886

MIPE/LOPAC

MIPE/NP

Assorta*vity    to  Measure  Overlap  

•  We  then  examine  the  distribu:on  of  assorta:vity  across  assorta:ve  trees  

•  Dissimilar  libraries  have  few  assorta:ve  trees  – But  they  have  high  values  of  assorta:vity  

•  However,  high  assorta:vity  doesn’t  imply  high  overlap  

Page 29: Characterization of Chemical Libraries Using Scaffolds and Network Models

Assorta*vity    to  Measure  Overlap  

Assorta:vity  =  0.85  (MIPE  &  NP)  

Assorta:vity  =  0.95  (CL886  &  CL1420)  

Page 30: Characterization of Chemical Libraries Using Scaffolds and Network Models

Overlap  via  Tree  Complexity  

•  Similar  libraries  lead  to  fewer  trees  in  the  merged  network,  but  also  denser  trees              

•  Change  in  density  (branching)  across  the  forest  can  also  measure  the  extent  of  overlap  

MIPE   LOPAC   Merged  

Page 31: Characterization of Chemical Libraries Using Scaffolds and Network Models

Summarizing  via  Tree  Complexity  

•  Distribu:ons  of  ID  before  and  aLer  merging  don’t  differ  very  much,  visually  

•  However  a  KS  test  does  discriminate  them  

0.0

0.2

0.4

0.6

0.8

1 2 3 4log10(ID)

density

IndividualMerged

CL886  /  CL1420   MIPE  /  NP  

0.0

0.1

0.2

0.3

0.4

2.5 5.0 7.5log10(ID)

density

IndividualMerged

D  =  0.0173,  p  =  1   D  =  0.0582,  p  =  .0008  

Page 32: Characterization of Chemical Libraries Using Scaffolds and Network Models

Summary  

•  Scaffold  networks  are  a  rela:vely  objec:ve  way  to  characterize  &  compare  libraries  – Supports  fast  comparisons  between  libraries  

•  The  approach  supports  mul:plexing  informa:on  in  to  a  single  data  structure  – Physchem  proper:es,  bioac:vi:es,  …  

•  “What  is  a  good  comparison?”  quickly  becomes  a  philosophical  ques:on