leveraging public domain bioactivity data with knime

29
George Papadatos, PhD Senior Technical Officer ChEMBL group [email protected] Leveraging public domain bioactivity data with KNIME

Upload: khangminh22

Post on 04-May-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

George  Papadatos,  PhD  Senior  Technical  Officer  ChEMBL  group  [email protected]  

Leveraging  public  domain  bioactivity  data  with  KNIME  

Genomes Ensembl

Ensembl Genomes EGA

Nucleotide sequence ENA

Functional genomics

ArrayExpress Expression Atlas

Protein Sequences UniProt

Protein families, motifs and domains

InterPro

Macromolecular PDBe

Protein activity IntAct, PRIDE

Pathways Reactome

Systems BioModels

BioSamples

Literature and ontologies CiteXplore, GO

Chemogenomics ChEMBL

•  ChEMBL database •  Curation •  Interface •  Research group

Chemical entities ChEBI

EMBL-­‐EBI  structure  

07/03/2013   KNIME  UGM  2  

Outline  •  Overview  of  ChEMBL  database  

•  IntroducGon,  contents,  access  •  What  ChEMBL  do  with  KNIME  •  What  KNIME  can  do  with  ChEMBL  

•  ChEMBL  KNIME  nodes  •  OpenChEMBL  •  UniChem  

07/03/2013  3   KNIME  UGM  

The  ChEMBL  Database  

KNIME  UGM  07/03/2013  4  

What  is  ChEMBL?  •  Open  access  database  for  drug  discovery  •  Freely  available  –  searchable  and  downloadable  •  Contents:  

•  BioacGvity  data  manually  extracted  from  the  primary  medicinal  chemistry  literature  

•  Deposited  data  from  neglected  disease  screening  (e.g.  Malaria)  •  Subset  of  data  from  PubChem  

•  BioacGvity  data  is  associated  with  a  biological  target  and  a  chemical  structure    

•  Updated  regularly  with  new  data  

07/03/2013  5   KNIME  UGM  

Drug  discovery  process  

07/03/2013  6  

Target Discovery

Lead Discovery

Lead Optimisation

Preclinical Development

Phase I Phase II Phase III Launch

• Target identification • Microarray profiling • Target validation • Assay development • Biochemistry • Clinical/Animal disease models

• High-throughput Screening (HTS) • Fragment-based screening • Focused libraries • Screening collection

• Medicinal Chemistry • Structure-based drug design • Selectivity screens • ADMET screens • Cellular/Animal disease models • Pharmacokinetics

• Toxicology • In vivo safety pharmacology • Formulation • Dose prediction

PK Tolerability

Efficacy Safety & Efficacy

Indication Discovery & expansion

Medicinal chemistry SAR Clinical candidates Drugs

Discovery Development Use

Clinical trials

KNIME  UGM  

>1,200,000  disGnct  compounds  ~27,000  disGnct  lead  series  

~12,000  candidates    

~1,400  drugs  

ChEMBL  database  

What  is  in  ChEMBL?  

KNIME  UGM  07/03/2013  7  

SAR Data

Compound

Ass

ay

Ki = 4.5 nM

>Thrombin MAHVRGLQLPGCLALAALCSLVHSQHVFLAPQQARSLLQRVRRANTFLEEVRKGNLERECVEETCSYEEAFEALESSTATDVFWAKYTACETARTPRDKLAACLEGNCAEGLGTNYRGHVNITRSGIECQLWRSRYPHKPEINSTTHPGADLQENFCRNPDSSTTGPWCYTTDPTVRRQECSIPVCGQDQVTVAMTPRSEGSSVNLSPPLEQCVPDRGQQYQGRLAVTTHGLPCLAWASAQAKALSKHQDFNSAVQLVENFCRNPDGDEEGVWCYVAGKPGDFGYCDLNYCEEAVEEETGDGLDEDSDRAIEGRTATSEYQTFFNPRTFGSGEADCGLRPLFEKKSLEDKTERELLESYIDGRIVEGSDAEIGMSPWQVMLFRKSPQELLCGASLISDRWVLTAAHCLLYPPWDKNFTENDLLVRIGKHSRTRYERNIEKISMLEKIYIHPRYNWRENLDRDIALMKLKKPVAFSDYIHPVCLPDRETAASLLQAGYKGRVTGWGNLKETWTANVGKGQPSVLQVVNLPIVERPVCKDSTRIRITDNMFCAGYKPDEGKRGDACEGDSGGPFVMKSPFNNRWYQMGIVSWGEGCDRDGKYGFYTHVFRLKKWIQKVIDQFGE

APTT = 11 min Targets  

Compounds  

BioacFviFes  

N

N

N

N

N

ON

O

N

O

H

H

H

H

H

PublicaFon  

What  is  in  ChEMBL?  

KNIME  UGM  07/03/2013  8  

ChEMBL_15  Compounds:  1,254,575    Assays:  679,259  Targets:  9,570  PublicaGons:  48,735  AcGviGes:  10,509,572  Data  sources:  16  

Increase  of  >230,000  compounds  from  literature  since  ChEMBL01  

How  to  access  ChEMBL?  1.  Web  interface  •  IntuiGve  and  secure  •  Compound,  assay,  target  search  

2.  SQL  dumps  and  flat  files  •  Oracle,  MySQL,  Postgresql*  dumps  and  .sd  file  

3.  RESTful  web  services  •  Exact,  substructure  &  similarity  search  •  BioacGviGes  for  compound,  assay  and  target  id  

•  hdps://www.ebi.ac.uk/chembldb/index.php/ws  

•  KNIME  examples  

KNIME  UGM  07/03/2013  9  

How  KNIME  is  used  in  the  group    •  I/O,  file  conversions,  data  retrieval  and  manipulaGon  

•  e.g.  format  Open  Source  Drug  Discovery  malaria  data  deposiGons  

•  ChemoinformaGcs,  data  modelling  and  visualisaGon  •  Ligand-­‐based  target  fishing  •  Data  quality  assurance  •  Automated  data  curaGon  

•  Text  mining  •  Chemical  named-­‐enGty  recogniGon  •  Document  classificaGon  

•  ChEMBL-­‐likeness  

KNIME  UGM  07/03/2013  10  

ChEMBL  KNIME  Nodes  

07/03/2013   KNIME  UGM  11  

RESTful  Web  Services  for  KNIME  •  Compound,  assay  and  target  look-­‐up  

•  CHEMBL_ID  (or  UniProt  Accession  for  targets)  as  input  

•  BioacGviGes  for  compound,  assay  and  target  •  CHEMBL_ID  as  input  

•  Compound  searching  •  Molecular  structure  as  input  •  Exact,  similarity  and  substructure  searching  

•  Advantages  •  Tighter  integraGon  of  KNIME  and  ChEMBL  data  •  No  need  for  internal  ChEMBL  database  or  SQL  queries  •  No  need  for  a  chemical  cartridge  

 hdps://www.ebi.ac.uk/chembldb/index.php/ws  

07/03/2013   KNIME  UGM  12  

ChEMBL  KNIME  nodes  

07/03/2013   KNIME  UGM  13  

hdp://tech.knime.org/book/embl-­‐ebi-­‐nodes  

Example:  All  bioactivities  for  hERG  

All  bioacGviGes  for  hERG  

07/03/2013   KNIME  UGM  14  

AcGvity  value,  assay  descripGon,  compound,  reference  

Example:  Compound  searching  in  ChEMBL  

Query  

List  of  NNs  

07/03/2013   KNIME  UGM  15  

Example:  Polypharmacology  proMile  

Compounds

Query  

07/03/2013   KNIME  UGM  16  

Find  NNs  

Retrieve  bioacGviGes    

Filter,  summarise  &  pivot    

…what  next?  •  Chemical  space  clustering  &  visualisaGon  •  (Q)SAR  analysis  

•  Data  modeling,  acGvity  cliffs,  FW,  MMP  analysis  •  Bioisosteric  replacements  mining  

•  De  novo  design  •  EvoluGonary  compound  opGmisaGon  

•  Target  fishing  •  (off-­‐)target  predicGon  and  ADR  analysis  •  Polypharmacology  networks  •  Druggability  /  Drug-­‐likeness  

KNIME  UGM  07/03/2013  17  

Download  the  ChEMBL  KNIME  workMlows    

…via  the  KNIME  Example  Flow  Server:  

07/03/2013   KNIME  UGM  18  

OpenChEMBL  +  KNIME  

07/03/2013   KNIME  UGM  19  

OpenChEMBL  Virtual  Machine  •  Packaged  in  a  VM  

•  ChEMBL  db,  Postgresql,  RDKit  toolkit  and  cartridge  •  Ubuntu  12.04  •  Exported  as  OVF  

•  Can  be  imported  by  VirtualBox  etc.  

•  Provides  •  Direct  database  connecGon  •  Custom  web  interface  •  Web  services  for  structure  search  

•  Available  as  lp  download  soon  

KNIME  UGM  07/03/2013  20  

OpenChEMBL  Interface  

07/03/2013   KNIME  UGM  21  

Using  KNIME  to  connect  to  the  VM  

KNIME  UGM  07/03/2013  22  

SELECT mr.*, md.chembl_id, cp.full_mwt, cp.alogp from mols_rdkit mr, molecule_dictionary md, compound_properties cp

where mr.m @> '$${SMolecule}$$'::qmol and mr.molregno = md.molregno and md.molregno = cp.molregno;

UniChem  +  KNIME  

07/03/2013   KNIME  UGM  23  

UniChem:  Linking  to  other  sources  

KNIME  UGM  07/03/2013  24  

 All  EBI  DBs  share  the  benefits  of  maintained  links  to  internal  and  external  resources  

etc.  EU_OPENSCREEN  

 The  ‘mapping  service’  will  be  opened  for  use  by  external  users  hdp://www.jcheminf.com/content/5/1/3/abstract  

KNIME  +  UniChem  

KNIME  UGM  07/03/2013  25  

ChEMBL  resources  

KNIME  UGM  07/03/2013  26  

ChEMBL  blog:  hdp://chembl.blogspot.com  

If  you  would  like  help:  chembl-­‐[email protected]  

For  ChEMBL  news  and  data  releases:  hdp://listserver.ebi.ac.uk/mailman/lisGnfo/chembl-­‐announce  

Webinars  

hdp://www.slideshare.net/gpapadatos/knime-­‐tutorial  

Summary  •  KNIME:  democraGzing  access  to  data  and  tools  •  Accessing  public  domain  structure  and  bioacGvity  data  with  KNIME  •  ChEMBL  KNIME  Nodes  •  OpenChEMBL  +  KNIME  •  UniChem  +  KNIME  

07/03/2013  27   KNIME  UGM  

Acknowledgements  •  ChEMBL  group  

•  Edmund  Duesbury  •  Rodrigo  Ochoa  •  Jon  Chambers  •  Mark  Davies  •  Shaun  McGlinchey  •  Anna  Gaulton  •  John  Overington  

•  Stephan  Beisken  

KNIME  UGM  07/03/2013  28  

•  RDKit  •  KNIME  •  KNIME  community  •  All  of  you  for  listening  

George  Papadatos,  PhD  Senior  Technical  Officer  ChEMBL  group  [email protected]  

Leveraging  public  domain  bioactivity  data  with  KNIME