leveraging public domain bioactivity data with knime
TRANSCRIPT
George Papadatos, PhD Senior Technical Officer ChEMBL group [email protected]
Leveraging public domain bioactivity data with KNIME
Genomes Ensembl
Ensembl Genomes EGA
Nucleotide sequence ENA
Functional genomics
ArrayExpress Expression Atlas
Protein Sequences UniProt
Protein families, motifs and domains
InterPro
Macromolecular PDBe
Protein activity IntAct, PRIDE
Pathways Reactome
Systems BioModels
BioSamples
Literature and ontologies CiteXplore, GO
Chemogenomics ChEMBL
• ChEMBL database • Curation • Interface • Research group
Chemical entities ChEBI
EMBL-‐EBI structure
07/03/2013 KNIME UGM 2
Outline • Overview of ChEMBL database
• IntroducGon, contents, access • What ChEMBL do with KNIME • What KNIME can do with ChEMBL
• ChEMBL KNIME nodes • OpenChEMBL • UniChem
07/03/2013 3 KNIME UGM
What is ChEMBL? • Open access database for drug discovery • Freely available – searchable and downloadable • Contents:
• BioacGvity data manually extracted from the primary medicinal chemistry literature
• Deposited data from neglected disease screening (e.g. Malaria) • Subset of data from PubChem
• BioacGvity data is associated with a biological target and a chemical structure
• Updated regularly with new data
07/03/2013 5 KNIME UGM
Drug discovery process
07/03/2013 6
Target Discovery
Lead Discovery
Lead Optimisation
Preclinical Development
Phase I Phase II Phase III Launch
• Target identification • Microarray profiling • Target validation • Assay development • Biochemistry • Clinical/Animal disease models
• High-throughput Screening (HTS) • Fragment-based screening • Focused libraries • Screening collection
• Medicinal Chemistry • Structure-based drug design • Selectivity screens • ADMET screens • Cellular/Animal disease models • Pharmacokinetics
• Toxicology • In vivo safety pharmacology • Formulation • Dose prediction
PK Tolerability
Efficacy Safety & Efficacy
Indication Discovery & expansion
Medicinal chemistry SAR Clinical candidates Drugs
Discovery Development Use
Clinical trials
KNIME UGM
>1,200,000 disGnct compounds ~27,000 disGnct lead series
~12,000 candidates
~1,400 drugs
ChEMBL database
What is in ChEMBL?
KNIME UGM 07/03/2013 7
SAR Data
Compound
Ass
ay
Ki = 4.5 nM
>Thrombin MAHVRGLQLPGCLALAALCSLVHSQHVFLAPQQARSLLQRVRRANTFLEEVRKGNLERECVEETCSYEEAFEALESSTATDVFWAKYTACETARTPRDKLAACLEGNCAEGLGTNYRGHVNITRSGIECQLWRSRYPHKPEINSTTHPGADLQENFCRNPDSSTTGPWCYTTDPTVRRQECSIPVCGQDQVTVAMTPRSEGSSVNLSPPLEQCVPDRGQQYQGRLAVTTHGLPCLAWASAQAKALSKHQDFNSAVQLVENFCRNPDGDEEGVWCYVAGKPGDFGYCDLNYCEEAVEEETGDGLDEDSDRAIEGRTATSEYQTFFNPRTFGSGEADCGLRPLFEKKSLEDKTERELLESYIDGRIVEGSDAEIGMSPWQVMLFRKSPQELLCGASLISDRWVLTAAHCLLYPPWDKNFTENDLLVRIGKHSRTRYERNIEKISMLEKIYIHPRYNWRENLDRDIALMKLKKPVAFSDYIHPVCLPDRETAASLLQAGYKGRVTGWGNLKETWTANVGKGQPSVLQVVNLPIVERPVCKDSTRIRITDNMFCAGYKPDEGKRGDACEGDSGGPFVMKSPFNNRWYQMGIVSWGEGCDRDGKYGFYTHVFRLKKWIQKVIDQFGE
APTT = 11 min Targets
Compounds
BioacFviFes
N
N
N
N
N
ON
O
N
O
H
H
H
H
H
PublicaFon
What is in ChEMBL?
KNIME UGM 07/03/2013 8
ChEMBL_15 Compounds: 1,254,575 Assays: 679,259 Targets: 9,570 PublicaGons: 48,735 AcGviGes: 10,509,572 Data sources: 16
Increase of >230,000 compounds from literature since ChEMBL01
How to access ChEMBL? 1. Web interface • IntuiGve and secure • Compound, assay, target search
2. SQL dumps and flat files • Oracle, MySQL, Postgresql* dumps and .sd file
3. RESTful web services • Exact, substructure & similarity search • BioacGviGes for compound, assay and target id
• hdps://www.ebi.ac.uk/chembldb/index.php/ws
• KNIME examples
KNIME UGM 07/03/2013 9
How KNIME is used in the group • I/O, file conversions, data retrieval and manipulaGon
• e.g. format Open Source Drug Discovery malaria data deposiGons
• ChemoinformaGcs, data modelling and visualisaGon • Ligand-‐based target fishing • Data quality assurance • Automated data curaGon
• Text mining • Chemical named-‐enGty recogniGon • Document classificaGon
• ChEMBL-‐likeness
KNIME UGM 07/03/2013 10
RESTful Web Services for KNIME • Compound, assay and target look-‐up
• CHEMBL_ID (or UniProt Accession for targets) as input
• BioacGviGes for compound, assay and target • CHEMBL_ID as input
• Compound searching • Molecular structure as input • Exact, similarity and substructure searching
• Advantages • Tighter integraGon of KNIME and ChEMBL data • No need for internal ChEMBL database or SQL queries • No need for a chemical cartridge
hdps://www.ebi.ac.uk/chembldb/index.php/ws
07/03/2013 KNIME UGM 12
Example: All bioactivities for hERG
All bioacGviGes for hERG
07/03/2013 KNIME UGM 14
AcGvity value, assay descripGon, compound, reference
Example: Polypharmacology proMile
Compounds
Query
07/03/2013 KNIME UGM 16
Find NNs
Retrieve bioacGviGes
Filter, summarise & pivot
…what next? • Chemical space clustering & visualisaGon • (Q)SAR analysis
• Data modeling, acGvity cliffs, FW, MMP analysis • Bioisosteric replacements mining
• De novo design • EvoluGonary compound opGmisaGon
• Target fishing • (off-‐)target predicGon and ADR analysis • Polypharmacology networks • Druggability / Drug-‐likeness
KNIME UGM 07/03/2013 17
OpenChEMBL Virtual Machine • Packaged in a VM
• ChEMBL db, Postgresql, RDKit toolkit and cartridge • Ubuntu 12.04 • Exported as OVF
• Can be imported by VirtualBox etc.
• Provides • Direct database connecGon • Custom web interface • Web services for structure search
• Available as lp download soon
KNIME UGM 07/03/2013 20
Using KNIME to connect to the VM
KNIME UGM 07/03/2013 22
SELECT mr.*, md.chembl_id, cp.full_mwt, cp.alogp from mols_rdkit mr, molecule_dictionary md, compound_properties cp
where mr.m @> '$${SMolecule}$$'::qmol and mr.molregno = md.molregno and md.molregno = cp.molregno;
UniChem: Linking to other sources
KNIME UGM 07/03/2013 24
All EBI DBs share the benefits of maintained links to internal and external resources
etc. EU_OPENSCREEN
The ‘mapping service’ will be opened for use by external users hdp://www.jcheminf.com/content/5/1/3/abstract
ChEMBL resources
KNIME UGM 07/03/2013 26
ChEMBL blog: hdp://chembl.blogspot.com
If you would like help: chembl-‐[email protected]
For ChEMBL news and data releases: hdp://listserver.ebi.ac.uk/mailman/lisGnfo/chembl-‐announce
Webinars
hdp://www.slideshare.net/gpapadatos/knime-‐tutorial
Summary • KNIME: democraGzing access to data and tools • Accessing public domain structure and bioacGvity data with KNIME • ChEMBL KNIME Nodes • OpenChEMBL + KNIME • UniChem + KNIME
07/03/2013 27 KNIME UGM
Acknowledgements • ChEMBL group
• Edmund Duesbury • Rodrigo Ochoa • Jon Chambers • Mark Davies • Shaun McGlinchey • Anna Gaulton • John Overington
• Stephan Beisken
KNIME UGM 07/03/2013 28
• RDKit • KNIME • KNIME community • All of you for listening
George Papadatos, PhD Senior Technical Officer ChEMBL group [email protected]
Leveraging public domain bioactivity data with KNIME