thesurechemblknimenodes - knime | open for innovation...emblebi$ 3 17/02/2014’ knime’ugm2014’...
TRANSCRIPT
![Page 3: TheSureChEMBLKNIMENodes - KNIME | Open for Innovation...EMBLEBI$ 3 17/02/2014’ KNIME’UGM2014’ Genomes&&&variaon& • Ensembl&& • EnsemblGenomes • Genome phenome&archive&](https://reader035.vdocuments.mx/reader035/viewer/2022062916/5eb9e785239a3c1634135014/html5/thumbnails/3.jpg)
EMBL-‐EBI
17/02/2014 KNIME UGM 2014 3
Genomes & variaCon • Ensembl • Ensembl Genomes • Genome-‐phenome archive • Metagenomics
NucleoCde sequences • European NucleoCde
Archive (ENA)
Expression • Array Express • Expression Atlas • PRIDE • R-‐Workbench
Proteins • The Universal Protein
Resource (UniProt) • InterPro
Chemical biology • ChEMBL • ChEBI
Literature & ontologies • Europe PubMed Central • Gene Ontology
Biomolecule structures • Protein Data Bank in Europe • PDBsum • ProFunc
Pathways • IntAct • Reactome • Metabolights
Systems • BioModels • Enzyme Portal • BioSamples
![Page 4: TheSureChEMBLKNIMENodes - KNIME | Open for Innovation...EMBLEBI$ 3 17/02/2014’ KNIME’UGM2014’ Genomes&&&variaon& • Ensembl&& • EnsemblGenomes • Genome phenome&archive&](https://reader035.vdocuments.mx/reader035/viewer/2022062916/5eb9e785239a3c1634135014/html5/thumbnails/4.jpg)
KNIME at the EBI • Provide KNIME training to scienCsts and researchers • CDK community nodes development • Access the ChEBI and ChEMBL databases via KNIME nodes
• Trusted community nodes
KNIME UGM 2014 17/02/2014 4
![Page 5: TheSureChEMBLKNIMENodes - KNIME | Open for Innovation...EMBLEBI$ 3 17/02/2014’ KNIME’UGM2014’ Genomes&&&variaon& • Ensembl&& • EnsemblGenomes • Genome phenome&archive&](https://reader035.vdocuments.mx/reader035/viewer/2022062916/5eb9e785239a3c1634135014/html5/thumbnails/5.jpg)
Overview of EMBL-‐EBI chemistry resources
KNIME UGM 2014 17/02/2014 5
UniChem – InChI-‐based resolver (full + relaxed ‘lenses’)
3rd Party Data
ZINC, PubChem, ThomsonPharma DOTF, IUPHAR, DrugBank, KEGG,
NIH NCC, eMolecules,
mcule, FDA SRS, PharmGKB, Selleck, ….
ChEMBL
BioacCvity data from literature
and deposiCons
ChEBI
Nomenclature of primary and secondary metabolites. Chemical & FuncConal Ontology
Atlas
Ligand induced transcript response
PDBe
Ligand structures
from structurally defined protein
complexes
SureChEMBL
Molecule structures from
patent literature
RDF and REST API interfaces
REST API Interface
15K 750 15M 1.5M 24K
65M
![Page 6: TheSureChEMBLKNIMENodes - KNIME | Open for Innovation...EMBLEBI$ 3 17/02/2014’ KNIME’UGM2014’ Genomes&&&variaon& • Ensembl&& • EnsemblGenomes • Genome phenome&archive&](https://reader035.vdocuments.mx/reader035/viewer/2022062916/5eb9e785239a3c1634135014/html5/thumbnails/6.jpg)
Novelty checking with UniChem
KNIME UGM 2014 17/02/2014 6
hgps://www.ebi.ac.uk/unichem/
![Page 7: TheSureChEMBLKNIMENodes - KNIME | Open for Innovation...EMBLEBI$ 3 17/02/2014’ KNIME’UGM2014’ Genomes&&&variaon& • Ensembl&& • EnsemblGenomes • Genome phenome&archive&](https://reader035.vdocuments.mx/reader035/viewer/2022062916/5eb9e785239a3c1634135014/html5/thumbnails/7.jpg)
SureChEMBL and KNIME
17/02/2014 KNIME UGM 2014 7
![Page 8: TheSureChEMBLKNIMENodes - KNIME | Open for Innovation...EMBLEBI$ 3 17/02/2014’ KNIME’UGM2014’ Genomes&&&variaon& • Ensembl&& • EnsemblGenomes • Genome phenome&archive&](https://reader035.vdocuments.mx/reader035/viewer/2022062916/5eb9e785239a3c1634135014/html5/thumbnails/8.jpg)
The story • EMBL-‐EBI have acquired SureChem – a leading ‘chemistry
patent mining’ product from Digital Science, Macmillan Group • SureChem not aligned with core future academic business
• User base • Free (SureChemOpen) • Paying (SureChemPro)
• EMBL-‐EBI will support exisCng licensees • Plans to provide an ongoing, free, open resource to enCre
community • Rebrand to SureChEMBL
KNIME UGM 2014 17/02/2014 8
![Page 9: TheSureChEMBLKNIMENodes - KNIME | Open for Innovation...EMBLEBI$ 3 17/02/2014’ KNIME’UGM2014’ Genomes&&&variaon& • Ensembl&& • EnsemblGenomes • Genome phenome&archive&](https://reader035.vdocuments.mx/reader035/viewer/2022062916/5eb9e785239a3c1634135014/html5/thumbnails/9.jpg)
Chemistry patents? • patere (LaCn) = to lay open • Legal and technical documents • Disclosure of invenCon in exchange for exclusive rights
• Usually 20 years • Driver for innovaCon • Most of the knowledge in (chemical) patents will never
appear anywhere else
KNIME UGM 2014 17/02/2014 9
![Page 10: TheSureChEMBLKNIMENodes - KNIME | Open for Innovation...EMBLEBI$ 3 17/02/2014’ KNIME’UGM2014’ Genomes&&&variaon& • Ensembl&& • EnsemblGenomes • Genome phenome&archive&](https://reader035.vdocuments.mx/reader035/viewer/2022062916/5eb9e785239a3c1634135014/html5/thumbnails/10.jpg)
SureChEMBL System Overview
17/02/2014 KNIME UGM 2014 10
WO
EP ApplicaCons& Granted
US ApplicaCons & granted
JP Abstracts
Patent Offices
Processed patents (service)
Name to Structure (five methods)
Image to Structure (one method)
Database
Chemistry Database
Patent PDFs
(service)
ApplicaCon Server
EnCty RecogniCon
The Cloud - Amazon Web Services Users
API
SureChem IP
SureChem System
1-‐[4-‐ethoxy-‐3-‐(6,7-‐dihydro-‐1-‐methyl-‐7-‐oxo-‐3-‐propyl-‐1H-‐pyrazolo[4,3-‐d]pyrimidin-‐5-‐yl)phenylsulfonyl]-‐4-‐methylpiperazine
![Page 11: TheSureChEMBLKNIMENodes - KNIME | Open for Innovation...EMBLEBI$ 3 17/02/2014’ KNIME’UGM2014’ Genomes&&&variaon& • Ensembl&& • EnsemblGenomes • Genome phenome&archive&](https://reader035.vdocuments.mx/reader035/viewer/2022062916/5eb9e785239a3c1634135014/html5/thumbnails/11.jpg)
17/02/2014 KNIME UGM 2014 11
Keyword search Filter by authority
Structure sketch
Filter by document secCon help
Paste SMILES, MOL, name
Types of chemistry search
Filter by date
http://www.surechembl.org/
help
Patent number search
![Page 12: TheSureChEMBLKNIMENodes - KNIME | Open for Innovation...EMBLEBI$ 3 17/02/2014’ KNIME’UGM2014’ Genomes&&&variaon& • Ensembl&& • EnsemblGenomes • Genome phenome&archive&](https://reader035.vdocuments.mx/reader035/viewer/2022062916/5eb9e785239a3c1634135014/html5/thumbnails/12.jpg)
Similarity searching
KNIME UGM 2014 17/02/2014 12
![Page 13: TheSureChEMBLKNIMENodes - KNIME | Open for Innovation...EMBLEBI$ 3 17/02/2014’ KNIME’UGM2014’ Genomes&&&variaon& • Ensembl&& • EnsemblGenomes • Genome phenome&archive&](https://reader035.vdocuments.mx/reader035/viewer/2022062916/5eb9e785239a3c1634135014/html5/thumbnails/13.jpg)
Reviewing the hits
KNIME UGM 2014 17/02/2014 13
![Page 14: TheSureChEMBLKNIMENodes - KNIME | Open for Innovation...EMBLEBI$ 3 17/02/2014’ KNIME’UGM2014’ Genomes&&&variaon& • Ensembl&& • EnsemblGenomes • Genome phenome&archive&](https://reader035.vdocuments.mx/reader035/viewer/2022062916/5eb9e785239a3c1634135014/html5/thumbnails/14.jpg)
From hits to patent documents
KNIME UGM 2014 17/02/2014 14
![Page 15: TheSureChEMBLKNIMENodes - KNIME | Open for Innovation...EMBLEBI$ 3 17/02/2014’ KNIME’UGM2014’ Genomes&&&variaon& • Ensembl&& • EnsemblGenomes • Genome phenome&archive&](https://reader035.vdocuments.mx/reader035/viewer/2022062916/5eb9e785239a3c1634135014/html5/thumbnails/15.jpg)
Full text patent document access
KNIME UGM 2014 17/02/2014 15
![Page 16: TheSureChEMBLKNIMENodes - KNIME | Open for Innovation...EMBLEBI$ 3 17/02/2014’ KNIME’UGM2014’ Genomes&&&variaon& • Ensembl&& • EnsemblGenomes • Genome phenome&archive&](https://reader035.vdocuments.mx/reader035/viewer/2022062916/5eb9e785239a3c1634135014/html5/thumbnails/16.jpg)
SureChEMBL KNIME Nodes • Developed by Max Recall InformaCon Systems GmbH • Main funcConality
• Keyword search • Lucene syntax and Boolean operators • pa:(Bayer OR Genentech OR Merck) AND desc:(chemotherap* AND
(PhosphoinosiCde kinase OR Pi3K))
• Structure search • AddiConal phys/chem filters
• Retrieve patent biblio and full text • Extract chemistry from patent
• AddiConal filters • Document secCon counts • Chemical corpus counts
KNIME UGM 2014 17/02/2014 16
![Page 17: TheSureChEMBLKNIMENodes - KNIME | Open for Innovation...EMBLEBI$ 3 17/02/2014’ KNIME’UGM2014’ Genomes&&&variaon& • Ensembl&& • EnsemblGenomes • Genome phenome&archive&](https://reader035.vdocuments.mx/reader035/viewer/2022062916/5eb9e785239a3c1634135014/html5/thumbnails/17.jpg)
API authentication key
KNIME UGM 2014 17/02/2014 17
![Page 18: TheSureChEMBLKNIMENodes - KNIME | Open for Innovation...EMBLEBI$ 3 17/02/2014’ KNIME’UGM2014’ Genomes&&&variaon& • Ensembl&& • EnsemblGenomes • Genome phenome&archive&](https://reader035.vdocuments.mx/reader035/viewer/2022062916/5eb9e785239a3c1634135014/html5/thumbnails/18.jpg)
Node description
KNIME UGM 2014 17/02/2014 18
![Page 19: TheSureChEMBLKNIMENodes - KNIME | Open for Innovation...EMBLEBI$ 3 17/02/2014’ KNIME’UGM2014’ Genomes&&&variaon& • Ensembl&& • EnsemblGenomes • Genome phenome&archive&](https://reader035.vdocuments.mx/reader035/viewer/2022062916/5eb9e785239a3c1634135014/html5/thumbnails/19.jpg)
Live Demo! • Exploring the anC-‐malarial landscape in US patents
KNIME UGM 2014 17/02/2014 19
![Page 20: TheSureChEMBLKNIMENodes - KNIME | Open for Innovation...EMBLEBI$ 3 17/02/2014’ KNIME’UGM2014’ Genomes&&&variaon& • Ensembl&& • EnsemblGenomes • Genome phenome&archive&](https://reader035.vdocuments.mx/reader035/viewer/2022062916/5eb9e785239a3c1634135014/html5/thumbnails/20.jpg)
More use cases within KNIME • ChemoinformaCcs
• Chemistry landscape for a parCcular biological target/disease • R-‐group analysis for a parCcular patent family claimed chemistry • Novelty checking
• CompeCCve intelligence • ReporCng • Patent alerts • Per target/disease
• Prior art checking • Further text-‐mining and annotaCon • Network analysis of citaCons
KNIME UGM 2014 17/02/2014 20
![Page 21: TheSureChEMBLKNIMENodes - KNIME | Open for Innovation...EMBLEBI$ 3 17/02/2014’ KNIME’UGM2014’ Genomes&&&variaon& • Ensembl&& • EnsemblGenomes • Genome phenome&archive&](https://reader035.vdocuments.mx/reader035/viewer/2022062916/5eb9e785239a3c1634135014/html5/thumbnails/21.jpg)
Timeframes and plans • About 2-‐3 months for full transfer of operaCons • Refactor authenCcaCon system
• Consider fair use • Future ideas for development – dependent on funding!
• Add sequence searching • Add disease terms and target indexing • Add chemical structure tagging & search to full text content of Europe PMC
KNIME UGM 2014 17/02/2014 21
![Page 22: TheSureChEMBLKNIMENodes - KNIME | Open for Innovation...EMBLEBI$ 3 17/02/2014’ KNIME’UGM2014’ Genomes&&&variaon& • Ensembl&& • EnsemblGenomes • Genome phenome&archive&](https://reader035.vdocuments.mx/reader035/viewer/2022062916/5eb9e785239a3c1634135014/html5/thumbnails/22.jpg)
Acknowledgements • ChEMBL group
• John Overington • Mark Davies
• ChEBI group • Stephan Beisken
• SureChem • MaxRecall
• Michael Digenbach
• KNIME • KNIME community
KNIME UGM 2014 17/02/2014 22
![Page 23: TheSureChEMBLKNIMENodes - KNIME | Open for Innovation...EMBLEBI$ 3 17/02/2014’ KNIME’UGM2014’ Genomes&&&variaon& • Ensembl&& • EnsemblGenomes • Genome phenome&archive&](https://reader035.vdocuments.mx/reader035/viewer/2022062916/5eb9e785239a3c1634135014/html5/thumbnails/23.jpg)
Any questions?
17/02/2014 KNIME UGM 2014 23
![Page 24: TheSureChEMBLKNIMENodes - KNIME | Open for Innovation...EMBLEBI$ 3 17/02/2014’ KNIME’UGM2014’ Genomes&&&variaon& • Ensembl&& • EnsemblGenomes • Genome phenome&archive&](https://reader035.vdocuments.mx/reader035/viewer/2022062916/5eb9e785239a3c1634135014/html5/thumbnails/24.jpg)
George Papadatos Senior Technical Officer ChEMBL group [email protected]
The SureChEMBL KNIME Nodes
17/02/2014