coupling ai and network biology · 101 (3), 187 –209. • nodes 1 and 8 are are close in...
TRANSCRIPT
Coupling AI and network biologyGenerate insights for disease understanding and target identification
Alexandr Ivliev, Director of BioinformaticsCheng Fang, Scientific Consultant03.06.2020
Agenda 1. Introduction2. Coupling AI and network biology3. High-quality biological networks
in microbiome for target ID4. Key takeaways
Introduction
3© 2020 Clarivate
Genomic revolution of early 2000’sFrom individual genes to understanding entire genome
© 2020 Clarivate 4
Computer vision
Natural language processing
Reinforcement learning
Artificial intelligence: new ongoing revolution
• Self-driving cars• Face recognition• And more
• Machine translation• Speech analysis• And more
• Chess, go, computer games• Robotics• And more
Revolutionizing industries
© 2020 Clarivate 5
6© 2020 Clarivate
First time “deep learning” appeared in Gartner Hype Cycle for Emerging Technologies in 2017Artificial intelligence is a hot field
7© 2020 Clarivate
Deep learning is a big field
https://www.asimovinstitute.org/neural-network-zoo/
8© 2020 Clarivate
Computer vision and image processing
Esteva et al, Nat Med, 2019
9© 2020 Clarivate
Text mining, e.g. electronic health records
Esteva et al, Nat Med, 2019
Applications in genomics
© 2020 Clarivate 10Esteva et al, Nat Med, 2019
Networks are how biology works
Network by Martin Grandjean © 2020 Clarivate 11
• Disease mechanism understanding
• Target ID
Drug target
Disease genes
12© 2020 Clarivate
Can biological networks be coupled with deep neural networks to enable disease mechanism understanding and target ID?
Target
13© 2020 Clarivate
What approaches are you taking to understand disease mechanisms and identify novel drug targets?
a. Literature searchesb. OMICs data analysisc. Small scale lab experimentsd. Classical machine learninge. Deep learningf. Other or inapplicable
Coupling AI with network biology to enable disease understanding and target ID
14© 2020 Clarivate
15© 2020 Clarivate
Problem: graphs are structurally very different from inputs in other AI solutions Networks are quite different from texts and images
Text and sequences have linear 1D structure
“To be or not to be, that is the question”
CGT TTA GAA
Images have 2D grid structure
Networks are more complex
16© 2020 Clarivate
Images and texts work fine as input for neural networks
0.1230.2240.8510.4010.4860.2980.6940.8870.6530.1010.696
Prediction
17© 2020 Clarivate
Images and texts work fine as input for neural networks
Solution 1: Generating biological network node embeddings using random walks
18© 2020 Clarivate
19© 2020 Clarivate
What is “embedding”?
bagel
Sequences
• Embedding = dense vector that captures important information from the input
• Embeddings can be learned automatically as opposed to feature engineering
• “Similar” objects have close embeddings
• It’s easy to use embeddings as input into AI techniques
kingqueen
Numeric space
word2vec
‹#›© 2020 Clarivate
Generating node embeddings
Node 7
Biological network
Node 2Node 1
Numeric space
Node 6
Node 3,816
Node 4Node 5
Node 3
‹#›© 2020 Clarivate
Generating node embeddings
Node 7
Biological network
Node 2Node 1
Numeric space
Node 6Node 3,816
Node 4Node 5
Node 3
Sequences
word2vec
node2vec
General ideaGenerating node embeddings using random walks
1
2
3
4
67 8
Examples of random walks starting from node 3 with four steps:
3 -> 4 -> 6 -> 7 -> 83 -> 2 -> 3 -> 1 -> 33 -> 4 -> 6 -> 3 -> 4
© 2020 Clarivate 21
Random walk sequences3 4 6 7 83 2 3 1 33 4 6 3 4
…
TextEmbedding
Model
Node graph embeddingsNode 1: -0.01822536, 0.14636423, 0.023379749 …Node 2: 0.10925472, 0.00750885, -0.019593006 ……
24© 2020 Clarivate
Random walk variant example: Node2Vec
node2vec: Scalable Feature Learning for Networks. Grover, A., & Leskovec, J. (2016).
Breadth first Depth first
It’s a scale of behaviorcontrolled by two hyperparameters
Nodes in “local communities”are more similar to each other
Nodes having alike “structural roles”are more similar to each other
22© 2020 Clarivate
Coupling biological networks with deep neural networks to enable disease understanding and target ID
Node embeddings from random
walks
Novel predicted
targets
Training set of known targets
DEGs
25© 2020 Clarivate
Challenges with simple random walks
• Nodes 1 and 8 are “similar” as they have the same attributesBut they will be far from each other in in random walks
• Node 9 is unreachable from any other node, yet it’s “similar” to node 6
How do we capture those non-trivial similarities?
Attributese.g.: • up-regulated = yes• protein class = kinase• known target = yes
Nodese.g.: • protein 1• protein 2• protein 3
26© 2020 Clarivate
How to incorporate attributes into graph embeddingsGat2Vec
Structural graphwithout attributes
Bipartite attributes graph
Generate random walks on each graph independently,and supply both sets of sequences into word embeddings learning
gat2vec: representation learning for attributed graphs. Sheikh, N., Kefato, Z., & Montresor, A. (2019). Computing, 101(3), 187–209.
• Nodes 1 and 8 are are close in bipartite graph
• Node 9 is connected to other nodes on bipartite graph
Caveats:- Attributes can be only discrete- Cannot use complex attributes, like SMILES, amino-acid sequences, etc
13
2
1 -> 3 -> 2 => “1 3 2”
1 -> a -> 8 => “1 8 2”
1
8 -> b -> 2
a
b
2
Random walk => node ids “words”From structural graph:
From attributes graph:
27© 2020 Clarivate
Example applicationTarget prioritization using gat2vec - GuiltyTargets
Protein-protein interaction
network
Discrete differential gene
expression
Known targets for the disease
Annotated protein-protein
interaction network
Features obtained using
Gat2Vec
Positive-unlabeled learning
Rank candidate targets
- STRING- HIPPIE
RNASeq from different cohorts(MSBB, MayoRNASeq, ROSMAP, etc.)
GuiltyTargets: Prioritization of Novel Therapeutic Targets with Deep Network Representation Learning. Muslu, Ö., Hoyt, C. T., Hofmann-Apitius, M., & Fröhlich, H. (2019). BioRxiv, 521161.
- Open Targets- Therapeutic Targets Database
Best ROC AUCfor different diseases≈0.92-0.94
28© 2020 Clarivate
Coupling biological networks with deep neural networks to enable disease understanding and target ID
Node embeddings from random
walks in (1) structural graph; and (2)
attribute graph
Novel predicted
targets
Training set of known targets
Kinases
**
*
GWAS hits
DEGs
+
+
Solution 2: Building artificial neural nets to structurally reflect abiological network of interest
29© 2020 Clarivate
30© 2020 Clarivate
Graph neural networks
The little cat looks lovely.
Input Graph neural net
Molecule
Physical path
Text
31© 2020 Clarivate
Graph neural networks for target ID – one approachA node’s neighborhood defines a computational graph
Features on node A
Features on node C
Any differentiable function that aggregates multiple vectors into one
The beauty is: it’s all a differentiable computational graph that can be optimized using backpropagation.
“Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 2018”
e.g.:• protein class• druggability• genetic link• differential
expression
e.g.:• A + C• A – 0.34 * C• A * (A + 1.4 * C)
BIOLOGICAL NETWORK
‹#›© 2020 Clarivate
Graph neural network for target ID
• Every node has its own unique computational graph defined by the biological network structure
• These computational graphs are neural networks that can be trained using standard AI techniques
“Deep Learning for Network Biology -- snap.stanford.edu/deepnetbio-ismb -- ISMB 2018”
BIOLOGICAL NETWORK
COMPUTATIONAL GRAPH = ARTIFICIAL NEURAL NETWORKFOR EACH NODE
A B C D E F
33© 2020 Clarivate
Example: Decagon algorithmModeling polypharmacy side effects with graph convolutional networks
Modeling polypharmacy side effects with graph convolutional networks. Zitnik, M., Agrawal, M., & Leskovec, J. (2018).
34© 2020 Clarivate
These methods open the doors to coupling AI with biological networks
Targets
Indications
Mechanisms
35© 2020 Clarivate
Key challenges
Garbage in –garbage out• The need for high-
quality networks• And large high-quality
training sets
Knowledge bias
• It’s hard to predict completely unknown from the known
Model interpretation
• Opening the “black box”
36© 2020 Clarivate
How optimistic are you that AI will transform pharma R&D in 5 years?
a. It'll revolutionize researchb. It'll yield incremental advancesc. It won't solve any of the major challengesd. Other
Curating high-quality networks in microbiome for target ID
37© 2020 Clarivate
Key challenges
Garbage in –garbage out• The need for high-
quality networks• And large high-quality
training sets
Knowledge bias
• It’s hard to predict completely unknown from the known
Model interpretation
• Opening the “black box”
High-quality biological networks in microbiome for target ID
38© 2020 Clarivate
39© 2020 Clarivate
“The human microbiome and why the solution for all disease lies within our own gut” Nov 2017
40© 2020 Clarivate
Why should we care about the microbiome?
Its importance has long been recognized (first described 1700 years ago!) and used in medical practice– FMT (fecal microbiota transplant). Fecal transplantation is performed as a treatment for recurrent C. difficile colitis infection (CDI). C. difficile colitis, a complication of antibiotic therapy, may be associated with diarrhea, abdominal cramping and sometimes fever.
Adverse effects are poorly understood.
41© 2020 Clarivate
Why should we care about microbiome: a new era
Microbiome is implicated in health and diseases:• IBD and Crohn’s diseases • Obesity & Diabetes• Immune functions & malfunction• Autoimmunity diseases • Cardiovascular diseases • Neurological diseases• Oncology
Cani, P. Nat Rev Gastroenterol Hepatol 14, 321–322 (2017).
Source: Cortellis Drug Discovery Intelligence
42© 2020 Clarivate
Active drug development• 781 drug and biologics
under development• 73 in clinical trials
Source: Cortellis Drug Discovery Intelligence
43© 2020 Clarivate
Active drug development • Vast majority of the
microbiome drugs in clinical trials have no specific mechanisms
• Sodium oligo-mannurarate extracted from algae developed at Shanghai Green Valley to treat mild to moderate AD.
• Sibofimloc is inhibitor of type 1 fimbrial adhesin from E. Coli. It’s in phase one for IBD.
44© 2020 Clarivate
Mechanism of action matters in drug development
PHASE I PHASE II PHASE III
20 – 100 volunteers
100 –500patients
1,000 – 5,000 patients
Safety Safety/Dosing
Efficacy, Adverse Events
Drug Discovery PreclinicalTranslationalPrecision Medicine
Clinical TrialAPI Synthesis
Regulatory ReviewScale-up to MFG
IND Submitted
APPROVAL
MarketingManufacturingPost-market Surveillance (Ph. IV)
NDA Submitted
5,000-10,000compounds
~250 compounds <5 compounds
Among 640 novel therapeutics of Phase 3 clinical trials (1998-2008), 344 (54%) failed in clinical development, 230 (36%) were approved by the US Food and Drug Administration (FDA), and 66 (10%) were approved in other countries but not by the FDA. Most products failed due to inadequate efficacy (n = 195; 57%), while 59 (17%) failed because of safety concerns and 74 (22%) failed due to commercial reasons.
Hwang et al. Dec. 2016 JAMA Internal Medicine
‹#›© 2020 Clarivate
How do we leverage AI to understand MoA and identify new targets in microbiome?
Novel predicted
targets
45© 2020 Clarivate
Understanding biology is critical for target ID
• A microbe-host interaction network could be used to:– Networks can uniquely identify
potential microbial effectors that target distinct host nodes or interfere with endogenous host interactions
– Determine how mutations on either host or microbial proteins affect the interaction
– Delineate pathogenic mechanisms and thereby help maximize beneficial therapeutics
Microbe-Host protein-protein interactions 1
MAMP (microbe associated molecular pattern) – Host protein-protein interactions
Microbial metabolite – host protein interactions 2
Microbe-microbe interactions (protein-protein or protein-metabolite)
46© 2020 Clarivate
Types of microbe-host interactions
1 approx. 16,000 publications2 approx. 10,000 publications
47© 2020 Clarivate
Microbiome publications are growing fast
Source: Clarivate Analytics Web of Science, using title search terms (human microbiota, human microbiome, microbiome, human microbial, human microbes, or gut ecology).
Microbiome publications over time
48© 2020 Clarivate
A database to capture microbial-host interactions is needed for better understanding the biology
‹#›© 2020 Clarivate
Ideal literature curation workflow
Define project
Construct search strings
Review and prioritize abstracts
Acquire data and articles
Annotate and curate
articles
QC and format for
delivery
Define curation template,inclusion/exclusion criteria and prioritization strategy
Find relevant articles for review
Manual review and prioritize based on inclusion/exclusion criteria
Experience inBiomedical literature monitoring
Controlled vocabularies and public database IDs
Knowledge in developmentof biological databases
Manual curation ensures the high quality
51© 2020 Clarivate
How is a database like this constructed?
A solution for interactome reconstruction, data management and integration
Literature curation and database construction
Curator• Annotates• Enriches data• Quality control
Articles and data• Metabolite-host interactions• Microbe-microbe interactions• And more
Administrator• Design • Development• Maintenance
User query
Summary statistics
Interaction networks• Table of interactions• Access to related articles• And more
Public data sources
Proprietary data sources
Database of Microbiome-Host
Interactions (DoMI)
User interface
Example interactome reconstruction
52© 2020 Clarivate
MICROBIOME
MetaBMetaG RNA-Seq
LPS
TLR4
TRAM IRF3 IFNB
ACT
IKKE
FHA
CD11B
CD18
TBK1 CASP7
CASP3
Butyrate GPR109A IL10
IL6
NOS2IL12
HOST
BGCTaxonomy KO
MYD88 TRAF6
NFKB
TlpA
COG© 2020 Clarivate 52
Activation
Inhibition
Metabolite
protein
‹#›© 2020 Clarivate
The microbial-host interaction database will help leveraging AI
Node 7
Node 2Node 1
Numeric space
Node 6Node 3,816
Node 4Node 5
Node 3
Sequences
word2vec
node2vec
Host-microbiome network
‹#›© 2020 Clarivate
Novel predicted
targets
The microbial-host interaction database will help leveraging AI
AI is promising significant advances in the data-rich biomedical field
Biological networks are different from common AI inputs but approaches have emerged to feed biological networks into AI techniques
Manual curation remains important for creating high-quality biological networks and training sets for AI
Time will show how much of transformation versus incremental progress AI will bring into pharma R&D
54© 2020 Clarivate
Key takeaways
Q&A
‹#›© 2020 Clarivate
© 2020 Clarivate. All rights reserved. Republication or redistribution of Clarivate content, including by framing or similar means, is prohibited without the prior written consent of Clarivate. Clarivate and its logo, as well as all other trademarks used herein are trademarks of their respective owners and used under license.
Interested in learning more about Clarivate’sdrug discovery consulting services? Visit our website to learn more.
Alexandr [email protected] [email protected]