2018 presentation montréal_handouts
TRANSCRIPT
![Page 1: 2018 presentation montréal_handouts](https://reader036.vdocuments.mx/reader036/viewer/2022062523/5a6d62967f8b9af2418b5483/html5/thumbnails/1.jpg)
PREDICTING AND UNDERSTANDING NETWORKS USING GRAPH EMBEDDING
Michiel Stock michielstock
1
KERMIT
![Page 2: 2018 presentation montréal_handouts](https://reader036.vdocuments.mx/reader036/viewer/2022062523/5a6d62967f8b9af2418b5483/html5/thumbnails/2.jpg)
RATIONALISM VS EMPIRICISM
Rationalism “the view that regards reason as the chief source and test of knowledge”
Historically associated with mathematics and physics.
Deduction: A -> B
Plato René Descartes
Empiricism “a theory that states that knowledge comes only or primary from sensory experiences”
Historically associated with biology, chemistry, geology…
Experimentation
Aristotle John Locke
2
![Page 3: 2018 presentation montréal_handouts](https://reader036.vdocuments.mx/reader036/viewer/2022062523/5a6d62967f8b9af2418b5483/html5/thumbnails/3.jpg)
RATIONALISM: BUILDING THEORIES
3
THEORY
Theory of evolutionY = Y0M
aAllometric scaling
Central dogma of molecular biology
![Page 4: 2018 presentation montréal_handouts](https://reader036.vdocuments.mx/reader036/viewer/2022062523/5a6d62967f8b9af2418b5483/html5/thumbnails/4.jpg)
EMPIRICISM: COLLECTING DATA
4
DATA Dataism
![Page 5: 2018 presentation montréal_handouts](https://reader036.vdocuments.mx/reader036/viewer/2022062523/5a6d62967f8b9af2418b5483/html5/thumbnails/5.jpg)
GRAPHS AND NETWORKS: THE BEST OF BOTH WORLDS
A graph G=(V, E) consists of a set of vertices V together with a set of edges E representing connections between the vertices.
Can be interpreted as a mechanistic model. Draws from a very rich body of mathematical theory.
Can be experimentally determined. Data structure.
5
a graph
![Page 6: 2018 presentation montréal_handouts](https://reader036.vdocuments.mx/reader036/viewer/2022062523/5a6d62967f8b9af2418b5483/html5/thumbnails/6.jpg)
NETWORKS IN SCIENCE
systems biology
social networksfood flavour network 6
![Page 7: 2018 presentation montréal_handouts](https://reader036.vdocuments.mx/reader036/viewer/2022062523/5a6d62967f8b9af2418b5483/html5/thumbnails/7.jpg)
ECOLOGICAL NETWORKS
7
Networks in ecology:
parasitismfood webs pollination
Sampling of species interaction networks:
![Page 8: 2018 presentation montréal_handouts](https://reader036.vdocuments.mx/reader036/viewer/2022062523/5a6d62967f8b9af2418b5483/html5/thumbnails/8.jpg)
ANALYSIS OF NETWORK DATA
8
![Page 9: 2018 presentation montréal_handouts](https://reader036.vdocuments.mx/reader036/viewer/2022062523/5a6d62967f8b9af2418b5483/html5/thumbnails/9.jpg)
MY STORY
➤ Bioscience engineer (cellular biotech)
➤ Into biology, not into experimenting
➤ PhD in machine learning
➤ past focus: molecular biology
➤ current focus: ecological networks
9
![Page 10: 2018 presentation montréal_handouts](https://reader036.vdocuments.mx/reader036/viewer/2022062523/5a6d62967f8b9af2418b5483/html5/thumbnails/10.jpg)
MY RESEARCH PROJECT: MACHINE LEARNING FOR ECOLOGICAL NETWORKS
10
Understand ➤ What are species doing?
➤ How do we compare networks?
➤ How can we extract numerical features?
Predict ➤ Find missing interactions.
➤ Changes in time and space.
➤ Uncertainty in predictions?
Control / manage ➤ Effective monitoring.
➤ How to increase productivity/stability?
➤ Encourage/discourage interactions.
![Page 11: 2018 presentation montréal_handouts](https://reader036.vdocuments.mx/reader036/viewer/2022062523/5a6d62967f8b9af2418b5483/html5/thumbnails/11.jpg)
MACHINE LEARNING
The The branch of computer science concerned with giving computers the ability to learn without being explicitly programmed.
11
The The study and development of algorithms that can detect stable patterns in finite data sets.
![Page 12: 2018 presentation montréal_handouts](https://reader036.vdocuments.mx/reader036/viewer/2022062523/5a6d62967f8b9af2418b5483/html5/thumbnails/12.jpg)
SUPERVISED LEARNING
Given a data set a labeled examples, find a function f(.) to predict the label of new data points.
12
Example: detecting animals on camera trap images
➤ hog
➤ human
➤ deer
➤ empty
x
x1
x2
y regression
classification
![Page 13: 2018 presentation montréal_handouts](https://reader036.vdocuments.mx/reader036/viewer/2022062523/5a6d62967f8b9af2418b5483/html5/thumbnails/13.jpg)
UNSUPERVISED LEARNING
Find simple structure in a complex data set. Often dimension reduction or clustering.
Finally, it should be noted that the computational complexity, and thus the running time, differs greatly between algorithms. While t-SNE based methods and SPADE are typically only able to process a few tens of
thousands of cells, methods such as FlowSOM scale to much larger datasets than the other visualization algo-rithms, allowing the algorithm to efficiently process millions of cells.
Nature Reviews | Immunology
MHC class II CD19
CD19CD64Auto uorescenceCD3NK1.1
MHC class IICD11cCD11b
LY6G
CD64
CD11c CD3 Auto uorescence
CD11b LY6G NK1.1
MHC class II CD19 CD64
MEP
CMP
Mast cells
Monocytes
NeutrophilsMacrophages
mDCs
NK cells
CD4+ T cells NKT cells
CD8+ T cellsγδ T cells
B cells
pDCsBasophils
Eosinophils
GMPCLP
Long-term HSCShort-term HSC
CD11c CD3 Auto uorescence
CD11b LY6G NK1.1
a SPADE b FlowSOM
c t-SNE d Scaffold map
Figure 2 | Marker visualization of mouse splenocytes. a–c | Visualization of mouse splenocytes using SPADE (spanning tree progression of density normalized events), FlowSOM (flow cytometry data analysis using self-organizing maps) and t-SNE (t-stochastic neighbour embedding). SPADE uses density-based downsampling and hierarchical clustering to group similar cells, which are visualized in a minimal spanning tree. FlowSOM also uses a minimal spanning tree but does not use subsampling, and it clusters the cells using a self-organizing map. By contrast, methods based on t-SNE (such as viSNE and ACCENSE do not cluster the cells but show each cell individually in two new dimensions that take similarities in all the original dimensions into account. For SPADE and t-SNE, a subplot is shown for each individual marker, in which the colour is more saturated for higher expression levels. By comparing the different subplots, the cell type can be determined. FlowSOM uses pie charts, combining all markers in a single plot. The height of each part indicates the expression level. Owing to the density based subsampling, SPADE analysis will even out the distribution of the different cell types. Although FlowSOM does not do this, it is still able to distinguish populations as small as 0.7% (such as neutrophils (which are CD11b+LY6G+) in this dataset), while at the same time running almost two orders of magnitude faster (9 seconds versus 700 seconds on a single-threaded processor). FlowSOM also offers additional visualization options, such as the original self-organizing map grid or a t-SNE mapping of the nodes. All cells were used by SPADE and FlowSOM, but owing to computational limitations only 10,000 cells were processed using t-SNE. d | Visualization of scaffold maps for the mouse immune reference data set from REF. 53. Code to replicate these figures is available at https://github.com/saeyslab/FlowCytometryScripts. CLP, common lymphoid progenitor; CMP, common myeloid progenitor; GMP, granulocyte–monocyte progenitor; HSC, haematopoietic stem cell; mDC, myeloid dendritic cell; MEP, megakaryocyte–erythroid progenitor; NK, natural killer; NKT, natural killer T; pDC, plasmacytoid dendritic cell.
REV IEWS
456 | JULY 2016 | VOLUME 16 www.nature.com/nri
Saeys et al. (2016) “Computational flow cytometry: helping to make sense of high-dimensional immunology data” 13
![Page 14: 2018 presentation montréal_handouts](https://reader036.vdocuments.mx/reader036/viewer/2022062523/5a6d62967f8b9af2418b5483/html5/thumbnails/14.jpg)
LEARNING FROM/ON GRAPHS
14
VS
typical dataset
![Page 15: 2018 presentation montréal_handouts](https://reader036.vdocuments.mx/reader036/viewer/2022062523/5a6d62967f8b9af2418b5483/html5/thumbnails/15.jpg)
( , ,1)PAIRWISE DATA (TWO OBJECTS AND A LABEL)
15
person
book
label (0/1)
( , ,1)( , ,0)
( , ,0)( , ,1)( , ,1)
…
![Page 16: 2018 presentation montréal_handouts](https://reader036.vdocuments.mx/reader036/viewer/2022062523/5a6d62967f8b9af2418b5483/html5/thumbnails/16.jpg)
A PAIRWISE MODEL
16
‘Learn’ a function on pairs based on observed data:
such that a high score indicates that someone would be interested in a book.
f( , )Main idea: combine the ‘description’ of the objects (person and book) into a large pairwise feature vector.
![Page 17: 2018 presentation montréal_handouts](https://reader036.vdocuments.mx/reader036/viewer/2022062523/5a6d62967f8b9af2418b5483/html5/thumbnails/17.jpg)
FINDING INTERACTING SPECIES = FINDING INTERESTING BOOKS
17
For example: plant-pollinator interactions
pollinators => persons
plants => books
pollination => reading
![Page 18: 2018 presentation montréal_handouts](https://reader036.vdocuments.mx/reader036/viewer/2022062523/5a6d62967f8b9af2418b5483/html5/thumbnails/18.jpg)
PAIRWISE LEARNING TO FIND FALSE NEGATIVES IN SPECIES INTERACTIONS
18
precision =
# interactions
size top
Improvement compared to random
selection
Stock et al. (2017)
![Page 19: 2018 presentation montréal_handouts](https://reader036.vdocuments.mx/reader036/viewer/2022062523/5a6d62967f8b9af2418b5483/html5/thumbnails/19.jpg)
GRAPH EMBEDDING: ENCODING-DECODING
19
Graphs are complex! Can we extract a numerical representation from them that we can easily use in conventional machine learning?
z
neighbourhood prediction
classification
visualisation
Inspired by Hamilton et al. (2017)
![Page 20: 2018 presentation montréal_handouts](https://reader036.vdocuments.mx/reader036/viewer/2022062523/5a6d62967f8b9af2418b5483/html5/thumbnails/20.jpg)
PROBABILISTIC MATRIX FACTORIZATION FOR NETWORKS
20
Find a low-dimensional representation of the species, such that the inner product corresponds to the probability of interacting.
Y
=
P
⇡ �( )
U V >
logistic map:
squeezes input to [0,1]
![Page 21: 2018 presentation montréal_handouts](https://reader036.vdocuments.mx/reader036/viewer/2022062523/5a6d62967f8b9af2418b5483/html5/thumbnails/21.jpg)
PROBABILISTIC MATRIX FACTORIZATION REVEALS STRUCTURES
21
![Page 22: 2018 presentation montréal_handouts](https://reader036.vdocuments.mx/reader036/viewer/2022062523/5a6d62967f8b9af2418b5483/html5/thumbnails/22.jpg)
NETWORK PROPERTIES ARE RETAINED
22
![Page 23: 2018 presentation montréal_handouts](https://reader036.vdocuments.mx/reader036/viewer/2022062523/5a6d62967f8b9af2418b5483/html5/thumbnails/23.jpg)
DETECTING (MISSING) INTERACTIONS
23
![Page 24: 2018 presentation montréal_handouts](https://reader036.vdocuments.mx/reader036/viewer/2022062523/5a6d62967f8b9af2418b5483/html5/thumbnails/24.jpg)
IN CONCLUSION
24
data = awesome
networks connect different disciplines
algorithms needed to analyse these networks