network genetics
TRANSCRIPT
Francesco GadaletaUniversity of Liege - Montefiore Institute
Select and Connect the benefits of building networks in geneticsFrancesco Gadaleta PhD - Montefiore Institute ULg
Francesco GadaletaSelect and Connect: the benefits of building networks in genetics
• Genetics and Networks
• Variable selection with penalised regression
• Application to Gene Expression Profiles
• Demo
OUTLINE
Francesco GadaletaSelect and Connect: the benefits of building networks in genetics
Given a set of microarray experiments
HOWselect covariates build networks?
Francesco GadaletaSelect and Connect: the benefits of building networks in genetics
EXPLAIN PREDICTOR
gene interactionbest SNP selection
phenotypesurvival
Francesco GadaletaSelect and Connect: the benefits of building networks in genetics
?ASSOCIATION
CORRELATION
CAUSALITYA B
REGULATION
Francesco GadaletaSelect and Connect: the benefits of building networks in genetics
PENALISED REGRESSION
least squares elastic net ridge regression fused group quadratic programming
hierarchical LASSO nonlinear penalised multivariate linear regression gradient descent
Francesco GadaletaSelect and Connect: the benefits of building networks in genetics
OPTIMISATION PROBLEMS IN MACHINE LEARNING
Def. convex set X
Def. convex function
Optimisation problem
Francesco GadaletaSelect and Connect: the benefits of building networks in genetics
GRADIENT
Where’s the min?
follow the
Francesco GadaletaSelect and Connect: the benefits of building networks in genetics
local minima are also global minima
fast convergence
WHY CONVEXITY?
Introduce gradients, subgradients and epigraphs
…some proofs
here
gradient/coordinate
descent
Francesco GadaletaSelect and Connect: the benefits of building networks in genetics
OPTIMISATION METHODS FOR
GRADIENT DESCENT works fine but can be slow
COORDINATE DESCENT cycle through each predictor in turn compute residuals
convex ANDdifferentiableconvex
PATHWISE COORDINATE DESCENT start with large (sparse model) apply COORDINATE DESCENT decrease (zero coordinates stay zero!)
Francesco GadaletaSelect and Connect: the benefits of building networks in genetics
min
penalty (sparsity)
covariance matrix (association)
gene matrixresponse
Francesco GadaletaSelect and Connect: the benefits of building networks in genetics
MULTICOLLINEARITY
y B C
D
lack of independency
presence of interdependency
least square regression fails
approach singularity
explodes
Francesco GadaletaSelect and Connect: the benefits of building networks in genetics
i
NETWORK CONSTRUCTION
Francesco GadaletaSelect and Connect: the benefits of building networks in genetics
matrix of regression coefficients
symmetric
not symmetric A B
A B
A B
Francesco GadaletaSelect and Connect: the benefits of building networks in genetics
NETWORK VALIDATION
GeneNetWeaver ‣ generates synthetic μA data from regulatory network
‣ several conditions (simulation of μA noise, network
perturbations, time series, generation of samples using
multifactorial equations)
‣ Golden Standard (GS)
Directed unweighted signed network, based on
transcription factor network (TFN) of E.coli [1]
Francesco GadaletaSelect and Connect: the benefits of building networks in genetics
SYNTHETIC DATA GENERATION
‣ Used no noise, multifactorial based GS networks
50 nodes with 3 regulators (TFs)
200 nodes with 10 regulators (TFs)
Francesco GadaletaSelect and Connect: the benefits of building networks in genetics
SELECT AND CONNECT IN ACTION
Francesco GadaletaSelect and Connect: the benefits of building networks in genetics
REAL NETWORK
200 nodes 212 connections
Francesco GadaletaSelect and Connect: the benefits of building networks in genetics
PREDICTED NETWORK
200 nodes 360 connections
Francesco GadaletaSelect and Connect: the benefits of building networks in genetics
DEGREE CORRELATION
86%false positivespredicted hubs correctly detected
Francesco GadaletaSelect and Connect: the benefits of building networks in genetics
BETWEENESS CORRELATION
83%
Francesco GadaletaSelect and Connect: the benefits of building networks in genetics
LASSO is quadratic programming (polynomial)
COMPUTATIONAL COMPLEXITY
Time-complexity of iterative convex optimisation is tricky to analyse (it depends on a convergence criterion)
:-):-(
Coordinate descent requires O(np) operations
:-|
Francesco GadaletaSelect and Connect: the benefits of building networks in genetics
PARALLEL COMPUTATIONS
(permutation tests)
Francesco GadaletaSelect and Connect: the benefits of building networks in genetics
200 genes, 400 permutations, 5 cpus
100 100100 100
as fast as 100 permutations
Francesco GadaletaSelect and Connect: the benefits of building networks in genetics
200 genes, 400 permutations, 40 cpus
100 100100 100 100 100100 100 100 100100 100
1:25 25:50 175:200
as fast as 25 genes and 100 permutations
…
Francesco GadaletaSelect and Connect: the benefits of building networks in genetics
NETWORK INTEGRATION
Francesco GadaletaSelect and Connect: the benefits of building networks in genetics
NETWORK INTEGRATION
how do we connect networks?
how do we deal with diverse datasets?
Francesco GadaletaSelect and Connect: the benefits of building networks in genetics
– Niels Bohr
“Prediction is very difficult, especially about the future.”
genes
www.worldofpiggy.com @worldofpiggy [email protected]
Francesco GadaletaSelect and Connect: the benefits of building networks in genetics
thank you.
www.worldofpiggy.com @worldofpiggy [email protected]