network genetics

29
Francesco Gadaleta University of Liege - Montefiore Institute Select and Connect the benefits of building networks in genetics Francesco Gadaleta PhD - Montefiore Institute ULg

Upload: francesco-gadaleta

Post on 15-Aug-2015

29 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Network genetics

Francesco GadaletaUniversity of Liege - Montefiore Institute

Select and Connect the benefits of building networks in geneticsFrancesco Gadaleta PhD - Montefiore Institute ULg

Page 2: Network genetics

Francesco GadaletaSelect and Connect: the benefits of building networks in genetics

• Genetics and Networks

• Variable selection with penalised regression

• Application to Gene Expression Profiles

• Demo

OUTLINE

Page 3: Network genetics

Francesco GadaletaSelect and Connect: the benefits of building networks in genetics

Given a set of microarray experiments

HOWselect covariates build networks?

Page 4: Network genetics

Francesco GadaletaSelect and Connect: the benefits of building networks in genetics

EXPLAIN PREDICTOR

gene interactionbest SNP selection

phenotypesurvival

Page 5: Network genetics

Francesco GadaletaSelect and Connect: the benefits of building networks in genetics

?ASSOCIATION

CORRELATION

CAUSALITYA B

REGULATION

Page 6: Network genetics

Francesco GadaletaSelect and Connect: the benefits of building networks in genetics

PENALISED REGRESSION

least squares elastic net ridge regression fused group quadratic programming

hierarchical LASSO nonlinear penalised multivariate linear regression gradient descent

Page 7: Network genetics

Francesco GadaletaSelect and Connect: the benefits of building networks in genetics

OPTIMISATION PROBLEMS IN MACHINE LEARNING

Def. convex set X

Def. convex function

Optimisation problem

Page 8: Network genetics

Francesco GadaletaSelect and Connect: the benefits of building networks in genetics

GRADIENT

Where’s the min?

follow the

Page 9: Network genetics

Francesco GadaletaSelect and Connect: the benefits of building networks in genetics

local minima are also global minima

fast convergence

WHY CONVEXITY?

Introduce gradients, subgradients and epigraphs

…some proofs

here

gradient/coordinate

descent

Page 10: Network genetics

Francesco GadaletaSelect and Connect: the benefits of building networks in genetics

OPTIMISATION METHODS FOR

GRADIENT DESCENT works fine but can be slow

COORDINATE DESCENT cycle through each predictor in turn compute residuals

convex ANDdifferentiableconvex

PATHWISE COORDINATE DESCENT start with large (sparse model) apply COORDINATE DESCENT decrease (zero coordinates stay zero!)

Page 11: Network genetics

Francesco GadaletaSelect and Connect: the benefits of building networks in genetics

min

penalty (sparsity)

covariance matrix (association)

gene matrixresponse

Page 12: Network genetics

Francesco GadaletaSelect and Connect: the benefits of building networks in genetics

MULTICOLLINEARITY

y B C

D

lack of independency

presence of interdependency

least square regression fails

approach singularity

explodes

Page 13: Network genetics

Francesco GadaletaSelect and Connect: the benefits of building networks in genetics

i

NETWORK CONSTRUCTION

Page 14: Network genetics

Francesco GadaletaSelect and Connect: the benefits of building networks in genetics

matrix of regression coefficients

symmetric

not symmetric A B

A B

A B

Page 15: Network genetics

Francesco GadaletaSelect and Connect: the benefits of building networks in genetics

NETWORK VALIDATION

GeneNetWeaver ‣ generates synthetic μA data from regulatory network

‣ several conditions (simulation of μA noise, network

perturbations, time series, generation of samples using

multifactorial equations)

‣ Golden Standard (GS)

Directed unweighted signed network, based on

transcription factor network (TFN) of E.coli [1]

Page 16: Network genetics

Francesco GadaletaSelect and Connect: the benefits of building networks in genetics

SYNTHETIC DATA GENERATION

‣ Used no noise, multifactorial based GS networks

50 nodes with 3 regulators (TFs)

200 nodes with 10 regulators (TFs)

Page 17: Network genetics

Francesco GadaletaSelect and Connect: the benefits of building networks in genetics

SELECT AND CONNECT IN ACTION

Page 18: Network genetics

Francesco GadaletaSelect and Connect: the benefits of building networks in genetics

REAL NETWORK

200 nodes 212 connections

Page 19: Network genetics

Francesco GadaletaSelect and Connect: the benefits of building networks in genetics

PREDICTED NETWORK

200 nodes 360 connections

Page 20: Network genetics

Francesco GadaletaSelect and Connect: the benefits of building networks in genetics

DEGREE CORRELATION

86%false positivespredicted hubs correctly detected

Page 21: Network genetics

Francesco GadaletaSelect and Connect: the benefits of building networks in genetics

BETWEENESS CORRELATION

83%

Page 22: Network genetics

Francesco GadaletaSelect and Connect: the benefits of building networks in genetics

LASSO is quadratic programming (polynomial)

COMPUTATIONAL COMPLEXITY

Time-complexity of iterative convex optimisation is tricky to analyse (it depends on a convergence criterion)

:-):-(

Coordinate descent requires O(np) operations

:-|

Page 23: Network genetics

Francesco GadaletaSelect and Connect: the benefits of building networks in genetics

PARALLEL COMPUTATIONS

(permutation tests)

Page 24: Network genetics

Francesco GadaletaSelect and Connect: the benefits of building networks in genetics

200 genes, 400 permutations, 5 cpus

100 100100 100

as fast as 100 permutations

Page 25: Network genetics

Francesco GadaletaSelect and Connect: the benefits of building networks in genetics

200 genes, 400 permutations, 40 cpus

100 100100 100 100 100100 100 100 100100 100

1:25 25:50 175:200

as fast as 25 genes and 100 permutations

Page 26: Network genetics

Francesco GadaletaSelect and Connect: the benefits of building networks in genetics

NETWORK INTEGRATION

Page 27: Network genetics

Francesco GadaletaSelect and Connect: the benefits of building networks in genetics

NETWORK INTEGRATION

how do we connect networks?

how do we deal with diverse datasets?

Page 28: Network genetics

Francesco GadaletaSelect and Connect: the benefits of building networks in genetics

– Niels Bohr

“Prediction is very difficult, especially about the future.”

genes

www.worldofpiggy.com @worldofpiggy [email protected]

Page 29: Network genetics

Francesco GadaletaSelect and Connect: the benefits of building networks in genetics

thank you.

www.worldofpiggy.com @worldofpiggy [email protected]