uncovering latent structure in valued...

48
Uncovering Latent Structure in Valued Graphs M. Mariadassou , S. Robin UMR AgroParisTech/INRA MIA 518, Paris ECCS07, October 2007 Mariadassou (AgroParisTech) Uncovering Structure in Valued Graphs ECCS07 1 / 20

Upload: others

Post on 09-May-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/Ermg.ECCS07.pdf · Motivations Outline 1 Motivations 2 An Explicit Random Graph Model Some Notations

Uncovering Latent Structure in Valued Graphs

M. Mariadassou, S. Robin

UMR AgroParisTech/INRA MIA 518, Paris

ECCS07, October 2007

Mariadassou (AgroParisTech) Uncovering Structure in Valued Graphs ECCS07 1 / 20

Page 2: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/Ermg.ECCS07.pdf · Motivations Outline 1 Motivations 2 An Explicit Random Graph Model Some Notations

Outline

1 Motivations

2 An Explicit Random Graph ModelSome NotationsERMG Graph Model

3 Parametric EstimationLog-likelihoods and Variational InferenceIterative algorithmModel Selection Criterion

4 Simulation StudyQuality of the estimatesNumber of Classes

Mariadassou (AgroParisTech) Uncovering Structure in Valued Graphs ECCS07 2 / 20

Page 3: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/Ermg.ECCS07.pdf · Motivations Outline 1 Motivations 2 An Explicit Random Graph Model Some Notations

Motivations

Outline

1 Motivations

2 An Explicit Random Graph ModelSome NotationsERMG Graph Model

3 Parametric EstimationLog-likelihoods and Variational InferenceIterative algorithmModel Selection Criterion

4 Simulation StudyQuality of the estimatesNumber of Classes

Mariadassou (AgroParisTech) Uncovering Structure in Valued Graphs ECCS07 3 / 20

Page 4: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/Ermg.ECCS07.pdf · Motivations Outline 1 Motivations 2 An Explicit Random Graph Model Some Notations

Motivations

Motivations for the study of networks

Networks. . .Arise in many fields:→ Biology, Chemistry→ Physics, Internet.

Represent an interaction pattern:→ O(n2) interactions→ between n elements.

Have a topology which:→ reflects the structure/function

relationship

From Barabási website

Mariadassou (AgroParisTech) Uncovering Structure in Valued Graphs ECCS07 4 / 20

Page 5: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/Ermg.ECCS07.pdf · Motivations Outline 1 Motivations 2 An Explicit Random Graph Model Some Notations

An Explicit Random Graph Model

Outline

1 Motivations

2 An Explicit Random Graph ModelSome NotationsERMG Graph Model

3 Parametric EstimationLog-likelihoods and Variational InferenceIterative algorithmModel Selection Criterion

4 Simulation StudyQuality of the estimatesNumber of Classes

Mariadassou (AgroParisTech) Uncovering Structure in Valued Graphs ECCS07 5 / 20

Page 6: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/Ermg.ECCS07.pdf · Motivations Outline 1 Motivations 2 An Explicit Random Graph Model Some Notations

An Explicit Random Graph Model Some Notations

Some Notations

Notations:→ V a set of vertices in {1, . . . , n};

→ E a set of edges in {1, . . . , n}2;

→ X = (Xij) the adjacency matrix, with Xij the value of the edgebetween i and j.

Random graph definition:→ The joint distribution of the Xij describes the topology of the

network.

Example:

1

2

31

2

4

V = {1, 2, 3}E = {{1, 2}, {2, 3}, {3, 1}}

. 4 1. . 2. . .

Mariadassou (AgroParisTech) Uncovering Structure in Valued Graphs ECCS07 6 / 20

Page 7: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/Ermg.ECCS07.pdf · Motivations Outline 1 Motivations 2 An Explicit Random Graph Model Some Notations

An Explicit Random Graph Model Some Notations

Some Notations

Notations:→ V a set of vertices in {1, . . . , n};

→ E a set of edges in {1, . . . , n}2;

→ X = (Xij) the adjacency matrix, with Xij the value of the edgebetween i and j.

Random graph definition:→ The joint distribution of the Xij describes the topology of the

network.

Example:

1

2

31

2

4

V = {1, 2, 3}E = {{1, 2}, {2, 3}, {3, 1}}

. 4 1. . 2. . .

Mariadassou (AgroParisTech) Uncovering Structure in Valued Graphs ECCS07 6 / 20

Page 8: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/Ermg.ECCS07.pdf · Motivations Outline 1 Motivations 2 An Explicit Random Graph Model Some Notations

An Explicit Random Graph Model ERMG Graph Model

ERMG Probabilistic Model (vertices)

Vertices heterogeneity→ Hypothesis: the vertices are distributed among Q classes with

different connectivity;

→ Z = (Zi)i; Ziq = 1{i ∈ q} are indep. hidden variables;

→ α = {αq}, the prior proportions of groups;

→ (Zi) ∼ M(1,α).

Example:→ Example for 8 nodes and 3 classes with α = (0.25, 0.25, 0.5)

Mariadassou (AgroParisTech) Uncovering Structure in Valued Graphs ECCS07 7 / 20

Page 9: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/Ermg.ECCS07.pdf · Motivations Outline 1 Motivations 2 An Explicit Random Graph Model Some Notations

An Explicit Random Graph Model ERMG Graph Model

ERMG Probabilistic Model (vertices)

Vertices heterogeneity→ Hypothesis: the vertices are distributed among Q classes with

different connectivity;

→ Z = (Zi)i; Ziq = 1{i ∈ q} are indep. hidden variables;

→ α = {αq}, the prior proportions of groups;

→ (Zi) ∼ M(1,α).

Example:→ Example for 8 nodes and 3 classes with α = (0.25, 0.25, 0.5)

Mariadassou (AgroParisTech) Uncovering Structure in Valued Graphs ECCS07 7 / 20

Page 10: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/Ermg.ECCS07.pdf · Motivations Outline 1 Motivations 2 An Explicit Random Graph Model Some Notations

An Explicit Random Graph Model ERMG Graph Model

ERMG Probabilistic Model (vertices)

Vertices heterogeneity→ Hypothesis: the vertices are distributed among Q classes with

different connectivity;

→ Z = (Zi)i; Ziq = 1{i ∈ q} are indep. hidden variables;

→ α = {αq}, the prior proportions of groups;

→ (Zi) ∼ M(1,α).

Example:→ Example for 8 nodes and 3 classes with α = (0.25, 0.25, 0.5)

P( )=0.25

P( )=0.25

P( )=0.5

Mariadassou (AgroParisTech) Uncovering Structure in Valued Graphs ECCS07 7 / 20

Page 11: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/Ermg.ECCS07.pdf · Motivations Outline 1 Motivations 2 An Explicit Random Graph Model Some Notations

An Explicit Random Graph Model ERMG Graph Model

ERMG Probabilistic Model (edges)

X distribution→ conditional distribution : Xij|{i ∈ q, j ∈ l} ∼ fθql ;

→ θ = (θql) is the connectivity paramater matrix;

→ ERMG : "Erdös-Rényi Mixture for Graphs".

Example:→ Example for 3 classes with Bernoulli-valued edges;

Mariadassou (AgroParisTech) Uncovering Structure in Valued Graphs ECCS07 8 / 20

Page 12: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/Ermg.ECCS07.pdf · Motivations Outline 1 Motivations 2 An Explicit Random Graph Model Some Notations

An Explicit Random Graph Model ERMG Graph Model

ERMG Probabilistic Model (edges)

X distribution→ conditional distribution : Xij|{i ∈ q, j ∈ l} ∼ fθql ;

→ θ = (θql) is the connectivity paramater matrix;

→ ERMG : "Erdös-Rényi Mixture for Graphs".

Example:→ Example for 3 classes with Bernoulli-valued edges;

0 0.9 0.25

. 1 0.5

. . 1

Mariadassou (AgroParisTech) Uncovering Structure in Valued Graphs ECCS07 8 / 20

Page 13: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/Ermg.ECCS07.pdf · Motivations Outline 1 Motivations 2 An Explicit Random Graph Model Some Notations

An Explicit Random Graph Model ERMG Graph Model

ERMG Probabilistic Model (edges)

X distribution→ conditional distribution : Xij|{i ∈ q, j ∈ l} ∼ fθql ;

→ θ = (θql) is the connectivity paramater matrix;

→ ERMG : "Erdös-Rényi Mixture for Graphs".

Example:→ Example for 3 classes with Bernoulli-valued edges;

0 0.9 0.25

. 1 0.5

. . 1

Mariadassou (AgroParisTech) Uncovering Structure in Valued Graphs ECCS07 8 / 20

Page 14: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/Ermg.ECCS07.pdf · Motivations Outline 1 Motivations 2 An Explicit Random Graph Model Some Notations

An Explicit Random Graph Model ERMG Graph Model

ERMG Probabilistic Model (edges)

X distribution→ conditional distribution : Xij|{i ∈ q, j ∈ l} ∼ fθql ;

→ θ = (θql) is the connectivity paramater matrix;

→ ERMG : "Erdös-Rényi Mixture for Graphs".

Example:→ Example for 3 classes with Bernoulli-valued edges;

0 0.9 0.25

. 1 0.5

. . 1

Mariadassou (AgroParisTech) Uncovering Structure in Valued Graphs ECCS07 8 / 20

Page 15: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/Ermg.ECCS07.pdf · Motivations Outline 1 Motivations 2 An Explicit Random Graph Model Some Notations

An Explicit Random Graph Model ERMG Graph Model

ERMG Probabilistic Model (edges)

X distribution→ conditional distribution : Xij|{i ∈ q, j ∈ l} ∼ fθql ;

→ θ = (θql) is the connectivity paramater matrix;

→ ERMG : "Erdös-Rényi Mixture for Graphs".

Example:→ Example for 3 classes with Bernoulli-valued edges;

0 0.9 0.25

. 1 0.5

. . 1

Mariadassou (AgroParisTech) Uncovering Structure in Valued Graphs ECCS07 8 / 20

Page 16: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/Ermg.ECCS07.pdf · Motivations Outline 1 Motivations 2 An Explicit Random Graph Model Some Notations

An Explicit Random Graph Model ERMG Graph Model

ERMG Probabilistic Model (edges)

X distribution→ conditional distribution : Xij|{i ∈ q, j ∈ l} ∼ fθql ;

→ θ = (θql) is the connectivity paramater matrix;

→ ERMG : "Erdös-Rényi Mixture for Graphs".

Example:→ Example for 3 classes with Bernoulli-valued edges;

0 0.9 0.25

. 1 0.5

. . 1

Mariadassou (AgroParisTech) Uncovering Structure in Valued Graphs ECCS07 8 / 20

Page 17: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/Ermg.ECCS07.pdf · Motivations Outline 1 Motivations 2 An Explicit Random Graph Model Some Notations

An Explicit Random Graph Model ERMG Graph Model

ERMG Probabilistic Model (edges)

X distribution→ conditional distribution : Xij|{i ∈ q, j ∈ l} ∼ fθql ;

→ θ = (θql) is the connectivity paramater matrix;

→ ERMG : "Erdös-Rényi Mixture for Graphs".

Example:→ Example for 3 classes with Bernoulli-valued edges;

0 0.9 0.25

. 1 0.5

. . 1

Mariadassou (AgroParisTech) Uncovering Structure in Valued Graphs ECCS07 8 / 20

Page 18: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/Ermg.ECCS07.pdf · Motivations Outline 1 Motivations 2 An Explicit Random Graph Model Some Notations

An Explicit Random Graph Model ERMG Graph Model

ERMG edge values

Classical Distributions:→ fθ can be any probability distribution;

→ Bernoulli: presence/absence of an edge;

→ Multinomial: nature of the connection (friend, lover, colleague);

→ Poisson: in coauthorship networks, number of copublishedpapers;

→ Gaussian: intensity of the connection (airport network);

→ Bivariate Gaussian: directed networks where forward andbackward edges are correlated;

→ Etc.

ERMG is a model to easily generate graphs

Mariadassou (AgroParisTech) Uncovering Structure in Valued Graphs ECCS07 9 / 20

Page 19: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/Ermg.ECCS07.pdf · Motivations Outline 1 Motivations 2 An Explicit Random Graph Model Some Notations

An Explicit Random Graph Model ERMG Graph Model

ERMG edge values

Classical Distributions:→ fθ can be any probability distribution;

→ Bernoulli: presence/absence of an edge;

→ Multinomial: nature of the connection (friend, lover, colleague);

→ Poisson: in coauthorship networks, number of copublishedpapers;

→ Gaussian: intensity of the connection (airport network);

→ Bivariate Gaussian: directed networks where forward andbackward edges are correlated;

→ Etc.

ERMG is a model to easily generate graphs

Mariadassou (AgroParisTech) Uncovering Structure in Valued Graphs ECCS07 9 / 20

Page 20: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/Ermg.ECCS07.pdf · Motivations Outline 1 Motivations 2 An Explicit Random Graph Model Some Notations

An Explicit Random Graph Model ERMG Graph Model

ERMG edge values

Classical Distributions:→ fθ can be any probability distribution;

→ Bernoulli: presence/absence of an edge;

→ Multinomial: nature of the connection (friend, lover, colleague);

→ Poisson: in coauthorship networks, number of copublishedpapers;

→ Gaussian: intensity of the connection (airport network);

→ Bivariate Gaussian: directed networks where forward andbackward edges are correlated;

→ Etc.

ERMG is a model to easily generate graphs

Mariadassou (AgroParisTech) Uncovering Structure in Valued Graphs ECCS07 9 / 20

Page 21: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/Ermg.ECCS07.pdf · Motivations Outline 1 Motivations 2 An Explicit Random Graph Model Some Notations

An Explicit Random Graph Model ERMG Graph Model

ERMG edge values

Classical Distributions:→ fθ can be any probability distribution;

→ Bernoulli: presence/absence of an edge;

→ Multinomial: nature of the connection (friend, lover, colleague);

→ Poisson: in coauthorship networks, number of copublishedpapers;

→ Gaussian: intensity of the connection (airport network);

→ Bivariate Gaussian: directed networks where forward andbackward edges are correlated;

→ Etc.

ERMG is a model to easily generate graphs

Mariadassou (AgroParisTech) Uncovering Structure in Valued Graphs ECCS07 9 / 20

Page 22: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/Ermg.ECCS07.pdf · Motivations Outline 1 Motivations 2 An Explicit Random Graph Model Some Notations

An Explicit Random Graph Model ERMG Graph Model

ERMG edge values

Classical Distributions:→ fθ can be any probability distribution;

→ Bernoulli: presence/absence of an edge;

→ Multinomial: nature of the connection (friend, lover, colleague);

→ Poisson: in coauthorship networks, number of copublishedpapers;

→ Gaussian: intensity of the connection (airport network);

→ Bivariate Gaussian: directed networks where forward andbackward edges are correlated;

→ Etc.

ERMG is a model to easily generate graphs

Mariadassou (AgroParisTech) Uncovering Structure in Valued Graphs ECCS07 9 / 20

Page 23: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/Ermg.ECCS07.pdf · Motivations Outline 1 Motivations 2 An Explicit Random Graph Model Some Notations

An Explicit Random Graph Model ERMG Graph Model

ERMG edge values

Classical Distributions:→ fθ can be any probability distribution;

→ Bernoulli: presence/absence of an edge;

→ Multinomial: nature of the connection (friend, lover, colleague);

→ Poisson: in coauthorship networks, number of copublishedpapers;

→ Gaussian: intensity of the connection (airport network);

→ Bivariate Gaussian: directed networks where forward andbackward edges are correlated;

→ Etc.

ERMG is a model to easily generate graphs

Mariadassou (AgroParisTech) Uncovering Structure in Valued Graphs ECCS07 9 / 20

Page 24: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/Ermg.ECCS07.pdf · Motivations Outline 1 Motivations 2 An Explicit Random Graph Model Some Notations

An Explicit Random Graph Model ERMG Graph Model

ERMG edge values

Classical Distributions:→ fθ can be any probability distribution;

→ Bernoulli: presence/absence of an edge;

→ Multinomial: nature of the connection (friend, lover, colleague);

→ Poisson: in coauthorship networks, number of copublishedpapers;

→ Gaussian: intensity of the connection (airport network);

→ Bivariate Gaussian: directed networks where forward andbackward edges are correlated;

→ Etc.

ERMG is a model to easily generate graphs

Mariadassou (AgroParisTech) Uncovering Structure in Valued Graphs ECCS07 9 / 20

Page 25: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/Ermg.ECCS07.pdf · Motivations Outline 1 Motivations 2 An Explicit Random Graph Model Some Notations

An Explicit Random Graph Model ERMG Graph Model

ERMG edge values

Classical Distributions:→ fθ can be any probability distribution;

→ Bernoulli: presence/absence of an edge;

→ Multinomial: nature of the connection (friend, lover, colleague);

→ Poisson: in coauthorship networks, number of copublishedpapers;

→ Gaussian: intensity of the connection (airport network);

→ Bivariate Gaussian: directed networks where forward andbackward edges are correlated;

→ Etc.

ERMG is a model to easily generate graphs

Mariadassou (AgroParisTech) Uncovering Structure in Valued Graphs ECCS07 9 / 20

Page 26: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/Ermg.ECCS07.pdf · Motivations Outline 1 Motivations 2 An Explicit Random Graph Model Some Notations

Parametric Estimation

Outline

1 Motivations

2 An Explicit Random Graph ModelSome NotationsERMG Graph Model

3 Parametric EstimationLog-likelihoods and Variational InferenceIterative algorithmModel Selection Criterion

4 Simulation StudyQuality of the estimatesNumber of Classes

Mariadassou (AgroParisTech) Uncovering Structure in Valued Graphs ECCS07 10 / 20

Page 27: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/Ermg.ECCS07.pdf · Motivations Outline 1 Motivations 2 An Explicit Random Graph Model Some Notations

Parametric Estimation Log-likelihoods and Variational Inference

Log-Likelihood of the model

First Idea: Use maximum likelihood estimatorsComplete data likelihood

L(X,Z) =∑

i

∑q

Ziq lnαq +∑i<j

∑q,l

ZiqZjl ln fθql(Xij)

with fθql(Xij) likelihood of edge value Xij under i ∼ q and j ∼ l.Observed data likelihood

L(X) = ln∑

ZexpL(X,Z)

The observed data likelihood requires a sum over Qn terms, and isthus untractable;

EM-like strategies require the knowledge of Pr(Z|X), alsountractable (no conditional independence) and thus also fail.

Mariadassou (AgroParisTech) Uncovering Structure in Valued Graphs ECCS07 11 / 20

Page 28: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/Ermg.ECCS07.pdf · Motivations Outline 1 Motivations 2 An Explicit Random Graph Model Some Notations

Parametric Estimation Log-likelihoods and Variational Inference

Log-Likelihood of the model

First Idea: Use maximum likelihood estimatorsComplete data likelihood

L(X,Z) =∑

i

∑q

Ziq lnαq +∑i<j

∑q,l

ZiqZjl ln fθql(Xij)

with fθql(Xij) likelihood of edge value Xij under i ∼ q and j ∼ l.Observed data likelihood

L(X) = ln∑

ZexpL(X,Z)

The observed data likelihood requires a sum over Qn terms, and isthus untractable;

EM-like strategies require the knowledge of Pr(Z|X), alsountractable (no conditional independence) and thus also fail.

Mariadassou (AgroParisTech) Uncovering Structure in Valued Graphs ECCS07 11 / 20

Page 29: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/Ermg.ECCS07.pdf · Motivations Outline 1 Motivations 2 An Explicit Random Graph Model Some Notations

Parametric Estimation Log-likelihoods and Variational Inference

Log-Likelihood of the model

First Idea: Use maximum likelihood estimatorsComplete data likelihood

L(X,Z) =∑

i

∑q

Ziq lnαq +∑i<j

∑q,l

ZiqZjl ln fθql(Xij)

with fθql(Xij) likelihood of edge value Xij under i ∼ q and j ∼ l.Observed data likelihood

L(X) = ln∑

ZexpL(X,Z)

The observed data likelihood requires a sum over Qn terms, and isthus untractable;

EM-like strategies require the knowledge of Pr(Z|X), alsountractable (no conditional independence) and thus also fail.

Mariadassou (AgroParisTech) Uncovering Structure in Valued Graphs ECCS07 11 / 20

Page 30: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/Ermg.ECCS07.pdf · Motivations Outline 1 Motivations 2 An Explicit Random Graph Model Some Notations

Parametric Estimation Log-likelihoods and Variational Inference

Variational Inference: Pseudo Likelihood

Main Idea: Replace complicated Pr(Z|X) by a simple RX[Z] such thatKL(RX[Z],Pr(Z|X)) is minimal.

Optimize in RX the function J(RX) given by :

J(RX[Z]) = L(X) − KL(RX[Z],Pr(Z|X))

= H(RX[Z]) −∑

ZRX[Z]L(X,Z)

For simple RX, J(RX[Z]) is tractable,

At best, RX = Pr(Z|X) and J(RX[Z]) = L(X).

Mariadassou (AgroParisTech) Uncovering Structure in Valued Graphs ECCS07 12 / 20

Page 31: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/Ermg.ECCS07.pdf · Motivations Outline 1 Motivations 2 An Explicit Random Graph Model Some Notations

Parametric Estimation Log-likelihoods and Variational Inference

Variational Inference: Pseudo Likelihood

Main Idea: Replace complicated Pr(Z|X) by a simple RX[Z] such thatKL(RX[Z],Pr(Z|X)) is minimal.

Optimize in RX the function J(RX) given by :

J(RX[Z]) = L(X) − KL(RX[Z],Pr(Z|X))

= H(RX[Z]) −∑

ZRX[Z]L(X,Z)

For simple RX, J(RX[Z]) is tractable,

At best, RX = Pr(Z|X) and J(RX[Z]) = L(X).

Mariadassou (AgroParisTech) Uncovering Structure in Valued Graphs ECCS07 12 / 20

Page 32: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/Ermg.ECCS07.pdf · Motivations Outline 1 Motivations 2 An Explicit Random Graph Model Some Notations

Parametric Estimation Log-likelihoods and Variational Inference

Variational Inference: Pseudo Likelihood

Main Idea: Replace complicated Pr(Z|X) by a simple RX[Z] such thatKL(RX[Z],Pr(Z|X)) is minimal.

Optimize in RX the function J(RX) given by :

J(RX[Z]) = L(X) − KL(RX[Z],Pr(Z|X))

= H(RX[Z]) −∑

ZRX[Z]L(X,Z)

For simple RX, J(RX[Z]) is tractable,

At best, RX = Pr(Z|X) and J(RX[Z]) = L(X).

Mariadassou (AgroParisTech) Uncovering Structure in Valued Graphs ECCS07 12 / 20

Page 33: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/Ermg.ECCS07.pdf · Motivations Outline 1 Motivations 2 An Explicit Random Graph Model Some Notations

Parametric Estimation Iterative algorithm

2 Steps Iterative Algorithm

Step 1 Optimize J(RX[Z]) w.r.t. RX[Z]:→ Restriction to a "comfortable" class of functions;→ RX[Z] =

∏i h(Zi; τi), with h(.; τi) the multinomial distribution;

→ τiq is a variational parameter to be optimized using a fixed pointalgorithm:

τ̃iq ∝ αq

∏j,i

Q∏l=1

fθql (Xij)τ̃jl

Step 2 Optimize J(RX[Z]) w.r.t. (α, θ):→ Constraint:

∑q αq = 1

α̃q =∑

i

τ̃iq/n

θ̃ql = arg maxθ

∑ij

τ̃iqτ̃jl log fθ(Xij)

→ Closed expression of θ̃ql for classical distributions.

Mariadassou (AgroParisTech) Uncovering Structure in Valued Graphs ECCS07 13 / 20

Page 34: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/Ermg.ECCS07.pdf · Motivations Outline 1 Motivations 2 An Explicit Random Graph Model Some Notations

Parametric Estimation Iterative algorithm

2 Steps Iterative Algorithm

Step 1 Optimize J(RX[Z]) w.r.t. RX[Z]:→ Restriction to a "comfortable" class of functions;→ RX[Z] =

∏i h(Zi; τi), with h(.; τi) the multinomial distribution;

→ τiq is a variational parameter to be optimized using a fixed pointalgorithm:

τ̃iq ∝ αq

∏j,i

Q∏l=1

fθql (Xij)τ̃jl

Step 2 Optimize J(RX[Z]) w.r.t. (α, θ):→ Constraint:

∑q αq = 1

α̃q =∑

i

τ̃iq/n

θ̃ql = arg maxθ

∑ij

τ̃iqτ̃jl log fθ(Xij)

→ Closed expression of θ̃ql for classical distributions.

Mariadassou (AgroParisTech) Uncovering Structure in Valued Graphs ECCS07 13 / 20

Page 35: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/Ermg.ECCS07.pdf · Motivations Outline 1 Motivations 2 An Explicit Random Graph Model Some Notations

Parametric Estimation Model Selection Criterion

Model Selection Criterion

We derive a statistical BIC-like criterion to select the number ofclasses:

The likelihood can be split: L(X,Z|Q) = L(X|Z,Q) +L(Z|Q).

These terms can be penalized separately:

L(X|Z,Q) → penX|Z =Q(Q + 1)

2log

n(n − 1)2

L(Z|Q) → penZ = (Q − 1) log(n)

ICL(Q) = maxθL(X, Z̃|θ,mQ) − 1

2

(Q(Q+1)

2 log n(n−1)2 − (Q − 1) log(n)

)

Mariadassou (AgroParisTech) Uncovering Structure in Valued Graphs ECCS07 14 / 20

Page 36: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/Ermg.ECCS07.pdf · Motivations Outline 1 Motivations 2 An Explicit Random Graph Model Some Notations

Parametric Estimation Model Selection Criterion

Model Selection Criterion

We derive a statistical BIC-like criterion to select the number ofclasses:

The likelihood can be split: L(X,Z|Q) = L(X|Z,Q) +L(Z|Q).

These terms can be penalized separately:

L(X|Z,Q) → penX|Z =Q(Q + 1)

2log

n(n − 1)2

L(Z|Q) → penZ = (Q − 1) log(n)

ICL(Q) = maxθL(X, Z̃|θ,mQ) − 1

2

(Q(Q+1)

2 log n(n−1)2 − (Q − 1) log(n)

)

Mariadassou (AgroParisTech) Uncovering Structure in Valued Graphs ECCS07 14 / 20

Page 37: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/Ermg.ECCS07.pdf · Motivations Outline 1 Motivations 2 An Explicit Random Graph Model Some Notations

Simulation Study

Outline

1 Motivations

2 An Explicit Random Graph ModelSome NotationsERMG Graph Model

3 Parametric EstimationLog-likelihoods and Variational InferenceIterative algorithmModel Selection Criterion

4 Simulation StudyQuality of the estimatesNumber of Classes

Mariadassou (AgroParisTech) Uncovering Structure in Valued Graphs ECCS07 15 / 20

Page 38: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/Ermg.ECCS07.pdf · Motivations Outline 1 Motivations 2 An Explicit Random Graph Model Some Notations

Simulation Study Quality of the estimates

Quality of the Estimates: Simulation Setup

→ Undirected graph with Q = 3 classes;

→ Poisson-valued edges;

→ n = 100, 500 vertices;

→ αq ∝ aq for a = 1, 0.5, 0.2;a = 1: balanced classes;a = 0.2: unbalanced classes (80.6%, 16.1%, 3.3%)

→ Connectivity matrix of the form

λ γλ γλ

γλ λ γλ

γλ γλ λ

for

γ = 0.1, 0.5, 0.9, 1.5 and λ = 2, 5.γ = 1: all classes equivalent (same connectivity pattern);γ <> 1: classes are different;λ: mean value of an edge;

→ 100 repeats for each setup.

Mariadassou (AgroParisTech) Uncovering Structure in Valued Graphs ECCS07 16 / 20

Page 39: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/Ermg.ECCS07.pdf · Motivations Outline 1 Motivations 2 An Explicit Random Graph Model Some Notations

Simulation Study Quality of the estimates

Quality of the Estimates: Simulation Setup

→ Undirected graph with Q = 3 classes;

→ Poisson-valued edges;

→ n = 100, 500 vertices;

→ αq ∝ aq for a = 1, 0.5, 0.2;a = 1: balanced classes;a = 0.2: unbalanced classes (80.6%, 16.1%, 3.3%)

→ Connectivity matrix of the form

λ γλ γλ

γλ λ γλ

γλ γλ λ

for

γ = 0.1, 0.5, 0.9, 1.5 and λ = 2, 5.γ = 1: all classes equivalent (same connectivity pattern);γ <> 1: classes are different;λ: mean value of an edge;

→ 100 repeats for each setup.

Mariadassou (AgroParisTech) Uncovering Structure in Valued Graphs ECCS07 16 / 20

Page 40: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/Ermg.ECCS07.pdf · Motivations Outline 1 Motivations 2 An Explicit Random Graph Model Some Notations

Simulation Study Quality of the estimates

Quality of the Estimates: Simulation Setup

→ Undirected graph with Q = 3 classes;

→ Poisson-valued edges;

→ n = 100, 500 vertices;

→ αq ∝ aq for a = 1, 0.5, 0.2;a = 1: balanced classes;a = 0.2: unbalanced classes (80.6%, 16.1%, 3.3%)

→ Connectivity matrix of the form

λ γλ γλ

γλ λ γλ

γλ γλ λ

for

γ = 0.1, 0.5, 0.9, 1.5 and λ = 2, 5.γ = 1: all classes equivalent (same connectivity pattern);γ <> 1: classes are different;λ: mean value of an edge;

→ 100 repeats for each setup.

Mariadassou (AgroParisTech) Uncovering Structure in Valued Graphs ECCS07 16 / 20

Page 41: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/Ermg.ECCS07.pdf · Motivations Outline 1 Motivations 2 An Explicit Random Graph Model Some Notations

Simulation Study Quality of the estimates

Quality of the Estimates: Simulation Setup

→ Undirected graph with Q = 3 classes;

→ Poisson-valued edges;

→ n = 100, 500 vertices;

→ αq ∝ aq for a = 1, 0.5, 0.2;a = 1: balanced classes;a = 0.2: unbalanced classes (80.6%, 16.1%, 3.3%)

→ Connectivity matrix of the form

λ γλ γλ

γλ λ γλ

γλ γλ λ

for

γ = 0.1, 0.5, 0.9, 1.5 and λ = 2, 5.γ = 1: all classes equivalent (same connectivity pattern);γ <> 1: classes are different;λ: mean value of an edge;

→ 100 repeats for each setup.

Mariadassou (AgroParisTech) Uncovering Structure in Valued Graphs ECCS07 16 / 20

Page 42: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/Ermg.ECCS07.pdf · Motivations Outline 1 Motivations 2 An Explicit Random Graph Model Some Notations

Simulation Study Quality of the estimates

Quality of the Estimates: Simulation Setup

→ Undirected graph with Q = 3 classes;

→ Poisson-valued edges;

→ n = 100, 500 vertices;

→ αq ∝ aq for a = 1, 0.5, 0.2;a = 1: balanced classes;a = 0.2: unbalanced classes (80.6%, 16.1%, 3.3%)

→ Connectivity matrix of the form

λ γλ γλ

γλ λ γλ

γλ γλ λ

for

γ = 0.1, 0.5, 0.9, 1.5 and λ = 2, 5.γ = 1: all classes equivalent (same connectivity pattern);γ <> 1: classes are different;λ: mean value of an edge;

→ 100 repeats for each setup.

Mariadassou (AgroParisTech) Uncovering Structure in Valued Graphs ECCS07 16 / 20

Page 43: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/Ermg.ECCS07.pdf · Motivations Outline 1 Motivations 2 An Explicit Random Graph Model Some Notations

Simulation Study Quality of the estimates

Quality of the Estimates: Results

Root Mean Square Error (RMSE) =√

Bias2 + VarianceRMSE for the αq RMSE for the λql

1 2 30

0.05

0.1

0.15

0.2

0.25

0.3

1 2 30

0.05

0.1

0.15

0.2

0.25

0.3

1 2 30

0.05

0.1

0.15

0.2

0.25

0.3

1 2 30

0.05

0.1

0.15

0.2

0.25

0.3

1 2 30

0.05

0.1

0.15

0.2

0.25

0.3

1 2 30

0.05

0.1

0.15

0.2

0.25

0.3

2 4 60

0.5

1

1.5

2

2.5

3

2 4 60

0.5

1

1.5

2

2.5

3

2 4 60

0.5

1

1.5

2

2.5

3

2 4 60

0.5

1

1.5

2

2.5

3

2 4 60

0.5

1

1.5

2

2.5

3

2 4 60

0.5

1

1.5

2

2.5

3

x-axis: α1, α2, α3 x-axis: λ11, λ22, λ33, λ12, λ13, λ23Top: n = 100, Bottom: n = 500

Left to right: a = 1, 0.5, 0.2Solid line: λ = 5, dashed line: λ = 2

Symbols depend on γ: ◦ = 0.1, O = 0.5, M= 0.9, ∗ = 1.5Mariadassou (AgroParisTech) Uncovering Structure in Valued Graphs ECCS07 17 / 20

Page 44: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/Ermg.ECCS07.pdf · Motivations Outline 1 Motivations 2 An Explicit Random Graph Model Some Notations

Simulation Study Quality of the estimates

Quality of the Estimates: Results

Root Mean Square Error (RMSE) =√

Bias2 + VarianceRMSE for the αq RMSE for the λql

1 2 30

0.05

0.1

0.15

0.2

0.25

0.3

1 2 30

0.05

0.1

0.15

0.2

0.25

0.3

1 2 30

0.05

0.1

0.15

0.2

0.25

0.3

1 2 30

0.05

0.1

0.15

0.2

0.25

0.3

1 2 30

0.05

0.1

0.15

0.2

0.25

0.3

1 2 30

0.05

0.1

0.15

0.2

0.25

0.3

2 4 60

0.5

1

1.5

2

2.5

3

2 4 60

0.5

1

1.5

2

2.5

3

2 4 60

0.5

1

1.5

2

2.5

3

2 4 60

0.5

1

1.5

2

2.5

3

2 4 60

0.5

1

1.5

2

2.5

3

2 4 60

0.5

1

1.5

2

2.5

3

x-axis: α1, α2, α3 x-axis: λ11, λ22, λ33, λ12, λ13, λ23Top: n = 100, Bottom: n = 500

Left to right: a = 1, 0.5, 0.2Solid line: λ = 5, dashed line: λ = 2

Symbols depend on γ: ◦ = 0.1, O = 0.5, M= 0.9, ∗ = 1.5Mariadassou (AgroParisTech) Uncovering Structure in Valued Graphs ECCS07 17 / 20

Page 45: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/Ermg.ECCS07.pdf · Motivations Outline 1 Motivations 2 An Explicit Random Graph Model Some Notations

Simulation Study Number of Classes

Number of Classes: Simulation Setup

→ Undirected graph with Q? = 3 classes;

→ Poisson-valued edges;

→ n = 50, 100, 500, 1000 vertices;

→ αq = (57.1%, 28, 6%, 14, 3%) (or a = 0.5);

→ λ = 2, γ = 0.5;

→ Retrieve Q that maximizes ICL;

→ 100 repeats for each value of n;

Mariadassou (AgroParisTech) Uncovering Structure in Valued Graphs ECCS07 18 / 20

Page 46: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/Ermg.ECCS07.pdf · Motivations Outline 1 Motivations 2 An Explicit Random Graph Model Some Notations

Simulation Study Number of Classes

Number of Classes: Results

1 2 3 4 5 6 7 8 9 10−2350

−2300

−2250

−2200

−2150

−2100

−2050

1 2 3 4 5 6 7 8 9 10−9000

−8900

−8800

−8700

−8600

−8500

−8400

−8300

−8200

1 2 3 4 5 6 7 8 9 10−2.26

−2.24

−2.22

−2.2

−2.18

−2.16

−2.14

−2.12

−2.1

−2.08

−2.06x 10

5

1 2 3 4 5−9

−8.9

−8.8

−8.7

−8.6

−8.5

−8.4

−8.3x 10

5

Figure: Mean ICL and 90% confidence interval as a function of Q = 1..10.From left to right: n = 50, n = 100, n = 500, n = 1000.

Qn 2 3 4

50 82 17 1100 7 90 3500 0 100 0

1000 0 100 0

Table: Frequency (in %) at which Q is selected for various n.Mariadassou (AgroParisTech) Uncovering Structure in Valued Graphs ECCS07 19 / 20

Page 47: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/Ermg.ECCS07.pdf · Motivations Outline 1 Motivations 2 An Explicit Random Graph Model Some Notations

Simulation Study Number of Classes

Number of Classes: Results

1 2 3 4 5 6 7 8 9 10−2350

−2300

−2250

−2200

−2150

−2100

−2050

1 2 3 4 5 6 7 8 9 10−9000

−8900

−8800

−8700

−8600

−8500

−8400

−8300

−8200

1 2 3 4 5 6 7 8 9 10−2.26

−2.24

−2.22

−2.2

−2.18

−2.16

−2.14

−2.12

−2.1

−2.08

−2.06x 10

5

1 2 3 4 5−9

−8.9

−8.8

−8.7

−8.6

−8.5

−8.4

−8.3x 10

5

Figure: Mean ICL and 90% confidence interval as a function of Q = 1..10.From left to right: n = 50, n = 100, n = 500, n = 1000.

Qn 2 3 4

50 82 17 1100 7 90 3500 0 100 0

1000 0 100 0

Table: Frequency (in %) at which Q is selected for various n.Mariadassou (AgroParisTech) Uncovering Structure in Valued Graphs ECCS07 19 / 20

Page 48: Uncovering Latent Structure in Valued Graphsgenome.jouy.inra.fr/~mmariadasso/downloads/Ermg.ECCS07.pdf · Motivations Outline 1 Motivations 2 An Explicit Random Graph Model Some Notations

Summary

Summary

Flexibility of ERMGProbabilistic model which captures features of real-networks,Models various network topologies,A promising alternative to existing methods.

Estimation and Model selectionVariational approaches to compute approximate MLE whendependencies are complex,A statistical criterion to choose the number of classes (ICL).

ExtensionsNetwork motifs

Mariadassou (AgroParisTech) Uncovering Structure in Valued Graphs ECCS07 20 / 20