scalable mcmc in degree corrected stochastic block...

51
Scalable MCMC in degree corrected stochastic block model Soumyasundar Pal Dept. of Electrical and Computer Engineering McGill University, Montr´ eal, Qu´ ebec, Canada February 11, 2019

Upload: others

Post on 15-Aug-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Scalable MCMC in degree corrected stochastic block modelnetworks.ece.mcgill.ca/.../Soumya...presentation.pdf · Scalable MCMC in degree corrected stochastic block model Soumyasundar

Scalable MCMC in degree corrected stochasticblock model

Soumyasundar Pal

Dept. of Electrical and Computer EngineeringMcGill University,

Montreal, Quebec, Canada

February 11, 2019

Page 2: Scalable MCMC in degree corrected stochastic block modelnetworks.ece.mcgill.ca/.../Soumya...presentation.pdf · Scalable MCMC in degree corrected stochastic block model Soumyasundar

Introduction

Community detection from networks

Academic collaboration, protein interaction, social networks

Community: dense internal and sparse external connections

Earlier approaches: hierarchical clustering, modularity

optimization, spectral clustering, clique percolation

Challenges: handling sparsity, scalability

2

Page 3: Scalable MCMC in degree corrected stochastic block modelnetworks.ece.mcgill.ca/.../Soumya...presentation.pdf · Scalable MCMC in degree corrected stochastic block model Soumyasundar

Introduction

Heuristic objective function, greedy optimization

Plethora of techniques1 2

Numerous quality metrics3

Principled approach: statistical modelling of community

structures

1S. Fortunato, “Community detection in graphs,” Phys. Rep., vol. 486, pp. 75–174, Feb. 2010.2S. Parthasarathy, Y. Ruan, and V. Satuluri, “Community discovery in social networks: Applications, methods

and emerging trends,” in Social Network Data Analytics, pp. 79–113. Springer US, Boston, MA, Mar. 2011.3T. Chakraborty, A. Dalmia, A. Mukherjee, and N. Ganguly,“Metrics for community analysis: a survey, ACM

Comput. Surv., vol. 50, no. 4, pp. 1–37, Aug. 2017.

3

Page 4: Scalable MCMC in degree corrected stochastic block modelnetworks.ece.mcgill.ca/.../Soumya...presentation.pdf · Scalable MCMC in degree corrected stochastic block model Soumyasundar

Stochastic Block Model (SBM)

Connectivity depends on community membership4.

Stochastic equivalence of nodes within same community

N: no. nodes, K : no. communities

ci : membership of node i

ci ∈ {1, 2, ...,K}, C = {ci}Ni=1

yab ∈ {0, 1}: (a, b)’th entry in adj.

matrix

βk` ∈ (0, 1): link probability between

two nodes in community k and `

yab|(ca = k , cb = `) ∼ Bernoulli(βk`)

4E. Abbe, “Community detection and stochastic block models, Found. and Trends Commun. and Inform.Theory, vol. 14, no.1-2, pp. 1162, Jun. 2018.

4

Page 5: Scalable MCMC in degree corrected stochastic block modelnetworks.ece.mcgill.ca/.../Soumya...presentation.pdf · Scalable MCMC in degree corrected stochastic block model Soumyasundar

Stochastic Block Model (SBM)

Connectivity depends on community membership4.

Stochastic equivalence of nodes within same community

N: no. nodes, K : no. communities

ci : membership of node i

ci ∈ {1, 2, ...,K}, C = {ci}Ni=1

yab ∈ {0, 1}: (a, b)’th entry in adj.

matrix

βk` ∈ (0, 1): link probability between

two nodes in community k and `

yab|(ca = k , cb = `) ∼ Bernoulli(βk`)

4E. Abbe, “Community detection and stochastic block models, Found. and Trends Commun. and Inform.Theory, vol. 14, no.1-2, pp. 1162, Jun. 2018.

4

Page 6: Scalable MCMC in degree corrected stochastic block modelnetworks.ece.mcgill.ca/.../Soumya...presentation.pdf · Scalable MCMC in degree corrected stochastic block model Soumyasundar

Stochastic Block Model (SBM)

Connectivity depends on community membership4.

Stochastic equivalence of nodes within same community

N: no. nodes, K : no. communities

ci : membership of node i

ci ∈ {1, 2, ...,K}, C = {ci}Ni=1

yab ∈ {0, 1}: (a, b)’th entry in adj.

matrix

βk` ∈ (0, 1): link probability between

two nodes in community k and `

yab|(ca = k , cb = `) ∼ Bernoulli(βk`)

4E. Abbe, “Community detection and stochastic block models, Found. and Trends Commun. and Inform.Theory, vol. 14, no.1-2, pp. 1162, Jun. 2018.

4

Page 7: Scalable MCMC in degree corrected stochastic block modelnetworks.ece.mcgill.ca/.../Soumya...presentation.pdf · Scalable MCMC in degree corrected stochastic block model Soumyasundar

Stochastic Block Model (SBM)

Connectivity depends on community membership4.

Stochastic equivalence of nodes within same community

N: no. nodes, K : no. communities

ci : membership of node i

ci ∈ {1, 2, ...,K}, C = {ci}Ni=1

yab ∈ {0, 1}: (a, b)’th entry in adj.

matrix

βk` ∈ (0, 1): link probability between

two nodes in community k and `

yab|(ca = k , cb = `) ∼ Bernoulli(βk`)

4E. Abbe, “Community detection and stochastic block models, Found. and Trends Commun. and Inform.Theory, vol. 14, no.1-2, pp. 1162, Jun. 2018.

4

Page 8: Scalable MCMC in degree corrected stochastic block modelnetworks.ece.mcgill.ca/.../Soumya...presentation.pdf · Scalable MCMC in degree corrected stochastic block model Soumyasundar

Stochastic Block Model (SBM)5

ci ∈ {1, 2, ...,K}, C = {ci}Ni=1

yab ∈ {0, 1, 2, ...}: (a, b)’th entry in adj. matrix

ωk` ∈ R+: average number of links between two nodes incommunity k and `

yab|(ca = k , cb = `) ∼ Poisson(ωk`) ,

L(y|C, ω) =∑k,`

(mk` logωk` − nkn`ωk`) ,

where, mk` =∑a,b

yab1{ca=k,cb=`} and nk =∑a

1{ca=k}

5B. Karrer and M. E. J. Newman, “Stochastic blockmodels and community structure in networks, Phys. Rev.E, vol. 83,no. 1, pp. 016107, Jan. 2011.

5

Page 9: Scalable MCMC in degree corrected stochastic block modelnetworks.ece.mcgill.ca/.../Soumya...presentation.pdf · Scalable MCMC in degree corrected stochastic block model Soumyasundar

Stochastic Block Model (SBM)5

ci ∈ {1, 2, ...,K}, C = {ci}Ni=1

yab ∈ {0, 1, 2, ...}: (a, b)’th entry in adj. matrix

ωk` ∈ R+: average number of links between two nodes incommunity k and `

yab|(ca = k , cb = `) ∼ Poisson(ωk`) ,

L(y|C, ω) =∑k,`

(mk` logωk` − nkn`ωk`) ,

where, mk` =∑a,b

yab1{ca=k,cb=`} and nk =∑a

1{ca=k}

5B. Karrer and M. E. J. Newman, “Stochastic blockmodels and community structure in networks, Phys. Rev.E, vol. 83,no. 1, pp. 016107, Jan. 2011.

5

Page 10: Scalable MCMC in degree corrected stochastic block modelnetworks.ece.mcgill.ca/.../Soumya...presentation.pdf · Scalable MCMC in degree corrected stochastic block model Soumyasundar

Stochastic Block Model (SBM)5

ci ∈ {1, 2, ...,K}, C = {ci}Ni=1

yab ∈ {0, 1, 2, ...}: (a, b)’th entry in adj. matrix

ωk` ∈ R+: average number of links between two nodes incommunity k and `

yab|(ca = k , cb = `) ∼ Poisson(ωk`) ,

L(y|C, ω) =∑k,`

(mk` logωk` − nkn`ωk`) ,

where, mk` =∑a,b

yab1{ca=k,cb=`} and nk =∑a

1{ca=k}

5B. Karrer and M. E. J. Newman, “Stochastic blockmodels and community structure in networks, Phys. Rev.E, vol. 83,no. 1, pp. 016107, Jan. 2011.

5

Page 11: Scalable MCMC in degree corrected stochastic block modelnetworks.ece.mcgill.ca/.../Soumya...presentation.pdf · Scalable MCMC in degree corrected stochastic block model Soumyasundar

Stochastic Block Model (SBM)

ML estimate:

ωk` =mk`

nkn`

L(y|C) = maxωL(y|C, ω) =

∑k,`

(mk`

m

)log

(mk`/m)

(nkn`/N2),

where, m =∑k,`

mk,` .

Greedy algorithm

pick a random node, place it in a community to maximally increase

the objective.

6

Page 12: Scalable MCMC in degree corrected stochastic block modelnetworks.ece.mcgill.ca/.../Soumya...presentation.pdf · Scalable MCMC in degree corrected stochastic block model Soumyasundar

Stochastic Block Model (SBM)

ML estimate:

ωk` =mk`

nkn`

L(y|C) = maxωL(y|C, ω) =

∑k,`

(mk`

m

)log

(mk`/m)

(nkn`/N2),

where, m =∑k,`

mk,` .

Greedy algorithm

pick a random node, place it in a community to maximally increase

the objective.

6

Page 13: Scalable MCMC in degree corrected stochastic block modelnetworks.ece.mcgill.ca/.../Soumya...presentation.pdf · Scalable MCMC in degree corrected stochastic block model Soumyasundar

Stochastic Block Model (SBM)

ML estimate:

ωk` =mk`

nkn`

L(y|C) = maxωL(y|C, ω) =

∑k,`

(mk`

m

)log

(mk`/m)

(nkn`/N2),

where, m =∑k,`

mk,` .

Greedy algorithm

pick a random node, place it in a community to maximally increase

the objective.

6

Page 14: Scalable MCMC in degree corrected stochastic block modelnetworks.ece.mcgill.ca/.../Soumya...presentation.pdf · Scalable MCMC in degree corrected stochastic block model Soumyasundar

Degree Corrected Stochastic Block Model (DC-SBM)

degree heterogeneity within community

θa ∈ (0, 1): degree correction parameters

yab|(ca = k , cb = `) ∼ Poisson(θaθbωk`) ,∑a

θa1{ca=k} = 1 .

L(y|C) = maxω,θL(y|C, ω, θ) =

∑k,`

(mk`

m

)log

(mk`/m)

(κkκ`/m2),

where, κk =∑a

da1{ca=k} .

7

Page 15: Scalable MCMC in degree corrected stochastic block modelnetworks.ece.mcgill.ca/.../Soumya...presentation.pdf · Scalable MCMC in degree corrected stochastic block model Soumyasundar

Degree Corrected Stochastic Block Model (DC-SBM)

degree heterogeneity within community

θa ∈ (0, 1): degree correction parameters

yab|(ca = k , cb = `) ∼ Poisson(θaθbωk`) ,∑a

θa1{ca=k} = 1 .

L(y|C) = maxω,θL(y|C, ω, θ) =

∑k,`

(mk`

m

)log

(mk`/m)

(κkκ`/m2),

where, κk =∑a

da1{ca=k} .

7

Page 16: Scalable MCMC in degree corrected stochastic block modelnetworks.ece.mcgill.ca/.../Soumya...presentation.pdf · Scalable MCMC in degree corrected stochastic block model Soumyasundar

Degree Corrected Stochastic Block Model (DC-SBM)

degree heterogeneity within community

θa ∈ (0, 1): degree correction parameters

yab|(ca = k , cb = `) ∼ Poisson(θaθbωk`) ,∑a

θa1{ca=k} = 1 .

L(y|C) = maxω,θL(y|C, ω, θ) =

∑k,`

(mk`

m

)log

(mk`/m)

(κkκ`/m2),

where, κk =∑a

da1{ca=k} .

7

Page 17: Scalable MCMC in degree corrected stochastic block modelnetworks.ece.mcgill.ca/.../Soumya...presentation.pdf · Scalable MCMC in degree corrected stochastic block model Soumyasundar

Degree Corrected Stochastic Block Model (DC-SBM)

degree heterogeneity within community

θa ∈ (0, 1): degree correction parameters

yab|(ca = k , cb = `) ∼ Poisson(θaθbωk`) ,∑a

θa1{ca=k} = 1 .

L(y|C) = maxω,θL(y|C, ω, θ) =

∑k,`

(mk`

m

)log

(mk`/m)

(κkκ`/m2),

where, κk =∑a

da1{ca=k} .

7

Page 18: Scalable MCMC in degree corrected stochastic block modelnetworks.ece.mcgill.ca/.../Soumya...presentation.pdf · Scalable MCMC in degree corrected stochastic block model Soumyasundar

Degree Corrected Stochastic Block Model (DC-SBM)

degree heterogeneity within community

θa ∈ (0, 1): degree correction parameters

yab|(ca = k , cb = `) ∼ Poisson(θaθbωk`) ,∑a

θa1{ca=k} = 1 .

L(y|C) = maxω,θL(y|C, ω, θ) =

∑k,`

(mk`

m

)log

(mk`/m)

(κkκ`/m2),

where, κk =∑a

da1{ca=k} .

7

Page 19: Scalable MCMC in degree corrected stochastic block modelnetworks.ece.mcgill.ca/.../Soumya...presentation.pdf · Scalable MCMC in degree corrected stochastic block model Soumyasundar

Mixed Membership Stochastic Blockmodel (MMSB)

Overlapping communities6

Community membership probability: πa ∈ (0, 1)K ,K∑

k=1

πak = 1

Prior distributions: βk` ∼ Beta(η), πa ∼ Dir(α)

Generative Model

for any two nodes a and b :

sample Zab ∼ πa and Zba ∼ πbsample yab|(Zab = k ,Zba = `) ∼ Bernoulli(βk`)

Posterior inference of p(β, π|y)

assortative MMSB (a-MMSB): βk` = δ for k 6= `

6E. M. Airoldi, D. M. Blei, S. E. Fienberg, and E. P. Xing, “Mixed membership stochastic blockmodels,”in J. Mach. Learn.Res., vol. 9, pp. 19812014, Jun. 2008.

8

Page 20: Scalable MCMC in degree corrected stochastic block modelnetworks.ece.mcgill.ca/.../Soumya...presentation.pdf · Scalable MCMC in degree corrected stochastic block model Soumyasundar

Mixed Membership Stochastic Blockmodel (MMSB)

Overlapping communities6

Community membership probability: πa ∈ (0, 1)K ,K∑

k=1

πak = 1

Prior distributions: βk` ∼ Beta(η), πa ∼ Dir(α)

Generative Model

for any two nodes a and b :

sample Zab ∼ πa and Zba ∼ πbsample yab|(Zab = k ,Zba = `) ∼ Bernoulli(βk`)

Posterior inference of p(β, π|y)

assortative MMSB (a-MMSB): βk` = δ for k 6= `

6E. M. Airoldi, D. M. Blei, S. E. Fienberg, and E. P. Xing, “Mixed membership stochastic blockmodels,”in J. Mach. Learn.Res., vol. 9, pp. 19812014, Jun. 2008.

8

Page 21: Scalable MCMC in degree corrected stochastic block modelnetworks.ece.mcgill.ca/.../Soumya...presentation.pdf · Scalable MCMC in degree corrected stochastic block model Soumyasundar

Mixed Membership Stochastic Blockmodel (MMSB)

Overlapping communities6

Community membership probability: πa ∈ (0, 1)K ,K∑

k=1

πak = 1

Prior distributions: βk` ∼ Beta(η), πa ∼ Dir(α)

Generative Model

for any two nodes a and b :

sample Zab ∼ πa and Zba ∼ πbsample yab|(Zab = k ,Zba = `) ∼ Bernoulli(βk`)

Posterior inference of p(β, π|y)

assortative MMSB (a-MMSB): βk` = δ for k 6= `

6E. M. Airoldi, D. M. Blei, S. E. Fienberg, and E. P. Xing, “Mixed membership stochastic blockmodels,”in J. Mach. Learn.Res., vol. 9, pp. 19812014, Jun. 2008.

8

Page 22: Scalable MCMC in degree corrected stochastic block modelnetworks.ece.mcgill.ca/.../Soumya...presentation.pdf · Scalable MCMC in degree corrected stochastic block model Soumyasundar

Mixed Membership Stochastic Blockmodel (MMSB)

Overlapping communities6

Community membership probability: πa ∈ (0, 1)K ,K∑

k=1

πak = 1

Prior distributions: βk` ∼ Beta(η), πa ∼ Dir(α)

Generative Model

for any two nodes a and b :

sample Zab ∼ πa and Zba ∼ πbsample yab|(Zab = k ,Zba = `) ∼ Bernoulli(βk`)

Posterior inference of p(β, π|y)

assortative MMSB (a-MMSB): βk` = δ for k 6= `

6E. M. Airoldi, D. M. Blei, S. E. Fienberg, and E. P. Xing, “Mixed membership stochastic blockmodels,”in J. Mach. Learn.Res., vol. 9, pp. 19812014, Jun. 2008.

8

Page 23: Scalable MCMC in degree corrected stochastic block modelnetworks.ece.mcgill.ca/.../Soumya...presentation.pdf · Scalable MCMC in degree corrected stochastic block model Soumyasundar

Mixed Membership Stochastic Blockmodel (MMSB)

Overlapping communities6

Community membership probability: πa ∈ (0, 1)K ,K∑

k=1

πak = 1

Prior distributions: βk` ∼ Beta(η), πa ∼ Dir(α)

Generative Model

for any two nodes a and b :

sample Zab ∼ πa and Zba ∼ πbsample yab|(Zab = k ,Zba = `) ∼ Bernoulli(βk`)

Posterior inference of p(β, π|y)

assortative MMSB (a-MMSB): βk` = δ for k 6= `

6E. M. Airoldi, D. M. Blei, S. E. Fienberg, and E. P. Xing, “Mixed membership stochastic blockmodels,”in J. Mach. Learn.Res., vol. 9, pp. 19812014, Jun. 2008.

8

Page 24: Scalable MCMC in degree corrected stochastic block modelnetworks.ece.mcgill.ca/.../Soumya...presentation.pdf · Scalable MCMC in degree corrected stochastic block model Soumyasundar

Inference in a-MMSB

Variational inference7

– Mean field approximation

– Stochastic gradient optimization

– Outperforms traditional techniques

Markov chain Monte Carlo8

– Stochastic gradient Riemannian Langevin dynamics (SGRLD)

– Faster convergence

– Better approximation of posterior

7P. K. Gopalan, S. Gerrish, M. Freedman, D. M. Blei, and D. M. Mimno, “Scalable inference of overlappingcommunities,” in Proc. Adv. Neural Inf. Proc. Systems, Dec. 2012.

8W. Li, S. Ahn, and M. Welling, “Scalable MCMC for mixed membership stochastic blockmodels,” in Proc.Artificial Intell.and Statist., May 2016.

9

Page 25: Scalable MCMC in degree corrected stochastic block modelnetworks.ece.mcgill.ca/.../Soumya...presentation.pdf · Scalable MCMC in degree corrected stochastic block model Soumyasundar

Inference in a-MMSB

Variational inference7

– Mean field approximation

– Stochastic gradient optimization

– Outperforms traditional techniques

Markov chain Monte Carlo8

– Stochastic gradient Riemannian Langevin dynamics (SGRLD)

– Faster convergence

– Better approximation of posterior

7P. K. Gopalan, S. Gerrish, M. Freedman, D. M. Blei, and D. M. Mimno, “Scalable inference of overlappingcommunities,” in Proc. Adv. Neural Inf. Proc. Systems, Dec. 2012.

8W. Li, S. Ahn, and M. Welling, “Scalable MCMC for mixed membership stochastic blockmodels,” in Proc.Artificial Intell.and Statist., May 2016.

9

Page 26: Scalable MCMC in degree corrected stochastic block modelnetworks.ece.mcgill.ca/.../Soumya...presentation.pdf · Scalable MCMC in degree corrected stochastic block model Soumyasundar

Inference in a-MMSB

Variational inference7

– Mean field approximation

– Stochastic gradient optimization

– Outperforms traditional techniques

Markov chain Monte Carlo8

– Stochastic gradient Riemannian Langevin dynamics (SGRLD)

– Faster convergence

– Better approximation of posterior

7P. K. Gopalan, S. Gerrish, M. Freedman, D. M. Blei, and D. M. Mimno, “Scalable inference of overlappingcommunities,” in Proc. Adv. Neural Inf. Proc. Systems, Dec. 2012.

8W. Li, S. Ahn, and M. Welling, “Scalable MCMC for mixed membership stochastic blockmodels,” in Proc.Artificial Intell.and Statist., May 2016.

9

Page 27: Scalable MCMC in degree corrected stochastic block modelnetworks.ece.mcgill.ca/.../Soumya...presentation.pdf · Scalable MCMC in degree corrected stochastic block model Soumyasundar

Inference in a-MMSB

Variational inference7

– Mean field approximation

– Stochastic gradient optimization

– Outperforms traditional techniques

Markov chain Monte Carlo8

– Stochastic gradient Riemannian Langevin dynamics (SGRLD)

– Faster convergence

– Better approximation of posterior

7P. K. Gopalan, S. Gerrish, M. Freedman, D. M. Blei, and D. M. Mimno, “Scalable inference of overlappingcommunities,” in Proc. Adv. Neural Inf. Proc. Systems, Dec. 2012.

8W. Li, S. Ahn, and M. Welling, “Scalable MCMC for mixed membership stochastic blockmodels,” in Proc.Artificial Intell.and Statist., May 2016.

9

Page 28: Scalable MCMC in degree corrected stochastic block modelnetworks.ece.mcgill.ca/.../Soumya...presentation.pdf · Scalable MCMC in degree corrected stochastic block model Soumyasundar

Mixed Membership Degree Corrected Blockmodel(MMDCB)9

Generalization of a-MMSB

Node specific degree correction parameters: ra ∈ RCommunity specific parameters: qk > 0

Generative Model

for any two nodes a and b :

sample Zab ∼ πa and Zba ∼ πbif Zab = Zba = k :

sample yab ∼ Bernoulli(logit−1(qk + ra + rb))

else:

sample yab ∼ Bernoulli(logit−1(ra + rb))

Prior distributions: ra ∼ N (0, σ2), qk ∼ N (0, σ2)1{qk>0}Posterior inference of p(q, r, π|y)

9S. Pal and M. Coates, “Scalable MCMC in degree corrected stochastic block model,” in Proc. Intl. Conf.Acoust., Speech and Signal Proc, May 2019 (Accepted).

10

Page 29: Scalable MCMC in degree corrected stochastic block modelnetworks.ece.mcgill.ca/.../Soumya...presentation.pdf · Scalable MCMC in degree corrected stochastic block model Soumyasundar

Mixed Membership Degree Corrected Blockmodel(MMDCB)9

Generalization of a-MMSB

Node specific degree correction parameters: ra ∈ R

Community specific parameters: qk > 0

Generative Model

for any two nodes a and b :

sample Zab ∼ πa and Zba ∼ πbif Zab = Zba = k :

sample yab ∼ Bernoulli(logit−1(qk + ra + rb))

else:

sample yab ∼ Bernoulli(logit−1(ra + rb))

Prior distributions: ra ∼ N (0, σ2), qk ∼ N (0, σ2)1{qk>0}Posterior inference of p(q, r, π|y)

9S. Pal and M. Coates, “Scalable MCMC in degree corrected stochastic block model,” in Proc. Intl. Conf.Acoust., Speech and Signal Proc, May 2019 (Accepted).

10

Page 30: Scalable MCMC in degree corrected stochastic block modelnetworks.ece.mcgill.ca/.../Soumya...presentation.pdf · Scalable MCMC in degree corrected stochastic block model Soumyasundar

Mixed Membership Degree Corrected Blockmodel(MMDCB)9

Generalization of a-MMSB

Node specific degree correction parameters: ra ∈ RCommunity specific parameters: qk > 0

Generative Model

for any two nodes a and b :

sample Zab ∼ πa and Zba ∼ πbif Zab = Zba = k :

sample yab ∼ Bernoulli(logit−1(qk + ra + rb))

else:

sample yab ∼ Bernoulli(logit−1(ra + rb))

Prior distributions: ra ∼ N (0, σ2), qk ∼ N (0, σ2)1{qk>0}Posterior inference of p(q, r, π|y)

9S. Pal and M. Coates, “Scalable MCMC in degree corrected stochastic block model,” in Proc. Intl. Conf.Acoust., Speech and Signal Proc, May 2019 (Accepted).

10

Page 31: Scalable MCMC in degree corrected stochastic block modelnetworks.ece.mcgill.ca/.../Soumya...presentation.pdf · Scalable MCMC in degree corrected stochastic block model Soumyasundar

Mixed Membership Degree Corrected Blockmodel(MMDCB)9

Generalization of a-MMSB

Node specific degree correction parameters: ra ∈ RCommunity specific parameters: qk > 0

Generative Model

for any two nodes a and b :

sample Zab ∼ πa and Zba ∼ πbif Zab = Zba = k :

sample yab ∼ Bernoulli(logit−1(qk + ra + rb))

else:

sample yab ∼ Bernoulli(logit−1(ra + rb))

Prior distributions: ra ∼ N (0, σ2), qk ∼ N (0, σ2)1{qk>0}Posterior inference of p(q, r, π|y)

9S. Pal and M. Coates, “Scalable MCMC in degree corrected stochastic block model,” in Proc. Intl. Conf.Acoust., Speech and Signal Proc, May 2019 (Accepted).

10

Page 32: Scalable MCMC in degree corrected stochastic block modelnetworks.ece.mcgill.ca/.../Soumya...presentation.pdf · Scalable MCMC in degree corrected stochastic block model Soumyasundar

Mixed Membership Degree Corrected Blockmodel(MMDCB)9

Generalization of a-MMSB

Node specific degree correction parameters: ra ∈ RCommunity specific parameters: qk > 0

Generative Model

for any two nodes a and b :

sample Zab ∼ πa and Zba ∼ πbif Zab = Zba = k :

sample yab ∼ Bernoulli(logit−1(qk + ra + rb))

else:

sample yab ∼ Bernoulli(logit−1(ra + rb))

Prior distributions: ra ∼ N (0, σ2), qk ∼ N (0, σ2)1{qk>0}Posterior inference of p(q, r, π|y)

9S. Pal and M. Coates, “Scalable MCMC in degree corrected stochastic block model,” in Proc. Intl. Conf.Acoust., Speech and Signal Proc, May 2019 (Accepted).

10

Page 33: Scalable MCMC in degree corrected stochastic block modelnetworks.ece.mcgill.ca/.../Soumya...presentation.pdf · Scalable MCMC in degree corrected stochastic block model Soumyasundar

Langevin Monte Carlo

Metropolis adjusted Langevin algorithm (MALA)

parameter θ, observed data X = {x1, x2, ..., xN}

prior distribution p(θ), generative model p(X|θ) =N∏i=1

p(xi |θ)

posterior distribution: p(θ|X) ∝ p(θ)N∏i=1

p(xi |θ)

q(θ∗|θ) = N (θ∗|θ +ε

2

(∇θ log p(θ) +

N∑i=1

∇θ log p(xi |θ)), εI )

acceptance probability: min

(1,

p(θ∗|x)q(θ|θ∗)p(θ|x)q(θ∗|θ)

)

Preconditioning: Riemannian Langevin Dynamics (RLD)

11

Page 34: Scalable MCMC in degree corrected stochastic block modelnetworks.ece.mcgill.ca/.../Soumya...presentation.pdf · Scalable MCMC in degree corrected stochastic block model Soumyasundar

Langevin Monte Carlo

Metropolis adjusted Langevin algorithm (MALA)

parameter θ, observed data X = {x1, x2, ..., xN}

prior distribution p(θ), generative model p(X|θ) =N∏i=1

p(xi |θ)

posterior distribution: p(θ|X) ∝ p(θ)N∏i=1

p(xi |θ)

q(θ∗|θ) = N (θ∗|θ +ε

2

(∇θ log p(θ) +

N∑i=1

∇θ log p(xi |θ)), εI )

acceptance probability: min

(1,

p(θ∗|x)q(θ|θ∗)p(θ|x)q(θ∗|θ)

)

Preconditioning: Riemannian Langevin Dynamics (RLD)

11

Page 35: Scalable MCMC in degree corrected stochastic block modelnetworks.ece.mcgill.ca/.../Soumya...presentation.pdf · Scalable MCMC in degree corrected stochastic block model Soumyasundar

Langevin Monte Carlo

Metropolis adjusted Langevin algorithm (MALA)

parameter θ, observed data X = {x1, x2, ..., xN}

prior distribution p(θ), generative model p(X|θ) =N∏i=1

p(xi |θ)

posterior distribution: p(θ|X) ∝ p(θ)N∏i=1

p(xi |θ)

q(θ∗|θ) = N (θ∗|θ +ε

2

(∇θ log p(θ) +

N∑i=1

∇θ log p(xi |θ)), εI )

acceptance probability: min

(1,

p(θ∗|x)q(θ|θ∗)p(θ|x)q(θ∗|θ)

)

Preconditioning: Riemannian Langevin Dynamics (RLD)

11

Page 36: Scalable MCMC in degree corrected stochastic block modelnetworks.ece.mcgill.ca/.../Soumya...presentation.pdf · Scalable MCMC in degree corrected stochastic block model Soumyasundar

Langevin Monte Carlo

Metropolis adjusted Langevin algorithm (MALA)

parameter θ, observed data X = {x1, x2, ..., xN}

prior distribution p(θ), generative model p(X|θ) =N∏i=1

p(xi |θ)

posterior distribution: p(θ|X) ∝ p(θ)N∏i=1

p(xi |θ)

q(θ∗|θ) = N (θ∗|θ +ε

2

(∇θ log p(θ) +

N∑i=1

∇θ log p(xi |θ)), εI )

acceptance probability: min

(1,

p(θ∗|x)q(θ|θ∗)p(θ|x)q(θ∗|θ)

)

Preconditioning: Riemannian Langevin Dynamics (RLD)

11

Page 37: Scalable MCMC in degree corrected stochastic block modelnetworks.ece.mcgill.ca/.../Soumya...presentation.pdf · Scalable MCMC in degree corrected stochastic block model Soumyasundar

Langevin Monte Carlo

Metropolis adjusted Langevin algorithm (MALA)

parameter θ, observed data X = {x1, x2, ..., xN}

prior distribution p(θ), generative model p(X|θ) =N∏i=1

p(xi |θ)

posterior distribution: p(θ|X) ∝ p(θ)N∏i=1

p(xi |θ)

q(θ∗|θ) = N (θ∗|θ +ε

2

(∇θ log p(θ) +

N∑i=1

∇θ log p(xi |θ)), εI )

acceptance probability: min

(1,

p(θ∗|x)q(θ|θ∗)p(θ|x)q(θ∗|θ)

)

Preconditioning: Riemannian Langevin Dynamics (RLD)

11

Page 38: Scalable MCMC in degree corrected stochastic block modelnetworks.ece.mcgill.ca/.../Soumya...presentation.pdf · Scalable MCMC in degree corrected stochastic block model Soumyasundar

Langevin Monte Carlo

Metropolis adjusted Langevin algorithm (MALA)

parameter θ, observed data X = {x1, x2, ..., xN}

prior distribution p(θ), generative model p(X|θ) =N∏i=1

p(xi |θ)

posterior distribution: p(θ|X) ∝ p(θ)N∏i=1

p(xi |θ)

q(θ∗|θ) = N (θ∗|θ +ε

2

(∇θ log p(θ) +

N∑i=1

∇θ log p(xi |θ)), εI )

acceptance probability: min

(1,

p(θ∗|x)q(θ|θ∗)p(θ|x)q(θ∗|θ)

)

Preconditioning: Riemannian Langevin Dynamics (RLD) 11

Page 39: Scalable MCMC in degree corrected stochastic block modelnetworks.ece.mcgill.ca/.../Soumya...presentation.pdf · Scalable MCMC in degree corrected stochastic block model Soumyasundar

Langevin Monte Carlo

Stochastic gradient Langevin dynamics (SGLD)

complexity in LD: O(N)

stochastic gradient: O(n) complexity

∇θ log p(X|θ) ≈ N

n

∑xti∈Xt

∇θ log p(xti |θ)

annealed step-size schedule∞∑t=1

εt =∞ and∞∑t=1

ε2t <∞

no acceptance probability computation

asymptotic convergence10 to the posterior distribution

10M. Welling and Y. W. Teh, “Bayesian learning via stochastic gradient Langevin dynamics,”in Proc. Intl. Conf. Machine Learning, Bellevue, WA, USA, Jun. 2011. 12

Page 40: Scalable MCMC in degree corrected stochastic block modelnetworks.ece.mcgill.ca/.../Soumya...presentation.pdf · Scalable MCMC in degree corrected stochastic block model Soumyasundar

Langevin Monte Carlo

Stochastic gradient Langevin dynamics (SGLD)

complexity in LD: O(N)

stochastic gradient: O(n) complexity

∇θ log p(X|θ) ≈ N

n

∑xti∈Xt

∇θ log p(xti |θ)

annealed step-size schedule∞∑t=1

εt =∞ and∞∑t=1

ε2t <∞

no acceptance probability computation

asymptotic convergence10 to the posterior distribution

10M. Welling and Y. W. Teh, “Bayesian learning via stochastic gradient Langevin dynamics,”in Proc. Intl. Conf. Machine Learning, Bellevue, WA, USA, Jun. 2011. 12

Page 41: Scalable MCMC in degree corrected stochastic block modelnetworks.ece.mcgill.ca/.../Soumya...presentation.pdf · Scalable MCMC in degree corrected stochastic block model Soumyasundar

Langevin Monte Carlo

Stochastic gradient Langevin dynamics (SGLD)

complexity in LD: O(N)

stochastic gradient: O(n) complexity

∇θ log p(X|θ) ≈ N

n

∑xti∈Xt

∇θ log p(xti |θ)

annealed step-size schedule∞∑t=1

εt =∞ and∞∑t=1

ε2t <∞

no acceptance probability computation

asymptotic convergence10 to the posterior distribution

10M. Welling and Y. W. Teh, “Bayesian learning via stochastic gradient Langevin dynamics,”in Proc. Intl. Conf. Machine Learning, Bellevue, WA, USA, Jun. 2011. 12

Page 42: Scalable MCMC in degree corrected stochastic block modelnetworks.ece.mcgill.ca/.../Soumya...presentation.pdf · Scalable MCMC in degree corrected stochastic block model Soumyasundar

Langevin Monte Carlo

Stochastic gradient Langevin dynamics (SGLD)

complexity in LD: O(N)

stochastic gradient: O(n) complexity

∇θ log p(X|θ) ≈ N

n

∑xti∈Xt

∇θ log p(xti |θ)

annealed step-size schedule∞∑t=1

εt =∞ and∞∑t=1

ε2t <∞

no acceptance probability computation

asymptotic convergence10 to the posterior distribution

10M. Welling and Y. W. Teh, “Bayesian learning via stochastic gradient Langevin dynamics,”in Proc. Intl. Conf. Machine Learning, Bellevue, WA, USA, Jun. 2011. 12

Page 43: Scalable MCMC in degree corrected stochastic block modelnetworks.ece.mcgill.ca/.../Soumya...presentation.pdf · Scalable MCMC in degree corrected stochastic block model Soumyasundar

Langevin Monte Carlo

Stochastic gradient Langevin dynamics (SGLD)

complexity in LD: O(N)

stochastic gradient: O(n) complexity

∇θ log p(X|θ) ≈ N

n

∑xti∈Xt

∇θ log p(xti |θ)

annealed step-size schedule∞∑t=1

εt =∞ and∞∑t=1

ε2t <∞

no acceptance probability computation

asymptotic convergence10 to the posterior distribution

10M. Welling and Y. W. Teh, “Bayesian learning via stochastic gradient Langevin dynamics,”in Proc. Intl. Conf. Machine Learning, Bellevue, WA, USA, Jun. 2011. 12

Page 44: Scalable MCMC in degree corrected stochastic block modelnetworks.ece.mcgill.ca/.../Soumya...presentation.pdf · Scalable MCMC in degree corrected stochastic block model Soumyasundar

Langevin Monte Carlo

Stochastic gradient Langevin dynamics (SGLD)

complexity in LD: O(N)

stochastic gradient: O(n) complexity

∇θ log p(X|θ) ≈ N

n

∑xti∈Xt

∇θ log p(xti |θ)

annealed step-size schedule∞∑t=1

εt =∞ and∞∑t=1

ε2t <∞

no acceptance probability computation

asymptotic convergence10 to the posterior distribution

10M. Welling and Y. W. Teh, “Bayesian learning via stochastic gradient Langevin dynamics,”in Proc. Intl. Conf. Machine Learning, Bellevue, WA, USA, Jun. 2011. 12

Page 45: Scalable MCMC in degree corrected stochastic block modelnetworks.ece.mcgill.ca/.../Soumya...presentation.pdf · Scalable MCMC in degree corrected stochastic block model Soumyasundar

Numerical Experiments and Results

NETSCIENCE RELATIVITY HEP-TH HEP-PH11

Nodes 1589 5242 9877 12008Edges 2742 14996 25998 118521

held out test set: 10% of the links, same number of non-links

evaluation metrics:

- average perplexity:

perpavg (Ytest|{π(i), q(i), r (i)}Ti=1)

= exp

(−

∑yab∈Ytest

log

{1

T

T∑i=1

p(yab|π(i), q(i), r (i))

}|Ytest|

).

- area under ROC (AUC) for link prediction task

11J. Leskovec, J. Kleinberg, and C. Faloutsos,“Graph evolution: densification and shrinking diameters,”in ACM Trans. Knowl. Discov. Data, vol. 1, no. 1, Mar. 2007

13

Page 46: Scalable MCMC in degree corrected stochastic block modelnetworks.ece.mcgill.ca/.../Soumya...presentation.pdf · Scalable MCMC in degree corrected stochastic block model Soumyasundar

Numerical Experiments and Results

NETSCIENCE RELATIVITY HEP-TH HEP-PH11

Nodes 1589 5242 9877 12008Edges 2742 14996 25998 118521

held out test set: 10% of the links, same number of non-links

evaluation metrics:

- average perplexity:

perpavg (Ytest|{π(i), q(i), r (i)}Ti=1)

= exp

(−

∑yab∈Ytest

log

{1

T

T∑i=1

p(yab|π(i), q(i), r (i))

}|Ytest|

).

- area under ROC (AUC) for link prediction task

11J. Leskovec, J. Kleinberg, and C. Faloutsos,“Graph evolution: densification and shrinking diameters,”in ACM Trans. Knowl. Discov. Data, vol. 1, no. 1, Mar. 2007

13

Page 47: Scalable MCMC in degree corrected stochastic block modelnetworks.ece.mcgill.ca/.../Soumya...presentation.pdf · Scalable MCMC in degree corrected stochastic block model Soumyasundar

Numerical Experiments and Results

NETSCIENCE RELATIVITY HEP-TH HEP-PH11

Nodes 1589 5242 9877 12008Edges 2742 14996 25998 118521

held out test set: 10% of the links, same number of non-links

evaluation metrics:

- average perplexity:

perpavg (Ytest|{π(i), q(i), r (i)}Ti=1)

= exp

(−

∑yab∈Ytest

log

{1

T

T∑i=1

p(yab|π(i), q(i), r (i))

}|Ytest|

).

- area under ROC (AUC) for link prediction task

11J. Leskovec, J. Kleinberg, and C. Faloutsos,“Graph evolution: densification and shrinking diameters,”in ACM Trans. Knowl. Discov. Data, vol. 1, no. 1, Mar. 2007

13

Page 48: Scalable MCMC in degree corrected stochastic block modelnetworks.ece.mcgill.ca/.../Soumya...presentation.pdf · Scalable MCMC in degree corrected stochastic block model Soumyasundar

Convergence of Perplexity

0 1 2 3 4

time (sec) 104

4

5

6

7

8

9

Pe

rple

xity

a-MMSB

MMDCB

Figure: Convergence of perplexity for HEP-PH dataset, K = 50

14

Page 49: Scalable MCMC in degree corrected stochastic block modelnetworks.ece.mcgill.ca/.../Soumya...presentation.pdf · Scalable MCMC in degree corrected stochastic block model Soumyasundar

Comparison of Perplexity at convergence

25 50 75 100

Number of communities (K)

10

11

12

Perp

lexity

(a)

a-MMSB

MMDCB

25 50 75 100

Number of communities (K)

11.5

12

12.5

13

13.5

14

Perp

lexity

(b)

a-MMSB

MMDCB

25 50 75 100

Number of communities (K)

18

20

22

24

26

28

Perp

lexity

(c)

a-MMSB

MMDCB

25 50 75 100

Number of communities (K)

3.5

4

4.5

5

5.5

6

Perp

lexity

(d)

a-MMSB

MMDCB

Figure: (a) NETSCIENCE, (b) RELATIVITY, (c) HEP-TH and (d) HEP-PH

15

Page 50: Scalable MCMC in degree corrected stochastic block modelnetworks.ece.mcgill.ca/.../Soumya...presentation.pdf · Scalable MCMC in degree corrected stochastic block model Soumyasundar

Comparison of AUC at convergence

25 50 75 100

Number of communities (K)

75

80

85

AU

C

(a)

a-MMSB

MMDCB

25 50 75 100

Number of communities (K)

78

80

82

84

86

AU

C

(b)

a-MMSB

MMDCB

25 50 75 100

Number of communities (K)

75

80

85

AU

C

(c)

a-MMSB

MMDCB

25 50 75 100

Number of communities (K)

88

90

92

94

96

AU

C

(d)

a-MMSB

MMDCB

Figure: (a) NETSCIENCE, (b) RELATIVITY, (c) HEP-TH and (d) HEP-PH

16

Page 51: Scalable MCMC in degree corrected stochastic block modelnetworks.ece.mcgill.ca/.../Soumya...presentation.pdf · Scalable MCMC in degree corrected stochastic block model Soumyasundar

Conclusion

MMDCB models the observed graph better than a-MMSB.

SG-MCMC algorithms scale well to large networks.

Future work:

- better generative models

- efficient mini-batch sampling, variance reduction

- more advanced SG-MCMC algorithms

17