estimating and sampling graphs with multidimensional random...

34
Estimating and Sampling Graphs with Multidimensional Random Walks Group 2: Mingyan Zhao, Chengle Zhang, Biao Yin, Yuchen Liu

Upload: others

Post on 02-Oct-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Estimating and Sampling Graphs with Multidimensional Random …users.wpi.edu/.../slides/Group2_Presentation.pptx.pdf · 2016. 10. 7. · Estimating and Sampling Graphs with Multidimensional

Estimating and Sampling Graphs with Multidimensional Random Walks

Group 2: Mingyan Zhao, Chengle Zhang, Biao Yin, Yuchen Liu

1

Page 2: Estimating and Sampling Graphs with Multidimensional Random …users.wpi.edu/.../slides/Group2_Presentation.pptx.pdf · 2016. 10. 7. · Estimating and Sampling Graphs with Multidimensional

Motivation ´  Complex Network

Social Network

Reference: www.forbe.com imdevsoftware.wordpress.com

Biological Network

2

Page 3: Estimating and Sampling Graphs with Multidimensional Random …users.wpi.edu/.../slides/Group2_Presentation.pptx.pdf · 2016. 10. 7. · Estimating and Sampling Graphs with Multidimensional

Existing Approaches

´ Random vertex sampling

´ Random edge sampling

´ Random walks 3

Page 4: Estimating and Sampling Graphs with Multidimensional Random …users.wpi.edu/.../slides/Group2_Presentation.pptx.pdf · 2016. 10. 7. · Estimating and Sampling Graphs with Multidimensional

Frontier Sampling

´ a new m-dimensional random walk that uses m

dependent random walkers.

4

Page 5: Estimating and Sampling Graphs with Multidimensional Random …users.wpi.edu/.../slides/Group2_Presentation.pptx.pdf · 2016. 10. 7. · Estimating and Sampling Graphs with Multidimensional

Contribution

´  Mitigates the large estimation errors caused by disconnected or

loosely connected components.

´  Shows that the tail of the degree distribution is better estimated

using random edge sampling than random vertex sampling.

´  Presents asymptotically unbiased estimators 5

Page 6: Estimating and Sampling Graphs with Multidimensional Random …users.wpi.edu/.../slides/Group2_Presentation.pptx.pdf · 2016. 10. 7. · Estimating and Sampling Graphs with Multidimensional

Definitions Notation Meaning

Gd (V, Ed) A labeled directed graph representing the (original) network graph, where V is a set of vertices and Ed is a set of edges

(u, v) A connection from u to v (a.k.a. edges)

Lv and Le Finite set of vertex and edge labels,

Le(u,v)= ∅ Edge (u, v) is unlabeled

Lv(v)= ∅ Vertex v is unlabeled 6

Page 7: Estimating and Sampling Graphs with Multidimensional Random …users.wpi.edu/.../slides/Group2_Presentation.pptx.pdf · 2016. 10. 7. · Estimating and Sampling Graphs with Multidimensional

Vertex V.S. Edge Sampling

7

Page 8: Estimating and Sampling Graphs with Multidimensional Random …users.wpi.edu/.../slides/Group2_Presentation.pptx.pdf · 2016. 10. 7. · Estimating and Sampling Graphs with Multidimensional

8

Section 4

1.  Mathematical theories and conductions on Random Walk Sampling 2.  Strong Law of Large Numbers 3.  Four estimators will be applied in Section 5 4.  Deficiency of RW 5.  Multiple Independent Random Walkers

Page 9: Estimating and Sampling Graphs with Multidimensional Random …users.wpi.edu/.../slides/Group2_Presentation.pptx.pdf · 2016. 10. 7. · Estimating and Sampling Graphs with Multidimensional

9

Strong Law of Large Numbers

Original statistical Thm:

Weak law:

B Number of RW steps

B*(B) Number of edges in E*

E* Total RW Sampled Edges

Page 10: Estimating and Sampling Graphs with Multidimensional Random …users.wpi.edu/.../slides/Group2_Presentation.pptx.pdf · 2016. 10. 7. · Estimating and Sampling Graphs with Multidimensional

10

First, label edges of interest:

So, the probability of the labelled edges

The estimator based on SLLN

Estimator 1: Edge Label Density

Page 11: Estimating and Sampling Graphs with Multidimensional Random …users.wpi.edu/.../slides/Group2_Presentation.pptx.pdf · 2016. 10. 7. · Estimating and Sampling Graphs with Multidimensional

11

Estimator 2: Assortative Mixing Coefficient (AMC)

Considering directed G, an asymptotically unbiased estimator of AMC:

Covariance

Which two are highly correlated?

Page 12: Estimating and Sampling Graphs with Multidimensional Random …users.wpi.edu/.../slides/Group2_Presentation.pptx.pdf · 2016. 10. 7. · Estimating and Sampling Graphs with Multidimensional

12

Estimator 3:Vertex label Density

Construct an asymptotically unbiased estimator:

Since:

So, this estimator converges to:

Page 13: Estimating and Sampling Graphs with Multidimensional Random …users.wpi.edu/.../slides/Group2_Presentation.pptx.pdf · 2016. 10. 7. · Estimating and Sampling Graphs with Multidimensional

13

Estimator 4: Global Clustering Coefficient

The unbiased estimator by SLLN

Since:

Page 14: Estimating and Sampling Graphs with Multidimensional Random …users.wpi.edu/.../slides/Group2_Presentation.pptx.pdf · 2016. 10. 7. · Estimating and Sampling Graphs with Multidimensional

14

Deficiency of RW from one point

1. “Trapped” inside a subgraph (MSE)

2. Start from non-stationary (non-steady) state (MSE, Bias)

Burn-in period: Discard the non-stationary samples

1. Just decrease error with non-stationary one

2. Discarding in a small sample is not ideal

Multiple Independent Random Walkers come!

Page 15: Estimating and Sampling Graphs with Multidimensional Random …users.wpi.edu/.../slides/Group2_Presentation.pptx.pdf · 2016. 10. 7. · Estimating and Sampling Graphs with Multidimensional

15

Single RW and Multiple RW

Single RW is depend on sample sizes from the estimators provided. Mutiple RW will split the sample sizes into each path. So, error in the total CNMSE

Log-log plot!

Still very high

Page 16: Estimating and Sampling Graphs with Multidimensional Random …users.wpi.edu/.../slides/Group2_Presentation.pptx.pdf · 2016. 10. 7. · Estimating and Sampling Graphs with Multidimensional

16

Why not M-independent RW ?

A

B

MIRW is hard to sample m independent vertex with p proportional to their degrees.

Page 17: Estimating and Sampling Graphs with Multidimensional Random …users.wpi.edu/.../slides/Group2_Presentation.pptx.pdf · 2016. 10. 7. · Estimating and Sampling Graphs with Multidimensional

Section 5 Motivation

´ We want an m-dimensional random walk that, in steady state, samples edges uniformly at random but, unlike MultipleRW, can benefit from starting its walkers at uniformly sampled vertices.

17

Page 18: Estimating and Sampling Graphs with Multidimensional Random …users.wpi.edu/.../slides/Group2_Presentation.pptx.pdf · 2016. 10. 7. · Estimating and Sampling Graphs with Multidimensional

Frontier Sampling

p = deg(𝑢)/∑𝐿↑▒deg(𝑣)  *1/deg(𝑢) =

18

Page 19: Estimating and Sampling Graphs with Multidimensional Random …users.wpi.edu/.../slides/Group2_Presentation.pptx.pdf · 2016. 10. 7. · Estimating and Sampling Graphs with Multidimensional

Frontier Sampling: A m-dimensional Random Walk

´ The frontier sampling process is equivalent to the sampling process of a single random walker over 𝐺↑𝑚 . (Lemma 5.1) ´ P(selecting a vertex and its outgoing edge in FS) = P(randomly

sampling an edge from e(𝐿↓𝑛 ) in single random walker over 𝐺↑𝑚 ).

19

Page 20: Estimating and Sampling Graphs with Multidimensional Random …users.wpi.edu/.../slides/Group2_Presentation.pptx.pdf · 2016. 10. 7. · Estimating and Sampling Graphs with Multidimensional

FS Steady State v.s Uniform Distribution

´  𝐾↓𝑓𝑠 (𝑚) be a random variable that denotes the number of random walkers in 𝑉↓𝐴  in steady state.

´  Let 𝐾↓𝑢𝑛 (𝑚) be a random variable that denotes the number of sampled vertices, out of m uniformly sampled vertices from V, that belong to 𝑉↓𝐴 .

´  Proving this to be true indicates that the FS algorithm starting with m random walkers at m uniformly sampled vertices approaches the steady state distribution. This means FS benefits from starting its walkers at uniformly sampled vertices by reducing transient of RW.

20

Page 21: Estimating and Sampling Graphs with Multidimensional Random …users.wpi.edu/.../slides/Group2_Presentation.pptx.pdf · 2016. 10. 7. · Estimating and Sampling Graphs with Multidimensional

FS Steady State V.S. Uniform Distribution

´  By definition

´  Theorem 5.2

´  Lemma 5.3

´  Theorem 5.4

21

Page 22: Estimating and Sampling Graphs with Multidimensional Random …users.wpi.edu/.../slides/Group2_Presentation.pptx.pdf · 2016. 10. 7. · Estimating and Sampling Graphs with Multidimensional

MultipleRW Steady State V.S. Uniform Distribution

´  𝐾↓𝑚𝑤 (𝑚) be a random variable that denotes the steady state number of MultipleRW random walkers in 𝑉↓𝐴 .

Note: 𝑑↓𝐴  (average degree of vertices in 𝑉↓𝐴 ) Conclusion: If we initialize m random walkers with uniformly sampled vertices, FS starts closer to steady state than MultipleRW.

22

Page 23: Estimating and Sampling Graphs with Multidimensional Random …users.wpi.edu/.../slides/Group2_Presentation.pptx.pdf · 2016. 10. 7. · Estimating and Sampling Graphs with Multidimensional

Distributed Frontier Sampling

´  Frontier Sampling can also be parallelized.

´  A MultipleRW sampling process where the cost of sampling a vertex v is an exponentially distributed random variable with parameter deg(v) is equivalent to a FS process. (Theorem 5)

Page 24: Estimating and Sampling Graphs with Multidimensional Random …users.wpi.edu/.../slides/Group2_Presentation.pptx.pdf · 2016. 10. 7. · Estimating and Sampling Graphs with Multidimensional

Experiment and Result

´ Data: ´ “Flickr”, “Livejournal”, and “YouTube” ´ Barabási-Albert [5] graph

´ Goal: ´ Compare FS to MultipleRW, SingleRM ´ Compare FS on random vertex and edge sampling

´ Result: FS is constantly more accurate

24

Page 25: Estimating and Sampling Graphs with Multidimensional Random …users.wpi.edu/.../slides/Group2_Presentation.pptx.pdf · 2016. 10. 7. · Estimating and Sampling Graphs with Multidimensional

Assortative Mixing Coefficient

25

Page 26: Estimating and Sampling Graphs with Multidimensional Random …users.wpi.edu/.../slides/Group2_Presentation.pptx.pdf · 2016. 10. 7. · Estimating and Sampling Graphs with Multidimensional

In-degree Distribution Estimates

26

Page 27: Estimating and Sampling Graphs with Multidimensional Random …users.wpi.edu/.../slides/Group2_Presentation.pptx.pdf · 2016. 10. 7. · Estimating and Sampling Graphs with Multidimensional

Out-degree Distribution Estimates

27

Page 28: Estimating and Sampling Graphs with Multidimensional Random …users.wpi.edu/.../slides/Group2_Presentation.pptx.pdf · 2016. 10. 7. · Estimating and Sampling Graphs with Multidimensional

In-degree Distribution loosely connected components

´  Barabási-Albert Graph

28

Page 29: Estimating and Sampling Graphs with Multidimensional Random …users.wpi.edu/.../slides/Group2_Presentation.pptx.pdf · 2016. 10. 7. · Estimating and Sampling Graphs with Multidimensional

FS V.S. Stationary MultipleRW & SingleRW

´  MultipleRW and SingleRW start with steady state

29

Page 30: Estimating and Sampling Graphs with Multidimensional Random …users.wpi.edu/.../slides/Group2_Presentation.pptx.pdf · 2016. 10. 7. · Estimating and Sampling Graphs with Multidimensional

FS V.S. Random Independent Sampling

30

Page 31: Estimating and Sampling Graphs with Multidimensional Random …users.wpi.edu/.../slides/Group2_Presentation.pptx.pdf · 2016. 10. 7. · Estimating and Sampling Graphs with Multidimensional

Density of Special Interest Group

31

Page 32: Estimating and Sampling Graphs with Multidimensional Random …users.wpi.edu/.../slides/Group2_Presentation.pptx.pdf · 2016. 10. 7. · Estimating and Sampling Graphs with Multidimensional

Global Clustering Coefficient Estimates

´    Global Clustering Coefficient

a measure of the degree to which nodes in a graph tend to cluster together.

32

Page 33: Estimating and Sampling Graphs with Multidimensional Random …users.wpi.edu/.../slides/Group2_Presentation.pptx.pdf · 2016. 10. 7. · Estimating and Sampling Graphs with Multidimensional

Conclusion

´ In almost all of the tests, FS is better.

Future Work • estimating characteristics of dynamic networks • design of new MCMC-based approximation algorithms

33

Page 34: Estimating and Sampling Graphs with Multidimensional Random …users.wpi.edu/.../slides/Group2_Presentation.pptx.pdf · 2016. 10. 7. · Estimating and Sampling Graphs with Multidimensional

Thank you!

34