graph cluster randomization

Graph Cluster Randomization: Network Exposure to Multiple Universes

Authors:

Johan Ugander, Cornell University

Brian Karrer, Facebook

Lars Backstrom, Facebook

Jon Kleinberg, Cornell University

Presented by:

Subhashis Hazarika,

Ohio State University

Motivation

• To estimate “average effect” of a treatment on a sample when the treatment of individuals in the sample spills over to the neighboring individuals via an underlying social network.

• A/B testing is so far the standard approach for “average effect” estimation of a treatment on sample population.

• But A/B testing doesn’t take into account the social interference of the sample being treated.

17-10-2013 2

A/B testing

• Assumption : SUTVA (single unit treatment value assumption)

• Universe A and Universe B are treated as two separate parallel universes.

New page A

• Treatment group

• Individuals respond independently

Default page B

• Control group

• Independent response

17-10-2013 3

Proposed Solution

Graph Cluster Randomization

– Formulate Average Treatment and Network Exposure w.r.t graph-theoretic conditions

– Apply graph cluster randomization algorithms on the formulated model

– Come up with an unbiased estimator i.e; Horvitz-Thompson estimator, with an upper bound on the estimator variance that is linear in the degrees of the graph.

17-10-2013 4

Average Treatment

• Given by Aronow and Samii equation without taking into consideration SUTVA.

• Let be the treatment assignment vector.

• Let be the potential outcome of user i under the treatment assignment vector z .

• Then the avg. treatment effect is given by:

17-10-2013 5

Network Exposure

• User i is “network exposed to a treatment” (with assignment vector say z) if i’s response under z is same as i’s response in the assignment vector 1.

• So there can be the following exposure (or conditions )for the experiment: o Full exposure

o Absolute k exposure

o Fractional q exposure

17-10-2013 6

Graph Cluster Randomization

• At a high level GCR is a technique in which the graph is partitioned into clusters and then randomization between treatment and control is performed at cluster level.

• We just need to know about the intersection of the set of clusters with the local graph structure near the vertex.

17-10-2013 7

Exposure Models

• Exposure Condition of an individual determines how they experience the intervention in full conjunction with how the world experiences the intervention.

• Let be the set of all assignment vector z for which i experiences outcome x. which is basically the exposure condition for i.

• Exposure Model for user i is a set of exposure conditions that completely partitions the possible assignment vectors z.

• Here we are interested only with and .

17-10-2013 8

Exposure Conditions

• Neighborhood Exposure( local exposure conditions ): Full neighborhood exposure

Absolute k- neighborhood exposure

Fractional q- neighborhood exposure

• Core Exposure(global dependency): Component exposure

Absolute k-core exposure

Fractional q-core exposure

Note:: assignment vectors of core exposure are entirely contained in the associated neighborhood exposure.

17-10-2013 9

Randomization and Estimation

Select assignment vector z at random from Z in the range of .

is distribution of Z.

is probability of network exposure to treatment.

Therefore avg. treatment effect is given by Horvitz-Thompson estimator,

The expectation over Z gives the actual avg. treatment effect.

17-10-2013 10

Exposure Probabilities

Model : Full neighborhood exposure + independent vertex randomization

– Probability of exposure to treatment will be

– Probability of exposure to control will be

– Exposure prob. for high degree vertex will be exponentially small in di and this will dramatically increase the variance of HT estimator.

17-10-2013 11


For absolute and fractional neighborhood models we have the following probabilities.

17-10-2013 12


• This model has an upper bound given by .

• This also gives an upper bound on the core exposure probabilities, given by the following proposition.

17-10-2013 13

Estimator Variance

The variance of effect estimator is given by:

Final variance:

17-10-2013 14

Estimator Variance

Final co-variance:

17-10-2013 15

Estimator Variance

• Thus we achieve O(1/n) bound on variance but only when the maximum degree is bounded.

• Variance can grow exponentially with the degree.

• Hence they try to introduce a condition on the graph clustering such that the degree remain bounded and we still have the variance growth.

17-10-2013 16

Restricted-Growth Graph

• Let Br(v) be the set of vertices within r hops of a vertex v.

17-10-2013 17

Variance in Restricted-Growth Graph

• Consider single cycle (k=1) graph of n vertices with basic cluster size c=2

• For c = 2

• For c >= 2

17-10-2013 18

Variance in Restricted-Growth Graph

17-10-2013 19

Clustering Restricted-Growth Graph

• Using 3-net for the shortest path metric of graph G.

Initially all vertices are unmarked.

While there are unmarked vertices, in step j find an arbitrary unmarked vertex v, selecting v to be vertex vj and marking all vertices in B2(vj).

Suppose k such vertices are defined and let S = {v1,v2,…..vk}

For every vertex w of G assign w to the closest vertex vi belonging to S, breaking ties consistently.

For every vj, let Cj be the set of all vertices assigned to vj.

17-10-2013 20

Variance Bounds

17-10-2013 21

Thank You

17-10-2013 22

graph cluster randomization

Education

following exposure

neighborhood exposure

set of exposure conditions

network exposure w

exposure o absolute

network exposure user

component exposure absolute

average treatment