algebraic statistics of p1 network models: markov bases ...risi/aml08/nips-algstats-p1-steve.pdf1...

68
Algebraic Statistics of p 1 Network Models: Markov Bases and Their Uses Stephen E. Fienberg Department of Statistics and Machine Learning Department Carnegie Mellon University Joint work with: Alessandro Rinaldo, Carnegie Mellon University and Sonja Petrovi´ c, University of Illinois at Chicago Algebraic Methods in Machine Learning NIPS Workshop December 2008 December 11, 2008 Stephen E. Fienberg (CMU) Markov Bases of p 1 Models December 11, 2008 1 / 28

Upload: others

Post on 25-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Algebraic Statistics of p1 Network Models: Markov Bases ...risi/AML08/NIPS-AlgStats-p1-Steve.pdf1 model, log-linear model, and algebraic statistics: Using algebra-geometry structure

Algebraic Statistics of p1 Network Models:Markov Bases and Their Uses

Stephen E. Fienberg

Department of Statistics and Machine Learning DepartmentCarnegie Mellon University

Joint work with:Alessandro Rinaldo, Carnegie Mellon University

and Sonja Petrovic, University of Illinois at Chicago

Algebraic Methods in Machine LearningNIPS Workshop December 2008

December 11, 2008

Stephen E. Fienberg (CMU) Markov Bases of p1 Models December 11, 2008 1 / 28

Page 2: Algebraic Statistics of p1 Network Models: Markov Bases ...risi/AML08/NIPS-AlgStats-p1-Steve.pdf1 model, log-linear model, and algebraic statistics: Using algebra-geometry structure

Introduction Outline

Context

Networks are ubiquitous in science

Network graphs 6= graphical model graphs

Statistical models for networks:

New statistical problems (standard asymptotics do not hold—how dowe get approximate normality?)

p1 model, log-linear model, and algebraic statistics:

Using algebra-geometry structure to “understand” a family of networkmodels and frame a related set of statistical estimation and inferenceissues.

Stephen E. Fienberg (CMU) Markov Bases of p1 Models December 11, 2008 2 / 28

Page 3: Algebraic Statistics of p1 Network Models: Markov Bases ...risi/AML08/NIPS-AlgStats-p1-Steve.pdf1 model, log-linear model, and algebraic statistics: Using algebra-geometry structure

Introduction Outline

Context

Networks are ubiquitous in science

Network graphs 6= graphical model graphs

Statistical models for networks:

New statistical problems (standard asymptotics do not hold—how dowe get approximate normality?)

p1 model, log-linear model, and algebraic statistics:

Using algebra-geometry structure to “understand” a family of networkmodels and frame a related set of statistical estimation and inferenceissues.

Stephen E. Fienberg (CMU) Markov Bases of p1 Models December 11, 2008 2 / 28

Page 4: Algebraic Statistics of p1 Network Models: Markov Bases ...risi/AML08/NIPS-AlgStats-p1-Steve.pdf1 model, log-linear model, and algebraic statistics: Using algebra-geometry structure

Introduction Outline

Context

Networks are ubiquitous in science

Network graphs 6= graphical model graphs

Statistical models for networks:

New statistical problems (standard asymptotics do not hold—how dowe get approximate normality?)

p1 model, log-linear model, and algebraic statistics:

Using algebra-geometry structure to “understand” a family of networkmodels and frame a related set of statistical estimation and inferenceissues.

Stephen E. Fienberg (CMU) Markov Bases of p1 Models December 11, 2008 2 / 28

Page 5: Algebraic Statistics of p1 Network Models: Markov Bases ...risi/AML08/NIPS-AlgStats-p1-Steve.pdf1 model, log-linear model, and algebraic statistics: Using algebra-geometry structure

Introduction Outline

Context

Networks are ubiquitous in science

Network graphs 6= graphical model graphs

Statistical models for networks:

New statistical problems (standard asymptotics do not hold—how dowe get approximate normality?)

p1 model, log-linear model, and algebraic statistics:

Using algebra-geometry structure to “understand” a family of networkmodels and frame a related set of statistical estimation and inferenceissues.

Stephen E. Fienberg (CMU) Markov Bases of p1 Models December 11, 2008 2 / 28

Page 6: Algebraic Statistics of p1 Network Models: Markov Bases ...risi/AML08/NIPS-AlgStats-p1-Steve.pdf1 model, log-linear model, and algebraic statistics: Using algebra-geometry structure

Introduction Outline

Context

Networks are ubiquitous in science

Network graphs 6= graphical model graphs

Statistical models for networks:

New statistical problems (standard asymptotics do not hold—how dowe get approximate normality?)

p1 model, log-linear model, and algebraic statistics:

Using algebra-geometry structure to “understand” a family of networkmodels and frame a related set of statistical estimation and inferenceissues.

Stephen E. Fienberg (CMU) Markov Bases of p1 Models December 11, 2008 2 / 28

Page 7: Algebraic Statistics of p1 Network Models: Markov Bases ...risi/AML08/NIPS-AlgStats-p1-Steve.pdf1 model, log-linear model, and algebraic statistics: Using algebra-geometry structure

Introduction Outline

Context

Networks are ubiquitous in science

Network graphs 6= graphical model graphs

Statistical models for networks:

New statistical problems (standard asymptotics do not hold—how dowe get approximate normality?)

p1 model, log-linear model, and algebraic statistics:

Using algebra-geometry structure to “understand” a family of networkmodels and frame a related set of statistical estimation and inferenceissues.

Stephen E. Fienberg (CMU) Markov Bases of p1 Models December 11, 2008 2 / 28

Page 8: Algebraic Statistics of p1 Network Models: Markov Bases ...risi/AML08/NIPS-AlgStats-p1-Steve.pdf1 model, log-linear model, and algebraic statistics: Using algebra-geometry structure

Introduction Outline

Context

Networks are ubiquitous in science

Network graphs 6= graphical model graphs

Statistical models for networks:

New statistical problems (standard asymptotics do not hold—how dowe get approximate normality?)

p1 model, log-linear model, and algebraic statistics:

Using algebra-geometry structure to “understand” a family of networkmodels and frame a related set of statistical estimation and inferenceissues.

Stephen E. Fienberg (CMU) Markov Bases of p1 Models December 11, 2008 2 / 28

Page 9: Algebraic Statistics of p1 Network Models: Markov Bases ...risi/AML08/NIPS-AlgStats-p1-Steve.pdf1 model, log-linear model, and algebraic statistics: Using algebra-geometry structure

Introduction Background

General framework for networks evolving over time

Network represented by a graph Gt at time t

Nodes and edges can be created and can die

Can have multiple relationships; attributes for nodes and/or edges

Edges can be directed or undirected

Data available to be observed at time t = t0

Underlying stochastic process that describes the network structureand evolution.

Come to my talk on dynamic network modeling this afternoon inGrpahical models Workshop!

Stephen E. Fienberg (CMU) Markov Bases of p1 Models December 11, 2008 3 / 28

Page 10: Algebraic Statistics of p1 Network Models: Markov Bases ...risi/AML08/NIPS-AlgStats-p1-Steve.pdf1 model, log-linear model, and algebraic statistics: Using algebra-geometry structure

Introduction Background

General framework for networks evolving over time

Network represented by a graph Gt at time t

Nodes and edges can be created and can die

Can have multiple relationships; attributes for nodes and/or edges

Edges can be directed or undirected

Data available to be observed at time t = t0

Underlying stochastic process that describes the network structureand evolution.

Come to my talk on dynamic network modeling this afternoon inGrpahical models Workshop!

Stephen E. Fienberg (CMU) Markov Bases of p1 Models December 11, 2008 3 / 28

Page 11: Algebraic Statistics of p1 Network Models: Markov Bases ...risi/AML08/NIPS-AlgStats-p1-Steve.pdf1 model, log-linear model, and algebraic statistics: Using algebra-geometry structure

Introduction Background

General framework for networks evolving over time

Network represented by a graph Gt at time t

Nodes and edges can be created and can die

Can have multiple relationships; attributes for nodes and/or edges

Edges can be directed or undirected

Data available to be observed at time t = t0

Underlying stochastic process that describes the network structureand evolution.

Come to my talk on dynamic network modeling this afternoon inGrpahical models Workshop!

Stephen E. Fienberg (CMU) Markov Bases of p1 Models December 11, 2008 3 / 28

Page 12: Algebraic Statistics of p1 Network Models: Markov Bases ...risi/AML08/NIPS-AlgStats-p1-Steve.pdf1 model, log-linear model, and algebraic statistics: Using algebra-geometry structure

Introduction Background

General framework for networks evolving over time

Network represented by a graph Gt at time t

Nodes and edges can be created and can die

Can have multiple relationships; attributes for nodes and/or edges

Edges can be directed or undirected

Data available to be observed at time t = t0

Underlying stochastic process that describes the network structureand evolution.

Come to my talk on dynamic network modeling this afternoon inGrpahical models Workshop!

Stephen E. Fienberg (CMU) Markov Bases of p1 Models December 11, 2008 3 / 28

Page 13: Algebraic Statistics of p1 Network Models: Markov Bases ...risi/AML08/NIPS-AlgStats-p1-Steve.pdf1 model, log-linear model, and algebraic statistics: Using algebra-geometry structure

Introduction Background

General framework for networks evolving over time

Network represented by a graph Gt at time t

Nodes and edges can be created and can die

Can have multiple relationships; attributes for nodes and/or edges

Edges can be directed or undirected

Data available to be observed at time t = t0

Underlying stochastic process that describes the network structureand evolution.

Come to my talk on dynamic network modeling this afternoon inGrpahical models Workshop!

Stephen E. Fienberg (CMU) Markov Bases of p1 Models December 11, 2008 3 / 28

Page 14: Algebraic Statistics of p1 Network Models: Markov Bases ...risi/AML08/NIPS-AlgStats-p1-Steve.pdf1 model, log-linear model, and algebraic statistics: Using algebra-geometry structure

Introduction Background

General framework for networks evolving over time

Network represented by a graph Gt at time t

Nodes and edges can be created and can die

Can have multiple relationships; attributes for nodes and/or edges

Edges can be directed or undirected

Data available to be observed at time t = t0

Underlying stochastic process that describes the network structureand evolution.

Come to my talk on dynamic network modeling this afternoon inGrpahical models Workshop!

Stephen E. Fienberg (CMU) Markov Bases of p1 Models December 11, 2008 3 / 28

Page 15: Algebraic Statistics of p1 Network Models: Markov Bases ...risi/AML08/NIPS-AlgStats-p1-Steve.pdf1 model, log-linear model, and algebraic statistics: Using algebra-geometry structure

Introduction Background

Example: The Framingham “Obesity” Study

Original Framingham “sample” cohort begun in 1947.

Offspring cohort of N0 = 5, 124 individuals measured beginning in1971 for T = 7 epochs centered at 1971, 1981, 1985, 1989, 1992,1997, 1999.

Link information on family members and one close friend.

Total number of individuals on whom we have obesity measures isN=12,067.

Details in Christakis and Fowler, NEJM, July 2007.

Stephen E. Fienberg (CMU) Markov Bases of p1 Models December 11, 2008 4 / 28

Page 16: Algebraic Statistics of p1 Network Models: Markov Bases ...risi/AML08/NIPS-AlgStats-p1-Steve.pdf1 model, log-linear model, and algebraic statistics: Using algebra-geometry structure

Introduction Background

Example: The Framingham “Obesity” Study

Original Framingham “sample” cohort begun in 1947.

Offspring cohort of N0 = 5, 124 individuals measured beginning in1971 for T = 7 epochs centered at 1971, 1981, 1985, 1989, 1992,1997, 1999.

Link information on family members and one close friend.

Total number of individuals on whom we have obesity measures isN=12,067.

Details in Christakis and Fowler, NEJM, July 2007.

Stephen E. Fienberg (CMU) Markov Bases of p1 Models December 11, 2008 5 / 28

Page 17: Algebraic Statistics of p1 Network Models: Markov Bases ...risi/AML08/NIPS-AlgStats-p1-Steve.pdf1 model, log-linear model, and algebraic statistics: Using algebra-geometry structure

Introduction Background

Example: The Framingham “Obesity” Study

Stephen E. Fienberg (CMU) Markov Bases of p1 Models December 11, 2008 6 / 28

Page 18: Algebraic Statistics of p1 Network Models: Markov Bases ...risi/AML08/NIPS-AlgStats-p1-Steve.pdf1 model, log-linear model, and algebraic statistics: Using algebra-geometry structure

Introduction Background

Example -The Collective Dynamics of Smoking in a LargeSocial Network (J.Fowler)

Stephen E. Fienberg (CMU) Markov Bases of p1 Models December 11, 2008 7 / 28

Page 19: Algebraic Statistics of p1 Network Models: Markov Bases ...risi/AML08/NIPS-AlgStats-p1-Steve.pdf1 model, log-linear model, and algebraic statistics: Using algebra-geometry structure

Introduction Background

Example: Monks in a Monastery

18 novices observed over two years.

Network data gather at 4 time points; and on multiple relationships..

See analyses in Airoldi, et al., (2008)

Stephen E. Fienberg (CMU) Markov Bases of p1 Models December 11, 2008 8 / 28

Page 20: Algebraic Statistics of p1 Network Models: Markov Bases ...risi/AML08/NIPS-AlgStats-p1-Steve.pdf1 model, log-linear model, and algebraic statistics: Using algebra-geometry structure

The p1 model History

Holland and Leinhardt’s p1 model

n nodes, random occurrence of directed edges.

Describe the probability of an edge occurring between nodes i and j :

log Prob(no edge) = log Pij(0, 0) = λij

log Prob(from i to j) = log Pij(1, 0) = λij + αi + βj + θ

log Prob(from j to i) = log Pij(0, 1) = λij + αj + βi + θ

log Prob(bi-directed edge) = log Pij(1, 1) = λij +αi +βj +αj +βi +2θ+ρij

3 common forms:

ρij = 0 (no reciprocal effect)ρij = ρ (constant reciprocation factor)ρij = ρ+ ρi + ρj (edge-dependent reciprocation)

Stephen E. Fienberg (CMU) Markov Bases of p1 Models December 11, 2008 9 / 28

Page 21: Algebraic Statistics of p1 Network Models: Markov Bases ...risi/AML08/NIPS-AlgStats-p1-Steve.pdf1 model, log-linear model, and algebraic statistics: Using algebra-geometry structure

The p1 model History

Holland and Leinhardt’s p1 model

n nodes, random occurrence of directed edges.

Describe the probability of an edge occurring between nodes i and j :

log Prob(no edge) = log Pij(0, 0) = λij

log Prob(from i to j) = log Pij(1, 0) = λij + αi + βj + θ

log Prob(from j to i) = log Pij(0, 1) = λij + αj + βi + θ

log Prob(bi-directed edge) = log Pij(1, 1) = λij +αi +βj +αj +βi +2θ+ρij

3 common forms:

ρij = 0 (no reciprocal effect)ρij = ρ (constant reciprocation factor)ρij = ρ+ ρi + ρj (edge-dependent reciprocation)

Stephen E. Fienberg (CMU) Markov Bases of p1 Models December 11, 2008 9 / 28

Page 22: Algebraic Statistics of p1 Network Models: Markov Bases ...risi/AML08/NIPS-AlgStats-p1-Steve.pdf1 model, log-linear model, and algebraic statistics: Using algebra-geometry structure

The p1 model History

Holland and Leinhardt’s p1 model

n nodes, random occurrence of directed edges.

Describe the probability of an edge occurring between nodes i and j :

log Prob(no edge) = log Pij(0, 0) = λij

log Prob(from i to j) = log Pij(1, 0) = λij + αi + βj + θ

log Prob(from j to i) = log Pij(0, 1) = λij + αj + βi + θ

log Prob(bi-directed edge) = log Pij(1, 1) = λij +αi +βj +αj +βi +2θ+ρij

3 common forms:

ρij = 0 (no reciprocal effect)ρij = ρ (constant reciprocation factor)ρij = ρ+ ρi + ρj (edge-dependent reciprocation)

Stephen E. Fienberg (CMU) Markov Bases of p1 Models December 11, 2008 9 / 28

Page 23: Algebraic Statistics of p1 Network Models: Markov Bases ...risi/AML08/NIPS-AlgStats-p1-Steve.pdf1 model, log-linear model, and algebraic statistics: Using algebra-geometry structure

The p1 model History

Holland and Leinhardt’s p1 model

n nodes, random occurrence of directed edges.

Describe the probability of an edge occurring between nodes i and j :

log Prob(no edge) = log Pij(0, 0) = λij

log Prob(from i to j) = log Pij(1, 0) = λij + αi + βj + θ

log Prob(from j to i) = log Pij(0, 1) = λij + αj + βi + θ

log Prob(bi-directed edge) = log Pij(1, 1) = λij +αi +βj +αj +βi +2θ+ρij

3 common forms:

ρij = 0 (no reciprocal effect)ρij = ρ (constant reciprocation factor)ρij = ρ+ ρi + ρj (edge-dependent reciprocation)

Stephen E. Fienberg (CMU) Markov Bases of p1 Models December 11, 2008 9 / 28

Page 24: Algebraic Statistics of p1 Network Models: Markov Bases ...risi/AML08/NIPS-AlgStats-p1-Steve.pdf1 model, log-linear model, and algebraic statistics: Using algebra-geometry structure

The p1 model History

Estimation for p1

The likelihood function for the p1 model is clearly of exponentialfamily form.

For the constant reciprocation version, we have

log p1(x) ∝ x++θ +∑

i

xi+αi +∑

j

x+jβj +∑ij

xijxjiρ (1)

Holland-Leinhardt explored goodness of fit of model empirically.

They compared ρij = 0 vs. ρij = ρ.The problem is that standard asymptotics (normality and chi-sqaretests of fit) are not applicable as the number of parameters increaseswith the number of nodes.How to test ρij = ρ against a more complex model?

Stephen E. Fienberg (CMU) Markov Bases of p1 Models December 11, 2008 10 / 28

Page 25: Algebraic Statistics of p1 Network Models: Markov Bases ...risi/AML08/NIPS-AlgStats-p1-Steve.pdf1 model, log-linear model, and algebraic statistics: Using algebra-geometry structure

The p1 model History

Aside—The Normal Distribution was 275 Years Old onNovember 12, 2008!

Stephen E. Fienberg (CMU) Markov Bases of p1 Models December 11, 2008 11 / 28

Page 26: Algebraic Statistics of p1 Network Models: Markov Bases ...risi/AML08/NIPS-AlgStats-p1-Steve.pdf1 model, log-linear model, and algebraic statistics: Using algebra-geometry structure

The p1 model History

Hotelling and the Normal Distribution

Display courtesy of Stephen Stigler.Stephen E. Fienberg (CMU) Markov Bases of p1 Models December 11, 2008 12 / 28

Page 27: Algebraic Statistics of p1 Network Models: Markov Bases ...risi/AML08/NIPS-AlgStats-p1-Steve.pdf1 model, log-linear model, and algebraic statistics: Using algebra-geometry structure

The p1 model Algebraic Statistics

p1 in Log-linear Form

Probabilities for four situations between two nodes:

Define

yijkl =

{1 if D(xij , xji ) = (k , l),

0 otherwise.

Yields an n × n × 2× 2 4-way array with “zeros” for n × n diagonals.

Fienberg and Wasserman demonstrated that in the 4-way table, thelog-linear model of no second-order interaction corresponds to p1 withconstant reciprocation, i.e., [12][13][14][23][24][34], and that thestandard iterative proportional fitting algorithm.

Edge-dependent reciprocation corresponds to log-linear model[12][134][234].

Stephen E. Fienberg (CMU) Markov Bases of p1 Models December 11, 2008 13 / 28

Page 28: Algebraic Statistics of p1 Network Models: Markov Bases ...risi/AML08/NIPS-AlgStats-p1-Steve.pdf1 model, log-linear model, and algebraic statistics: Using algebra-geometry structure

The p1 model Algebraic Statistics

Primer: Algebraic Representation and Computer AlgebraTools

Log-linear models are represented by polynomial maps and parameterspace is toric variety.

Multinomial likelihood is a monomial.

Fiber consists of all tables with margins t.

Markov basis generates fiber and consists of minimal generators oftoric ideal.

Computer algebra tools:4ti2

We use 4ti2 to generate basis elements (perhaps redundant) for Markovbases for specific values of n. Can use these to compute exactdistribution given the MSSs.

Polymake—Examines fiber for MLE existence.

We use Polymake to explore conditions for MLEs to exist, as inexample for ρij = 0. Affects computation and assessment of fit.

Stephen E. Fienberg (CMU) Markov Bases of p1 Models December 11, 2008 14 / 28

Page 29: Algebraic Statistics of p1 Network Models: Markov Bases ...risi/AML08/NIPS-AlgStats-p1-Steve.pdf1 model, log-linear model, and algebraic statistics: Using algebra-geometry structure

The p1 model Algebraic Statistics

Algebraic Version of p1 in Log-linear Form

For each pair of nodes i and j we have a monomial in the modelparameters:

pij(a, b) 7→ λijαai α

bj β

bi β

aj θ

a+bρmin(a,b)ij for all i < j ∈ {1, . . . , n}

Stephen E. Fienberg (CMU) Markov Bases of p1 Models December 11, 2008 15 / 28

Page 30: Algebraic Statistics of p1 Network Models: Markov Bases ...risi/AML08/NIPS-AlgStats-p1-Steve.pdf1 model, log-linear model, and algebraic statistics: Using algebra-geometry structure

The p1 model Algebraic Statistics

Algebraic Version of p1 in Log-linear Form

For each pair of nodes i and j we have a monomial in the modelparameters:

pij(a, b) 7→ λijαai α

bj β

bi β

aj θ

a+bρmin(a,b)ij for all i < j ∈ {1, . . . , n}

Stephen E. Fienberg (CMU) Markov Bases of p1 Models December 11, 2008 15 / 28

Page 31: Algebraic Statistics of p1 Network Models: Markov Bases ...risi/AML08/NIPS-AlgStats-p1-Steve.pdf1 model, log-linear model, and algebraic statistics: Using algebra-geometry structure

The p1 model Algebraic Statistics

Algebraic Version of p1 in Log-linear Form

For each pair of nodes i and j we have a monomial in the modelparameters:

pij(a, b) 7→ λijαai α

bj β

bi β

aj θ

a+bρmin(a,b)ij for all i < j ∈ {1, . . . , n}

Stephen E. Fienberg (CMU) Markov Bases of p1 Models December 11, 2008 15 / 28

Page 32: Algebraic Statistics of p1 Network Models: Markov Bases ...risi/AML08/NIPS-AlgStats-p1-Steve.pdf1 model, log-linear model, and algebraic statistics: Using algebra-geometry structure

The p1 model Algebraic Statistics

p1 as a toric variety

The monomial map parametrizes a toric variety:

the design matrix has 4(n2

)columns (variables)

and 3(n2

)+ 1, or 3

(n2

)+ 2, or 4

(n2

)+ 2 rows (parameters)

depending on reciprocation

For example, n = 3 and ρij edge-dependent, the variety is a degree-3hypersurface in P11.

Its defining ideal gives Markov basis,

it will connect all networks with the same sufficient statistics.

Stephen E. Fienberg (CMU) Markov Bases of p1 Models December 11, 2008 16 / 28

Page 33: Algebraic Statistics of p1 Network Models: Markov Bases ...risi/AML08/NIPS-AlgStats-p1-Steve.pdf1 model, log-linear model, and algebraic statistics: Using algebra-geometry structure

The p1 model Algebraic Statistics

p1 as a toric variety

The monomial map parametrizes a toric variety:

the design matrix has 4(n2

)columns (variables)

and 3(n2

)+ 1, or 3

(n2

)+ 2, or 4

(n2

)+ 2 rows (parameters)

depending on reciprocation

For example, n = 3 and ρij edge-dependent, the variety is a degree-3hypersurface in P11.

Its defining ideal gives Markov basis,

it will connect all networks with the same sufficient statistics.

Stephen E. Fienberg (CMU) Markov Bases of p1 Models December 11, 2008 16 / 28

Page 34: Algebraic Statistics of p1 Network Models: Markov Bases ...risi/AML08/NIPS-AlgStats-p1-Steve.pdf1 model, log-linear model, and algebraic statistics: Using algebra-geometry structure

The p1 model Algebraic Statistics

p1 as a toric variety

The monomial map parametrizes a toric variety:

the design matrix has 4(n2

)columns (variables)

and 3(n2

)+ 1, or 3

(n2

)+ 2, or 4

(n2

)+ 2 rows (parameters)

depending on reciprocation

For example, n = 3 and ρij edge-dependent, the variety is a degree-3hypersurface in P11.

Its defining ideal gives Markov basis,

it will connect all networks with the same sufficient statistics.

Stephen E. Fienberg (CMU) Markov Bases of p1 Models December 11, 2008 16 / 28

Page 35: Algebraic Statistics of p1 Network Models: Markov Bases ...risi/AML08/NIPS-AlgStats-p1-Steve.pdf1 model, log-linear model, and algebraic statistics: Using algebra-geometry structure

The p1 model Algebraic Statistics

p1 as a toric variety

The monomial map parametrizes a toric variety:

the design matrix has 4(n2

)columns (variables)

and 3(n2

)+ 1, or 3

(n2

)+ 2, or 4

(n2

)+ 2 rows (parameters)

depending on reciprocation

For example, n = 3 and ρij edge-dependent, the variety is a degree-3hypersurface in P11.

Its defining ideal gives Markov basis,

it will connect all networks with the same sufficient statistics.

Stephen E. Fienberg (CMU) Markov Bases of p1 Models December 11, 2008 16 / 28

Page 36: Algebraic Statistics of p1 Network Models: Markov Bases ...risi/AML08/NIPS-AlgStats-p1-Steve.pdf1 model, log-linear model, and algebraic statistics: Using algebra-geometry structure

The p1 model Algebraic Statistics

p1 as a toric variety

The monomial map parametrizes a toric variety:

the design matrix has 4(n2

)columns (variables)

and 3(n2

)+ 1, or 3

(n2

)+ 2, or 4

(n2

)+ 2 rows (parameters)

depending on reciprocation

For example, n = 3 and ρij edge-dependent, the variety is a degree-3hypersurface in P11.

Its defining ideal gives Markov basis,

it will connect all networks with the same sufficient statistics.

Stephen E. Fienberg (CMU) Markov Bases of p1 Models December 11, 2008 16 / 28

Page 37: Algebraic Statistics of p1 Network Models: Markov Bases ...risi/AML08/NIPS-AlgStats-p1-Steve.pdf1 model, log-linear model, and algebraic statistics: Using algebra-geometry structure

The p1 model Algebraic Statistics

p1 as a toric variety

The monomial map parametrizes a toric variety:

the design matrix has 4(n2

)columns (variables)

and 3(n2

)+ 1, or 3

(n2

)+ 2, or 4

(n2

)+ 2 rows (parameters)

depending on reciprocation

For example, n = 3 and ρij edge-dependent, the variety is a degree-3hypersurface in P11.

Its defining ideal gives Markov basis,

it will connect all networks with the same sufficient statistics.

Stephen E. Fienberg (CMU) Markov Bases of p1 Models December 11, 2008 16 / 28

Page 38: Algebraic Statistics of p1 Network Models: Markov Bases ...risi/AML08/NIPS-AlgStats-p1-Steve.pdf1 model, log-linear model, and algebraic statistics: Using algebra-geometry structure

Algebraic Viewpoint Theoretical results

Toric ideal of simplification of p1

By ignoring the normalizing constants λij we get a simplified model

Theorem (FPR)

If ρij = 0, the ideal of the simplified model equals IGn + Tn

where Tn is generated by pij(1, 0)pij(0, 1)− pij(1, 1)and IGn is the toric ideal of the edge subring of Gn := Kn,n\{i , i}.

Theorem (FPR)

If ρij = ρ+ ρi + ρj , the ideal of the simplified model equals IGn + Qn

where IGn is as above,and Qn is the toric ideal of the edge subring of Kn.

Stephen E. Fienberg (CMU) Markov Bases of p1 Models December 11, 2008 17 / 28

Page 39: Algebraic Statistics of p1 Network Models: Markov Bases ...risi/AML08/NIPS-AlgStats-p1-Steve.pdf1 model, log-linear model, and algebraic statistics: Using algebra-geometry structure

Algebraic Viewpoint Theoretical results

Toric ideal of simplification of p1

By ignoring the normalizing constants λij we get a simplified model

Theorem (FPR)

If ρij = 0, the ideal of the simplified model equals IGn + Tn

where Tn is generated by pij(1, 0)pij(0, 1)− pij(1, 1)and IGn is the toric ideal of the edge subring of Gn := Kn,n\{i , i}.

Theorem (FPR)

If ρij = ρ+ ρi + ρj , the ideal of the simplified model equals IGn + Qn

where IGn is as above,and Qn is the toric ideal of the edge subring of Kn.

Stephen E. Fienberg (CMU) Markov Bases of p1 Models December 11, 2008 17 / 28

Page 40: Algebraic Statistics of p1 Network Models: Markov Bases ...risi/AML08/NIPS-AlgStats-p1-Steve.pdf1 model, log-linear model, and algebraic statistics: Using algebra-geometry structure

Algebraic Viewpoint Theoretical results

Toric ideal of simplification of p1

By ignoring the normalizing constants λij we get a simplified model

Theorem (FPR)

If ρij = 0, the ideal of the simplified model equals IGn + Tn

where Tn is generated by pij(1, 0)pij(0, 1)− pij(1, 1)and IGn is the toric ideal of the edge subring of Gn := Kn,n\{i , i}.

Theorem (FPR)

If ρij = ρ+ ρi + ρj , the ideal of the simplified model equals IGn + Qn

where IGn is as above,and Qn is the toric ideal of the edge subring of Kn.

Stephen E. Fienberg (CMU) Markov Bases of p1 Models December 11, 2008 17 / 28

Page 41: Algebraic Statistics of p1 Network Models: Markov Bases ...risi/AML08/NIPS-AlgStats-p1-Steve.pdf1 model, log-linear model, and algebraic statistics: Using algebra-geometry structure

Algebraic Viewpoint Theoretical results

Main Theorem - toric ideal of p1

Upshot: known Graver bases for edge subrings!

We now incorporate λij into the previous theorems:

Theorem (FPR)

The toric ideal of the p1 random graph model is the multi-homogenouspiece of the toric ideal of a simplified model.

By multi-homogeneous, we mean with respect to each pair i , j .

In progress: Markov moves for all three cases of ρij .

Stephen E. Fienberg (CMU) Markov Bases of p1 Models December 11, 2008 18 / 28

Page 42: Algebraic Statistics of p1 Network Models: Markov Bases ...risi/AML08/NIPS-AlgStats-p1-Steve.pdf1 model, log-linear model, and algebraic statistics: Using algebra-geometry structure

Algebraic Viewpoint Theoretical results

Main Theorem - toric ideal of p1

Upshot: known Graver bases for edge subrings!

We now incorporate λij into the previous theorems:

Theorem (FPR)

The toric ideal of the p1 random graph model is the multi-homogenouspiece of the toric ideal of a simplified model.

By multi-homogeneous, we mean with respect to each pair i , j .

In progress: Markov moves for all three cases of ρij .

Stephen E. Fienberg (CMU) Markov Bases of p1 Models December 11, 2008 18 / 28

Page 43: Algebraic Statistics of p1 Network Models: Markov Bases ...risi/AML08/NIPS-AlgStats-p1-Steve.pdf1 model, log-linear model, and algebraic statistics: Using algebra-geometry structure

Algebraic Viewpoint Theoretical results

Main Theorem - toric ideal of p1

Upshot: known Graver bases for edge subrings!

We now incorporate λij into the previous theorems:

Theorem (FPR)

The toric ideal of the p1 random graph model is the multi-homogenouspiece of the toric ideal of a simplified model.

By multi-homogeneous, we mean with respect to each pair i , j .

In progress: Markov moves for all three cases of ρij .

Stephen E. Fienberg (CMU) Markov Bases of p1 Models December 11, 2008 18 / 28

Page 44: Algebraic Statistics of p1 Network Models: Markov Bases ...risi/AML08/NIPS-AlgStats-p1-Steve.pdf1 model, log-linear model, and algebraic statistics: Using algebra-geometry structure

Algebraic Viewpoint Theoretical results

Main Theorem - toric ideal of p1

Upshot: known Graver bases for edge subrings!

We now incorporate λij into the previous theorems:

Theorem (FPR)

The toric ideal of the p1 random graph model is the multi-homogenouspiece of the toric ideal of a simplified model.

By multi-homogeneous, we mean with respect to each pair i , j .

In progress: Markov moves for all three cases of ρij .

Stephen E. Fienberg (CMU) Markov Bases of p1 Models December 11, 2008 18 / 28

Page 45: Algebraic Statistics of p1 Network Models: Markov Bases ...risi/AML08/NIPS-AlgStats-p1-Steve.pdf1 model, log-linear model, and algebraic statistics: Using algebra-geometry structure

Algebraic Viewpoint Example

Example with 4 nodes

Figure: A degree-5 binomial

Figure: the corresponding path in K4,4\{i , i}

Stephen E. Fienberg (CMU) Markov Bases of p1 Models December 11, 2008 19 / 28

Page 46: Algebraic Statistics of p1 Network Models: Markov Bases ...risi/AML08/NIPS-AlgStats-p1-Steve.pdf1 model, log-linear model, and algebraic statistics: Using algebra-geometry structure

Algebraic Viewpoint Example

Example with 4 nodes

Figure: A degree-5 binomial

Figure: the corresponding path in K4,4\{i , i}

Stephen E. Fienberg (CMU) Markov Bases of p1 Models December 11, 2008 19 / 28

Page 47: Algebraic Statistics of p1 Network Models: Markov Bases ...risi/AML08/NIPS-AlgStats-p1-Steve.pdf1 model, log-linear model, and algebraic statistics: Using algebra-geometry structure

Summary of Markov Moves

3-node network

For all 3 cases of ρij , there is only one Markov move:

Figure: A binomial of degree 3

representing the binomial:

p12(1, 0)p23(1, 0)p13(0, 1)− p12(0, 1)p23(0, 1)p13(1, 0).

The corresponding Markov move is to remove edges 1→ 2, 2→ 3and 3→ 1, and replace them by the same edges oriented in theopposite direction.

Stephen E. Fienberg (CMU) Markov Bases of p1 Models December 11, 2008 20 / 28

Page 48: Algebraic Statistics of p1 Network Models: Markov Bases ...risi/AML08/NIPS-AlgStats-p1-Steve.pdf1 model, log-linear model, and algebraic statistics: Using algebra-geometry structure

Summary of Markov Moves

3-node network

For all 3 cases of ρij , there is only one Markov move:

Figure: A binomial of degree 3

representing the binomial:

p12(1, 0)p23(1, 0)p13(0, 1)− p12(0, 1)p23(0, 1)p13(1, 0).

The corresponding Markov move is to remove edges 1→ 2, 2→ 3and 3→ 1, and replace them by the same edges oriented in theopposite direction.

Stephen E. Fienberg (CMU) Markov Bases of p1 Models December 11, 2008 20 / 28

Page 49: Algebraic Statistics of p1 Network Models: Markov Bases ...risi/AML08/NIPS-AlgStats-p1-Steve.pdf1 model, log-linear model, and algebraic statistics: Using algebra-geometry structure

Summary of Markov Moves

3-node network

For all 3 cases of ρij , there is only one Markov move:

Figure: A binomial of degree 3

representing the binomial:

p12(1, 0)p23(1, 0)p13(0, 1)− p12(0, 1)p23(0, 1)p13(1, 0).

The corresponding Markov move is to remove edges 1→ 2, 2→ 3and 3→ 1, and replace them by the same edges oriented in theopposite direction.

Stephen E. Fienberg (CMU) Markov Bases of p1 Models December 11, 2008 20 / 28

Page 50: Algebraic Statistics of p1 Network Models: Markov Bases ...risi/AML08/NIPS-AlgStats-p1-Steve.pdf1 model, log-linear model, and algebraic statistics: Using algebra-geometry structure

Summary of Markov Moves

3-node network

For all 3 cases of ρij , there is only one Markov move:

Figure: A binomial of degree 3

representing the binomial:

p12(1, 0)p23(1, 0)p13(0, 1)− p12(0, 1)p23(0, 1)p13(1, 0).

The corresponding Markov move is to remove edges 1→ 2, 2→ 3and 3→ 1, and replace them by the same edges oriented in theopposite direction.

Stephen E. Fienberg (CMU) Markov Bases of p1 Models December 11, 2008 20 / 28

Page 51: Algebraic Statistics of p1 Network Models: Markov Bases ...risi/AML08/NIPS-AlgStats-p1-Steve.pdf1 model, log-linear model, and algebraic statistics: Using algebra-geometry structure

Summary of Markov Moves

4-node network, ρij constant

Reminder: ρij = ρ, including ρij = 0.

Figure: A binomial of degree 3

Figure: A binomial of degree 4

Stephen E. Fienberg (CMU) Markov Bases of p1 Models December 11, 2008 21 / 28

Page 52: Algebraic Statistics of p1 Network Models: Markov Bases ...risi/AML08/NIPS-AlgStats-p1-Steve.pdf1 model, log-linear model, and algebraic statistics: Using algebra-geometry structure

Summary of Markov Moves

4-node network, ρij constant

Reminder: ρij = ρ, including ρij = 0.

Figure: A binomial of degree 3

Figure: A binomial of degree 4

Stephen E. Fienberg (CMU) Markov Bases of p1 Models December 11, 2008 21 / 28

Page 53: Algebraic Statistics of p1 Network Models: Markov Bases ...risi/AML08/NIPS-AlgStats-p1-Steve.pdf1 model, log-linear model, and algebraic statistics: Using algebra-geometry structure

Summary of Markov Moves

4-node network, ρij constant

Reminder: ρij = ρ, including ρij = 0.

Figure: A binomial of degree 3

Figure: A binomial of degree 4

Stephen E. Fienberg (CMU) Markov Bases of p1 Models December 11, 2008 21 / 28

Page 54: Algebraic Statistics of p1 Network Models: Markov Bases ...risi/AML08/NIPS-AlgStats-p1-Steve.pdf1 model, log-linear model, and algebraic statistics: Using algebra-geometry structure

Summary of Markov Moves

4-node network, ρij Constant

Figure: A more complicated binomial of degree 4

represents

pij(0, 0)pjk(1, 1)pkl(0, 1)pil(1, 0)− pij(1, 0)pjk(1, 0)pkl(1, 1)pil(0, 0)

The moves come in degrees 3, 4, 5, 6.

Stephen E. Fienberg (CMU) Markov Bases of p1 Models December 11, 2008 22 / 28

Page 55: Algebraic Statistics of p1 Network Models: Markov Bases ...risi/AML08/NIPS-AlgStats-p1-Steve.pdf1 model, log-linear model, and algebraic statistics: Using algebra-geometry structure

Summary of Markov Moves

4-node network, ρij Constant

Figure: A more complicated binomial of degree 4

represents

pij(0, 0)pjk(1, 1)pkl(0, 1)pil(1, 0)− pij(1, 0)pjk(1, 0)pkl(1, 1)pil(0, 0)

The moves come in degrees 3, 4, 5, 6.

Stephen E. Fienberg (CMU) Markov Bases of p1 Models December 11, 2008 22 / 28

Page 56: Algebraic Statistics of p1 Network Models: Markov Bases ...risi/AML08/NIPS-AlgStats-p1-Steve.pdf1 model, log-linear model, and algebraic statistics: Using algebra-geometry structure

Summary of Markov Moves

4-node network, ρij Constant

Figure: A more complicated binomial of degree 4

represents

pij(0, 0)pjk(1, 1)pkl(0, 1)pil(1, 0)− pij(1, 0)pjk(1, 0)pkl(1, 1)pil(0, 0)

The moves come in degrees 3, 4, 5, 6.

Stephen E. Fienberg (CMU) Markov Bases of p1 Models December 11, 2008 22 / 28

Page 57: Algebraic Statistics of p1 Network Models: Markov Bases ...risi/AML08/NIPS-AlgStats-p1-Steve.pdf1 model, log-linear model, and algebraic statistics: Using algebra-geometry structure

Summary of Markov Moves

4-node network, ρij edge-dependent

Reminder: ρij = ρ+ ρi + ρj .

Figure: A binomial of degree 3

Figure: A binomial of degree 4

Stephen E. Fienberg (CMU) Markov Bases of p1 Models December 11, 2008 23 / 28

Page 58: Algebraic Statistics of p1 Network Models: Markov Bases ...risi/AML08/NIPS-AlgStats-p1-Steve.pdf1 model, log-linear model, and algebraic statistics: Using algebra-geometry structure

Summary of Markov Moves

4-node network, ρij edge-dependent

Reminder: ρij = ρ+ ρi + ρj .What follows is a list of all possible moves on a 4-node network, withrespect to symmetry of course.

Figure: A binomial of degree 3

Figure: A binomial of degree 4

Stephen E. Fienberg (CMU) Markov Bases of p1 Models December 11, 2008 23 / 28

Page 59: Algebraic Statistics of p1 Network Models: Markov Bases ...risi/AML08/NIPS-AlgStats-p1-Steve.pdf1 model, log-linear model, and algebraic statistics: Using algebra-geometry structure

Summary of Markov Moves

4-node network, ρij edge-dependent

Figure: A binomial of degree 3

Figure: A binomial of degree 4

Stephen E. Fienberg (CMU) Markov Bases of p1 Models December 11, 2008 23 / 28

Page 60: Algebraic Statistics of p1 Network Models: Markov Bases ...risi/AML08/NIPS-AlgStats-p1-Steve.pdf1 model, log-linear model, and algebraic statistics: Using algebra-geometry structure

Summary of Markov Moves

4-node network, ρij edge-dependent

Figure: A binomial of degree 3

Figure: A binomial of degree 4

Stephen E. Fienberg (CMU) Markov Bases of p1 Models December 11, 2008 23 / 28

Page 61: Algebraic Statistics of p1 Network Models: Markov Bases ...risi/AML08/NIPS-AlgStats-p1-Steve.pdf1 model, log-linear model, and algebraic statistics: Using algebra-geometry structure

Summary of Markov Moves

4-node Network, ρij edge-dependent

Figure: A binomial of degree 4

Figure: A binomial of degree 4

Stephen E. Fienberg (CMU) Markov Bases of p1 Models December 11, 2008 24 / 28

Page 62: Algebraic Statistics of p1 Network Models: Markov Bases ...risi/AML08/NIPS-AlgStats-p1-Steve.pdf1 model, log-linear model, and algebraic statistics: Using algebra-geometry structure

Summary of Markov Moves

4-node Network, ρij edge-dependent

Figure: A binomial of degree 4

Figure: A binomial of degree 4

Stephen E. Fienberg (CMU) Markov Bases of p1 Models December 11, 2008 24 / 28

Page 63: Algebraic Statistics of p1 Network Models: Markov Bases ...risi/AML08/NIPS-AlgStats-p1-Steve.pdf1 model, log-linear model, and algebraic statistics: Using algebra-geometry structure

Summary of Markov Moves

4-node network, ρij edge-dependent

Figure: A binomial of degree 4

Figure: A binomial of degree 5

Stephen E. Fienberg (CMU) Markov Bases of p1 Models December 11, 2008 25 / 28

Page 64: Algebraic Statistics of p1 Network Models: Markov Bases ...risi/AML08/NIPS-AlgStats-p1-Steve.pdf1 model, log-linear model, and algebraic statistics: Using algebra-geometry structure

Summary of Markov Moves

4-node network, ρij edge-dependent

Figure: A binomial of degree 4

Figure: A binomial of degree 5

Stephen E. Fienberg (CMU) Markov Bases of p1 Models December 11, 2008 25 / 28

Page 65: Algebraic Statistics of p1 Network Models: Markov Bases ...risi/AML08/NIPS-AlgStats-p1-Steve.pdf1 model, log-linear model, and algebraic statistics: Using algebra-geometry structure

Summary of Markov Moves

4-node network, ρij edge-dependent

Figure: A binomial of degree 5

Figure: A binomial of degree 6

Stephen E. Fienberg (CMU) Markov Bases of p1 Models December 11, 2008 26 / 28

Page 66: Algebraic Statistics of p1 Network Models: Markov Bases ...risi/AML08/NIPS-AlgStats-p1-Steve.pdf1 model, log-linear model, and algebraic statistics: Using algebra-geometry structure

Summary of Markov Moves

4-node network, ρij edge-dependent

Figure: A binomial of degree 5

Figure: A binomial of degree 6

Stephen E. Fienberg (CMU) Markov Bases of p1 Models December 11, 2008 26 / 28

Page 67: Algebraic Statistics of p1 Network Models: Markov Bases ...risi/AML08/NIPS-AlgStats-p1-Steve.pdf1 model, log-linear model, and algebraic statistics: Using algebra-geometry structure

Summary

Summary

Graphs and network models.

p1 and its algebraic representation:Reviewed of p1 and log-linear models.Developed Markov bases.

Markov bases and proposed uses:Existence of MLE.Generate exact distribution for p1 and use for assessing goodness-of-fit.

Future work includes:Completing algebraic characterization of p1 models and putting themto use.Generalizations to exponential random graph models (ERGM) models[also known as p∗ models]:

Many complex statistical issues including existence of MLEs(non-existence = degeneracies involving zero estimates) andnear-degeneracies.Scaling tools up to be of use for analysis large networks with efficientcomputation.

Stephen E. Fienberg (CMU) Markov Bases of p1 Models December 11, 2008 27 / 28

Page 68: Algebraic Statistics of p1 Network Models: Markov Bases ...risi/AML08/NIPS-AlgStats-p1-Steve.pdf1 model, log-linear model, and algebraic statistics: Using algebra-geometry structure

Summary

... The End ...

Stephen E. Fienberg (CMU) Markov Bases of p1 Models December 11, 2008 28 / 28