algebraic complexity in statistics using combinatorial and...

Algebraic Complexity in Statistics using Combinatorial and Tensor Methods

BY

ELIZABETH GROSS

THESIS

Submitted in partial fulfillment of the requirementsfor the degree of Doctor of Philosophy in Mathematics

in the Graduate College of theUniversity of Illinois at Chicago, 2013

Chicago, Illinois

Defense Committee:

Shmuel Friedland, Chair and AdvisorSonja Petrovic, Advisor, Penn StateJan VerscheldeOlga KashcheyevaLek-Heng Lim, University of Chicago

To Ryan and Sebastian.

ii

ACKNOWLEDGMENTS

Thank you to my advisor, Sonja Petrovic, whose dedicated mentoring has prepared me for

life as a mathematician. Sonja has been an excellent advisor, always challenging me to do better

and always believing I could. I will continually be grateful for her guidance and knowledge.

Thank you to my committee, Shmuel Friedland, Jan Verschelde, Olga Kashcheyeva, and

Lek-Heng Lim. Jan and Olga have been supportive through their active participation in the

Graduate Computational Algebraic Geometry Seminar. Shmuel has been a second advisor to

me, and I have enjoyed our many conversations. I am grateful to Jan for his early mentoring

and introducing me to numerical algebraic geometry.

I’ve been very lucky to have multiple people who have gone above and beyond in helping me

succeed; of these people, Mathias Drton, Bernd Sturmfels, and Seth Sullivant deserve special

recognition.

I am grateful to my fellow classmates at UIC and my colleagues from SFSU. It has been

inspiring to watch everyone’s triumphs and wonderful to be part of such a vibrant and stimu-

lating department. I am also grateful for the support of my friends and family, especially my

father.

Finally, I would like to thank Ryan. Ryan, you are a loving and devoted husband and father.

You keep me grounded when I am in danger of floating away and soaring when I am leaden.

Together, we have a beautiful life. Thank you for all your help.

iii

TABLE OF CONTENTS

CHAPTER PAGE

1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

2 BACKGROUND . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.1 Models, Ideals, and Varieties . . . . . . . . . . . . . . . . . . . . 62.2 Toric models, Markov bases and Markov complexity . . . . . . 92.3 Phylogenetic Models . . . . . . . . . . . . . . . . . . . . . . . . . 102.4 Tensors, Rank, and Border Rank . . . . . . . . . . . . . . . . . 122.5 Maximum Likelihood Estimation . . . . . . . . . . . . . . . . . 14

3 TORIC IDEALS OF HYPERGRAPHS . . . . . . . . . . . . . . . . . 183.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183.2 Preliminaries and notation . . . . . . . . . . . . . . . . . . . . . 213.3 Splitting sets and reducible edge sets . . . . . . . . . . . . . . . 233.4 Indispensable Binomials . . . . . . . . . . . . . . . . . . . . . . . 283.5 General degree bounds . . . . . . . . . . . . . . . . . . . . . . . 343.6 Hidden Subset Models . . . . . . . . . . . . . . . . . . . . . . . . 44

4 PHYLOGENETIC MODELS AND TENSORS OF BOUNDEDRANK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 504.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 504.2 A characterization of V4(3, 3, 4) . . . . . . . . . . . . . . . . . . 534.3 Proving case A.I.3 using degree 6 polynomials . . . . . . . . . 554.3.1 The case L = R = e3e>3 . . . . . . . . . . . . . . . . . . . . . . . 584.3.2 The case L = e3e>3 , R = e3e>2 . . . . . . . . . . . . . . . . . . . 604.4 The defining polynomials of V4(4, 4, 4) . . . . . . . . . . . . . . 60

5 MAXIMUM LIKELIHOOD DEGREE OF VARIANCE COM-PONENT MODELS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 625.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 625.2 The likelihood equations . . . . . . . . . . . . . . . . . . . . . . 665.2.1 Maximum likelihood . . . . . . . . . . . . . . . . . . . . . . . . . 675.2.2 Restricted maximum likelihood . . . . . . . . . . . . . . . . . . 715.3 Proof of formula for ML degree . . . . . . . . . . . . . . . . . . 725.4 Proof of formula for REML degree . . . . . . . . . . . . . . . . 815.5 Linear mixed models with multimodal likelihood functions . . 86

6 CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

iv

TABLE OF CONTENTS (Continued)

CHAPTER PAGE

CITED LITERATURE . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

VITA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

v

LIST OF FIGURES

FIGURE PAGE1 Reducible balanced edge set. The green edge es is the separator. . . 252 Reducible balanced edge set with an improper separator. The sepa-

rator consists of green edges e1 and e2. . . . . . . . . . . . . . . . . . . . . 253 Hypergraph associated to the hierarchical log-linear model for no 3-

way interaction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324 Example of a non-uniform hypergraph whose associated toric ideal

is non-homogeneous. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445 Case 1. Proof of Theorem 32. . . . . . . . . . . . . . . . . . . . . . . . 486 Case 3. Proof of Theorem 32. . . . . . . . . . . . . . . . . . . . . . . . 48

vi

SUMMARY

Fundamental questions in statistical modeling ask about the best methods for model se-

lection, goodness-of-fit testing, and estimation of parameters. For example, given a collection

of aligned DNA sequences from a group of extant species, how can we decide which evolu-

tionary tree best describes the species’ ancestral history, or, given a sparse high-dimensional

contingency table, how can we perform goodness-of-fit testing when exact tests are infeasible?

In questions such as these, combinatorics, commutative algebra and algebraic geometry play a

leading role.

We explore such questions for specific classes of models, e.g. toric models, phylogenetic

models, and variance components models, and tackle the algebraic complexity problems that

lie at the root of them. We begin our exploration by studying toric ideals of hypergraphs,

algebraic objects that are used for goodness-of-fit testing for log-linear models. In this study,

we use the combinatorics of hypergraphs to give degree bounds on the generators of the ideals,

give sufficiency conditions of when a binomial in the ideal is indispensable, show that the ideal

of Tan(P1)n is generated by quadratics and cubics in cumulant coordinates, and recover a

well-known complexity theorem in algebraic statistics due to De Loera and Onn. Second, we

explore phylogenetic models by viewing the models as sets of tensors with bounded rank. We

show that the variety of 4× 4× 4 complex-valued tensors with border rank at most 4 is defined

by polynomials of degree 5, 6, and 9. This variety corresponds to the 4-state general Markov

model on the claw tree K1,3 and its defining polynomials can be used in model selection. This

vii

SUMMARY (Continued)

result also gives further evidence that the phylogenetic ideal of the model can be generated by

polynomials of degree 9 and less. Finally, we look at the algebraic complexity of maximum

likelihood estimation for variance components models, where we give explicit formulas for the

ML and REML degree of the random effects model for the one-way layout and give examples

of multimodal likelihood surfaces.

viii

CHAPTER 1

INTRODUCTION

Algebraic statistics applies commutative algebra, algebraic geometry, and combinatorics to

problems arising in statistics (for surveys see [22] and [25]). In this field of study, the idea

of complexity comes up in several different respects. This thesis will explore three different

complexity issues: Markov complexity for toric models, phylogenetic complexity for phylogenetic

models, and maximum likelihood degree for variance components models. The first two of these

are both defined as the maximum degree of polynomials in a minimal generating set of a specified

ideal. The last measure of algebraic complexity, the maximum likelihood degree, is the degree

of a zero-dimensional variety.

A statistical model is a family of probability distributions where joint probabilities are

often specified parametrically. If the joint probabilities, or, more commonly, their logarithms,

are parameterized by polynomials, then the closure of the model is an algebraic variety. The

underlying idea of algebraic statistics is that information about the variety yields statistical

information about the model.

For example, the generators of the vanishing ideal of a statistical model are useful in

goodness-of-fit testing and model selection [19] [9]. These generators are called model invariants

or, in the case of log-linear models, Markov bases. Part I of this dissertation is concerned with

providing an efficient description of the model invariants for two different classes of discrete

models, those encoded by hypergraphs and those specified as tensors of bounded border rank.

1

2

In Chapter 3, we focus on statistical models that are parameterized by square-free mono-

mials. Examples of such models include log-linear models [24] [22], group-based phylogenetic

models [58], and some hidden subset models [59]. In these cases, the parameterization can be

encoded by a hypergraph H. The model invariants are the generators of the toric ideal of the

hypergraph H, the kernel of the monomial map defined by the vertex-edge incidence matrix of

H. In the context of algebraic statistics, a binomial generating set of a toric ideal is called a

Markov basis. The Markov complexity, or Markov width, of a toric ideal is the maximum degree

of the polynomials in a minimal generating set.

Section 3.4 focuses on indispensable binomials, i.e. binomials that are members of every

minimal generating set of IH . The degree of an indispensable binomial gives a lower bound

on the Markov complexity of IH . Proposition 19 gives a combinatorial sufficient condition

for determining whether a binomial f ∈ IH is indispensable. Consequently, the Graver basis

is the unique minimal generating set of IH for any 2-regular hypergraph (Proposition 20). In

Corollary 26, we apply our combinatorial lower bounds to recover a well-known complexity result

in algebraic statistics from [16], which states that Markov bases for the no 3-way interaction

model on 3× r × c contingency tables are arbitrarily complicated.

In Section 3.5, we show that a degree bound on the generators of IH is equivalent to a

combinatorial criterion on H (See Thereom 27). This result generalizes work of Villareal [65]

and Ohsugi and Hibi [47] regarding the toric ideals of graphs and offers a way of computing an

upper bound for the Markov complexity of a given statistical model.

3

As an example, we consider hidden subset models. In [59], Sturmfels and Zwiernik show that

the variety of the hidden subset model for n random variables with subsets {{1}, {2}, . . . {n}}

is isomorphic to the image of the first tangential variety Tan((P1)n) in cumulant coordinates.

Using the hypergraph approach, we are able to show that in cumulant coordinates, the defining

ideal of the first tangential variety Tan((P1)n) is generated in quadratics and cubics. Thus, as

n grows, the Markov complexity of the hidden subset model associated to Tan((P1)n) remains

constant.

In Chapter 4, we study the 4-state general tree-based Markov model on the claw tree K1,3;

this model has applications to phylogenetics [1]. In terms of multilinear algebra, the variety of

this model is the set of all tensors in C4 ⊗ C4 ⊗ C4 of border rank at most 4. Thus, we shift

from the combinatorics of hypergraphs to the study of three-way tensors in this chapter.

For phylogenetic models, the vanishing ideal associated to the model is referred to as a

phylogenetic ideal, and the maximum degree of the polynomials in a minimal generating set

is called the phylogenetic complexity of the model. For the 4-state general Markov model on

K1,3, the phylogenetic complexity is conjectured to be 9 (see Conjecture 34), this was first

conjectured in [57] and agrees with the numerical computations in [6].

In [26], Friedland shows that the variety associated to the model is cut out set-theorectically

by polynomials of degree 5, 9, and 16. The goal of Chapter 4 is to replace the degree 16 equations

with a set of degree 6 equations that are known to be in the ideal [39]. In Theorem 33, we

tighten the results of Friedland in [26] and show that the variety of tensors in C4 ⊗C4 ⊗C4 of

border rank at most 4 is cut out by polynomials of degree 5, 6, and 9. In addition to providing

4

supporting evidence for Conjecture 34, this result, combined with results in [26] and [6], gives

explicit polynomials that one can use to test whether a tensor in C4⊗C4⊗C4 has border rank

less than four, or equivalently, whether given data could have arisen from the 4-state general

Markov model on the claw tree K1,3.

Chapter 5 is dedicated to a different kind of complexity problem: that of maximum likelihood

estimation. In particular, we study the algebraic complexity, or ML degree, of maximum

likelihood estimation for specific models. The ML degree gives us insight into the geometry

of the likelihood surface. For example, it speaks to the total possible number of modes of the

likelihood surface. Since common maximum likelihood estimation techniques use local numerical

methods to find maxima, asking whether a given likelihood function could have more than one

local maximum is an important, but often overlooked, question in statistics.

The goal of maximum likelihood estimation is to find parameters that best explain a given

data point. This amounts to finding the zero set of the likelihood equations. This zero set is also

referred to as the likelihood locus and has been studied within an algebraic geometry setting in

[10] and [33]. The number of complex solutions to the system of likelihood equations for generic

data is called the maximum likelihood degree (ML degree); it is the degree of the likelihood locus

and it quantifies the feasibility of using symbolic algebraic methods to find maximum likelihood

estimates and gives an upper bound on the number of modes of the real likelihood surface.

Another statistical method for finding likely parameters given an array of observations is

restricted maximum likelihood estimation, a variation of maximum likelihood estimation. The

REML method involves considering a projection of the observed array; for the one-way layout

5

with random effects this method returns a likelihood function dependent on only two of the

three parameters. In the case of restricted maximum likelihood estimation, the REML degree

of a model is defined similarly as the ML degree: it is the number of complex solutions to the

restricted maximum likelihood equations for generic data. To our knowledge, the REML degree

has never been studied for any statistical model.

In Chapter 5, we give explicit formulas for the ML and REML degree for variance compo-

nents models, specifically one-way layouts with random effects. We conclude Chapter 5 with

two examples of a multimodal likelihood functions.

CHAPTER 2

BACKGROUND

In this dissertation, we use a common shorthand notation for monomials using vectors. Let

x = (x1, . . . xN ) be a vector of indeterminates and let a ∈ ZN≥0 be a non-negative integer vector.

Then we denote xa = xa11 · x

a22 · · ·x

aNN .

2.1 Models, Ideals, and Varieties

In Part I, we will be concerned with discrete statistical models. The proceeding notation

and definitions follow [22] and [60].

In the case of discrete models, we can think of the joint probability distribution of m

random variables as a m-dimensional array P. Let X1, . . . Xm be discrete random variables

with Xl ∈ [rl]. Let R =∏ml=1[rl] and N =

∏ml=1 rl. The (i1, i2, . . . , im)th entry of P is the joint

probability P (X1 = i1, . . . , Xm = im). For simplicity, we will often flatten the tensor P into a

vector p:

Definition 1 The joint probability vector for X1, . . . Xm is the N -dimensional vector p =

(pi | i ∈ R) ∈ RN where pi is the joint probability

pi = pi1...im = P (X1 = i1, . . . , Xm = im), i = (i1, . . . , im) ∈ R.

6

7

Since we are concerned with probability distributions, the coordinates of p satisfy pi ≥ 0

for all i ∈ R and the constraint∑

i∈R pi = 1. The set of all p that satisfy these constraints

forms a N − 1 dimensional simplex.

Definition 2 The probability simplex ∆N−1 is the set of all possible joint probability distribu-

tions for m random variables with respective state spaces [r1], . . . , [rm].

∆N−1 := {p ∈ RN | pi ≥ 0 for all i ∈ R and∑i∈R

pi = 1}.

A statistical model M for the random variables X1, . . . , Xm is a subset of ∆N−1. The

models we consider in this dissertation are parametric statistical models.

Definition 3 Let Θ ⊂ Rd be a parameter space and φ : Θ→ ∆N−1 a map. The image

M = φ(Θ) ⊂ ∆N−1

is a parametric statistical model.

In algebraic statistics, we are concerned with parametric statistical models where φ is a

rational map. In this case, it is natural to consider φ as a map from Cd → CN . The advantage

of this view is that the image φ(Cd) is well-approximatd by a variety, namely its Zariski closure

(see [60][Theorem 3.6] and proceeding discussion). For completeness, we define here what we

mean by variety, ideal of a subset of CN , and Zariski closure.

8

In the following discussion we work over the ring R[p], treating the joint probability distri-

butions pi for i ∈ R as indeterminates.

Definition 4 Let F be a collection of polynomials in R[p]. The variety of F is the zero set of

F :

V (F ) := {x ∈ CN | f(x) = 0 for all f ∈ F}.

Notice that in the above definition the variety V (F ) may not be irreducible, in general, we will

use variety to mean an algebraic set.

Let S ⊂ CN . We define the ideal of S as

I(S) := {f ∈ R[p] |f(x) = 0 for all x ∈ S}.

Given a variety V ⊂ CN , the ideal I(V ) is the set of all polynomials that vanish on V . If

V = V (J) for an ideal J , then J is called a set of defining equations for V . Notice that, in

many cases, V = V (J) does not imply I(V ) = J ; this implication is only true when J is radical.

Thus, when we say that J defines a variety V set-theorectically, this means V = V (J) but not

necessarily I(V ) = J .

As stated above, we are often interested in the Zariski closure of the image of a rational

map. The Zariski closure of S is V (I(S)); this is the smallest variety that contains S. For a

parametric model with a polynomial parameterization φ, the ideal of the model will be denoted

by

IM := I(Im φ),

9

and the variety of the model as

VM := V (IM).

Chapters 3 and 4 are motivated by parametric statistical models with polynomial parame-

terizations. Our main goal is to understand the complexity of the implicitization problem, i.e.

finding the generators of IM.

2.2 Toric models, Markov bases and Markov complexity

Let C∗ = C\{0}. If the map φ : Cd → CN in Definition 3 is monomial, then the parametric

statistical model M = φ((C∗)d) ∩ ∆N−1 is called a toric model. Toric models are generally

referred to as log-linear models in statistical literature.

In statistics, for discrete data, observations are recorded in a contingency table, m-dimensional

arrays T where the (i1, . . . , im)th entry is the number of times the random vector (X1, . . . , Xm)

was observed in state (i1, . . . , im). A Markov basis is a set of integer vectors that connects

the lattice of all tables with the same sufficient statistics as T for all T in the sample space

(the sufficient statistics are determined by the model; see [22]). Given a toric model M, the

Fundamental Theorem of Markov Bases, which originally appeared in [19], establishes a one-

to-one correspondence between Markov bases and binomial generating sets of IM. The ideal

IM is a toric ideal, which is well-known to be a binomial ideal (see [56]). Since the focus of this

dissertation is more algebraic than statistical, we will use the algebraic definition of Markov

bases.

10

Definition 5 Let M be a toric model. Let F ⊂ R[p] such that each f ∈ F is binomial. If F

generates the ideal IM, then F is a Markov basis of M.

The Markov complexity of a toric model M is the maximum degree of all polynomials in

a minimal generating of IM. In Section 3.4, we use the Graver bases to bound the Markov

complexity for certain toric models.

The Graver basis of a toric ideal I is the set of all binomials pu − pv such that there is no

other binomial pw−pz ∈ I such that pw divides pu and pz divides pv. Binomials in the Graver

basis are called primitive. The Graver complexity of a toric model M is the maximum degree

of all polynomials in the Graver basis of IM. Since the Graver basis of an ideal I contains the

universal Grobner basis of I [56], we see that the Graver basis is a generating set of I, and thus,

Markov complexity of M ≤ Graver complexity of M.

2.3 Phylogenetic Models

A problem that arises in computational biology and has been studied extensively in alge-

braic statistics is the problem of inferring phylogenetic trees given aligned DNA sequences of

several living species. Algebraic methods for inferring phylogenetic trees were first proposed

independently by Lake in [37] and Cavender and Felsenstein in [13]. The methods have been

more recently explored by Cassanellas and Fernandez-Sanchez in [14].

The models used in these studies are hidden Markov models, parametric statistical models

with a polynomial parameterization. In a hidden Markov model, evolution is assumed to

proceed along a directed tree with all edges moving away from the root. In the tree, each

11

node corresponds to a species. The leaves correspond to extant species while the internal nodes

represent extinct species.

Common hidden Markov models used in phylogenetics are the 2-state and 4-state models.

In the 4-state model, each internal node is a hidden random variable and each leaf is an observed

random variable; the state space for each random variable Xi, hidden and observed, is {1, 2, 3, 4}

which correspond to the nucleic bases {A,C,G, T}. Each edge (i, j) is assigned a 4×4 transition

matrix whose (k, l)th entry is the conditional probability P (Xj = l |Xi = k). In the general

Markov model, the model we explore in Chapter 4, the only constraint on the transition matrices

are that the rows sum to 1.

We will refer to the four-state general Markov model on the tree T as MT . The ideal IMT

is referred to as a phylogenetic ideal and its generators are called phylogenetic invariants.

Definition 6 The phylogenetic complexity of a phylogenetic modelMT is the maximum degree

over all polynomials in a minimal generating set of IMT.

When the tree T is a bifurcating tree, results in [21] and [2] state that all phylogenetic

invariants of MT can be obtained from the phylogenetic invariants of MK1,3 where K1,3 is the

3-leaf claw tree. Thus, it suffices to understand IMK1,3.

Proposition 7 [22][Proposition 4.1.11] Let M0 be the 4-state general Markov model on the

claw tree K1,3. VM0 is isomorphic to V4(4, 4, 4), the set of all complex-valued 4× 4× 4 tensors

with border rank less than or equal to 4.

12

Thus, by Proposition 7, in order to understand VM0 , we need to be able to understand the

set of all complex-valued 4× 4× 4 tensors with border rank less than or equal to 4.

2.4 Tensors, Rank, and Border Rank

This section uses terminology and definitions from [26], [38], and [41].

In Chapter 4 we focus on elements of Cm⊗Cn⊗Cl, equivalently, three-way complex-valued

tensors of dimension m × n × l. We will take a coordinate based perspective, considering a

tensor T ∈ Cm ⊗Cn ⊗Cl as an array T = [ti,j,k]m,n,li=j=k ∈ Cm×n×l whose (i, j, k)th entry is ti,j,k.

Coordinate representations of tensors are also referred to as a hypermatrices in order to call

attention to the fact that they are equipped with algebraic operations arising from the algebraic

structure of Cm ⊗ Cn ⊗ Cl rather than just data structures (see [41]).

Just as matrices, we can define a notion of rank for tensors that is independent of the choice

of bases for the vector spaces Cm, Cn, Cl. This rank definition is also sometimes referred to as

the outer product rank.

Definition 8 A three-way tensor T ∈ Cm ⊗ Cn ⊗ Cl is a rank one tensor if it can be written

as the outer product of three vectors u ∈ Cm, v ∈ Cn, w ∈ Cl, i.e.

T = u⊗ v ⊗w.

The (i, j, k)th element of T is uivjwk.

13

Definition 9 The rank of a non-zero tensor T ∈ Cm ⊗ Cn ⊗ Cl, denoted rank T , is the

minimal number r such that there exist ui ∈ Cm, vi ∈ Cn, wi ∈ Cl for 1 ≤ i ≤ r such that

T =r∑i=1

ui ⊗ vi ⊗wi.

While the set of all matrices in Cm ⊗ Cn with rank less than r is closed with respect to

the Zariski topology, the set of all tensors in Cm ⊗ Cn ⊗ Cl is not necessarily closed. Thus, we

introduce the following notion of border rank as described in [38].

Definition 10 A tensor T has border rank r if it is a limit of tensors of rank r but is not a

limit of tensors of rank s for any s < r. In this case, we write brank T = r.

We use Vr(m,n, l) to denote the set of all tensors in Cm ⊗Cn ⊗Cl of border rank less than

r. The set Vr(m,n, l) is a closed irreducible variety whose projectivization is the rth secant

variety of Pm−1 × Pn−1 × Pl−1. Chapter 4 is concerned with determining defining equations of

V4(4, 4, 4), the variety associated to the 4-state general Markov model on the 3-leaf claw tree.

In Chapter 4, many of the results are phrased in terms of the slices of a tensor T ∈

Cm ⊗ Cn ⊗ Cl. Slices are matrices obtained from a tensor T = [ti,j,k]m,n,li=j=k by fixing one of the

three indices: a 1-slice or horizontal slice is obtained by fixing the 1st index, a 2-slice or lateral

slice is obtained by fixing the 2nd index, and a 3-slice or frontal slice is obtained by fixing the

3rd index. A tensor T ∈ Cm ⊗ Cn ⊗ Cl has m horizontal slices, n lateral slices, and l frontal

slices. We denote slices as Tq,p where p ∈ {1, 2, 3} and indicates whether it is a 1-slice, 2-slice,

or 3-slice, and q gives the value of the fixed index. For example, T1,3 = [ti,j,1]m,ni,j=1.

14

One way to understand the rank of a tensor is to understand the span of its frontal slices

(or horizontal or lateral slices). Let the span of the frontal slices be denoted

T3(T ) := span(T1,3, . . . , Tl,3) ⊂ Cm×n.

The following Theorem from [26] states the connection between T3(T ) and the rank of T .

Theorem 11 [26][Theorem 2.1] Let T ∈ Cm⊗Cn⊗Cl. Then rank T is the minimal dimension

of a subspace U ∈ Cm×n that contains T3(T ) and is spanned by rank one matrices.

Theorem 11 is used in Chapter 4 as we show that V4(4, 4, 4) is cut out by polynomials of

degree 5, 6, and 9.

2.5 Maximum Likelihood Estimation

In Chapter 5 we turn our attention towards algebraic complexity problems that arise in

maximum likelihood estimation. Maximum likelihood estimation is a statistical method for

estimating the most likely parameters of a probability density function given a set of observed

data (e.g. a contingency table) and a statistical model. Let

M = {f(·|θ)|θ ∈ Θ}

be a parametric statistical model with parameters θ = (θ1, . . . , θm). Since the models we explore

in Chapter 5 are continuous, we change our notation slightly from the preceding sections and

denote a joint probability density function in M as f(·|θ).

15

In maximum likelihood estimation, one assumes that observed data are independently and

identically distributed according to a probability density function f(·|θ0) ∈ M. Thus, given a

sample of n observations x1, . . . ,xn, the goal is to find the best estimate of θ0. This amounts

to maximizing the likelihood function.

The likelihood function is the probability of observing x1, . . . ,xn given θ0 = θ, that is, it is

a function in the parameters θ

L(θ|x1, . . . ,xn) =n∏i=1

f(xi|θ).

The estimator (or MLE) of θ0, which is denoted θ, is the value of the parameters θ that

maximizes L(θ|x1, . . . ,xn). Since in many cases the logarithm of the likelihood function is

easier to analyze, in statistics, we often consider the log-likelihood function:

`(θ|x1, . . . ,xn) =n∑i=1

ln f(xi|θ).

If θ is a maximum of the log-likelihood function, then θ is the MLE for L(θ|x1, . . . ,xn). We

will use the log-likelihood function in Chapter 5.

A maximum of `(θ|x1, . . . ,xn) occurs when all its first partial derivatives are zero. The

likelihood equations, or log-likelihood equations, are the equations { ∂L∂θi= 0, i = 1, . . . ,m} and

{ ∂`∂θi= 0, i = 1, . . . ,m}, respectively. In Chapter 5 we use ‘likelihood equations’ to mean the

log-likelihood equations.

16

The number of complex solutions to the likelihood equations is constant with probability

one, and a data set is generic if it is not part of the null set for which the number of complex

solutions is different. Thus, we define the maximum likelihood degree as:

Definition 12 The maximum likelihood degree ( ML degree) is the number of complex solu-

tions to the maximum likelihood equations for generic data.

If the likelihood equations or log-likelihood equations are rational, there is symbolic (e.g

Macaulay2) and numerical software (e.g PHCpack) that can find all the complex solutions to

the likelihood equations. The remainder of the optimization problem then becomes evaluating

the likelihood function at the solutions to determine at which point the maximum is attained.

Thus the ML degree is a measure of the algebraic complexity of the problem of maximum

likelihood estimation. For more background on ML degrees, see [33; 10; 8; 22; 57; 34].

The ML degree also gives insight into the geometry of maximum likelihood estimation. The

likelihood surface is the real part of the hypersurface defined by the likelihood function. If

the ML degree of a model is greater than one, then it is possible that the likelihood surface is

multimodal, which suggests local methods of obtaining the maximum of the likelihood surface

could fail. Section 5.5 gives an example of a multimodal likelihood surface.

In Section 5.4, we study restricted maximum likelihood estimation, a variation of maximum

likelihood estimation whose algebraic complexity has not been studied before for any statistical

model. We define REML degree in terms of the restricted maximum likelihood equations.

17

Definition 13 The restricted maximum likelihood degree ( REML degree) is the number of

complex solutions to the restricted maximum likelihood equations for generic data.

Theorem 39 gives a formula for the REML degree for variance components models.

CHAPTER 3

TORIC IDEALS OF HYPERGRAPHS

This chapter is based on work in [30] with Sonja Petrovic.

3.1 Introduction

Let H be a hypergraph on V = {1, . . . , n} with edge set E ⊂ P(V ) \ {∅}. Each edge

ei ∈ E of size d encodes a squarefree monomial xei :=∏j∈ei

xj of degree d in the polynomial

ring k[x1, . . . , xn]. The edge subring of the hypergraph H, denoted by k[H], is the following

monomial subring:

k[H] := k[xei : ei ∈ E(H)].

The toric ideal of k[H], denoted IH , is the kernel of the monomial map φH : k[tei ] → k[H]

defined by φH(tei) = xei . The ideal IH encodes the algebraic relations among the edges of the

hypergraph. For the special case where H is a graph, generating sets of the toric ideal of k[H]

have been studied combinatorially in [47], [48], [51], [62], [65], and [66].

The combinatorial signatures of generators of IH are balanced edge sets of H. Balanced

edge sets on uniform hypergraphs were introduced in [50], and are referred to as monomial

walks. This chapter is based on the fact that the ideal IH is generated by binomials fE arising

from primitive balanced edge sets E of H (See Proprosition 14, a generalization of [50, Theorem

2.8]). A balanced edge set of H is a multiset of bicolored edges E = Eblue t Ered satisfying

18

19

the following balancing condition: for each vertex v covered by E , the number of red edges

containing v equals the number of blue edges containing v, that is,

degblue(v) = degred(v). (3.1.1)

A binomial fE arises from E if it can be written as

fE =∏

e∈Eblue

te −∏

e′∈Ered

te′ .

Note that while H is a simple hypergraph (it contains no multiple edges), E allows repetition

of edges. In addition, the balanced edge set E is primitive if there exists no other balanced edge

set E ′ = E ′blue t E ′red such that E ′blue ( Eblue and E ′red ( Ered; this is the usual definition of an

element in the Graver basis of IH . If H is a uniform hypergraph, a balanced edge set is called

a monomial walk to conform with the terminology in [65], [66] and [50].

The motivation for studying toric ideals IH in this work is their connection to Markov bases

for statistical models parameterized by monomials as described in Section 2.2. In what follows,

we give two general degree bounds for generators of IH (Section 3.5), study the combinatorics

of splitting sets and reducibility (defined in Section 3.3), and explore implications to algebraic

statistics and Markov complexity throughout. Section 3.4 focuses on indispensable binomials,

i.e. binomials that are members of every minimal generating set of IH . Proposition 19 gives a

combinatorial sufficient condition for determining whether a binomial f ∈ IH is indispensable.

Consequently, the Graver basis is the unique minimal generating set of IH for any 2-regular

20

hypergraph (Proposition 20). In particular, this means that the Graver basis is equal to the

universal Grobner basis, although the defining matrix need not be unimodular. Theorem 27 is a

combinatorial criterion for the ideal of a uniform hypergraph to be generated in degree at most

d ≥ 2. The criterion is based on decomposable balanced edge sets, separators, and splitting

sets; see Definitions 15 and 16. Our result generalizes the well-known criterion for the toric

ideal of a graph to be generated in degree 2 from [47], [65], and [66]. Splitting sets translate

and extend the constructions used in [47], [65], and [66] to hypergraphs and arbitrary degrees.

Theorem 29 provides a more general result for non-uniform hypergraphs.

Since log-linear models, by definition, have a monomial parametrization, we can also asso-

ciate to any log-linear model M with a square-free parameterization a (non-uniform) hyper-

graph HM. By Proposition 14, Markov moves for the modelM are described by balanced edge

sets of HM: if E is a balanced edge set of HM, then a Markov move on a fiber of the model

corresponds to replacing the set of red edges in E by the set of blue edges in E . Our degree

bounds give a bound for the Markov complexity of the model M.

We apply our combinatorial criteria to recover a well-known complexity result in algebraic

statistics from [16] in Corollary 26. Finally, we study the Markov complexity of a set of mod-

els from [59] called hidden subset models; the Zariski closure of these models are tangential

varieities. Namely, Theorem 32 says that the ideal associated to the image of Tan((P1)n) in

higher cumulants is generated by quadratics and cubics.

21

3.2 Preliminaries and notation

We remind the reader that all hypergraphs in this chapter are simple, that is, they contain

no multiple edges. In contrast, balanced edge sets of hypergraphs are not, since the binomials

arising from the sets need not be squarefree. Therefore, for the purpose of this manuscript, we

will refer to a balanced edge set as a multiset of edges, with implied vertex set; and, as usual,

V (E) denotes the vertex set contained in the edges in E .

For the remainder of this short section, we will clear the technical details and notation we

need for the proofs that follow.

A multiset, M , is an ordered pair (A, f) such that A is a set and f is a function from A

to N>0 that records the multiplicity of each of the elements of A. For example, the multiset

M = ({1, 2}, f) with f(1) = 1 and f(2) = 3 represents M = {1, 2, 2, 2} where ordering doesn’t

matter. We will commonly use the latter notation.

Given a multiset M = (A, f), the support of M is supp (M) := A, and its size is |M | :=∑a∈A f(a). For two multisets M1 = (A, f1) and M2 = (B, f2), we say M2 ⊆ M1 if B ⊆ A and

for all b ∈ B, f2(b) ≤ f1(b). M2 is a proper submultiset of M1 if B ( A, or there exists a b ∈ B

such that f2(b) < f1(b).

Unions, intersections, and relative complements of multisets are defined in the canonical

way:

22

M1 ∪M2 := (A ∪B, g) where g(a) =

f1(a) if a ∈ A \B,

f2(a) if a ∈ B \A,

max(f1(a), f2(a)) if a ∈ A ∪B;

M1 ∩M2 := (A ∩B, g) where g(a) = min(f1(a), f2(a));

M1 −M2 := (C, g), where g(a) =

f1(a) if a ∈ A \B,

f1(a)− f2(a) otherwise.

and C = A \B ∪ {a ∈ A ∩B | f1(a)− f2(a) > 0}

Note that the support of the union (intersection) of two multisets is the union (intersection)

of their supports. Finally, we define a sum of M1 and M2:

23

M1 tM2 := (A ∪B, g) where g(a) =

f1(a) if a ∈ A \B,

f2(a) if a ∈ B \A

f1(a) + f2(a) if a ∈ A ∩B

.

If M1 tM2 is a balanced edge set, then the notation M1 tbM2 will be used to record the

bicoloring of M1 tM2: edges in M1 are blue, and edges in M2 are red.

Finally, the number of edges in a hypergraph H containing a vertex v will be denoted by

deg(v;H). For a bicolored multiset M := Mblue tm Mred, the blue degree degblue(v;M) of a

vertex v is defined to be deg(v;Mblue). The red degree degred(v;M) is defined similarly.

3.3 Splitting sets and reducible edge sets

The aim of this section is to lay the combinatorial groundwork for studying toric ideals of

hypergraphs. In particular, we explicitly state what it combinatorially means for a binomial

arising from a monomial walk to be generated by binomials of a smaller degree. We begin by

describing the binomial generators of IH . Unless otherwise stated, H need not be uniform.

Proposition 14 Every binomial in the toric ideal of a hypergraph corresponds to a balanced

edge set. In particular, the toric ideal IH is generated by primitive balanced edge sets.

Proof. Suppose E is a balanced multiset of edges over H. Define a binomial fE ∈ k[te :

e ∈ E(H)] as follows:

fE =∏

e∈Eblue

te −∏

e′∈Ered

te′ .

24

The balancing condition (3.1.1) ensures that fE is in the kernel of the map φH .

The second claim is immediate.

Motivated by the application of reducible simplicial complexes to understand the Markov

bases of hierarchical log-linear models [20], we now introduce notions of reducibility and sepa-

rators for balanced edge sets. For simplicity, we will often abuse notation and use H to denote

the edge set of H.

Definition 15 A balanced edge set E is said to be reducible with separator S, supp (S) ⊆

supp (E), and decomposition (Γ1, S, Γ2), if there exist balanced edge sets Γ1 6= E and Γ2 6= E

with S 6= ∅ such that S = Γ1red∩Γ2blue

, E = Γ1tΓ2, and the following coloring conditions hold:

Γ1red,Γ2red

⊆ Ered and Γ1blue,Γ2blue

⊆ Eblue.

We say that S is proper with respect to (Γ1, S, Γ2) if S is a proper submultiset of both Γ1red

and Γ2blue.

If S is not proper, then S is said to be blue with respect to (Γ1, S, Γ2) if Γ1red= S, and red

with respect to (Γ1, S, Γ2) if Γ2blue= S.

Figure 1 shows an example of a reducible balanced edge set E . The separator is proper

and consists of the single green edge es; it appears twice in the balanced edge set E , once as

a blue edge and once as a red edge. Figure 2 shows a reducible balanced edge set where the

separator, consisting of the two green edges e1 and e2, is not proper. As before, the separator

edges appear twice in the balanced edge set.

25

!"#

Figure 1. Reducible balanced edgeset. The green edge es is the

separator.

!"# !$#

Figure 2. Reducible balanced edge set with animproper separator. The separator consists of

green edges e1 and e2.

If H is a hypergraph and E is a balanced edge set with supp (E) ⊆ H, given a multiset S

with supp (S) ⊆ H, we can construct a new balanced edge set in the following manner:

E + S := (Eblue t S) tm (Ered t S).

Definition 16 Let H be a hypergraph. Let E be a balanced edge set with size 2n such that

supp (E) ⊆ H. A non-empty multiset S with supp (S) ⊆ H is a splitting set of E with decom-

position (Γ1, S, Γ2) if E + S is reducible with separator S and decomposition (Γ1, S, Γ2).

S is said to be a blue ( red, resp.) splitting set with respect to (Γ1, S, Γ2), if S is a blue

(red, resp.) separator of E + S with respect to (Γ1, S, Γ2).

S is a proper splitting set of E if there exists a decomposition (Γ1, S, Γ2) of E +S such that

S is a proper separator with respect to (Γ1, S, Γ2).

26

Example 17 (Group-based Markov model) Let V1 = {x1, x2, x3, x4}, V2 = {y1, y2, y3, y4},

and V3 = {z1, z2, z3, z4}. Let V be the disjoint union of V1, V2, and V3. Let H be the 3-uniform

hypergraph with vertex set V and edge set:

e111 = {x1, y1, z1} e122 = {x1, y2, z2} e133 = {x1, y3, z3} e144 = {x1, y4, z4}




The hypergraph H has applications in algebraic phylogenetics: it represents the parametriza-

tion of the Z2 × Z2 group-based Markov model on the claw tree K1,2 (see [58, Example 25]).

This model is a submodel of the 4-state general Markov model on K1,2 described in [addme].

Consider the monomial walk

W = {e324, e111, e243, e432} tm {e122, e313, e234, e441}.

Let S = {e133, e212}. Then S is a splitting set of W with decomposition (Γ1, S, Γ2) where

Γ1 = {e111, e243, e432} tm {e133, e212, e441}

Γ2 = {e133, e212, e324} tm {e122, e313, e234}.

27

The decomposition (Γ1, S,Γ2) encodes binomials in IH that generate fW :

fW = te324(te111te243te432 − te133te212te441) + te441(te133te212te324 − te122te313te234).

The previous example illustrates the algebraic interpretation of a splitting set. Notice there

is a correspondence between monomials in k[tei ] and multisets of edges of H. We will write

E(ta1ei1ta2ei2· · · tal

eil) for the multiset ({ei1 , . . . , eil}, f) where

f : {ei1 , . . . , eil} → N

eij 7→ aj .

Thus the support of E(ta1ei1ta2ei2· · · tal

eil) corresponds to the support of the monomial ta1

ei1ta2ei2· · · tal

eil.

If fE = u − v ∈ IH is the binomial arising from the balanced edge set E , then a monomial

s corresponds to a splitting set S if and only if there exist two binomials u1 − v1, u2 − v2 ∈ IH

such that us = u1u2, vs = v1v2 and s = gcd(v2, u1). In this case, the decomposition of E + S

is (Γ1, S,Γ2) where Γ1 = E(u1) tm E(v1) and Γ2 = E(u2) tm E(v2).

For a balanced edge set, E , the existence of a spitting set determines whether the binomial

fE ∈ IH can be written as the linear combination of two binomials fΓ1 ,fΓ2 ∈ IH . While, in

general, the existence of a splitting set does not imply deg(fΓ1), deg(fΓ2) < deg(fE), if H is

uniform and the splitting set is proper, then the following lemma holds.

28

Lemma 18 Let H be a uniform hypergraph and let W be a monomial walk with supp (W) ⊆ H

and |W| = 2n. If S is a proper splitting set of W, then there exists a decomposition (Γ1, S, Γ2)

of W + S such that |Γ1| < |W| and |Γ2| < |W|.

Proof. Let S be a proper splitting set ofW. By definition, there exists a decomposition

(Γ1, S, Γ2) of W + S, such that S is a proper submultiset of Γ1redand Γ2blue

.

Let |Γ1| = 2n1 and |Γ2| = 2n2. Since W +S = Γ1 tΓ2, it follows that |W +S| = |Γ1|+ |Γ2|.

Then, 2n + 2|S| = 2n1 + 2n2, which implies 2n − 2n1 = 2n2 − 2|S|. But S being a proper

submultiset of Γ2bluegives that n2 > |S|, which, in turn, implies that n > n1. By a similar

argument, n > n2. Thus |Γ1| < |W| and |Γ2| < |W|.

3.4 Indispensable Binomials

A binomial f in a toric ideal I is indispensable if f or−f belongs to every binomial generating

set of I. Indispensable binomials of toric ideals were introduced by Takemura et al, and are

studied in [63], [3], [11], [48], [51]. The degree of a indispensable binomial in IH is a lower

bound on the Markov complexity of the model associated to H.

Proposition 19 Let H be a hypergraph. Let E be a balanced edge set with supp (E) ⊆ H. Let

fE be the binomial arising from E. If there does not exist a splitting set of E, then fE is an

indispensable binomial of IH .

Proof. Suppose E is not indispensable. Then there is a binomial generating set of IH ,

G = {f1, . . . , fn}, such that fE /∈ G and −fE /∈ G.

29

Since fE = f+E − f−E ∈ IH , there is a fi = f+

i − f−i ∈ G such that f+i or f−i divides

f+E . Without loss of generality, assume f+

i |f+E . Since fi is a binomial in IH , fi arises from a

monomial walk Ei on H.

Let S = Eired. Let Γ1 = Ei and Γ2 = Γ2blue

tm Γ2redwhere

Γ2blue= ((Eblue − Eiblue

) t Eired)

Γ2red= Ered.

Since f+i |f

+E , the multiset Eiblue

⊆ Eblue, and thus Γ1 t Γ2 = E + S. By construction, Γ1red∩

Γ2blue= S. Therefore S is a splitting set of E .

If every Graver basis element of a binomial ideal IH is indispensable, then the Graver basis

of IH is the unique minimal generating set of IH . Propositions 20 and 24 describe two classes of

hypergraphs where this is the case. In particular, for these hypergraphs, the universal Grobner

basis of IH is a minimal generating set.

Proposition 20 If H is a 2-regular uniform hypergraph, then the Graver basis of IH is the

unique minimal generating set of IH .

For the proof of Proposition 20, we make use of Proposition 3.2 in [50] which concerns

balanced edge sets that are pairs of perfect matchings.

Definition 21 A matching on a hypergraph H = (V,E) is a subset M ⊆ E such that the

elements of M are pairwise disjoint. A matching is called perfect if V (M) = V .

30

Proof. [Proof of Proposition 20] Let G be the Graver basis of IH and let f ∈ G. Since

every element of G is binomial, f arises from a primitive monomial walkW with supp (W) ⊆ H.

Let Mb = supp (Wred) and Mr = supp (Wblue). By primitivity of W, the intersection

Mr ∩ Mb = ∅. Since W satisfies condition (3.1.1) and H is 2-regular, if e1, e2 ∈ Mb and

e1 ∩ e2 6= ∅, then e1 ∈Mr or e2 ∈Mr, which would contradict the primitivity of W. So Mb and

Mr are two edge-disjoint perfect matchings on V (W). By Proposition 3.2 in [50], W contains

no multiple edges, i.e. W = Mb tmMr. Furthermore, since H is 2-regular, the edge set of the

subhypergraph induced by V (W) is Mb ∪Mr

Suppose S is a splitting set of W with decomposition (Γ1, S, Γ2). By the correspondence

between primitive monomial walks and primitive binomials, there exists a primitive monomial

walk Γ such that Γblue ⊆ Γ1blueand Γred ⊆ Γ1red

(if Γ1 is primitive, then Γ = Γ1). By

Proposition 3.2 in[50], Γ must be a pair of perfect matchings on V (Γ ). This means Γ is a proper

balanced edge set of W, a contradiction. Therefore, by Proposition 19, fW is indispensable.

Since every element in the Graver basis of IH is indispensable, there is no generating set of IH

strictly contained in the Graver basis, and the claim follows.

Definition 22 A k-uniform hypergraph H = (V,E) is k-partite if there exists a partition of V

into k disjoint subsets, V1, . . . , Vk, such that each edge in E contains exactly one vertex from

each Vi.

Lemma 23 Let H = (V,E) be a k-uniform k-partite hypergraph with E = EbtEr and Eb∩Er =

∅. If there exists a Vi, 1 ≤ i ≤ k, such that deg(v;Er) = deg(v;Eb) = 1 for all v ∈ Vi, then a

monomial walk W with support E is primitive only if W contains no multiple edges.

31

Proof. Follows from the proof of necessity of Proposition 3.2 in [50].

Proposition 24 Let H = (V,E) be a k-uniform k-partite hypergraph. If there exists a Vi such

that deg(v;E) = 2 where for all v ∈ Vi, then the Graver basis of IH is the unique minimal

generating set of IH .

Proof. The proof is similar to the proof of Proposition 20. Note that while H may

not be 2-regular, one of its parts, Vi, is ‘locally’ 2-regular, and thus restricts the structure of

monomial walks on H. In particular, Lemma 23 ensures that Mr and Mb, are edge-disjoint

perfect matchings on V (W)|Vi , and the rest of the proof follows immediately.

Example 25 (No 3-way interaction) The toric ideal of the hypergraph H in Figure 3 cor-

responds to the hierarchical log-linear model for no 3-way interaction on 2× 2× 2 contingency

tables. This statistical model is a common example in algebraic statistics [22, Example 1.2.7].

Since there is exactly one primitive monomial walk W on H that travels through 8 edges,

IH = (fW).

For 2 × 3 × 3 contingency tables with no 3-way interaction, the hypergraph corresponding

to this log-linear model has 18 edges. The hypergraph in this case is H = (V,E) where V =

32

Figure 3. Hypergraph associated to the hierarchical log-linear model for no 3-way interaction.

{x00, x01, x02, x10, x11, x12, y00, y01, y02, y10, y11, y12, z00, z01, z02, z10, z11, z12, z20, z21, z22} and the

edge set is:

e000 = {x00, y00, z00} e001 = {x00, y01, z01} e002 = {x00, y02, z02}

e010 = {x01, y00, z10} e011 = {x01, y01, z11} e012 = {x01, y02, z12}

e020 = {x02, y00, z20} e021 = {x02, y01, z21} e022 = {x02, y02, z22}

e100 = {x10, y10, z00} e101 = {x10, y11, z01} e102 = {x10, y12, z02}

e110 = {x11, y10, z10} e111 = {x11, y11, z11} e112 = {x11, y12, z12}

e120 = {x12, y10, z20} e121 = {x12, y11, z21} e122 = {x12, y12, z22}

Let W be the primitive monomial walk

W = {e000, e101, e011, e112, e022, e120} tm {e100, e001, e111, e012, e122, e220.}

33

Every remaining edge H that does not appear in W is not contained in V (W), thus it can

be easily verified that there does not exist a splitting set of W, so by Proposition 19, fW is

indispensable. In fact, H satisfies the condition of Proposition 24 and thus every binomial in

IH corresponding to a primitive monomial walk is indispensable.

From the above discussion, we can see that if a uniform hypergraph H contains an induced

subhypergraph Hs that is 2-regular and there exists a bicoloring such that with this bicoloring

Hs is also a balanced edge set, then the maximum degree of any minimal generating set of IH

is at least |E(Hs)|/2.

A similar statement holds for k-uniform, k-partite hypergraphs with vertex partition V =

∪ki=1Vi. Namely, if H contains an induced subhypergraph Hs that is 2-regular on Vi (i.e., H

satisfies the conditions of Proposition 24) and there exists a bicoloring such that with this

bicoloring Hs is a balanced edge set (e.g., Hs is a pair of disjoint perfect matchings), then the

maximum degree of any minimal generating set of IH is at least |E(Hs)|/2.

Recall that degree bounds on minimal generators give a Markov complexity bound for the

corresponding log-linear model in algebraic statistics. This allows us to recover a well-known

result:

Corollary 26 (Consequence of Theorem 1.2 in [16]; see also Theorem 1.2.17 in [22])

The Markov complexity for the no 3-way interaction model on 3×r×c contingency tables grows

arbitrarily large as r and c increase.

Proof. For the no 3-way interaction model on 2×r×c contingency tables, we can construct

a primitive binomial fHs of degree 2 · min(r, c) in its defining toric ideal by taking a cycle of

34

length min(r, c) on the bipartite graph Kr,c. (We remind the reader that this is precisely how

fW is constructed in Example 25). By noting that the hypergraph associated to this binomial

Hs is an induced subhypergraph of the hypergraph associated to the 3 × r × c case and that

Hs is 2-regular in one of the partitions, the claim follows by Proposition 24.

3.5 General degree bounds

For uniform hypergraphs, balanced edge sets are referred to as monomial walks. In the

previous sections, we saw that splitting sets of W translate to algebraic operations on the

binomials fW , providing a general construction for rewriting a high-degree binomial in terms of

binomials corresponding to shorter walks. This, along with Lemma 18, is the key to the general

degree bound result.

Theorem 27 Given a k-uniform hypergraph H, the toric ideal IH is generated in degree at most

d if and only if for every primitive monomial walk W of length 2n > 2d, with supp (W) ⊆ H,

one of the following two conditions hold:

i) there exists a proper splitting set S of W,

or

ii) there is a finite sequence of pairs, (S1, R1), . . . , (SN , RN ), such that

• S1 and R1 are blue and red splitting sets of W of size less than n with decompositions

(Γ11 , S1, Γ21) and (Υ11 , R1, Υ21),

• Si+1 and Ri+1 are blue and red splitting sets of Wi = Γ2ibluetm Υ1ired

of size less than n

with decompositions (Γ1i+1 , Si+1, Γ2i+1) and (Υ1i+1 , Ri+1,Υ2i+1), and,

35

• SN ∩RN 6= ∅ or there exists a proper splitting set of WN .

Proof. [Proof of necessity (⇒)] Let H be a k-uniform hypergraph whose toric ideal IH

is generated in degree at most d. Let W be a primitive monomial walk of length 2n > 2d.

Let pW = u − v be the binomial that arises from W. Since IH is generated in degree at most

d, there exist primitive binomials of degree at most d, (u1 − v1), . . . , (us − vs) ∈ k[tei ], and

m1, . . . ,ms ∈ k[tei ], such that

pW = m1(u1 − v1) +m2(u2 − v2) + . . .+ms(us − vs).

By expanding and reordering so that m1u1 = uw, msvs = vw, and mivi = mi+1ui+1 for all

i = 1, . . . , s− 1, we may and will assume that m1, . . . ,ms are monomials.

If gcd(mi,mi+1) 6= 1 for some i, we can add the terms mi(ui − vi) and mi+1(ui+1 − vi+1)

to get a new term, m′i(u′i− v′i), where m′i = gcd(mi,mi+1) and (u′i− v′i) is an binomial of IH of

degree less than n. Continuing recursively in the manner, we have

pW = m′1(u′1 − v′1) +m′2(u′2 − v′2) + . . .+m′r(u′r − v′r)

where m′1u′1 = u′w, m′rv

′r = v′w, m′iv

′i = m′i+1u

′i+1, gcd(m′i,m

′i+1) = 1 for all i = 1, . . . , r− 1, and

deg(u′i − v′i) < n for all i = 1, . . . r. For convenience, we will drop the superscripts and write

pw = m1(u1 − v1) +m2(u2 − v2) + . . .+mr(ur − vr).

36

Case 1: r = 2. In this case, pW = m1(u1 − v1) +m2(u2 − v2). Let

Γ1 := E(u1) tm E(v1)

Γ2 := E(u2) tm E(v2)

S := E(v1) ∩ E(u2) = E(gcd(v1, u2)).

We want to show (Γ1, S, Γ2) is a decomposition of W + S. Since S = Γ1red∩ Γ2blue

, Γ1blue⊆

Wblue, and Γ2red⊆ Wred, we only need to show W + S = Γ1 t Γ2, Γ2red

⊆ (W + S)red, and

Γ2blue⊆ (W + S)blue. First, notice the following equalities hold:

W + S = (Wblue t S) t (Wred t S) = E(u) t S t E(v) t S

= E(m1u1) t S t E(m2v2) t S = E(m1) t E(u1) t S t E(m2) t E(v2) t S.

Let s ∈ k[tei ] be the monomial such that E(s) = S, so s = gcd(v1, u2). The equality

m1v1 = m2u2 implies m1(v1s ) = m2(u2s ). Now, v1s and u2

s are clearly relatively prime, and by the

assumptions on pW , m1 and m2 are relatively prime. This means the equality m1(v1s ) = m2(u2s )

implies m1 = u2s and m2 = v1

s . Thus,

Γ1 t Γ2 = E(u1) t E(v1) t E(u2) t E(v2)

= E(u1) t E(v1

s) t S t E(v2) t E(

u2

s) t S

= E(u1) t E(m2) t S t E(v2) t E(m1) t S.

37

Consequently, W + S = Γ1 t Γ2.

Notice the equality m2 = v1s also implies Γ1red

= E(v1) = E(m2) t S. This means Γ1red⊆

(E(m2u2) t S) = (Wred t S) = (W + S)red. By a similar observation, Γ2blue⊆ (W + S)blue.

Case 2: r = 2N + 1. For 1 < i < N , let

Γ1i = E(ui) tm E(vi)

Γ2i = E(mi+1ui+1) tm E(m2N−i+2v2N−i+2)

Si = E(vi) ∩ E(mi+1ui+1) = E(gcd(vi,mi+1ui+1)) = E(vi).

For 1 < i < N , let

Υ1i = E(miui) tm E(m2N−i+1v2N−i+1)

Υ2i = E(u2N−i+2) tm E(v2N−i+2)

Ri = E(m2N−i+1v2N−i+1) ∩ E(u2N−i+2)

= E(gcd(m2N−i+1v2N−i+1, u2N−i+2)) = E(u2N−i+2).

One can follow the proof of Case 1) to see that S1 and R1 are splitting sets of W, and Si+1

and Ri+1 are splitting sets of Wi = E(mi+1ui+1) tm E(m2N−i+1v2N−i+1) for i = 1, . . . , N − 1.

Furthermore, by definition, they are blue and red splitting sets (resp.) of size less than 2n.

38

Since WN−1blue= Γ2N−1blue

and WN−1red= Υ1N−1red

, the binomial arising from the walk on

WN−1 is

mNuN −mN+2vN+2 = mN (uN − vN ) +mN+1(uN+1 − vN+1) +mN+2(uN+2 − vN+2).

Choose e ∈ H such that te | mN+1, then te | vN and te | uN+2. But since SN = E(vN ) and

RN = E(uN+2), e ∈ SN and e ∈ RN , so SN ∩RN 6= ∅.

Case 3: r = 2N + 2. For 1 < i < N , let

Γ1i = E(ui) tm E(vi)

Γ2i = E(mi+1ui+1) tm E(m2N−i+3v2N−i+3)

Si = E(vi) ∩ E(mi+1ui+1) = E(gcd(vi,mi+1ui+1)) = E(vi).

For 1 < i < N , let

Υ1i = E(miui) tm E(m2N−i+2v2N−i+2)

Υ2i = E(u2N−i+3) tm E(v2N−i+3)

Ri = E(m2N−i+2v2N−i+2) ∩ E(u2N−i+3)

= E(gcd(m2N−i+2v2N−i+2, u2N−i+3)) = E(u2N−i+3).

We can follow the proof of Case 1) to see that S1 and R1 are splitting sets of W, and Si+1

and Ri+1 are splitting sets of Wi = E(mi+1ui+1) tm E(m2N−i+2v2N−i+2) for i = 1, . . . , N − 1.

39

Furthermore, by definition, they are blue and red (resp.) splitting sets of size less than n. Since

WNblue= Γ2Nblue

and WNred= Υ1Nred

, the binomial arising from WN is

mN+1uN+1 −mN+2vN+2 = mN+1(uN+1 − vN+1) +mN+2(uN+2 − vN+2)

which is exactly case 1), which means there exists a proper splitting set of WN .

Proof. [Proof of sufficiency (⇐)] Assume every primitive monomial walk W of length

2n > 2d with supp (W) ⊂ H satisfies i) or ii). Let pW = u − v be a generator of IH which

arises from the monomial walk W on H.

To show that IH = [IH ]≤d, we proceed by induction on the degree of pW . If deg pW = 2,

then pW ∈ [IH ]≤d. So assume deg pW = n > d and every generator of IH of degree less than n

is in [IH ]≤d. Since the size of W is greater than 2d, either condition i) holds or condition ii)

holds.

Suppose i) holds. By Lemma 3.5, there exists a decomposition of W, (Γ1, S, Γ2), such that

|Γ1| < |W| and |Γ2| < |W|. Let pΓ1 = u1− v1 (pΓ2 = u2− v2, respectively) be the binomial that

arises from Γ1 (Γ2, respectively). Let m1 = u/u1 and m2 = v/v2.

What remains to be shown is that pW = m1pΓ1 + m2pΓ2 , that is, u − v = m1(u1 − v1) +

m2(u2 − v2). However, it is clear that u = m1u1 and v = m2v2, so it suffices to show is that

m1v1 = m2u2, or equivalently, E(m1v1) = E(m2u2).

40

Let s ∈ k[tei ] be the monomial such that E(s) = S. Then

Γ1 t Γ2 = (E(u1) t E(v1

s) t S) t (E(

u2

s) t S t E(v2))

and

W + S = (E(m1) t E(u1) t S) t (E(m2) t E(v2) t S).

Thus, since W + S = Γ1 t Γ2,

E(m1) t E(m2) = E(v1

s) t E(

u2

s),

which in turn implies

m1m2 = (v1

s)(u2

s).

Since W is primitive and the coloring conditions on (Γ1, S, Γ2) imply E(v1s ) ⊆ Wred and

E(m1) ⊆ Wblue, the monomials m1 and v1s are relatively prime. A similar argument shows

m2 and u2s are relatively prime. Thus, m1 = u2

s and m2 = v1s , and consequently, E(m1v1) =

E(m2u2) and pw = m1pΓ1 +m2pΓ2 .

Since deg pΓ1 , deg pΓ2 < n, the induction hypothesis applied to pΓ1 and pΓ2 shows that

pW ∈ [IH ]≤d.

41

Now suppose ii) holds. For i from 1 to N , let pΓ1i= ui − vi and pΥ2i

= yi − zi be the

binomials arising from Γ1i and Υ2i . Let wib −wir be the binomial arising from the walk Wi and

let pW = w0b− w0r . For 1 ≤ i ≤ N , let mi = w(i−1)b

/ui, and qi = w(i−1)r/zi. Then

pW =N∑i=1

mi(ui − vi) + wNb− wNr +

N∑i=1

qN+1−i(yN+1−i − zN+1−i).

The preceding claim follows from three observations: (1) by construction, w0b= m1u1 and

w0r = q1z1; (2) by the definition of WN , wNb= mNvN and wNr = qNyN ; and (3) by the

definitions of mi, qi, and the walk Wi, mivi = mi+1ui+1 and qi+1zi+1 = qiyi for 1 ≤ i ≤ N − 1.

As a consequence of the size conditions on the splitting sets of Wi, the linear combination∑Ni=1mi(ui − vi) ∈ [IH ]≤d and

∑Ni=1 qN+1−i(yN+1−i − zN+1−i) ∈ [IH ]≤d. So if WN satisfies

condition i), the binomial wNb− wNr ∈ [IH ]≤d, and thus, pW ∈ [IH ]≤d.

To finish the proof, assume that SN and RN share an edge, e. Then the claim above

becomes:

pW =N∑i=1

mi(ui − vi) + te(mNvNte

− qNyNte

) +N∑i=1

qN+1−i(yN+1−i − zN+1−i)

and we just need to show that, in fact, te divides mNvN and qNyN . But this is clear to see

since e ∈ SN which implies te|vN and e ∈ RN which implies te|yN .

Example 28 (Independence models) Let H be the complete k-partite hypergraph with d

vertices in each partition V1, . . . , Vk. These hypergraphs correspond to independence models in

42

statistics. Equivalently, the edge subring of the complete k-partite hypergraph with d vertices in

each partition parametrizes the Segre embedding of Pd × · · · × Pd with k copies.

The ideal IH is generated by quadrics. To see this, let W, supp (W) ⊆ H, be a primitive

monomial walk of length 2n, n > 2. Choose a multiset E′ ⊂ W consisting of n−1 blue and n−1

red edges. Since each edge must contain a vertex from each Vi, for each i, there is at most one

vertex in V (E′) ∩ Vi that is not covered by a red edge and a blue edge from E′. Consequently,

V (E′) contains a vertex from each Vi that belong to at least one red edge and at least one blue

edge of E′.

For a multiset of edges, M , with supp (M) ⊆ H, we define the max degree of a vertex:

maxdeg(v;M) := max(degred(v;M),degblue(v;M)).

The partitioning of the vertices ensures that V (E′) cannot contain more then k vertices whose

maxdeg with respect to E′ is n− 1. Indeed, if there are more that k vertices with maxdeg equal

to n − 1, then two of those vertices must belong to the same partition, Vj. This would imply

that W contains at least 4(n− 1) edges, which is impossible when n > 2.

Next, choose n− 1 new blue edges and n− 1 red edges in the following manner:

Let db(v) := degblue(v;E′) and dr(v) := degred(v;E′). For i = 1, . . . , k choose a vertex from

V (E′blue) ∩ V (E′red) ∩ Vi that has the largest maxdeg with respect to E′; let bn−1 and rn−1 be

this set of vertices. For all v ∈ bn−1, reduce db(v) and dr(v) by 1. Now choose b1, . . . , bn−2 by

the following algorithm:

43

for i from 1 to k do:

let Vi :=sort V (E′) ∩ Vi by db(v) in decreasing order;

for j from n− 2 down to 1 do:

(

bj := list {vi : vi is first element in Vi};

for all v ∈ bj do db(v) = db(v)− 1;

for i from 1 to k do Vi =sort Vi by db(v) in decreasing order;

).

Let R1 = {b1, . . . , bn−1} and S1 = {r1, . . . , rn−1}. Then R1 and S1 are red and blue splitting

sets of W that share an edge. Thus, condition ii) of Theorem 27 is met, and consequently IH

is generated in degree 2.

When H is a non-uniform hypergraph, the toric ideal IH is not necessarily homogeneous.

For example, Figure 4 supports a binomial in IH where H consists of edges of size two and four;

note that the edges still satisfy the balancing condition (3.1.1). However, we can still modify the

conditions of Theorem 27 to find degree bounds for the toric ideals of non-uniform hypergraphs.

Proposition 29 gives a prescription for determining a degree bound on the generators of IH in

terms of local structures of H.

Proposition 29 Given a hypergraph H and a binomial fE ∈ IH arising from the balanced edge

set E with n = |Eblue| ≥ |Ered|, fE is a linear combination of binomials in IH of degree less than

n if one of the following two conditions hold:

44

Figure 4. Example of a non-uniform hypergraph whose associated toric ideal isnon-homogeneous.

i) there exists a proper splitting set S of E with decomposition (Γ1, S, Γ2) where |Γiblue|, |Γired

| <

n for i = 1, 2,

or

ii) there is a pair of blue and red splitting sets of E, S and R, of size less than n with

decompositions (Γ1, S, Γ2), (Υ1, R, Υ2) such that |Γ1blue|, |Υ2red

| < n, |Γ2blue|, |Υ1red

| ≤ n, and

S ∩R 6= ∅.

Proof. This proof follows the proof of sufficiency for Theorem 27. Note that in the

proof, the uniform condition doesn’t play an essential role; it is only invoked to bound the size of

the red and blue parts of each monomial hypergraph appearing in the decompositions involved.

Thus, the hypothesis of Proposition 29 acts in place of the uniform condition in Theorem 27.

3.6 Hidden Subset Models

For the remainder of this section, we will concern ourselves with the first tangential variety,

Tan((P1)n). In [59], Sturmfels and Zwiernik use cumulants to give a monomial parameterization

45

of Tan((P1)n). The variety Tan((P1)n) is associated to a class of hidden subset models [59,

Example 5.2] and context-specific independence models [46]. We now derive a bound for the

toric ideal of the image of Tan((P1)n) in higher cumulants and, equivalently, for the Markov

complexity of these models.

Example 30 Let H = (V,E) where V = {1, . . . , n} and E = {e : e ⊆ V and |e| ≥ 2}. Then

the set of polynomials vanishing on the image of Tan((P1)n) in higher cumulants is the toric

ideal IH (see [59, Theorem 4.1]).

The hypergraph in Example 30 is the complete hypergraph on n vertices after removing

all singleton edges. The degree bound on the generators of this hypergraph can be found by

looking at a smaller hypergraph.

Lemma 31 Let H1 = (V,E1) where V = {1, . . . , n} and E1 = {e : e ⊆ V and |e| ≥ 2}, and

let H2 = (V,E2) where E2 = {e ⊆ V : 2 ≤ |e| ≤ 3}. If the ideal IH2 is generated in degree at

most d, then the ideal IH1 is generated in degree at most d.

Proof. Consider IH2 as an ideal in the bigger polynomial ring S := k[tei : ei ∈ H1],

denoted as IH2 := IH2S. Assume that IH2 , and consequently, IH2 , is generated in degree at

most d. Pick an arbitrary binomial

u− v = tei1tei2· · · tein

− tej1tej2· · · tejm

∈ IH1 .

46

Since every edge e ∈ H1 is the disjoint union of a collection of edges ek1 , . . . , ekl∈ H2, we may

write te −∏li=1 teki

∈ IH1 . Noting that

te −l∏

i=1

teki= (te − tek1

t∪li=2eki

)−l−2∑j=1

[(j∏i=1

teki

)(t∪l

i=j+1eki− tej+1t∪l

i=j+2eki)

],

one easily sees that the binomial te −∏li=1 teki

is generated by quadratics. In turn, this essen-

tially shows that relations in IH2 allow us to rewrite u−v in terms of edges ei1 , . . . , ein , ej1 , . . . , ejm ∈

E2 of size 2 and 3 only. The claim follows since u− v can be expressed as a binomial in IH2 .

Theorem 32 Let H = (V,E) where V = {1, . . . , n} and E = {e ⊆ V : 2 ≤ |e| ≤ 3}. The

toric ideal of H is generated by quadrics and cubics.

In particular the image of Tan((P1)n) in higher cumulants is generated in degrees 2 and 3.

In the following proof, we examine the local combinatorics of H to illustrate how the struc-

ture of a hypergraph reveals insights into the generating set of IH .

Proof. Let fE be a primitive binomial in IH with E a balanced edge set. Without loss

of generality, we will assume throughout the proof |Eblue| ≥ |Ered|. If E contains only 2-edges or

only 3-edges, then by [56, Theorem 14.1] fE is a linear combination of quadratics. So we will

assume E contains a 2-edge and a 3-edge.

Since |Eblue| ≥ |Ered|, Eblue must contain at least as many 2-edges as Ered, and in order to

satisfy (3.1.1), the difference between the number of 3-edges in Ered and the number of 3-edges

in Eblue must be a multiple of 2.

47

Notice that for every pair e1, e2 of 3-edges (where e1 and e2 do not need to be unique), there

are three 2-edges in H, e3, e4, e5, such that

{e1, e2} tm {e3, e4, e5}

is a balanced edge set. Let B2,3 ⊂ IH be the set of all binomials arising from balanced edge

sets of this form. Then fE is a linear combination of binomials in B2,3 and fE ′ , where E ′blue and

E ′red contains the same number of 2-edges and exactly one 3-edge.

Since it suffices to consider primitive binomials, we will proceed inductively by showing that

every primitive degree n binomial in

Bh := {fE ∈ IH : |Eblue| = |Ered| and Eblue, Ered contain exactly one 3-edge each}

is a linear combination of binomials in Bh with degree less than n.

Let fE ∈ Bh such that degree fE = n > 3 and fE is primitive. Let e1 be the 3-edge in Ered.

Since fE is primitive, e1 must intersect a 2-edge e2 in Eblue. Let e2 = {v1, v2} where v1 ∈ e1.

The edge e2 intersects at most one other edge of Ered besides e1. We will examine the

possible intersections of e2 and Ered in order to find splitting sets of E that satisfy one of the

conditions listed in Proposition 29. For illustrations of Case 1 and Case 3 see Figures 5 and

6. In all three cases, we will construct S, Γ1 and Γ2 such that S is a splitting set of E with an

associated decomposition (Γ1, S, Γ2) which satisfies the properties of condition i) in Theorem 29.

In fact, fE will be a linear combination of fΓ1 and fΓ2 , both of which have strictly lower degree

48

than fE . Furthermore, since the blue and red parts of Γ1 and Γ2 will contain the same number

of 2 and 3-edges, it follows that fΓ1 , fΓ2 ∈ Bh.

Case 1: The edge e1 = e2 ∪ {v3} = {v1, v2, v3} for some v3 ∈ V (E).

Since v3 /∈ e2 and |Eblue| = |Ered|, there must be a 2-edge e3 ∈ Ered such that v3 /∈ e3 in order

for (3.1.1) to hold. Let e3 = {v4, v5} and e4 = {v3, v4, v5}. The sets S, Γ1 and Γ2 in this case

are:

S = {e4}

Γ1 = (Eblue − {e2}) tm ((Ered − {e1, e3}) t {e4})

Γ2 = {e2, e4} tm {e1, e3}.

!"#!$#

!%# !&#

Figure 5. Case 1. Proof of Theorem 32.

!"#

!$##

!%#

!&#

Figure 6. Case 3. Proof of Theorem 32.

49

Case 2: The edge e1 = {v1, v3, v4} for some v3, v4 ∈ V (E) and there is a 2-edge e3 ∈ Ered

such that e3 = {v2, v3}.

Since v3 /∈ e2, degblue(v3; E) = degred(v3; E) ≤ n−1 and, thus, there exists a 2-edge e4 ∈ Ered

such that v3 /∈ e4. Let e4 = {v5, v6}.

Now let e5 = {v3, v4, v5} and e6 = {v3, v6}. The sets S, Γ1 and Γ2 in this case are:

S = {e5, e6}

Γ1 = (Eblue − {e2}) tm ((Ered − {e1, e3, e4}) t {e5, e6})

Γ2 = {e2, e5, e6} tm {e1, e3, e4}.

Case 3: There is a 2-edge e3 ∈ Ered such that v2 ∈ e3 and e2 ∩ e3 = ∅. In this case, let

e4 = (e1 − {v1}) ∪ (e3 − {v2}). The sets S, Γ1 and Γ2 in this case are:

S = {e4}

Γ1 = (Eblue − {e2}) tm ((Ered − {e1, e3}) t {e4})

Γ2 = {e2, e4} tm {e1, e3}.

CHAPTER 4

PHYLOGENETIC MODELS AND TENSORS OF BOUNDED RANK

This chapter is based on work in [27] with Shmuel Friedland.

4.1 Introduction

Let Vr(m,n, l) ⊆ Cm⊗Cn⊗Cl be the variety of tensors of border rank at most r. The variety

we will explore in this chapter is V4(4, 4, 4), the variety associated to the 4-state general Markov

model on the 3-leaf claw tree denoted M0 (see Proposition 7). Motivated by understanding

the phylogenetic invariants for the modelM0, in 2007, Elizabeth Allman posed the problem of

determining the ideal I4(4, 4, 4) generated by all polynomials vanishing on V4(4, 4, 4); this has

been coined the salmon problem [2]. The salmon conjecture from [50, Conjecture 3.24] states

that I4(4, 4, 4) is generated by polynomials of degree 5 and 9.

A first nontrivial step in characterizing V4(4, 4, 4) is to characterize V4(3, 3, 4), the variety of

all complex valued 3× 3× 4 tensors or border rank at most 4. In [39], Landsberg and Manivel

show that V4(3, 3, 4) satisfies a set of polynomial equations of degree 6 which are not in the ideal

generated by the equations of degree 5 from the original conjecture. (See also [40, Remark 5.7]

and [6]). Hence the revised version of the salmon conjecture states that I4(4, 4, 4) is generated

by polynomials of degree 5, 6 and 9 [57, §2]. This, in particular, implies the set-theoretic version

of the salmon conjecture, which we will prove in the remainder of this chapter:

50

51

Theorem 33 The variety of tensors in C4 ⊗ C4 ⊗ C4 of border rank at most 4 is cut out by

polynomials of degree 5, 6, and 9.

If the ideal generated by the degree 5, 6, and 9 polynomials described in Section 4.4 is indeed

radical, then Theorem 33 gives a concrete foundation to proving the following conjecture about

the phylogenetic complexity of M0. This conjecture is implied by the salmon conjecture [57,

§2], [6].

Conjecture 34 The phylogenetic complexity of M0, the 4-state general Markov model on the

3-leaf claw tree, is at most 9.

In [26], Friedland shows that V4(4, 4, 4) is cut out by polynomials of degree 5, 9 and 16 by

showing V4(3, 3, 4) is cut out by polynomials of degrees 9 and 16. In [39], Landsberg and Manivel

give an algorithm to construct the polynomials of degree 6, referred here as the LM-polynomials,

that vanish on V4(3, 3, 4) but are not in the ideal generated by the known polynomials of degree

5. In [6], Bates and Oeding explicitly construct a basis of the these degree 6 polynomials

which consist of ten linearly independent polynomials. Using methods from numerical algebraic

geometry, Bates and Oeding give numerical confirmation that V4(3, 3, 4) is the zero set of a set

of polynomials of degree 6 and 9 [6], where the degree 6 polynomials are the LM-polynomials.

The aim of this chapter is to show that V4(3, 3, 4) is cut out by polynomials of degree 6

and 9. This is done by showing that in Case A.I.3 of [26, Proof of Theorem 4.5] the use of

52

polynomials of degree 16 can be eliminated by use of the LM-polynomials. More precisely we

show that any 3× 3× 4 tensor X = [xi,j,k] ∈ C3×3×4 whose four frontal slices are of the form

Xk =

x1,1,k x1,2,k 0

x2,1,k x2,2,k 0

0 0 x3,3,k

, k = 1, 2, 3, 4, (4.1.1)

has border rank at most four if and only if the ten basis LM-polynomials vanish on X .

As we will see later, a tensor X ∈ C3×3×4 of the form (4.1.1) has border rank at most four

if and only if either the four matrices

x1,1,k x1,2,k

x2,1,k x2,2,k

, k = 1, 2, 3, 4 are linearly dependent

or x3,3,k = 0 for k = 1, 2, 3, 4. Note that the condition that the above four 2 × 2 matrices are

linearly dependent is equivalent to the vanishing of the polynomial

f(X ) = det

x1,1,1 x1,2,1 x2,1,1 x2,2,1

x1,1,2 x1,2,2 x2,1,2 x2,2,2

x1,1,3 x1,2,3 x2,1,3 x2,2,3

x1,1,4 x1,2,4 x2,1,4 x2,2,4

. (4.1.2)

Computer-aided symbolic calculations show that the restrictions of the ten basis LM-

polynomials to X of the form (4.1.1) are the polynomials

x3,3,kx3,3,lf(X ) for 1 ≤ k ≤ l ≤ 4. (4.1.3)

53

Hence X has a border rank at most four if and only if the ten basis LM-polynomials vanish on

X . Combining this with the results in [26] we deduce the set-theoretic version of the salmon

conjecture, Theorem 33.

We summarize briefly the content of this chapter. In Section 4.2 we restate the characteri-

zation of V4(3, 3, 4) given in [26, Theorem 4.5]. In 4.3 we show that the use of polynomials of

degree 16 in the proof of [26, Theorem 4.5] can be replaced by the use of the LM-polynomials.

In 4.4, we summarize the characterization of V4(4, 4, 4) as the zero set of polynomials of degree

5, 6 and 9.

4.2 A characterization of V4(3, 3, 4)

In order to understand the defining polynomials for V4(4, 4, 4), it is helpful to understand the

defining polynomials of V4(3, 3, 4). This is because the set of defining equations for V4(4, 4, 4)

inherits the equations of V4(3, 3, 4). The inheritance process is explained explicitly in Condition

2 of [26, Theorem 5.1] and also more generally in [39, Proposition 4.4]. Thus, in this section,

we focus on V4(3, 3, 4) and show that we can replace the degree 16 equations in [26] with degree

6 equations.

Let X = [xi,j,k]3,3,4i,j,k=1 ∈ C3×3×4 ∈ V4(3, 3, 4) be a 3 × 3 × 4 tensor with border rank less

than 4. The four frontal slices of X are denoted as the matrices Xk := Xk,3 = [xi,j,k]3i,j=1 ∈

C3×3, k = 1, 2, 3, 4. Notice that since we are only working with frontal slices, we dropped the

second index on X which indicates the type of the slice.

The following lemma is a specialization of [26, Lemma 4.2].

54

Lemma 35 [26, Lemma 4.2] Let X = [xi,j,k]3,3,4i,j,k=1 ∈ C3×3×4 have border rank at most 4. Then

there exist L,R ∈ C3 such that

LXk ⊂ S(3,C), XkR ⊂ S(3,C) k = 1, . . . , 4, (4.2.1)

LR> = R>L =tr(LR>)

3I3 (4.2.2)

where S(3,C) is the set of all complex-values 3× 3 symmetric matrices.

The following is an immediate corollary to Lemma 35.

Corollary 36 Let X ∈ V4(3, 3, 4).Then there exist nontrivial matrices L,R ∈ C3×3 \ {0} satis-

fying the conditions

LXk −X>k L> = 0, k = 1, . . . , 4, L ∈ C3×3, (4.2.3)

XkR−R>X>k = 0, k = 1, . . . , 4, R ∈ C3×3. (4.2.4)

Proof. The equations 4.2.3 and 4.2.4 are a restatement of conditions imposed by 4.2.1.

2

The equations 4.2.3, 4.2.4 are referred to as the symmetrization conditions. It is known that

for any square matrix A, the matrix A−A> is skew symmetric. Thus 4.2.3, 4.2.4 each result in 12

linear homogeneous equations in the entries of L and R respectively. Let CL(X ), CR(X ) ∈ C12×9

55

be the coefficient matrices of these respective systems. The entries of CL(X ), CR(X ) are linear

functions in the entries of X .

For a generic X ∈ V4(3, 3, 4), rank CL(X ) = rank CR(X ) = 8, hence we can express the

entries of L and R in terms of corresponding 8 × 8 minors of CL(X )CR(X ) respectively (for

details, see [26]). The following characterization of V4(3, 3, 4) from [26, Theorem 4.5] describes

defining polynomials of degree 9 and degree 16 in terms of the coefficient matrices CL(X ), CR(X )

and their minors.

Theorem 4.2.1 [26, Theorem 4.5] The tensor X = [xi,j,k]3,3,4i=j=k=1 ∈ C3×3×4 has border

rank at most 4 if and only if the following conditions hold.

1. Let Xk := [xi,j,k]3i=j=1 ∈ C3×3, k = 1, . . . , 4 be the four frontal slices of X . Then the ranks

of CL(X ), CR(X ) are less than 9. This results in degree 9 polynomial equations.

2. Let R,L be solutions of (4.2.3) and (4.2.4) respectively given by 8×8 minors of CL(X ), CR(X ),

then (4.2.2) holds. This condition results in degree 16 polynomial equations.

The proof of Theorem 4.2.1 in [26] consists of discussing a number of cases. Condition 2 from

Theorem 4.2.1 (the degree 16 polynomials) are used only in the case A.I.3. In the next section

we show how to prove the theorem in the case A.I.3 using only the ten basis LM-polynomials

of degree 6, thus replacing Condition 2 with the LM-polynomials.

4.3 Proving case A.I.3 using degree 6 polynomials

Suppose X ∈ C3×3×4 and there exist two nonzero matrices L,R ∈ C3×3 such that (4.2.3)–

(4.2.4) hold. The case A.I.3 assumes that L and R are rank one matrices and resolves the case

56

where LR> = R>L = 0 without use of the degree 16 polynomials. Therefore, to eliminate the

use of the degree 16 polynomials we need to show the following.

Claim 4.3.1 Let X ∈ C3×3×4. Let R,L ∈ C3×3 be rank one matrices satisfying the condi-

tions (4.2.3)–(4.2.4) respectively. Suppose furthermore that either LR> 6= 0 or R>L 6= 0. If

the ten LM-polynomials vanish on X then X ∈ V4(3, 3, 4).

In the rest of this section we prove Claim 4.3.1. Assume that R,L are rank one matrices.

Then there exist u,v,x,y ∈ C3 such that L = uv>, R = xy>.

Lemma 37 Let A ∈ Cn×n and u,v,x,y ∈ Cn. Then the following two statements hold:

uv>A is symmetric if and only if v>A = bu> for some b ∈ C, (4.3.1)

Axy> is symmetric if and only if Ax = cy for some c ∈ C. (4.3.2)

Proof. Both 4.3.1 and 4.3.2 are direct consequences of the fact that a matrix S ∈ Cn×n

is a rank one symmetric matrix if and only if there exists a z ∈ Cn such that S = zz>. 2

By changing bases in two copies of C3, we can assume that u = v = e3 = (0, 0, 1)>.

(Changes of bases do not affect the vanishing condition of either LR> or R>L [26].) Let

P,Q ∈ GL(3,C) such that

P>e3, Q>e3 ∈ span(e3). (4.3.3)

Then if A ∈ C3 × C3 such that e>3 A = be>3 for some b ∈ C and Ax = cy for some c ∈ C, by

Lemma 37, e3e>3 (PAQ) is symmetric. Observe next that PAQ(Q−1x)(Py)> is also symmetric.

57

Thus we need to analyze what kind of vectors can be obtained from two nonzero vectors x,y

by applying Q−1x, Py, where P,Q satisfy (4.3.3). Notice that P and Q have the zero pattern

∗ ∗ ∗

∗ ∗ ∗

0 0 ∗

. (4.3.4)

Let Q1 := Q−1. By considering the adjoint matrix of Q, we see that Q1 has the zero pattern

in 4.3.4 and satisfies the same conditions as Q and P in (4.3.3) .

Lemma 38 Let y ∈ C3 \ {0}. If e>3 y 6= 0 then there exists P ∈ GL(3,C) of the form (4.3.4)

such that Py = e3. If e>3 y = 0 then there exists P ∈ GL(3,C) of the form (4.3.4) such that

Py = e2.

Proof. Assume first that e>3 y 6= 0. Let f = (f1, 0, f3)>,g = (0, g2, g3)> ∈ C3 \ {0} such

that f>y = g>y = 0. Then f1g2 6= 0. Hence there exists P ∈ GL(3,C) of the form (4.3.4),

whose first and second rows are f>,g> respectively, such that Py = e3.

Suppose now that e>3 y = 0. Hence there exists P = P1 ⊕ [1], P1 ∈ GL(2,C) such that

Py = e2. Any such P is an element of GL(3,C). 2

Corollary 4.3.2 Let A ∈ C3×3 and assume that LA and AR are symmetric matrices for

some rank one matrices L,R ∈ C3×3. Then there exists P,Q ∈ GL(3,C) such that by replacing

58

A,L,R by A1 := PAQ,L1 := Q>LP−1, R1 = Q−1RP> we can assume L1 = e3e>3 and R1 has

one of the following 4 forms

e3e>3 , e3e>2 , e2e>3 , e2e>2 . (4.3.5)

As a result of Corollary 4.3.2, to prove Claim 4.3.1 we need to consider the first three

choices of R1 in (4.3.5) since the last choice implies LR> = R>L = 0. Furthermore, note that

by changing the first two indices in X ∈ C3×3×4 we need to consider only the first two choices

of R1 in (4.3.5).

4.3.1 The case L = R = e3e>3

In the remainder of this section we say that a tensor T ∈ Cm×n×l is essentially a tensor

T ′ = [t′i,j,k] ∈ Cm′×n′×l′ if after a change of bases in Cm,Cn,Cl the tensor T is represented by

the tensor T = [ti,j,k] ∈ Cm×n×l such that the following conditions hold. First t′i,j,k = ti,j,k for

i = 1, . . . ,m′, j = 1, . . . , n′, k = 1, . . . , l′. Second ti,j,k = 0 if ti,j,k is not a coordinate of T ′.

Clearly, rank T = rank T ′,brank T = brank T ′.

Let X1, X2, X3, X4 ∈ C3×3 be the four frontal sections of X = [xi,j,k] ∈ C3×3×4. Assume

that (4.2.3)–(4.2.4) hold. Then, since L = R = e3e>3 , each Xk has the form of (4.1.1). This is

the case discussed in [26, (4.7)].

Using Mathematica, we took the ten basis LM-polynomials available in the ancillary material

of [6, deg 6 salmon.txt] and let x1,3,k = 0, x2,3,k = 0, x3,1,k = 0, x3,2,k = 0 for k = 1, 2, 3, 4. The

resulting polynomials had 24 terms. We then factored f(X ) from these restricted polynomials.

59

This symbolic computations shows that the restriction of the ten basis LM-polynomials to

X satisfying (4.2.3)–(4.2.4) are the polynomials given in (4.1.3). Therefore, by the result of

Landsberg-Manivel [39], if X ∈ V4(3, 3, 4) then all polynomials in (4.1.3) vanish on X .

Conversely, suppose that all polynomials in (4.1.3) vanish on X . Let

Yk =

x1,1,k x1,2,k

x2,1,k x2,2,k

, k = 1, 2, 3, 4, (4.3.6)

be the projection of the four frontal sections of X given by (4.1.1) on C2×2. Then f(X ) = 0

if and only if Y1, Y2, Y3, Y4 are linearly dependent. Decompose the tensor X to a sum Y + Z.

The four frontal sections of Y are block diagonal matrices diag(Yk, 0), k = 1, 2, 3, 4 and the four

frontal sections of Z are diag(0, 0, x3,3,k), k = 1, 2, 3, 4.

Assume first that the polynomial f(X ) given by (4.1.2) vanishes in X . Since Y1, Y2, Y3, Y4

are linearly dependent, it follows the tensor Y can be viewed as a 2× 2× 3 tensor.

A particular case of [12, Theorem 3.1] tells us

brank T ≤ min(n, 2m) for any T ∈ C2×m×n where 2 ≤ m ≤ n (4.3.7)

Hence the border rank of Y is at most 3. In fact, we can say more. It is straightforward

to show that any three dimensional subspace of C2×2 is spanned by 3 rank one matrices, thus

Theorem 11 implies that rank Y ≤ 3. Now, clearly rank Z ≤ 1, therefore brank X ≤ 4 (more

precisely, rank X ≤ 4.)

60

Assume now that f(X ) 6= 0. Since the ten polynomials in (4.1.3) vanish on X it follows that

x3,3,k = 0 for k = 1, 2, 3, 4. So Z = 0. In this case X is essentially a 2 × 2 × 4 tensor. Hence,

by (4.3.7), its border rank is at most 4.

4.3.2 The case L = e3e>3 , R = e3e>2

Let X1, X2, X3, X4 ∈ C3×3 be the four frontal sections of X = [xi,j,k] ∈ C3×3×4. Assume

that (4.2.3)–(4.2.4) hold. This means that our tensor X = [xi,j,k] ∈ C3×3×4 has the following

zero entries x1,3,k = x3,1,k = x3,2,k = x3,3,k = 0 for k = 1, 2, 3, 4. So our tensor is essentially a

2× 3× 4 tensor and hence, by (4.3.7), its border rank is at most 4.

4.4 The defining polynomials of V4(4, 4, 4)

In this section we state for the reader’s convenience the defining equations of V4(4, 4, 4). We

briefly repeat the arguments in [26] by replacing the degree 16 polynomial equations with the

degree 6 polynomial equations. Let X = [xi1,i2,i3 ] ∈ C4×4×4. For each l ∈ {1, 2, 3} we fix il

while we let ip, iq = 1, 2, 3, 4 where {p, q} = {1, 2, 3} \ {l}. In this way we obtain four l-sections

X1,l, . . . , X4,l ∈ C4×4. (Note that Xk,3 = [xi,j,k]4i=j=1, k = 1, 2, 3, 4 are the four frontal sections

of X .) Denote by Xl =span(X1,l, . . . , X4,l) ⊂ C4×4 the l-section subspace corresponding to X .

For each l ∈ {1, 2, 3} we define the following linear subspaces of polynomials of degrees 5, 6, 9

respectively in the entries of X . The defining polynomials could be any basis in each of these

linear subspaces.

We first describe the Strassen commutative conditions [55]. These conditions where redis-

covered independently in [1]. Take U1, U2, U3 ∈ Xl. View Ui =∑4

j=1 uj,iXj,l for i = 1, 2, 3. So

61

the entries of each Xj,l are fixed scalars and uj,i, i = 1, 2, 3, j = 1, 2, 3, 4 are viewed as variables.

Let adj U2 be the adjoint matrix of U2. Then the Strassen commutative conditions are

U1(adj U2)U3 − U3(adj U2)U1 = 0.

Since the values of uj,i, i = 1, 2, 3, j = 1, 2, 3, 4 are arbitrary, we regroup the above condition

for each entry as a polynomial in uj,i. The coefficient of each monomial in the uj,i variables

is a polynomial of degree 5 in the entries of X and must be equal to zero. The set of all such

polynomials of degree 5 span a linear subspace, and we can choose any basis in this subspace.

The degree 6 and 9 polynomial conditions are obtained in a a slightly different way. Let

P = [pij ], Q = [qij ] ∈ C4×4 be matrices with entries viewed as variables. View PXk,lQ, k =

1, 2, 3, 4 as the four frontal slices of the 4× 4× 4 tensor X (P,Q, l) = [xi,j,k(P,Q, l)]4i,j,k=1.

Let Y = [xi,j,k(P,Q, l)]3,3,4i,j,k=1. Now Y must satisfy the degree 6 polynomial conditions of

Landsberg-Manivel and the degree 9 symmetrization conditions. Since the entries of P,Q are

variables, this means that the coefficients of the monomials in the variables pij , qij , i, j = 1, 2, 3, 4

must vanish identically. This procedure gives rise to 10 polynomial conditions of degree 6 [39],

which are linearly independent, and 440 polynomial conditions of degree 9 [26], which may

be linearly dependent. Using appropriate software one may reduce the number of linearly

independent conditions of degree 9.

The zero set of the above polynomials of degrees 5, 6 and 9 defines V4(4, 4, 4). Thus, Theorem

33 holds.

CHAPTER 5

MAXIMUM LIKELIHOOD DEGREE OF VARIANCE COMPONENT

MODELS

This chapter is based on work from [29] with Mathias Drton and Sonja Petrovic.

5.1 Introduction

Linear models with fixed and random effects are widely used for dependent observations.

Such mixed models are typically fit using likelihood-based techniques, and the necessary opti-

mization problems can be solved using the numerical methods implemented in various statistical

software packages, as discussed, for instance, in [23]. Such software typically takes into account

that the variance parameters are nonnegative. However, general-purpose optimization proce-

dures do not give any guarantees that a global optimum is found (see Section 1.8 in [36]). It can

thus be appealing to compute maximum likelihood (ML) estimates algebraically. Linear mixed

models have rational likelihood equations, which can be solved using either symbolic methods

as in [22, Chap. 2] or with numerical solvers such as PHCPack [64]. While solving likelihood

equations algebraically may not be feasible in large models with several random factors, modern

computational algebra does allow one to fully understand the likelihood surface in practically

relevant settings.

62

63

The main contribution of this chapter is a study of the algebraic complexity of ML estimation

in the unbalanced one-way layout with random effects. This model concerns a collection of

grouped observations

Yij = µ+ αi + εij , i = 1, . . . , q, j = 1, . . . , ni. (5.1.1)

The overall mean µ ∈ R is a fixed (‘non-random’) but unknown parameter. The random effects

αi and the error terms εij are mutually independent normal random variables. More precisely,

αi ∼ N (0, τ) and εij ∼ N (0, ω), where τ and ω denote the common variances of the random

effects and the error terms, respectively. The distribution of observation Yij is N (µ, τ +ω), and

two observations Yij and Yik from the ith group are dependent with covariance τ . A detailed

discussion and examples of applications of this specific model can be found in Chapter 3 of [54]

and in Chapter 11 of [53].

The covariance matrix of the joint multivariate normal distribution for all Yij defined by

(5.1.1) is the product of the scalar ω and a matrix that is a function of the variance ratio

θ = τ/ω. Therefore, when θ is known, the likelihood equations for µ and ω are of the type

encountered in generalized least squares calculations, with a unique solution that is a rational

function of the data and the known value of θ. We may thus eliminate µ and ω from the

likelihood equations, which then reduce to a single univariate equation. Before turning to a

first example, we remark that we always tacitly assume suitable sample size conditions to be

satisfied such that ML estimates exist. In particular, we assume there to be q ≥ 2 groups with

64

at least one group of size ni ≥ 2. A definitive answer to the existence problem in linear mixed

models is given in [17] who also treat restricted maximum likelihood (REML) estimation; see

[43] for an introduction to this technique.

Example 5.1.1 Textbook data from [15, §6.4] give the yield of dyestuff from 5 different

preparations from each of q = 6 different batches of an intermediate product; the data are also

available in the R package lme4. The layout is balanced, that is, all batch sizes are equal, here

ni = 5. In this case, the likelihood equations are well-known to be equivalent to a linear equation

system and have ML degree one. The REML degree is also one.

A different picture emerges in the unbalanced case, when the batch sizes are not all equal.

For illustration, we remove the first, second and sixth observation from the data. The first batch

then only comprises n1 = 3 preparations, and the second batch only n2 = 4. The remaining

batches are unchanged with ni = 5 for i ≥ 3. In this unbalanced case, the solutions of the

likelihood equations correspond to the solutions of the polynomial equation

− 245488320000 θ7 − 277109078400 θ6 − 58814614680 θ5 + 54052612853 θ4

+ 37792395524 θ3 + 10086075110 θ2 + 1279832076 θ + 64175517 = 0. (5.1.2)

Thus, the ML degree is 7.

Numerical optimization using the R package lme4 yields a local maximum of the likelihood

function that corresponds to θ ≈ 0.5585. We may check whether this local maximum is unique,

65

or at least a global optimum, by finding all roots of the above univariate polynomial. This is a

task that can be done reliably in computer algebra systems.

It is natural to ask for the ML degree of the one-way layout as a function of the number of

groups q and the group sizes n1, . . . , nq. Our main result answers this question. Theorem 39

gives formulas for both the ML and the REML degree of one-way layouts and offers a direct

comparison of the algebraic complexity of the two approaches. Its proof is given in later sections.

As explained in the paragraph before Example 5.1.1, we may reparametrize the model using the

ratio θ = τ/ω and eliminate the two parameters µ and ω from the likelihood equations. This

gives a single rational equation in θ. By carefully clearing terms from the numerator and the

denominator appearing in the rational equation, our proof produces a polynomial in θ whose

roots correspond to the solutions of the rational equation. The degree of this polynomial is the

ML/REML degree; recall Example 5.1.1.

Our theorem is stated using a notion of multiplicities. Suppose v = (v1, . . . , vq) ∈ Zq is

a tuple of integers. If v has M distinct entries, then the multiplicities of v form the integer

multiset {m1, . . . ,mM}, where mj counts how often the jth distinct entry of v appears among

all entries of v.

Theorem 39 Consider a one-way layout with random effects for q groups that are of sizes

n1, . . . , nq. Suppose M of the group sizes are distinct, with associated multiplicities m1, . . . ,mM .

Let M2 = #{j : mj ≥ 2}. Then the ML degree is 3M + M2 − 3, and the REML degree is

2M + 2M2 − 3. The ML degree exceeds the REML degree unless M2 = M , in which case

equality holds.

66

The condition M2 = M holds if each group size appears at least twice. In the balanced

case, we have M = M2 = 1 and the theorem recovers the well-known fact that both degrees are

one; compare [32; 54; 52]. Each degree is maximal when the group sizes n1, . . . , nq are pairwise

distinct. The degrees are then 3q − 3 for ML and 2q − 3 for REML.

Example 5.1.2 The model for the dyestuff data from Example 5.1.1 has q = 6 groups. The

unbalanced case we considered had group sizes (n1, . . . , n6) = (3, 4, 5, 5, 5, 5). The multiplicities

are {1, 1, 4}. Our formula confirms the ML degree to be 3 · 3 + 1− 3 = 7. As another example,

if (n1, . . . , n6) = (4, 4, 3, 2, 2, 2), then the ML degree is 8 and the REML degree is 7.

The remainder of the chapter is structured as follows. In Section 5.2, we review the deriva-

tion of the likelihood equations for ML and REML estimation. Section 5.3 contains the proof

of the ML degree formula from Theorem 39, and Section 5.4 treats the REML degree. Each

proof consists of a detailed study of a univariate rational equation in the variance ratio θ. We

end this chapter with Section 5.5, which gives two examples of unbalanced one-way random

effects models with bimodal likelihood functions.

5.2 The likelihood equations

Let n1, . . . , nM be unique group sizes with associated multiplicities m1, . . . ,mM . Let Yij =

(Yij1, . . . , Yijni) be the vector comprising the observations in the jth group of size ni. Then

the model for the one-way layout given by (5.1.1) can equivalently be described as stating that

Y11, . . . , Y1m1 , Y21, . . . , YMmMare independent multivariate normal random vectors with

Yij ∼ N (µ1ni ,Σni(ω, τ)) ,

67

where the covariance matrix is

Σni(ω, τ) = ωIni + τ1ni1Tni.

Here, 1n = (1, . . . , 1)T ∈ Rn, and In is the n× n identity matrix.

5.2.1 Maximum likelihood

Ignoring additive constants and multiplying by two, the log-likelihood function of the one-

way model is

`(µ, ω, τ) =M∑i=1

mi∑j=1

log det (Kni(ω, τ))− (Yij − µ1ni)TKni(ω, τ)(Yij − µ1ni), (5.2.1)

where

Kni(ω, τ) =1ωIni −

τ

ω(ω + niτ)1ni1

Tni. (5.2.2)

is the inverse of Σni(ω, τ). The inverse has determinant

det(Kni(ω, τ)) =1

ωni−1(ω + niτ). (5.2.3)

Let N = m1n1 + · · ·+mMnM be the total number of observations. For each i = 1, . . . ,M ,

define the group averages

Yij =1ni

ni∑k=1

Yijk, j = 1, . . . ,mi,

68

and the average across the groups of equal size

Yi =1mi

mi∑j=1

Yij .

From the averages, compute the between-group sum of squares

Bi =mi∑j=1

(Yij − Yi)2.

Note that, for generic data, Bj = 0 if and only if mj = 1. Therefore, it suffices to consider the

sums of squares Bi with mi ≥ 2. Finally, define the within-group sum of squares

W =M∑i=1

mi∑j=1

ni∑k=1

(Yijk − Yij)2,

which is positive for generic data.

Proposition 40 Upon the substitution κ = 1/ω and θ = τ/ω, the log-likelihood function for

the one-way layout can be written as

`(µ, κ, θ) = N log(κ)− κW −

[M∑i=1

mi log(1 + niθ)

]− κ

[M∑i=1

ni1 + niθ

Bi

]

− κ

[M∑i=1

mini1 + niθ

(Yi − µ)2

]. (5.2.4)

69

Proof. Applying (5.2.2), the quadratic form in (5.2.1) can be expanded into

(Yij − µ1ni)TKni(ω, τ)(Yij − µ1ni)

=1ω

ni∑k=1

(Yijk − µ)2 − τ

ω(ω + niτ)[(Yij − µ1ni)

T1ni

]2=

1ω

ni∑k=1

(Yijk − Yij)2 +niω

(Yij − µ)2 − τ

ω(ω + niτ)n2i (Yij − µ)2

= κni

1 + niθ(Yij − µ)2 + κ

ni∑k=1

(Yijk − Yij)2.

Using this expression and (5.2.3), the log-likelihood function is seen to be equal to

`(µ, κ, θ) = N log(κ)− κW −

[M∑i=1

mi log(1 + niθ)

]− κ

M∑i=1

mi∑j=1

ni1 + niθ

(Yij − µ)2

. (5.2.5)

The claimed form of `(µ, κ, θ) is now obtained by expanding the last sum as

mi∑j=1

ni1 + niθ

(Yij − µ)2 (5.2.6)

=ni

1 + niθ

mi∑j=1

[(Yij − Yi)2 + (Yi − µ)2 + 2(Yij − Yi)(Yi − µ)

](5.2.7)

=mini

1 + niθ(Yi − µ)2 +

ni1 + niθ

Bi. (5.2.8)

2

70

The partial derivatives of the log-likelihood function from Proposition 40 are

∂`

∂µ= 2κ

M∑i=1

mini1 + niθ

(Yi − µ), (5.2.9)

∂`

∂κ=N

κ−

[W +

M∑i=1

mini1 + niθ

(Yi − µ)2 +M∑i=1

ni(1 + niθ)

Bi

], (5.2.10)

∂`

∂θ= −

[M∑i=1

mini1 + niθ

]+ κ

[M∑i=1

min2i

(1 + niθ)2(Yi − µ)2 +

M∑i=1

n2i

(1 + niθ)2Bi

]. (5.2.11)

Since N 6= 0, the equation system obtained by setting the three partials to zero has the same

solution set as the equation system

M∑i=1

mini1 + niθ

(Yi − µ) = 0, (5.2.12)

N − κ

[W +

M∑i=1

mini1 + niθ

(Yi − µ)2 +M∑i=1

ni1 + niθ

Bi

]= 0, (5.2.13)

κ

[M∑i=1

min2i

(1 + niθ)2(Yi − µ)2 +

M∑i=1

n2i

(1 + niθ)2Bi

]−

[M∑i=1

mini1 + niθ

]= 0. (5.2.14)

Now we can solve equation (5.2.12) for µ, substitute the result into equation (5.2.13) and

solve for κ. Both µ and κ are then expressed in terms of θ. Substituting the expressions into

(5.2.14), we obtain a univariate rational equation in θ. Our proof of the ML degree formula in

Theorem 39 proceeds by cancelling terms from the numerator and denominator of this rational

expression. This is the topic of Section 5.3.

71

5.2.2 Restricted maximum likelihood

The REML method uses a slightly different likelihood function that is obtained by con-

sidering a projection of the observed random array (Yijk) ∈ RN . The mean of this array has

all entries equal to µ. In other words, it is modelled to lie in the space L ⊂ RN spanned by

the array with all entries equal to one. The likelihood function used in REML is obtained by

taking the observation to be the projection of (Yijk) onto the orthogonal complement of L. The

distribution of the projection no longer depends on µ and so the REML function only has (τ, ω)

or, equivalently, (κ, θ) as arguments.

Using the formulas given, for instance, in [43], and simplifying the resulting expressions

similar to what was done in the proof of Proposition 40, we obtain the following expression for

the restricted log-likelihood function.

Proposition 41 Upon the substitution κ = 1/ω and θ = τ/ω, the restricted log-likelihood

function for the one-way layout can be written as

¯(κ, θ) = (N − 1) log(κ)− κW −

[M∑i=1

mi log(1 + niθ)

]

− log

(M∑i=1

mini1 + niθ

)− κ

[M∑i=1

ni1 + niθ

Bi

]− κ

[M∑i=1

mini1 + niθ

(Yi − µ(θ))2

], (5.2.15)

with

µ(θ) =

∑Mi=1

∑mij=1

ni1+niθ

Yij∑Mi=1

∑mij=1

ni1+niθ

=

∑Mi=1

mini1+niθ

Yi∑Mi=1

mini1+niθ

. (5.2.16)

72

Note that µ(θ) is the solution to the equation in (5.2.12). Computing µ(θ) is the standard

way to obtain an estimate of µ from a REML estimate of θ.

The partial derivatives of the restricted log-likelihood function from Proposition 41 are

∂ ¯

∂κ=N − 1κ−

[W +

M∑i=1

mini1 + niθ

(Yi − µ(θ))2 +M∑

1=1

ni1 + niθ

Bi

], (5.2.17)

∂ ¯

∂θ= −

[M∑i=1

mini1 + niθ

]+

∑Mi=1

min2i

(1+niθ)2∑Mi=1

mini(1+niθ)

(5.2.18)

+ κ

[M∑i=1

min2i

(1 + niθ)2(Yi − µ(θ))2 +

M∑i=1

n2i

(1 + niθ)2Bi

].

The equation ∂ ¯/∂κ = 0 is easily solved. Substituting the unique solution κ(θ) into the equation

∂ ¯/∂θ = 0 yields again a univariate rational equation in θ. The proof of the REML degree

formula in Theorem 39 requires studying cancellations from the numerator and denominator of

this equation, which is the topic of Section 5.4.

5.3 Proof of formula for ML degree

Our proof of the ML degree formula in Theorem 39 proceeds in two steps. First, in Lemma 42

we derive a univariate rational equation whose number of zeros is the ML degree of the model.

Second, we simplify it in Lemmas 43 and 44 by clearing common factors from the numerator

and the denominator.

73

Fix the following notation, used throughout. For a vector a = (a1, . . . , aM ) ∈ RM , define

the rational functions

ra(θ) =M∑i=1

mini1 + niθ

ai and sa(θ) =M∑i=1

min2i

(1 + niθ)2ai.

We write r1, rB/m, rY , rY 2 for the functions ra that have

a = 1M , a =(B1

m1, . . . ,

BMmM

), a = (Y1, . . . , YM ), a = (Y 2

1 , . . . , Y2M ),

respectively. It is clear from Section 5.2 that forming a common denominator for the rational

equations to be studied involves the product

d(θ) =M∏i=1

(1 + niθ) = d1(θ)d2(θ),

where

d1(θ) =∏

{i:mi=1}

(1 + niθ), d2(θ) =∏

{i:mi≥2}

(1 + niθ).

For a vector a ∈ RM , define the degree M − 1 polynomial

fa(θ) = d(θ)ra(θ) =M∑i=1

miniai∏j 6=i

(1 + njθ)

74

and the degree 2(M − 1) polynomial

ga(θ) = d(θ)2sa(θ) =M∑i=1

min2i ai∏j 6=i

(1 + njθ)2.

Lemma 42 The ML degree of the one-way layout is the degree of the numerator created when

cancelling all common factors from numerator and denominator of the following rational func-

tion in θ:

1Nd(θ)2f1(θ)2

×(N[f1(θ)2gY 2(θ)− 2fY (θ)f1(θ)gY (θ) + fY (θ)2g1(θ) + f1(θ)2gB/m(θ)

]−f1(θ)2

[Wf1(θ)d(θ) + fY 2(θ)f1(θ)− fY (θ)2 + f1(θ)fB/m(θ)

]). (5.3.1)

Proof. Adopting the notation above, the solution of the first of the likelihood equations

in (5.2.12) can be written as

µ(θ) =rY (θ)r1(θ)

. (5.3.2)

Next, rewrite the following term from the system of the three critical equations:

M∑i=1

mini1 + niθ

(Yi − µ(θ))2 = rY 2(θ)− 2rY (θ)r1(θ)

M∑i=1

mini1 + niθ

Yi +rY (θ)2

r1(θ)2

M∑i=1

mini1 + niθ

(5.3.3)

= rY 2(θ)− rY (θ)2

r1(θ).

75

Solving the second equation in (5.2.13) with µ = µ(θ) for κ thus gives

κ(θ) =N

W + rY 2(θ) + rB/m(θ)− rY (θ)2

r1(θ)

(5.3.4)

=Nr1(θ)

Wr1(θ) + rY 2(θ)r1(θ) + r1(θ)rB/m(θ)− rY (θ)2. (5.3.5)

Substituting µ(θ) and κ(θ) into the third and last equation in (5.2.14), we obtain the

univariate rational equation

sY 2(θ)− 2rY (θ)r1(θ)

sY (θ) +rY (θ)2

r1(θ)2s1(θ) + sB/m(θ)− r1(θ)

κ(θ)= 0, (5.3.6)

where we have divided by the non-zero rational expression κ(θ). According to (5.3.4), this is

sY 2(θ)− 2rY (θ)r1(θ)

sY (θ) +rY (θ)2

r1(θ)2s1(θ) + sB/m

−Wr1(θ) + rY 2(θ)r1(θ) + r1(θ)rB/m − rY (θ)2

N= 0. (5.3.7)

Reexpress (5.3.6) in terms of the f and g polynomials as

gY 2(θ)d(θ)2

− 2fY (θ)f1(θ)

gY (θ)d(θ)2

+fY (θ)2

f1(θ)2

g1(θ)d(θ)2

+gB/m

d(θ)2

−Wf1(θ)d(θ) + fY 2(θ)f1(θ) + f1(θ)fB/m − fY (θ)2

Nd(θ)2= 0. (5.3.8)

Forming a common denominator we obtain the rational function from (5.3.1). The number of

complex solutions to the likelihood equations and the number of complex roots of (5.3.1) agree.

76

Thus, the ML degree of the one-way layout is the number of complex solutions of (5.3.1), or,

equivalently, the degree of the numerator in (5.3.1) after canceling common factors from the

numerator and denominator. 2

The numerator given in (5.3.1) in Lemma 42 has degree 3(M−1)+M = 4M−3; the highest

degree term involves the within-group sum of squares W . The denominator in (5.3.1) has degree

2M + 2(M − 1) = 4M − 2. The next two lemmas imply that, after cancelling common factors,

the numerator of the univariate rational function from Lemma 42 has the degree claimed in the

ML degree formula from Theorem 39.

Lemma 43 If mt = 1, then (1 + ntθ) divides the numerator of the rational equation (5.3.1).

Hence, the polynomial d1(θ) of degree M −M2 divides this numerator.

Lemma 44 If d1(θ) is cleared from both the numerator and the denominator of the rational

function given in (5.3.1), then the new numerator and denominator are relatively prime for

generic sufficient statistics Y1, . . . , YM , W , and Bj with mj ≥ 2.

Proof. [Proof of Lemma 43] Let mt = 1. To show that (1 +ntθ) divides the numerator,

it is sufficient to show that (1 + ntθ) divides the sum of

N[f1(θ)2gY 2(θ)− 2fY (θ)f1(θ)gY (θ) + fY (θ)2g1(θ) + f1(θ)2gB/m(θ)

](5.3.9)

and

− f1(θ)2[fY 2(θ)f1(θ)− fY (θ)2 + f1(θ)fB/m(θ)]. (5.3.10)

77

The product f1(θ)2gY 2(θ) in the first term of (5.3.9) may be rewritten as

M∑i=1

mini∏j 6=i

(1 + njθ)

M∑k=1

mknk∏l 6=k

(1 + nlθ)

M∑r=1

mrn2rY

2r

∏s 6=r

(1 + nsθ)2

=

M∑i=1

M∑k=1

M∑r=1

mimkmrninkn2rY

2r

∏j 6=i

(1 + njθ)∏l 6=k

(1 + nlθ)∏s 6=r

(1 + nsθ)2.

Combining this expression with the analogous expansions of the other three terms shows that

the polynomial in (5.3.9) is equal to N times

M∑i=1

M∑k=1

M∑r=1

[(mrY

2r − 2mrYiYr +mrYiYk +Br)mimkninkn

2r

×∏j 6=i

(1 + njθ)∏l 6=k

(1 + nlθ)∏s6=r

(1 + nsθ)2

]. (5.3.11)

The polynomial in (5.3.10) can be expanded similarly. We find

fY 2(θ)f1(θ)− fY (θ)2 + f1(θ)fB/m(θ) (5.3.12)

=M∑i=1

M∑k=1

(mkY2i −mkYiYk +Bk)minink

∏j 6=i

(1 + njθ)∏l 6=k

(1 + nlθ).

Expanding f1(θ)2 as well, we obtain that the polynomial in (5.3.10) is equal to

−M∑i=1

M∑k=1

M∑r=1

M∑u=1

[(mkY

2i −mkYiYk +Bk)mimrmuninknrnu

∏j 6=i

(1 + njθ)∏l 6=k

(1 + nlθ)∏s 6=r

(1 + nsθ)∏v 6=u

(1 + nvθ)]. (5.3.13)

78

Now notice that (1+ntθ) divides every summand in (5.3.11) and (5.3.13) unless i = k = r = t

in the first summation, or i = k = r = u = t in the second summation. So it suffices to

only consider these ‘diagonal’ terms. However, under the equality of indices, the quadratic

expressions in the averages Yi cancel. Hence, the terms missing a factor of (1 + ntθ) in (5.3.9)

and (5.3.10) sum to

Btntm2tn

4t (N −mt)

∏j 6=t

(1 + njθ)4. (5.3.14)

Throughout the arguments, we assume that we have at least two groups with at least one group

size ni ≥ 2. Moreover, for generic data, Bt = 0 if and only if mt = 1. Hence, for generic data,

the expression in (5.3.14) is zero if and only if mt = 1. We conclude that d1(θ) divides the

numerator of the rational function in (5.3.1). 2

Note that the last part of the above proof shows not only that d1(θ) divides the numerator

of (5.3.1), but that (1+ntθ) does not divide the numerator when Bt 6= 0, which holds generically

if mt ≥ 2.

Proof. [Proof of Lemma 44] Clearing d1(θ) from the denominator in (5.3.1) yields

the polynomial Nd2(θ)d(θ)f1(θ)2. From the preceding comment, we know that d2(θ) and the

numerator are relatively prime for generic data Y1, . . . , YM , W > 0, and Bj > 0 with mj ≥ 2.

To establish our claim, we will first show that f1(θ) does not share a common factor with the

numerator by showing that f1(θ) and fY (θ)2g1(θ) are relatively prime; all terms other than

79

fY (θ)2g1(θ) in the numerator of (5.3.1) are multiples of f1(θ). Then, we will show that after

clearing d1(θ) in (5.3.1), d1(θ) and the new numerator are relatively prime.

Let θ1, . . . , θM−1 be the (possibly complex) roots of the degree M −1 polynomial f1(θ). For

each 1 ≤ k ≤M − 1, consider the linear form fY (θk) in the polynomial ring C[Y1, . . . , YM ]. Let

V (fY (θk)) ⊂ CM be the zero locus of fY (θk). Each set V (fY (θk)) is a hyperplane of dimension

M − 1. Thus, the union ∪M−1k=1 V (fY (θk)) is an M − 1 dimensional algebraic subset of CM . A

generic vector of group means (Y1, . . . , YM ) lies outside this lower-dimensional set, which means

that f1(θ) and fY (θ) are relatively prime for generic data.

To show that f1(θ) and g1(θ) are relatively prime, assume θ0 = a+ ib is a root of f1(θ) and

g1(θ). Since g1(θ) is a sum of squares that is positive on R, we must have θ0 /∈ R and hence

b 6= 0. Without loss of generality, let n1 be the least of the group sizes ni. Rewriting f1(θ0) = 0,

we get

n1 = −∑M

i=2mini∏j 6=i(1 + njθ0)

m1∏j 6=1(1 + njθ0)

= −M∑i=2

mini(1 + n1θ0)m1(1 + niθ0)

. (5.3.15)

The imaginary part of the right side of this equation must equal 0 since n1 is an integer.

Substituting a+ ib for θ0, the imaginary part of (5.3.15) is

bM∑i=2

(minim1

)(ni − n1)

(1 + nia)2 + (nib)2.

Since each term in the sum is positive, we obtain that b = 0. Consequently, θ0 ∈ R, which is a

contradiction. Therefore, f1(θ) and g1(θ) are relatively prime.

80

It remains to show that the numerator and denominator obtained by clearing the factor

d1(θ) in (5.3.1) are relatively prime for generic data. We claim that if mt = 1 then (1 + ntθ)

divides

f1(θ)2gY 2(θ)− 2fY (θ)f1(θ)gY (θ) + fY (θ)2g1(θ)f1(θ)2gB/m(θ)d1(θ)

, (5.3.16)

while d1(θ) and

Wf1(θ)d2(θ) +fY 2(θ)f1(θ)− fY (θ)2 + f1(θ)fB/m(θ)

d1(θ)=: Wf1(θ)d2(θ) + F (θ) (5.3.17)

are relatively prime for generic data.

The ratio in (5.3.16) equals (5.3.11) divided by d1(θ). We may rewrite (5.3.11) as

M∑i=1

M∑k=1

M∑r=1

[(mrY

2r −mrYiYr −mrYkYr +mrYiYk +Br)mimkninkn

2r

×∏j 6=i

(1 + njθ)∏l 6=k

(1 + nlθ)∏s 6=r

(1 + nsθ)2

]. (5.3.18)

It is clear that the square (1 + ntθ)2 divides all terms in the sum (5.3.18) except those for

r = i = t or r = k = t. However, the quadratic form in the averages Yi vanishes if r = i or

r = k. Since the terms in question have r = t, and Br = Bt = 0 because mt = 1, we conclude

that (1 + ntθ)2 divides the entire sum (5.3.18), which proves that d1(θ) divides the ratio in

(5.3.16).

We are left to show that d1(θ) and Wf1(θ)d2(θ) + F (θ) are relatively prime for generic

data. Let θ1, . . . , θM−M2 be the roots of d1(θ); each root is equal to −1/ni for some index i.

81

Since the ni are distinct, no root of d1(θ) is a root of d2(θ). Moreover, it is easy to see that no

root of d1(θ) is a root of f1(θ). Now let I be the ideal generated by the M −M2 polynomials

Wf1(θk)d2(θk)+F (θk) in the polynomial ring C[W, Y1, . . . YM , B′1, . . . B

′M2

], where the B′i stand

for the between-group sums of squares Bi with multiplicity mi ≥ 2. Pick sufficient statistics

W = Y1 = · · · = YM 6= 0 and B′1 = · · · = B′M = 0. Since no root of d1(θ) is a root of d2(θ) or

f1(θ), (5.3.12) implies that for these special data Wf1(θk)d2(θk) + F (θk) 6= 0 for each k. The

zero locus V (I) is thus a proper algebraic subset of CM+M2+1. Such a set is of lower dimension

and, thus, d1(θ) and Wf1(θ)d2(θ) + F (θ) are relatively prime for generic data. 2

5.4 Proof of formula for REML degree

For the proof of the REML degree formula in Theorem 39, we proceed in the same way

as for the ML degree. We begin by deriving the univariate rational function whose number of

roots is the REML degree.

Lemma 45 Consider the rational function whose numerator is

(g1(θ)− f1(θ)2)[Wf1(θ)d(θ) + fY 2(θ)f1(θ)− fY (θ)2 + f1(θ)fB/m] (5.4.1)

+ (N − 1)[f1(θ)2gY 2(θ)− 2fY (θ)f1(θ)gY (θ) + fY (θ)2g1(θ) + f1(θ)2gB/m(θ)

]

and denominator is

d(θ)f1(θ)[Wf1(θ)d(θ) + fY 2(θ)f1(θ)− fY (θ)2 + f1(θ)fB/m

]. (5.4.2)

82

The REML degree is the degree of the numerator of this rational function after clearing common

factors from the given numerator and denominator.

Proof. The equation ∂ ¯/∂κ = 0 has the unique solution

κ(θ) =N − 1

W +∑M

i=1mini1+niθ

(Yi − µ(θ))2 +∑M

i=1ni

1+niθBi

;

compare (5.2.17). Substituting κ(θ) into the partial derivative ∂ ¯/∂θ yields the univariate

function

−M∑i=1

mini1 + niθ

+

∑Mi=1

min2i

(1+niθ)2∑Mi=1

mini1+niθ

(5.4.3)

+ κ(θ)

[M∑i=1

min2i

(1 + niθ)2(Yi − µ(θ))2 +

M∑i=1

n2i

(1 + niθ)2Bi

]= 0;

recall (5.2.18). We can now simplify and rewrite (5.4.3), forming a common denominator, to

obtain the desired rational function. 2

The degree of the numerator in Lemma 45 is 4M − 3 and the degree of the denominator is

4M − 2. The numerator shares common factors with the denominator. In fact, in the proof of

Lemma 43, we have shown that d1(θ) divides fY 2(θ)f1(θ)− fY (θ)2 + f1(θ)fB/m. Thus, d1(θ)2,

whose degree is 2M − 2M2, divides the denominator from Lemma 45. To prove Theorem 39, it

remains to prove the following two facts.

Lemma 46 The polynomial d1(θ)2 divides the numerator (5.4.1).

83

Lemma 47 After clearing d1(θ)2 from (5.4.1) and (5.4.2), the new numerator and new de-

nominator are relatively prime for generic data.

Proof. [Proof of Lemma 46] From the proof of Lemma 43, we know that d1(θ) divides the

polynomial fY 2(θ)f1(θ)− fY (θ)2 + f1(θ)fB/m. Moreover, as shown in the proof of Lemma 44,

the square d1(θ)2 divides

f1(θ)2gY 2(θ)− 2f1(θ)fY (θ)gY (θ) + fY (θ)2g1(θ) + f1(θ)2gB/m(θ).

To complete the proof of the present lemma, it suffices to show that d1(θ) divides g1(θ)−f1(θ)2.

However, with some distributing and grouping, we see

g1(θ)− f1(θ)2

=M∑i=1

min2i

∏j 6=i

(1 + njθ)2 −M∑i=1

M∑k=1

mimknink∏j 6=i

(1 + njθ)∏l 6=k

(1 + njθ)

=M∑i=1

(mi −m2i )∏j 6=i

(1 + njθ)−M∑i=1

M∑k>i

2nink∏j 6=i

(1 + njθ)∏l 6=k

(1 + njθ),

which is divisible by (1 + ntθ) if and only if mt = 1. 2

84

Proof. [Proof of Lemma 47] We first show that if mt ≥ 2, then, for generic data, (1+ntθ)

and the numerator from (5.4.1) are relatively prime. Consider

(g1(θ)− f1(θ)2)[fY 2(θ)f1(θ)− fY (θ)2 + f1(θ)fB/m(θ)

]+ (N − 1)[f1(θ)2gY 2(θ)− 2fY (θ)f1(θ)gY (θ) + fY (θ)2g1(θ) + gB/m(θ)f1(θ)2]. (5.4.4)

Using the results from the proof of Lemma 43 and writing out the involved summations, (5.4.4)

is seen to be equal to

(M∑i=1

M∑k=1

M∑r=1

(mkY2i −mkYiYk +Bk)mimrninkn

2r

∏j 6=i

(1 + njθ)∏l 6=k

(1 + nlθ)∏s 6=r

(1 + nsθ)2

−

(M∑i=1

M∑k=1

M∑r=1

M∑u=1

(mkY2i −mkYiYk +Bk)mimrmuninknrnu

∏j 6=i

(1 + njθ)∏l 6=k

(1 + nlθ)∏s 6=r

(1 + nsθ)∏v 6=u

(1 + nvθ)

+ (N − 1)

[M∑i=1

M∑k=1

M∑r=1

(mrY2r − 2mrYiYr +mrYiYk +Br)mimkninkn

2r

∏j 6=i

(1 + njθ)∏l 6=k

(1 + nlθ)∏s 6=r

(1 + nsθ)2

. (5.4.5)

The factor (1+ntθ) divides every summand in the above summations unless t = i = k = r = u,

so it suffices to only consider these terms. Letting t = i = k = r = u, the terms missing a factor

of (1 + ntθ) sum to a term we already encountered, namely, that in (5.3.14). The discussion

85

following display (5.3.14) shows that if the data is generic and mt ≥ 2, then (1 + niθ) does not

divide the numerator given in (5.4.1).

Continuing to work through the factors of the denominator from (5.4.2), assume that θ0 is a

root of f1(θ). Then everything vanishes in the numerator except for two terms −g1(θ0)fY (θ0)2

and (N − 1)fY (θ0)2g1(θ0), which add to (N − 2)fY (θ0)2g1(θ0). From the proof of Lemma 44,

we know fY (θ0)2g1(θ0) 6= 0 for generic data, so since we are working under the assumption of

at least two groups and at least one group size ni ≥ 2, the numerator and f1(θ) are relatively

prime for generic data.

Finally, we need to show

H(θ) := f1(θ)2gY 2(θ)− 2f1(θ)fY (θ)gY (θ) + fY (θ)2g1(θ) + f1(θ)2gB/m(θ)

and

G(θ) := Wf1(θ)d2(θ) + F (θ),

are relatively prime for generic dataW , Y1, . . . YM , and Bi withmi ≥ 2; the polynomial F (θ) was

defined in (5.3.17). We will again denote the between-group sums of squares with multiplicities

mi ≥ 2 as B′1, . . . , B′M2

. By a standard algebraic results, the polynomials G(θ) and H(θ) share a

common root θ if and only if a certain polynomial in their coefficients vanishes; this polynomial is

called the resultant and we denote it by Res(G,H). Since both H(θ) and G(θ) have coefficients

that are polynomials in the sufficient statistics W , Y1, . . . YM , and B′1, . . . , B′M2

, we may regard

Res(G,H) as a polynomial in the ring C[W, Y1, . . . YM , B′1, . . . , B

′M2

]. By Lemma 44, for any

86

given generic choice of Y1, . . . , YM , B′1, . . . , B

′M2

, a root θ0 of H is not a root of f1(θ) or d2(θ).

Hence, θ0 is a root of G if and only if

W = − F (θ0)d2(θ0)f1(θ0)

. (5.4.6)

PickingW not to satisfy (5.4.6) shows that Res(G,H) is not the zero polynomial in C[W, Y1, . . . YM , B′1, . . . , B

′M2

].

Hence, the zero locus of Res(G,H) is a set of lower dimension, and we conclude that H and G

are relatively prime for generic data. 2

5.5 Linear mixed models with multimodal likelihood functions

To our knowledge, the literature does not supply many examples of linear mixed models

with multimodal likelihood functions. We conclude by giving two simulated examples that

demonstrate the mathematical possibility of more than one mode. Such examples were rare

in our simulations, which is in agreement with findings of [61] who also treat the unbalanced

one-way layout. While uniqueness of local optima is not explicitly discussed in [61], the authors

remark in their conclusion that “varying the iteration starting point slightly affects the rate of

convergence, but not the [mean square errors] or biases of the [ML and REML] estimators.” The

examples we give involve three positive roots to the ML or REML equations for the variance

ratio θ. We do not know of examples with more positive roots.

87

Example 5.5.1 Consider the one-way layout with a single grand mean µ from (5.1.1).

Take q = 5 groups of sizes

n1 = 2, n2 = 5, n3 = 10, n4 = 20, n5 = 50.

Let the sufficient statistics be the five group averages

Y1 = −7357114273 ≈ −5.1546, Y2 = 13781

78326 ≈ 0.1759,

Y3 = −1327792152 ≈ −0.1441, Y4 = 31207

202567 ≈ 0.1541,

Y5 = −1571324121 ≈ −0.6514,

and the within-group sum of squares

W = 116487421 ≈ 276.69.

The univariate ML equation in θ has three nonnegative solutions, namely,

θML,1 ≈ 0.00838738, θML,2 ≈ 0.118458, θML,3 ≈ 0.338944;

having specified six digits we should add that the solutions were computed treating the above

rational fractions as the input. The solution θML,1 yields the global maximum of the likelihood

88

function, whereas θML,2 and θML,3 determine a saddle point and local maximum, respectively.

In contrast, the restricted likelihood function has a unique local and global maximum for

θREML ≈ 0.771763.

The data was simulated from the model with mean µ0 = 0, and variance components τ0 = 3

and ω0 = 2, which gives θ0 = 3/2.

Example 5.5.2 Continuing with the setup from Example 5.5.1, change the sufficient statis-

tics to

Y1 = 23008140206 ≈ 5.7226, Y2 = 721282

5630371 ≈ 0.1281,

Y3 = 2930595646 ≈ 0.3064, Y4 = 15365

37988 ≈ 0.4045,

Y5 = − 56940932 ≈ −0.0139,

and

W = 7550021759 ≈ 429.22.

Now, all real solutions to the ML equations are negative. Thus, the global maximum of the

likelihood is achieved at the boundary point θML = 0. In contrast, the REML equations have

three feasible solutions for θ, namely,

θREML,1 ≈ 0.00492193, θREML,2 ≈ 0.159465, θREML,3 ≈ 0.2414611.

89

The solution θREML,1 gives the global maximum of the restricted likelihood function. The solu-

tions θREML,2 and θREML,3 determine a saddle point and a local maximum, respectively. The

data was simulated as in Example 5.5.1.

In both Example 5.5.1 and Example 5.5.2, the first group is of the smallest size but has

group mean that is largest in absolute value. The other means are comparatively close to each

other. We experimented with permuting the means, while holding the group sizes fixed. In

Example 5.5.2, eight out of 120 permutations give bimodal restricted likelihood functions. Two

permutations yield three positive roots to the REML equations. The other six cases have two

positive roots, and one of the two local maxima occurs for θ = 0. In similar experiments for

Example 5.5.1, which features positive correlation between group sizes and means, bimodal

likelihood functions are obtained for 18 permutations. Again, these permutations keep the first

mean fixed. Only three permutations give three positive roots to the ML equations. The 18

permutations include the top six permutations in terms of large positive correlation but also

the permutation whose associated correlation ranks 43rd.

While dependence between group means and sizes plays a role in Examples 5.5.1 and 5.5.2,

the precise interplay between them appears to be subtle. For instance, when varying the mean

Y1 in Example 5.5.1 and keeping all other sufficient statistics fixed, we find that there are three

positive roots to the ML equations when −5.47 ≤ Y1 ≤ −5.08 but a unique root otherwise;

we experimented with a grid of values in [−10, 10]. In particular, the likelihood function is

unimodal for larger negative values of Y1. It would be interesting, but presumably difficult, to

90

get a better understanding of the semi-algebraic set of sufficient statistics that give (restricted)

likelihood functions with more than one local maximum.

CHAPTER 6

CONCLUSION

In this dissertation, we have seen three examples of how we can use tools from combina-

torial commutative algebra, multilinear algebra, and algebraic geometry to better understand

statistical models and methods. The theme of algebraic complexity arose in all three of these

examples.

In Chapter 3, we explored toric ideals of hypergraphs and showed how we can use com-

binatorics to understand the Markov bases of statistical models parameterized by square-free

monomials. In the literature, there are many papers that have explored toric ideals of graphs,

however, the content in Chapter 2 is one of only several known instances ([30] and [50]) where

toric ideals are studied through the combinatorial framework of hypergraphs. While hyper-

graphs are admittedly general and harder to visualize than graphs, there is a fast growing body

of work in combinatorics regarding hypergraphs and we see opportunities for more connections

to be made between toric ideals and these combinatorial constructions.

Since edge subrings of hypergraphs and their defining ideals, i.e. toric ideals of hypergraphs,

are only beginning to be explored, there are many open questions that would be interesting to

pursue. We list some of the open questions here.

Open Question 48 Given a hypergraph H, what is the Krull dimension of the edge subring

k[H]? This question could be understood by studying the vertex-edge incidence matrix of H,

91

92

see [44, Proposition 7.5], or perhaps by appropriately generalizing the results for graphs in [66,

Corollary 8.2.13].

Open Question 49 Given a hypergraph H, can we give combinatorial conditions that guar-

antee that k[H] is normal? Cohen-Macaulay? Gorenstein? These questions have been studied

for graphs in [47], [42] and [45].

Open Question 50 Given a hypergraph H, are there combinatorial conditions that imply IH

is a robust toric ideal, i.e. minimally generated by a universal Grobner basis (see [7])? For

example, Proposition 20 implies that when H is a 2-regular hypergraph, IH is a robust toric

ideal.

Answers to these questions would give us a better insight into the interplay between combi-

natorics and algebra in the study of toric ideals of hypergraphs, and in turn, give us better

understanding of toric models.

In Chapter 4, we described defining polynomials of the variety associated to the 4-state

general Markov model on the tree K1,3. This result marks the most recent progress toward

proving the salmon conjecture, i.e. showing that the phylogenetic ideal for the model is gener-

ated in polynomials of degree 5, 6, and 9. A final step would be to show that the polynomials

we describe in Section 4.4 define a radical ideal.

The results in Chapter 4 have two important impacts. First, even though we don’t have

an ideal description of the general Markov model, we do have explicit polynomials that can be

used for phylogenetic model selection according to methods described in [14]. Second, these

93

polynomials can also be used to test whether a 4× 4× 4 tensor has border rank at most four.

Currently, there are only a few values of m, n, l, and r for which the defining polynomials of

Vr(m,n, l) are known, but the issue of tensor rank is important in a variety of applications

including data mining, computer vision, and neuroscience.

In Chapter 5, we turned our attention towards maximum likelihood estimation and gave

an explicit formula for the ML degree for random effects models with one-way layouts. While

not only giving us insight into the feasibility of using algebraic methods to solve the likelihood

equations, the ML degree also tells us the number of paths we need to track if using numerical

solvers such as PHCpack [64] or Bertini [5] ,which use homotopy continuation methods.

As we saw in Section 5.5, it is possible for the likelihood equations to have more than one

real, positive solution. Thus, we argue that algebraic methods, symbolic or numeric, should

be used whenever feasible for maximum likelihood estimation, since local methods are not

guaranteed to return the global maximum. Symbolic algebraic methods have complexity issues

that may be hard to overcome in practice, however, homotopy continuation methods can be

used to solve polynomial equations numerically. Homotopy continuation methods have been

recently used in maximum likelihood estimation problems in [31]. One future application of

the research in Chapter 5 is to use the ML degrees to implement code to solve the likelihood

equations for variance component models using PHCpack.

94

CITED LITERATURE

1. E. S. Allman and J. A. Rhodes, Phylogenetic Invariants for the General MarkovModel of Sequence Mutation, Math. Biosci. 186 (2003), 113-144.

2. E. S. Allman and J. A. Rhodes, Phylogenetic ideals and varieties for general theMarkov model, Advances in Appl. Math., 40 (2008) 127-148.

3. S. Aoki, A. Takemura, R. Yoshida, Indispensable monomials of toric ideals andMarkov bases, Journal of Symbolic Computation 43 67 (2008) 490-507.

4. Quentin D. Atkinson, Phonemic diversity supports a serial founder effect model oflanguage expansion from Africa, Science 332 (2011), no. 6027, 346–349.

5. D. Bates, J. D. Hauenstein, A. J. Sommese, and C. W. Wampler,Bertini: Software for numerical algebraic geometry, Available at http://www.nd.edu/sommese/ bertini.

6. D.J. Bates and L. Oeding, Toward a salmon conjecture, Experimental Mathematics,20 no. 3 (2011) 358-370.

7. A. Boocher and E. Robeva, Robust Toric Ideals, preprint, arXiv:1304.0603.

8. Max-Louis G. Buot, Serkan Hosten, and Donald St. P. Richards, Counting and lo-cating the solutions of polynomial systems of maximum likelihood equations.II. The Behrens-Fisher problem, Statist. Sinica 17 (2007), no. 4, 1343–1354.

9. M. Casanellas and J. Fernandez-Sanchez, Performance of a new invariants methodon homogeneous and non-homogeneous quartet trees, Molecular Biology andEvolution, Vol 24, No. 1 (2007), pp. 288-293.

10. Fabrizio Catanese, Serkan Hosten, Amit Khetan, and Bernd Sturmfels, The maxi-mum likelihood degree, Amer. J. Math. 128 (2006), no. 3, 671–697.

11. H. Charalambous, A. Katsabekis, and A. Thoma. Minimal systems of binomial gen-erators and the indispensable complex of a toric ideal, Proceedings of theAmerican Mathematical Society 135 (2007) 3443-3451.

95

12. M. V. Catalisano and A. V. Geramita and A. Gimigliano, Ranks of tensors, secantvarieties of Segre varieties and fat points, Linear Algebra Appl., 355 (2002)263-285.

13. J. A. Cavender and J. Felsenstein, Invariants of phylogenies: a simple case withdiscrete states. Journal of Classification 4 (1987) 57-71.

14. M. Casanellas and J. Fernandez-Sanchez, Geometry of the Kimura 3-parametermodel, Advances in Applied Mathematics, Vol. 41, Issue 3 (2008) 265–292.

15. O. L. Davies and P. L. Goldsmith (eds.), Statistical methods in research and produc-tion, 4th ed., Hafner, 1972.

16. J. De Loera and S. Onn, Markov bases of three-way tables are arbitrarily complicated,J. Symb. Comput. 41 2 (February 2006) 173-18.

17. E. Demidenko and H. Massam, On the existence of the maximum likelihood estimatein variance components models, Sankhya Ser. A 61 (1999), no. 3, 431–443.

18. M. Develin and S. Sullivant. Markov bases of binary graph models, Annals of Com-binatorics 7 (2003) 441-466.

19. P. Diaconis and B. Sturmfels. Algebraic algorithms for sampling from conditionaldistributions, Ann. Statist. 26, (1998) no. 1, 363–397.

20. A. Dobra and S. Sullivant. A divide-and-conquer algorithm for generating Markovbases of multi-way tables. Computational Statistics 19 (2004), 347-366.

21. J. Draisma and J. Kuttler, On the ideals of equivariant tree models, MathematischeAnnalen, Vol. 344, Issue 3 , pp 619-644 (2009).

22. M. Drton, B. Sturmfels, and S. Sullivant, Lectures on algebraic statistics, BirkhauserVerlag AG, Basel, Switzerland, 2009.

23. J. J. Faraway, Extending the linear model with R, Texts in Statistical Science Series,Chapman & Hall/CRC, Boca Raton, FL, 2006, Generalized linear, mixedeffects and nonparametric regression models.

24. S. Fienberg, The Analysis of Cross-classified Categorical Data, MIT Press, Cam-bridge, MA, 1980.

96

25. S. Fienberg, Expanding the Statistical Toolkit with Algebraic Statistics, StatisicaSinica, Vol. 17 (2007), pp. 1251-1272.

26. S. Friedland, On tensors of border rank l in Cm×n×l, Linear Algebra Appl., in press,arXiv:1003.1968.

27. S. Friedland and E. Gross. A proof of the set-theoretic version of the salmon con-jecture, Journal of Algebra, Vol. 356 (2012) No. 1, pp. 374-379.

28. I. Gitler, E. Reyes, R. Villarreal, Ring graphs and toric ideals, Electronic Notes inDiscrete Mathematics 28 1 (2007) 393-400.

29. E. Gross, M. Drton and S. Petrovic, Maximum likelihoood degree of variancecomponent models, Electronic Journal of Statistics, Vol. 6 (2012), pp. 993-1016.

30. E. Gross and S. Petrovic, Combinatorial degree bound for toric ideals of hypergraphs,arXiv: 1206.2512.

31. J. Hauenstain, J. Rodriguez, and B. Sturmfels, Maximum Likelihood for Matriceswith Rank Constraints, preprint, arXiv:1210.0198.

32. R. R. Hocking, The analysis of linear models, Brooks/Cole Publishing Co., Monterey,CA, 1985.

33. S. Hosten, A. Khetan, and B. Sturmfels, Solving the likelihood equations, Found.Comput. Math. 5 (2005), no. 4, 389–407.

34. S. Hosten and S. Sullivant, The algebraic complexity of maximum likelihood estima-tion for bivariate missing data, Algebraic and geometric methods in statistics,Cambridge Univ. Press, Cambridge, 2010, pp. 123–133.

35. S. Hosten and S. Sullivant, A finiteness theorem for Markov bases of hierarchicalmodels, J. Comb. Theory Ser. A 114 2 (2007) 311-321.

36. J. Jiang, Linear and generalized linear mixed models and their applications, SpringerSeries in Statistics, Springer, New York, 2007.

37. J. A. Lake, A rate-independent technique for analysis of nucleaic acid sequences:evolutionary parsimony, Molecular Biology and Evolution 4 (1987) 167-191.

97

38. J. M. Landsberg, Tensors: Geometry and Applications, Graduate Studies in Mathe-matics, Vol. 128, American Mathematical Society, 2012.

39. J.M. Landsberg and L. Manivel, On the ideals of secant varieties of Segre varieties,Found. Comput. Math. 4 (2004), 397422.

40. J.M. Landsberg and L. Manivel, Generalizations of Strassen’s equations for secantvarieties of Segre varieties, Comm. Algebra 36 (2008), 405–422.

41. L. H. Lim, “Tensors and hypermatrices,” in: L. Hogben (Ed.), Handbook of LinearAlgebra, 2nd Ed., CRC Press, Boca Raton, FL, 2013.

42. J. Martnez-Bernal and R. H. Villarreal, Toric ideals generated by circuits, AlgebraColloq. 19, 665 (2012).

43. P. McCullagh and J. A. Nelder, Generalized linear models, 2nd ed., Monographs onStatistics and Applied Probability, Chapman & Hall, London, 1989.

44. E. Miller and B. Sturmfels, Combinatorial commutative algebra, Graduate Texts inMathematics, 227, Springer-Verlag, New York, 2005.

45. A. O’Keefe, Cohen-Macaulay toric rings arising from finite graphs, Ph.d. Thesis,2012.

46. L. Oeding, Set-theoretic defining equations of the tangential variety of the Segrevariety, J. Pure and Applied Algebra, 215 (2011) 1516-1527.

47. H. Ohsugi and T. Hibi, Toric ideals generated by quadratic binomials, Journal ofAlgebra 218 (1999), 509-527.

48. H. Ohsugi and T. Hibi, Indispensable binomials of finite graphs, J. Algebra Appl. 4(2005), no 4, 421-434.

49. L. Pachter and B. Sturmfels, Algebraic Statistics for Computational Biology, Cam-bridge University Press, 2005.

50. S. Petrovic and D. Stasi, Toric algebra of hypergraphs, Journal of Algebraic Combi-natorics, to appear.

98

51. E. Reyes, C. Tatakis and A. Thoma, Minimal generators of toric ideals of graphs,Adv. in Appl. Math 48 (2012), no. 1, 64-67

52. H. Sahai and M. M. Ojeda, Analysis of variance for random models. Vol. I. Bal-anced data, Birkhauser Boston Inc., Boston, MA, 2004, Theory, methods,applications and data analysis.

53. H. Sahai and M. M. Ojeda, Analysis of variance for random models. Vol. II. Un-balanced data, Birkhauser Boston Inc., Boston, MA, 2005, Theory, methods,applications, and data analysis.

54. S. R. Searle, G. Casella, and C. E. McCulloch, Variance components, Wiley Series inProbability and Mathematical Statistics: Applied Probability and Statistics,John Wiley & Sons Inc., New York, 1992, A Wiley-Interscience Publication.

55. V. Strassen, Rank and optimal computations of generic tensors, Linear Algebra Appl.,52/53 (1983) 645-685.

56. B. Sturmfels, Grobner bases and convex polytopes, University Lecture Series, 8, Amer-ican Mathematical Society, 1996.

57. B. Sturmfels, Open problems in algebraic statistics, Emerging applications of alge-braic geometry, IMA Vol. Math. Appl., vol. 149, Springer, New York, 2009,pp. 351–363.

58. B. Sturmfels and S. Sullivant, Toric ideals of phylogenetic invariants, Journal ofComputational Biology 12 (2005) 204-228.

59. B. Stumfels and P. Zwiernik, Binary cumulant varieties, Annals of Combinatorics,to appear.

60. S. Sullivant, Statistical models are algebraic varieties, Lecture notes. Available athttp://www4.ncsu.edu/ smsulli2/Activities/assc.html.

61. W. H. Swallow and J. F. Monahan, Monte Carlo comparison of ANOVA, MIVQUE,REML, and ML estimators of variance components, Technometrics 26 (1984),no. 1, 47–57.

62. C. Tatakis and A. Thoma, On the universal Grobner basis of toric ideals of graphs,Journal of Combinatorial Theory, Series A, 118 (2011) 1540-1548.

99

63. A. Takemura and S. Aoki, Some characterizations of minimal Markov basis for sam-pling from discrete conditional distributions, Ann. Inst. Statist. Math. 56 1(2004), 117.

64. J. Verschelde. Algorithm 795: PHCpack: A general-purpose solverfor polynomial systems by homotopy continuation, ACM Trans.Math. Softw. 25 2 (1999), 251-276. Software available athttp://www.math.uic.edu/~jan/download.html.

65. R. Villarreal, Rees algebras of edge ideals, Communications in Algebra 23 (9),3513–3524 (1995).

66. R. Villarreal, Monomial Algebras. Monographs and Textbooks in Pure and AppliedMathematics 238, Marcel Dekker, Inc: New York, 2001.

100

VITA

ELIZABETH GROSS

EDUCATION:

PhD in Mathematics. University of Illinois at Chicago, Aug, 2013 (expected).

Advisors: Sonja Petrovic and Shmuel Friedland.

MA in Mathematics. San Francisco State University, Aug, 2010.

Advisor: Arek Goetz.

BS in Mathematics. California State University, Chico, May, 2003.

PUBLICATIONS:

Maximum likelihood degree of variance component models, with Mathias Drton

and Sonja Petrovic, Electronic Journal of Statistics, 6, (2012), 993-1016.

A proof of the set-theorectic version of the salmon conjecture, with Shmuel

Friedland, Journal of Algebra 356, (2012), no.1, 374-379.

Combinatorial degree bound for toric ideals of hypergraphs, with Sonja Petrovic,

arXiv:1206.2512, Submitted.

PHCPack in Macaulay2, with Sonja Petrovic and Jan Verschelde, arXiv:1105.4881,

Submitted.

101

Modeling social networks using a random walk on a torus, M.A. Thesis, San

Francisco State University.

SOFTWARE:

PHCPack.m2, with Sonja Petrovic and Jan Verschelde, A Macaulay2 interface for PHC-

Pack. Available with Macaulay2 v.1.4 and later.

EXPERIENCE:

Research Assistant, Penn State, 2011–2012.

Visiting Student in the Statistics Department.

Graduate Teaching Assistant, University of Illinois at Chicago, 2009–2011.

Discussion leader for Intermediate Algebra and Calculus I. Grader and Tutor for Ap-

plied Linear Algebra.

Graduate Teaching Assistant, San Francisco State University, 2008–2009.

Instructor for Beginning Algebra and Intermediate Algebra. Discussion leader for

Calculus I, II. Grader for Calculus III.

AWARDS:

Deans Scholar Award, University of Illinois at Chicago, 2012.

Merit fellowship sponsored by the Graduate College.

Poster Award, SIAM Conference on Applied Algebraic Geometry, 2011.

First place in poster competition.

102

MSCS Graduate Student Teaching Award, University of Illinois at Chicago, 2010.

Departmental award for excellent teaching by a teaching assistant.

Sergio Martins Memorial Scholarship, San Francisco State University, 2008.

Merit scholarship sponsored by the Mathematics Department.

Robert W. Maxwell Memorial Scholarship, San Francisco State University, 2008.

Merit scholarship sponsored by the College of Science and Engineering.

algebraic complexity in statistics using combinatorial and...

Documents