an introduction to algebraic statisticsmd5/papers/algstat.pdf · 2010-01-13 · ‘algebraic...

119
An Introduction to Algebraic Statistics Mathias Drton Department of Statistics University of Chicago January, 2010

Upload: others

Post on 22-May-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

An Introduction to Algebraic Statistics

Mathias Drton

Department of StatisticsUniversity of Chicago

January, 2010

Page 2: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

‘Algebraic statistics’

Application and development of techniques in

Algebraic Geometry, Commutative Algebra, and Combinatorics

to address problems in Statistics.

Instrumental paper:

Diaconis, Persi; Sturmfels, Bernd. Algebraic algorithms forsampling from conditional distributions. Annals of Statistics26 (1998), no. 1, 363–397.

Applied-minded algebraists get involved with Statistics

(AMS meetings, SIAM activity group, . . . ).

Page 3: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Some literature

Pistone, Riccomagno & Wynn: Algebraic Statistics (Exp. Design)

Pachter & Sturmfels: Algebraic Statistics for Computational Biology

Gibilisco et al. (Eds.): Algebraic and Geometric Methods in Statistics

Viana & Richards (Eds.): Algebraic Methods in Statistics and Probability(2nd volume in prep.)

Page 4: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

These lectures

Material from Chapters 1, 2 and 5 in

Drton, Sullivant & Sturmfels:Lectures on Algebraic Statistics

Chapter 3: Conditional independenceGraphical models

Chapter 4: Hidden variable models

Chapter 6: Worked exercises

Chapter 7: Open problems

Page 5: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Lectures

Lecture I: Markov Bases for Exact Inference in Contingency Tables

(Chapter 1 in lecture notes)

Lecture II: Likelihood Ratio Tests and Singularities

(Section 2.3 in lecture notes)

Lecture III: Bayesian Integrals

(Section 5.1 in lecture notes)

Page 6: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Part I

Markov Bases for Exact Inference in Contingency Tables

1 Fisher’s exact test for 2× 2 contingency tables2 Log-linear models for multi-way tables3 Markov bases for exact conditional inference

Page 7: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Lecture outline

1 Fisher’s exact test for 2× 2 contingency tables

2 Log-linear models for multi-way tables

3 Markov bases for exact conditional inference

Mathias Drton Lecture 1: Fisher’s exact test 2 / 110

Page 8: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Example: Cancer treatment

Surgery versus radition treatment for cancer patients:

Cancer Cancer NotControlled Controlled

Surgery 21 0 21Radiation therapy 15 3 18

36 3 39

Disease outcome independent of treatment?

Chi-square test p-value = 0.1788

Fisher’s exact test p-value = 0.08929

Mathias Drton Lecture 1: Fisher’s exact test 3 / 110

Page 9: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Independence model

Two discrete/categorical random variables

X ∈ [r ] := {1, 2, . . . , r} and Y ∈ [c] := {1, 2, . . . , c}

Joint and marginal probabilities:

pij = P(X = i ,Y = j), pi+ = P(X = i), p+j = P(Y = j)

X and Y independent (X⊥⊥Y ) iff

pij = pi+p+j for all i ∈ [r ], j ∈ [c]

or, equivalently, the matrix P = (pij) has rank 1.

Mathias Drton Lecture 1: Fisher’s exact test 4 / 110

Page 10: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Chi-square test of independence

Counts from n i.i.d. copies of (X ,Y ):

Uij =n∑

k=1

1{X (k)=i ,Y (k)=j}, i ∈ [r ], j ∈ [c].

Contingency table U = (Uij) has multinomial distribution:

P(U = u) =n!

u11!u12! · · · urc !

r∏i=1

c∏j=1

puij

ij .

Chi-square statistic

X 2(U) =r∑

i=1

c∑j=1

(Uij − uij)2

uij

H0−→d χ2(r−1)(c−1), n→∞

Mathias Drton Lecture 1: Fisher’s exact test 5 / 110

Page 11: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Fisher’s exact test for 2× 2 table

Hypergeometric distribution:

If X⊥⊥Y , then

P(U11 = u11 |U1+ = u1+,U+1 = u+1) =

(u1+u11

)( n−u1+u+1−u11

)( nu+1

)for u11 ∈ {max(0, u1+ + u+1 − n), . . . ,min(u1+, u+1)}.

Exact test:1 Choose a test statistic T (u)

(e.g., X 2(u), P(U11 = u11 |U1+ = u1+,U+1 = u+1), . . . )2 P-value:

P(T (U) ≥ T (u) |U1+,U+1) =∑

v :T (v)≥T (u)

(U1+

v11

)(n−U1+

U+1−v11

)(n

U+1

)Mathias Drton Lecture 1: Fisher’s exact test 6 / 110

Page 12: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Lecture outline

1 Fisher’s exact test for 2× 2 contingency tables

2 Log-linear models for multi-way tables

3 Markov bases for exact conditional inference

Mathias Drton Lecture 1: Log-linear models 7 / 110

Page 13: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Three-way table (Agresti, 2002)

White subjects were asked about:

(1) “Black children on school bus”, (2) “Black candidate for presidency”,

(3) “Black friend for dinner at home”

HomePresident Busing Yes No ???

Yes Yes 41 65 0No 71 157 1??? 1 17 0

No Yes 2 5 0No 3 44 0??? 1 0 0

??? Yes 0 3 1No 0 10 0??? 0 0 1

??? = ‘don’t know’

Mathias Drton Lecture 1: Log-linear models 8 / 110

Page 14: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Log-linear models

Discrete r.v. X1, . . . ,Xm; X` ∈ [r`]

State space: R =∏m`=1[r`]

Joint probability table: p = (pi | i ∈ R)

Probability simplex: ∆R−1

Definition

Fix a matrix A ∈ Zd×R whose columns all sum to the same value. Thelog-linear model associated with A is the set of positive probability tables

MA ={

p = (pi ) ∈ int(∆R−1) : log p = (log pi ) ∈ rowspan(A)},

where rowspan(A) is the linear space spanned by the rows of A.

Mathias Drton Lecture 1: Log-linear models 9 / 110

Page 15: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Example: Independence model

X , Y : two discrete r.v. with joint probabilities pij > 0

X⊥⊥Y is equivalent to

log pij = log pi+ + log p+j = αi + βj , i ∈ [r ], j ∈ [c].

Suppose r = 2 and c = 3. Then log p ∈ R2×3 is in row span of the(r + c)× rc = 5× 6 matrix

A =

11 12 13 21 22 23

α1 1 1 1 0 0 0α2 0 0 0 1 1 1β1 1 0 0 1 0 0β2 0 1 0 0 1 0β3 0 0 1 0 0 1

.

Mathias Drton Lecture 1: Log-linear models 10 / 110

Page 16: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Contingency tables

Based on n-sample, define m-way contingency table U:

Ui =n∑

k=1

1{X (k)1 =i1,...,X

(k)m =im}

, i = (i1, . . . , im) ∈ R

Let T (n) be the space of non-neg integer tables summing to n.

Definition

We call the vector Au the minimal sufficient statistics for the model MA,and the set of tables

F(u) ={

v ∈ NR : Av = Au}

is the fiber of a contingency table u ∈ T (n) with respect to model MA.

Mathias Drton Lecture 1: Log-linear models 11 / 110

Page 17: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Example: Independence model

Let u be an r × c table.

For the matrix A encoding the independence model X⊥⊥Y :

Au =

(u·+u+·

),

where u·+ and u+· are the row and columns sums of table u.

If r = 2 and c = 3:

Au =

1 1 1 0 0 00 0 0 1 1 11 0 0 1 0 00 1 0 0 1 00 0 1 0 0 1

u11

u12

u13

u21

u22

u23

=

u1+

u2+

u+1

u+2

u+3

.

Mathias Drton Lecture 1: Log-linear models 12 / 110

Page 18: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Hierarchical models

Conditional independence:

X1 and X2 conditionally independent given X3 if

P(X1 = i ,X2 = j |X3 = k) = P(X1 = i |X3 = k)P(X2 = j |X3 = k).

Equivalent to matrices Pk = (pijk) having rank at most 1 for all k.

Log-linear formulation:

log pijk = α(13)ik + α

(23)jk

No three-way interaction:

log pijk = α(12)ij + α

(13)ik + α

(23)jk

Mathias Drton Lecture 1: Log-linear models 13 / 110

Page 19: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Conditional inference

Lemma

If p = eATα ∈MA and u ∈ T (n), then

P(U = u) =n!∏

i∈R ui !eα

T (Au).

Corollary

Conditional distribution is multivariate hypergeometric:

P(U = u |AU = Au) =1/(∏

i∈R ui !)∑

v∈F(u) 1/(∏

i∈R vi !) ,

and does not depend on p.

Mathias Drton Lecture 1: Log-linear models 14 / 110

Page 20: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Exact test

Consider the hypothesis testing problem

H0 : p ∈MA versus H1 : p 6∈ MA.

Maximum likelihood estimates pi

Expected counts ui = npi (same for all tables in a fiber F(u))

Chi-square statistic

X 2(U) =∑i∈R

(Ui − ui )2

ui

Exact p-valueP(X 2(U) ≥ X 2(u) |AU = Au)

Mathias Drton Lecture 1: Log-linear models 15 / 110

Page 21: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Markov chain Monte Carlo

Exact p-value is equal to∑v∈F(u) 1{X 2(v)≥X 2(u)}/

(∏i∈R ui !

)∑v∈F(u) 1/

(∏i∈R vi !

) .

Larger counts or tables: prohibitive to sum over entire fiber

Approximate p-value by Markov chain Monte Carlo algorithms forsampling tables from the conditional distribution

With prob 1, MCMC yields sequence of tables vt ∈ F(u) such thatthe proportion of tables with X 2(vt) ≥ X 2(u) converges to p-value.

Problem

For an irreducible Metropolis-Hastings sampler, find

Finite set of moves that connect any two tables in any fiber.

Mathias Drton Lecture 1: Log-linear models 16 / 110

Page 22: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Lecture outline

1 Fisher’s exact test for 2× 2 contingency tables

2 Log-linear models for multi-way tables

3 Markov bases for exact conditional inference

Mathias Drton Lecture 1: Markov bases 17 / 110

Page 23: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Markov basis – Definition

Log-linear model MA associated with matrix A

Integer kernel kerZ(A)

Definition

A finite subset B ⊂ kerZ(A) is a Markov basis for MA if for all u ∈ T (n)and all pairs v , v ′ ∈ F(u) there exists a sequence u1, . . . , uL ∈ B such that

v ′ = v +L∑

k=1

uk and v +l∑

k=1

uk ≥ 0 for all l = 1, . . . , L.

The elements of the Markov basis are called moves.

Mathias Drton Lecture 1: Markov bases 18 / 110

Page 24: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Metropolis-Hastings algorithm

Input: Contingency table u; Markov basis B for the model MA.

Output: Sequence (X 2(vt))∞t=1 for tables vt in fiber F(u).

Step 1: Initialize v1 = u.

Step 2: For t = 1, 2, . . . repeat the following steps:

(i) Select uniformly at random a move ut ∈ B.(ii) If min(vt + ut) < 0, then set vt+1 = vt , else set

vt+1 =

{vt + ut

vt

with probability

{q

1− q,

where

q = min

{1,

P(U = vt + ut |AU = Au)

P(U = vt |AU = Au)

}.

(iii) Compute X 2(vt).

Mathias Drton Lecture 1: Markov bases 19 / 110

Page 25: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Markov basis for independence model

Let eij be the r × c table:

j

0 0 0 0 0 0 . . .i 0 0 0 1 0 0 . . .

0 0 0 0 0 0 . . ....

......

......

. . .

Proposition

The (unique minimal) Markov basis for the independence model MX⊥⊥Y

consists of the following 2 ·(r

2

)(c2

)moves, each having one-norm 4:

B ={±(eij + ekl − eil − ekj) : 1 ≤ i < k ≤ r , 1 ≤ j < l ≤ c

}.

Mathias Drton Lecture 1: Markov bases 20 / 110

Page 26: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Independence model – Proof

Idea Show that we can use elements of B to bring any twodistinct tables in the same fiber closer to one another.

Claim Given v 6= u, v ∈ F(u) show that there is b ∈ B such that(i) u + b ≥ 0 and (ii) ‖u − v‖1 > ‖u + b − v‖1.

Proof Recall Au yields row and column sums:

(a) Since u 6= v and Au = Av , there is at least one positiveentry in u − v . WLOG, u11 − v11 > 0.

(b) Since Au = Av , there is a negative entry in the first row ofu − v . WLOG, u12 − v12 < 0.

(c) Similarly, u22 − v22 > 0.

(d) Let b = e12 + e21 − e11 − e22. Then‖u − v‖1 > ‖u + b − v‖1 and u + b ≥ 0 as desired.

Mathias Drton Lecture 1: Markov bases 21 / 110

Page 27: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Symbolic computation – 4ti2

Markov basis of ‘no 3-way interaction model’ for 2× 2× 2 table?

Matrix representing model has format 12× 8 (store in file no3way):

12 81 1 0 0 0 0 0 00 0 1 1 0 0 0 00 0 0 0 1 1 0 00 0 0 0 0 0 1 11 0 1 0 0 0 0 00 1 0 1 0 0 0 00 0 0 0 1 0 1 00 0 0 0 0 1 0 11 0 0 0 1 0 0 00 1 0 0 0 1 0 00 0 1 0 0 0 1 00 0 0 1 0 0 0 1

Mathias Drton Lecture 1: Markov bases 22 / 110

Page 28: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Symbolic computation – 4ti2

Compute Markov basis (up to sign) using command markov no3way

Output in file no3way.mar:

1 81 -1 -1 1 -1 1 1 -1

Two moves

±(e111 + e122 + e212 + e221 − e112 − e121 − e211 − e222)

correspond to the quartic equation

p111p122p212p221 = p112p121p211p222

Recall:pijk ∝ θ

(12)ij θ

(13)ik θ

(23)jk

Mathias Drton Lecture 1: Markov bases 23 / 110

Page 29: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Polynomial algebra

Polynomial ring R[p] = R[p1, p2, . . . , pk ]

For non-neg integer table u = (u1, . . . , uk) ∈ Nk define monomial

pu = pu11 pu2

2 · · · pukk

For integer table u = u+ − u− ∈ Zk with positive and negative partsu+, u− ∈ Nk define binomial

pu+ − pu−

Example:

p =

(p11 p12

p21 p22

), u =

(2 −2−1 1

)=⇒ p2

11p22 − p212p21

Mathias Drton Lecture 1: Markov bases 24 / 110

Page 30: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Polynomial algebra

A subset I ⊂ R[p] is an ideal if

f , g ∈ I =⇒ f + g ∈ I

f ∈ I , h ∈ R[p] =⇒ hf ∈ I

Hilbert’s basis theorem:Every ideal I has a finite generating set f1, . . . , fm ∈ R[p], that is,

I = 〈f1, . . . , fm〉 =

{m∑

i=1

hi fi : h1, . . . , hm ∈ R[p]

}

Mathias Drton Lecture 1: Markov bases 25 / 110

Page 31: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Fundamental theorem

Given a matrix A ∈ Nd×k for a log-linear model, define the (toric) ideal

IA := 〈 pu+ − pu− : u ∈ kerZ(A) 〉 ⊂ R[p].

Theorem (Fundamental theorem of Markov bases)

A subset B of kerZ(A) is a Markov basis if and only if the correspondingset of binomials { pb+ − pb− : b ∈ B } generates the ideal IA. Inparticular, a (finite) Markov basis always exists.

Mathias Drton Lecture 1: Markov bases 26 / 110

Page 32: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Example: Independence model for 2× 2 table

We have shown that a Markov basis (up to sign) is given by

b =

(1 −1−1 1

)Hence, IA = I ∗ := 〈p11p22 − p12p21〉

Example for IA ⊆ I ∗: Consider the tables

u =

(4 12 5

), v =

(3 23 4

).

Since u − b = v , we have u − b+ = v − b− and thus

p411p1

12p221p5

22 − p311p2

12p321p4

22 = p311p1

12p221p4

22(p11p22 − p12p21) ∈ I ∗

Mathias Drton Lecture 1: Markov bases 27 / 110

Page 33: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Computing Markov bases

Theorem

The ideal IA is a homogeneous ideal and its homogeneous elements areexactly the homogeneous polynomials f in R[p] that vanish on thelog-linear model MA:

f (p) = 0 for all p ∈MA.

For a matrix A = (aij) ∈ Nd×k , compute a Markov basis byeliminating the variables from the equation system

pj − θa1j

1 θa2j

2 · · · θadj

d = 0, i = 1, . . . , k .

Software for Grobner basis calculations.... . . Macaulay 2, Singular, 4ti2

Mathias Drton Lecture 1: Markov bases 28 / 110

Page 34: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Example: No 3-way interaction in 2× 2× 2 table

Equation system:

p111 = α11β11γ11, p112 = α11β12γ12,

p121 = α12β11γ21, p122 = α12β12γ22,

p211 = α21β21γ11, p212 = α21β22γ12,

p221 = α22β21γ21, p222 = α22β22γ22.

Variable elimination:Every relation among pijk is a polynomial multiple of

p111p122p212p221 − p112p121p211p222

Markov basis:

±(e111 + e122 + e212 + e221 − e112 − e121 − e211 − e222)

Mathias Drton Lecture 1: Markov bases 29 / 110

Page 35: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Singular session

LIB "elim.lib";ring R = 0,(p111,p112,p121,p122,p211,p212,p221,p222,

a11,a12,a21,a22,b11,b12,b21,b22,c11,c12,c21,c22),dp;ideal M =p111 - a11*b11*c11,p112 - a11*b12*c12,p121 - a12*b11*c21,p122 - a12*b12*c22,p211 - a21*b21*c11,p212 - a21*b22*c12,p221 - a22*b21*c21,p222 - a22*b22*c22;eliminate(M, a11*a12*a21*a22*b11*b12*b21*b22*

c11*c12*c21*c22);

Mathias Drton Lecture 1: Markov bases 30 / 110

Page 36: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Background reading

Cox, D.; Little, J.; O’Shea, D. (2007).Ideals, varieties, and algorithms.Springer, New York, 2007.

Mathias Drton Lecture 1: Markov bases 31 / 110

Page 37: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Database: http://mbdb.mis.mpg.de

Mathias Drton Lecture 1: Markov bases 32 / 110

Page 38: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Slim and long tables

Theorem

Let X1 be a r.v. with 3 states, and X2 and X3 r.v. with r2 and r3 states,resp. Let v ∈ Zk be any integer vector. There are r2, r3 ∈ N and acoordinate projection π : Z3×r2×r3 → Zk such that every minimal Markovbasis for the no 3-way interaction model contains a table u with π(u) = v.

Theorem

Fix a set of interactions Γ for a hierarchical log-linear model, and fixr2, . . . , rm. There exists a number b(Γ, r2, . . . , rm) <∞ such that theone-norms of the elements of any minimal Markov basis for Γ ons × r2 × · · · × rm tables are less than or equal to b(Γ, r2, . . . , rm). Thisbound is independent of s, which can grow large.

Mathias Drton Lecture 1: Markov bases 33 / 110

Page 39: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Exercise

Exercises 6.1 and 6.2 in the lecture notes

Perform an exact test for your favorite table

e.g. test ‘no 3-way interaction’ in the example from Agresti (2002)shown earlier:

HomePresident Busing Yes No ???

Yes Yes 41 65 0No 71 157 1??? 1 17 0

No Yes 2 5 0No 3 44 0??? 1 0 0

??? Yes 0 3 1No 0 10 0??? 0 0 1

??? = ‘don’t know’

Mathias Drton Lecture 1: Markov bases 34 / 110

Page 40: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Part II

Likelihood Ratio Tests and Singularities

4 Algebraic statistical models5 Large-sample asymptotics and Chernoff’s theorem6 Examples

Page 41: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Lecture outline

4 Algebraic statistical models

5 Large-sample asymptotics and Chernoff’s theorem

6 Examples

Mathias Drton Lecture 2: Algebraic statistical models 36 / 110

Page 42: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Example: Bayesian network

Sachs et al. (2005): Analysis of flow cytometry data

Expression values for 11 proteins discretized −→ ternary variables

Large sample size (observational part: n = 1200)

Bayesian network (conditional independence model):

Typical task: test absence of edges

Likelihood ratio test of absence‘PKC → PKA’ can be based on χ2

4

distribution

See Chapter 3 in the lecture notes

Mathias Drton Lecture 2: Algebraic statistical models 37 / 110

Page 43: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Chi-square asymptotics

Theorem

Suppose

(i) {Pθ : θ ∈ Θ} is a regular exponential family (Θ ⊂ Rk open),

(ii) Θ0 ⊂ Θ1 are smooth submanifolds of Θ,

(iii) True parameter point θ0 ∈ Θ0.

Then the likelihood ratio statistic for testing

H0 : θ ∈ Θ0 vs. H1 : θ ∈ Θ1 \Θ0

tends to χ2dim(Θ1)−dim(Θ0) as n→∞.

Theorem covers Bayesian network example because

interior of probability simplex is regular exponential family, and

Bayesian networks define smooth submanifolds.

Mathias Drton Lecture 2: Algebraic statistical models 38 / 110

Page 44: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Regular exponential families

Definition

Let PΘ = {Pθ : θ ∈ Θ} be a family of probability distributions onX ⊆ Rm that have densities with respect to a measure ν. We call PΘ anexponential family if there is a statistic T : X → Rk and functionsh : Θ→ Rk and Z : Θ→ R such that each distribution Pθ has ν-density

pθ(x) =1

Z (θ)exp{〈h(θ),T (x)〉}, x ∈ X .

If

H =

{η ∈ Rk :

∫X

exp{〈η,T (x)〉} dν(x) <∞}

is an open subset of Rk and h a diffeomorphism between Θ and H, thenwe say that PΘ is a regular exponential family of order k .

Mathias Drton Lecture 2: Algebraic statistical models 39 / 110

Page 45: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Curved exponential families

Definition

Suppose {Pθ : θ ∈ Θ} is a regular exponential family. If Θ0 is a smoothsubmanifold of Θ, then {Pθ : θ ∈ Θ0} is a curved exponential family.

Well-developed large-sample theory for CEFs

Estimation and confidence intervals:

Maximum likelihood estimators are asymptotically normal.

Hypothesis testing:

Likelihood ratio statistics have asymptotic chi-square distributions.Wald statistics asymptotic chi-square distributions.

Model selection:

Bayesian information criterion (BIC) is consistent and connected to theasymptotics of marginal likelihood integrals.

Mathias Drton Lecture 2: Algebraic statistical models 40 / 110

Page 46: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Example: Instrumental variables

Estimate coeffient γ43 in the system

X3 = γ35X5 + ε3,

X4 = γ43X3 + γ45X5 + ε4,

X5 = ε5

with εi ∼ N (0, ωi ) independent

X3

X4

X5

Variable X5 hidden

: Consider distributions

(X1, . . . ,X4) ∼ N(0,Σ(γ, ω)

)(γ, ω)→ Σ(γ, ω) polynomial parametrization

Mathias Drton Lecture 2: Algebraic statistical models 41 / 110

Page 47: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Example: Instrumental variables

Estimate coeffient γ43 in the system

X1 = ε1,

X2 = ε2,

X3 = γ31X1 + γ32X2 + γ35X5 + ε3,

X4 = γ43X3 + γ45X5 + ε4,

X5 = ε5

with εi ∼ N (0, ωi ) independent

X1

X3

X4

X2

X5

Variable X5 hidden

Marginal distribution

(X1, . . . ,X4) ∼ N(0,Σ(γ, ω)

)

Mathias Drton Lecture 2: Algebraic statistical models 42 / 110

Page 48: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Example: Instrumental variables

Covariance matrix parametrization is a polynomial map:

Σ(γ, ω) =ω1 0 γ31 ω1 γ43 γ31 ω1

ω2 γ32 ω2 γ43 γ32 ω2

Var[X3] γ43 Var[X3] + γ35 γ45 ω5

ω4 + γ243 Var[X3] + γ2

45 ω5 + 2γ45 γ43 γ35 ω5

with

Var[X3] = ω3 + γ231 ω1 + γ2

32 ω2 + γ235 ω5

Coordinate σij is a combinatorial expression summing termsassociated with ‘treks’

i ←− `1 ←− `2 ←− . . .←− t −→ . . . −→ r2 −→ r1 −→ j

Mathias Drton Lecture 2: Algebraic statistical models 43 / 110

Page 49: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Example: Instrumental variables

In this hidden variable model test

H0 : γ31 = γ32 = 0

Null distrib. of LR statistic (n = 1000) X1

X3

X4

X2

X5

0 2 4 6 8 10 12

0.0

0.2

0.4

0.6

0.8

1.0

CDF

F(x

)

Mathias Drton Lecture 2: Algebraic statistical models 44 / 110

Page 50: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Algebraic exponential families

Asymptotic behavior of the LRT in instrumental variables example?

Hidden variable models 6= curved exponential family

What is a suitable general framework to study hidden variable models?

Definition

Suppose {Pθ : θ ∈ Θ} is a regular exponential family. If Θ0 is asemi-algebraic subset of Θ, then the submodel {Pθ : θ ∈ Θ0} is analgebraic exponential family.

Mathias Drton Lecture 2: Algebraic statistical models 45 / 110

Page 51: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Semi-algebraic sets

Definition

Let R[t1, . . . , tk ] be the ring of polynomials in the indeterminates t1, . . . , tkwith real coefficients. A semi-algebraic set is a finite union of the form

Θ0 =m⋃

i=1

{θ ∈ Rk | f (θ) = 0 for f ∈ Fi and h(θ) > 0 for h ∈ Hi},

where Fi ,Hi ⊂ R[t1, . . . , tk ] are collections of polynomials and all Hi finite.

Theorem (Tarski-Seidenberg)

If g : Rd → Rk is a polynomial map and Γ is a semi-algebraic set, thenΘ0 = g(Γ) is semi-algebraic.

Mathias Drton Lecture 2: Algebraic statistical models 46 / 110

Page 52: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Lecture outline

4 Algebraic statistical models

5 Large-sample asymptotics and Chernoff’s theorem

6 Examples

Mathias Drton Lecture 2: Large-sample asymptotics and Chernoff’s theorem 47 / 110

Page 53: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Likelihood ratio test

Independent observations X (1), . . . ,X (n) with unknown distribution

Statistical model {Pθ : θ ∈ Θ}, Θ ⊆ Rk

Suppose Pθ have density functions pθ(x). Define likelihood function

Ln : Θ→ R, θ 7→n∏

i=1

pθ(X (i)).

Test H0 : θ ∈ Θ0 vs. H1 : θ ∈ Θ1 \Θ0 for some Θ0 ⊂ Θ1 ⊂ Θ.

Definition

The likelihood ratio test rejects H0 if the likelihood ratio statistic

λn = 2 logsupθ∈Θ1

Ln(θ)

supθ∈Θ0Ln(θ)

is “too large” =⇒ p-value PH0(λn ≥ λobs).

Page 54: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Canonical example: Normal means

Normal mean model {N (θ, Ik) : θ ∈ Rk}Log-likelihood function

`n(θ) = −nk

2log(2π)−

n

2‖Xn − θ‖2

2 −1

2

n∑i=1

‖X (i) − Xn‖22.

Sample mean

Xn =1

n

n∑i=1

X (i)

Likelihood ratio statistic for testing H0 : θ ∈ Θ0 vs. H1 : θ 6∈ Θ0:

λn = n · infθ∈Θ0

‖Xn − θ‖22 = inf

θ∈Θ0

‖√

n(Xn − θ0)−√

n(θ − θ0)‖22

where θ0 is the true parameter.

Mathias Drton Lecture 2: Large-sample asymptotics and Chernoff’s theorem 49 / 110

Page 55: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Canonical example: Normal means

Asymptotics of LR statistic determined by squared Euclidean distancebetween N (0, Ik)-point and “limit of

√n(Θ0 − θ0)”

Example: Cuspidal cubic

Bivariate normal mean model

Θ0 cuspidal cubic {(θ1, θ2) : θ31 = θ2

2}

Tangent cone at θ0 = 0 is half-ray{(θ1, θ2) : θ1 ≥ 0, θ2 = 0}

Limiting distribution of LRT is a mixtureof chi-squares:

λnD−→ 1

2χ2

1 +1

2χ2

2.

Mathias Drton Lecture 2: Large-sample asymptotics and Chernoff’s theorem 50 / 110

Page 56: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Chernoff’s theorem: Preparation

Definition (Tangent cone)

TC θ0(Θ0) =

{lim

n→∞

θn − θ0

βn: βn > 0, θn ∈ Θ0, θn −→ θ0

}

Definition (Fisher-information matrix)

Positive semi-definite matrix I (θ) with entries

I (θ)ij = Eθ

[(∂

∂θilog pθ(X )

)(∂

∂θjlog pθ(X )

)], i , j ∈ [k].

Mathias Drton Lecture 2: Large-sample asymptotics and Chernoff’s theorem 51 / 110

Page 57: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Chernoff’s theorem (for exponential families)

Theorem

Suppose {Pθ : θ ∈ Θ} is a regular exponential family with Θ ⊆ Rk . Letθ0 ∈ Θ0 ⊆ Θ ⊆ Rk be the true parameter point. If Θ0 is Chernoff-regularat θ0 and n→∞, then LR statistic λn for H0 : θ ∈ Θ0 vs. H1 : θ 6∈ Θ0

converges tomin

τ∈TCθ0(Θ0)‖Z − I (θ0)1/2τ‖2

2

where Z ∼ N (0, Ik) and I (θ0)1/2 is any matrix square root of theFisher-information I (θ0).

Mathias Drton Lecture 2: Large-sample asymptotics and Chernoff’s theorem 52 / 110

Page 58: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

What is Chernoff-regularity?

Condition on how tangent cone TC θ0(Θ0) approximates the set Θ0

locally at θ0 ∈ Θ0.Allows one to pass from supθ∈Θ0

. . . to supτ∈TCθ0(Θ0) . . . .

For θ0 = 0:

distance(θ,TC 0(Θ0)) = o(‖θ‖), θ ∈ Θ0,

distance(τ,Θ0) = o(‖τ‖), τ ∈ TC 0(Θ0)

Definition

A set Θ0 ⊆ Rk is Chernoff-regular at θ0 if

For all τ ∈ TC θ0(Θ0) and βn ↘ 0there exists a sequence θn → θ0 in Θ0 such that

limn→∞

θn − θ0

βn= τ.

Mathias Drton Lecture 2: Large-sample asymptotics and Chernoff’s theorem 53 / 110

Page 59: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Chernoff-regularity of semi-algebraic sets

Lemma

Semi-algebraic sets are everywhere Chernoff-regular.

Follows from ‘curve selection lemma’ that implies that for all τ ∈ TΘ(θ0)there exists a (real analytic) map α : [0, ε)→ Θ with α(0) = θ0 s.t.

τ = limt→0+

α(t)− α(0)

t.

Corollary (Testing in a submodel)

Suppose {Pθ : θ ∈ Θ} is regular exponential family with Θ ⊆ Rk . LetΘ0,Θ1 be semi-algebraic subsets of Θ. If true parameter θ0 is in Θ0 andn→∞, then LR statistic for H0 : θ ∈ Θ0 vs. H1 : θ ∈ Θ1 \Θ0 converges to

minτ∈TCθ0

(Θ0)‖Z − I (θ0)1/2τ‖2

2− minτ∈TCθ0

(Θ1)‖Z − I (θ0)1/2τ‖2

2, Z ∼ N (0, Ik).

Mathias Drton Lecture 2: Large-sample asymptotics and Chernoff’s theorem 54 / 110

Page 60: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Lecture outline

4 Algebraic statistical models

5 Large-sample asymptotics and Chernoff’s theorem

6 Examples

Mathias Drton Lecture 2: Examples 55 / 110

Page 61: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Linear spaces

Lemma

If Θ0 is a d-dimensional linear subspace of Rk and X ∼ N (0,Σ) withpositive definite covariance matrix Σ, then

infθ∈Θ0

(X − θ)T Σ−1(X − θ) ∼ χ2k−d .

Corollary

Likelihood ratio statistic is asymptotically chi-square when testing linear orsmooth hypotheses.

Mathias Drton Lecture 2: Examples 56 / 110

Page 62: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Order-restricted inference

Example:

X1 : Difference in blood pressure before and after taking 1 pillX2 : Difference in blood pressure before and after taking 2 pills

Suppose X1 ∼ N(µ1, σ20) and X2 ∼ N(µ2, σ

20) and test:

H0 : µ2 ≥ µ1 ≥ 0 versus H1 : (µ2 < µ1 or µ1 < 0)

or possibly,

H0 : µ2 = µ1 = 0 versus H1 : µ2 ≥ µ1 ≥ 0

Mathias Drton Lecture 2: Examples 57 / 110

Page 63: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Mixture of chi-square distributions

1

8· χ2

0 +1

2· χ2

1 +3

8· χ2

2

Mathias Drton Lecture 2: Examples 58 / 110

Page 64: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Convex cones – ‘Boundary problems’

Lemma

Distance between standard normal random vector and convex cone isdistributed like a mixture of chi-square distributions.

Theorem (Miles, 1959; Drton & Klivans, 2009)

(a)

H0 : θ ∈{

x ∈ Rk : x1 ≤ x2 ≤ · · · ≤ xk

}Mixture weights ∝ coeff’s of t(t − 1) · · · (t − k + 1)

(b)

H0 : θ ∈{

x ∈ Rk : 0 ≤ x1 ≤ x2 ≤ · · · ≤ xk

}Mixture weights ∝ coeff’s of (t − 1)(t − 3) · · · (t − 2k + 1).

Mathias Drton Lecture 2: Examples 59 / 110

Page 65: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Singularities

Geometry of a semi-algebraic set Θ0 ⊆ Rk expresses itselfalgebraically in the vanishing ideal

I(Θ0) = {f ∈ R[t1, . . . , tk ] : f (θ) = 0 for all θ ∈ Θ0}.

Finite generating set

〈 f1, . . . , fs 〉 = I(Θ0), f1, . . . , fs ⊂ R[t1, . . . , tk ]

Definition

A point θ0 in Θ0 is a singularity if the rank of the Jacobian matrix

Jf (θ0) =

(∂fi (t)

∂tj

)t=θ

∈ Rs×k .

is smaller than k − dim Θ0.

Mathias Drton Lecture 2: Examples 60 / 110

Page 66: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Algebraic tangent cone

Let θ0 be a root of the polynomial f ∈ R[t1, . . . , tk ] and write

f (t) =L∑

h=l

fh(t − θ0),

where fh homogeneous, degree(fh) = h, and fl 6= 0.

Since f (θ0) = 0, minimal degree l ≥ 1, and we define fθ0,min = fl .

Tangent cone ideal:

{fθ0,min : f ∈ I(Θ0)} ⊂ R[t1, . . . , tk ].

Lemma

Suppose θ0 is a point in the semi-algebraic set Θ0 and f ∈ R[t1, . . . , tk ] apolynomial such that f (θ0) = 0 and f (θ) ≥ 0 for all θ ∈ Θ0. Then everytangent vector τ ∈ TC θ0(Θ0) satisfies that fθ0,min(τ) ≥ 0.

Mathias Drton Lecture 2: Examples 61 / 110

Page 67: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Example: Cuspidal cubic

Θ0 = {(θ1, θ2) : θ31 = θ2

2}Tangent cone ideal for θ0 = 0 isgenerated by t2

2

Associated algebraic tangent cone

{θ : θ22 = 0} = {θ : θ2 = 0}

Tangent cone at θ0 = 0 is half-ray

{θ : θ1 ≥ 0, θ2 = 0}

Mathias Drton Lecture 2: Examples 62 / 110

Page 68: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Instrumental variables – Singularities

Covariance matrixω1 0 γ31 ω1 γ43 γ31 ω1

ω2 γ32 ω2 γ43 γ32 ω2

ω3 + . . . γ35 γ45 ω5 + . . .

ω4 + . . .

X1

X3

X4

X2

X5

Vanishing idealI = 〈σ12, σ13σ24 − σ14σ23 〉

Singular locus:

{Σ = (σij) : σ12 = σ13 = σ14 = σ23 = σ24 = 0}

coincides with H0 : γ31 = γ32 = 0

Mathias Drton Lecture 2: Examples 63 / 110

Page 69: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Instrumental variables – Tangent cone

Singularities are ‘zero’

Vanishing ideal is homogeneous and thus equal to tangent cone ideal

Algebraic tangent cone at a singularity:(diag2×2 rank ≤ 1

arbitrary2×2

)Geometric tangent cone TC is closed cone that contains all derivativedirections. It is equal to algebraic cone.

Mathias Drton Lecture 2: Examples 64 / 110

Page 70: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Instrumental variables – Asymptotics

Proposition

Consider testingH0 : γ31 = γ32 = 0

in the instrumental variables example. Under the null and as n→∞,

λn −→d max{eigenvalues(W(2, I ))}

where W2×2(2, I ) is standard Wishart matrix with 2 degrees of freedom.

‘Proof’ (Details in worked exercises 6.4 and 6.5 in lecture notes)

Tangent cone invariant under transformation with matrix square rootof Fisher-information

Distance between 2× 2-matrix A and {rank ≤ 1} given by smallersingular value of A

Mathias Drton Lecture 2: Examples 65 / 110

Page 71: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Factor analysis

Factor analysis (conditional independence given hidden variable)

X1 = γ1H + ε1,

X2 = γ2H + ε2,

X3 = γ3H + ε3,

X4 = γ4H + ε4

X1 X2 X3 X4

H

Multivariate normal distributions N4(µ,Σ) with µ ∈ R4 and Σ in

Θ0 = {∆ + γγt | ∆ ∈ R4×4pd diagonal, γ ∈ R4}

Software (e.g. factanal in R) for testing

H0 : Σ ∈ Θ0 vs. H1 : Σ 6∈ Θ0,

uses LRT and χ22-approximation

Mathias Drton Lecture 2: Examples 66 / 110

Page 72: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Factor analysis

Histograms of 20,000 simulated p-values for sample size n = 1000:

Γ = (1, 1, 1, 1)t

p−value

0.0 0.4 0.8

Γ = (1, 1, 1, 0)t

p−value

0.0 0.4 0.8

0.0

0.4

0.8

Γ = (1, 1, 0, 0)t

p−value

0.0 0.4 0.8

0.0

0.6

1.2

Γ = (1, 0, 0, 0)t

p−value

0.0 0.4 0.8

0.0

1.0

Factor loadings 0 or 1, cond. variances 1/3 =⇒ correlations 0 or 3/4.

Three types of limiting distributions?

Mathias Drton Lecture 2: Examples 67 / 110

Page 73: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Factor analysis – Singular session

LIB "sing.lib";LIB "linalg.lib";

ring R = 0,(s11,s12,s13,s14, s22,s23,s24, s33,s34, s44,d1,d2,d3,d4, g1,g2,g3,g4),dp;

// Compute the vanishing ideal by eliminationideal F = s11-(d1+g1^2), s12-g1*g2, s13-g1*g3, s14-g1*g4,

s22-(d2+g2^2), s23-g2*g3, s24-g2*g4,s33-(d3+g3^2), s34-g3*g4,s44-(d4+g4^2);

ideal I = eliminate(F, d1*d2*d3*d4*g1*g2*g3*g4);I;

Mathias Drton Lecture 2: Examples 68 / 110

Page 74: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Factor analysis – Singular session

ring RR = 0,(s11,s12,s13,s14, s22,s23,s24, s33,s34, s44),dp;ideal I = fetch(R,I);dim(groebner(I));

// Compute the singularitiesideal S = slocus(I); S;primdecGTZ(S);

// Tangent cone at diagonal matrixtangentcone(I);// at matrix with s12=1tangentcone( subst(I,s12,s12+1) );// at regular point with s12=s13=1tangentcone( subst(I,s12,s12+1,s13,s13+1) );

Mathias Drton Lecture 2: Examples 69 / 110

Page 75: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Factor analysis: Singularities and tangent cones

Theorem (D, 2009)

(i) A covariance matrix Σ is a singularity of the one-factor model if andonly if Σ has at most one non-zero off-diagonal entry σij , i < j .

(ii) If Σ is diagonal then the tangent cone is the topological closure of{∆ + γγt | ∆ ∈ Rm×m diagonal, γ ∈ Rm

}.

(iii) If Σ has exactly one non-zero off-diagonal entry that is positive, sayσ12 > 0, then the tangent cone is the set of symmetric matrices

θ =

θ11 θ12 θ13 . . . θ1m

θ12 θ22 cθ13 . . . cθ1m

θ33 . . .θmm

, c ∈[σ12

σ11,σ22

σ12

].

Case σ12 < 0 is similar with c < 0.

Mathias Drton Lecture 2: Examples 70 / 110

Page 76: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Exercise: RC association model (Haberman, 1981)

Two discrete r.v. X1 and X2 with r1 and r2 states, respectively.

Logarithmic parametrization

log pij = αi + βj + γiδj , i ∈ [r1], j ∈ [r2]

What are the singularities? (in log-prob coordinates)

What do the tangent cones at the singularities look like?

What is the asymptotic distribution for the likelihood ratio statisticwhen testing the independence model X1⊥⊥X2 against the RCassociation model?

Mathias Drton Lecture 2: Examples 71 / 110

Page 77: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Part III

Bayesian Integrals

7 Information criteria for model selection8 Marginal likelihood integrals9 Resolution of singularities and Newton polyhedra10 Reduced rank regression

Page 78: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Lecture outline

7 Information criteria for model selection

8 Marginal likelihood integrals

9 Resolution of singularities and Newton polyhedra

10 Reduced rank regression

Mathias Drton Lecture 3: Information criteria for model selection 73 / 110

Page 79: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Model selection: Setup

Observations X (1), . . . ,X (n) ∼ P i.i.d.

Unknown P assumed to be in (identifiable) ambient statistical model

{Pθ : θ ∈ Θ}, Θ ⊆ Rk .

True parameter θ0 is such that Pθ0 = P.

Call submodel given by Θ0 ⊂ Θ true if θ0 ∈ Θ0.

Model selection problem

Find the “simplest” true model from a set of competing submodelsassociated with

Θ1,Θ2, . . . ,ΘM ⊆ Θ.

Mathias Drton Lecture 3: Information criteria for model selection 74 / 110

Page 80: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Score-based search

Strategy

Assign a score to each model and maximize the score.

Assume densities pθ(x), and define likelihood function

Ln : Θ→ R, θ 7→n∏

i=1

pθ(X (i)).

For submodel Θi , let

ˆn(i) = sup{ log Ln(θ) | θ ∈ Θi}, i = 1, . . . ,M.

If Θ1 ⊆ Θ2, then ˆn(1) ≤ ˆ

n(2).

Mathias Drton Lecture 3: Information criteria for model selection 75 / 110

Page 81: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Information criteria

Definition

The information criterion associated with a penalty function πn : [M]→ Rassigns the score

τn(i) = ˆn(i)− πn(i)

to the i-th model, i = 1, . . . ,M.

Example

AIC: πn(i) = dim(Θi ) (Akaike)

BIC: πn(i) = dim(Θi )2 log(n) (Bayesian, Schwarz)

Information criteria strike balance between model fit and modeldimensionality.

Mathias Drton Lecture 3: Information criteria for model selection 76 / 110

Page 82: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Basic consistency result

Theorem (compare Haughton, 1988)

Consider a regular exponential family (Pθ | θ ∈ Θ). In particular, Θ ⊆ Rk

is open. Let Θ1,Θ2 ⊆ Θ be any two sets.

1 Suppose θ0 ∈ Θ2 \Θ1. If 1n |πn(2)− πn(1)| n→∞−→ 0, then

limn→∞

Pθ0 (τn(1) < τn(2)) = 1.

2 Suppose θ0 ∈ Θ1 ∩Θ2. If πn(1)− πn(2)n→∞−→ ∞, then

limn→∞

Pθ0 (τn(1) < τn(2)) = 1.

Mathias Drton Lecture 3: Information criteria for model selection 77 / 110

Page 83: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Consistency

Corollary

Suppose a collection of models is given by closed sets Θ1,Θ2, . . . ,ΘM . Ifthe collection is closed under intersections, and Θi ⊂ Θj impliesdim(Θi ) < dim(Θj), then:

1 AIC identifies a true model with prob one as n→∞.

2 BIC identifies smallest true model with prob one as n→∞.

Example

1 Linear regression (random design)

2 Undirected graphical models

3 Determining rank in reduced-rank regression (‘singularities)

4 Determining number of factors in factor analysis (‘singularities)

5 Directed graphical models (‘faithfulness’), hidden var’s (‘singularities’)

Mathias Drton Lecture 3: Information criteria for model selection 78 / 110

Page 84: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Lecture outline

7 Information criteria for model selection

8 Marginal likelihood integrals

9 Resolution of singularities and Newton polyhedra

10 Reduced rank regression

Mathias Drton Lecture 3: Marginal likelihood integrals 79 / 110

Page 85: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Bayesian model determination

Prior probability of model i :

P(Θi ), i = 1, . . . ,M

Prior distribution of parameter in model i :

Qi (θ), θ ∈ Θi

Likelihood function:

Ln(θ | X (1), . . . ,X (n)) =n∏

i=1

pθ(X (i))

Posterior probability of model i :

P(Θi | X (1), . . . ,X (n)) ∝ P(Θi )

∫Θi

Ln(θ | X (1), . . . ,X (n)) dQi (θ)︸ ︷︷ ︸marginal/integrated likelihood

Mathias Drton Lecture 3: Marginal likelihood integrals 80 / 110

Page 86: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Bayesian model determination

Prior probability of model i :

P(Θi ), i = 1, . . . ,M

Prior distribution of parameter in model i :

Qi (θ), θ ∈ Θi

Likelihood function:

Ln(θ | X (1), . . . ,X (n)) =n∏

i=1

pθ(X (i))

Posterior probability of model i :

P(Θi | X (1), . . . ,X (n)) ∝ P(Θi )

∫Θi

Ln(θ | X (1), . . . ,X (n)) dQi (θ)︸ ︷︷ ︸marginal/integrated likelihood

Mathias Drton Lecture 3: Marginal likelihood integrals 80 / 110

Page 87: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Marginal likelihood

In typical applications, the models are parametrized:

θ = gi (γ), γ ∈ Rd

Priors Qi specified via distributions on γ that have densities pi (γ)

Marginal likelihood for one model (suppressing index i):

µn =

∫Rd

Ln

(g(γ) | X (1), . . . ,X (n)

)p(γ) dγ

=

∫Rd

e`n( g(γ) |X (1),...,X (n))p(γ) dγ

Frequentist view

Suppose X (1), . . . ,X (n), · · · ∼ Pθ0 are i.i.d. with θ0 = g(γ0).

What is the asymptotic behavior of the sequence (µn)?

Mathias Drton Lecture 3: Marginal likelihood integrals 81 / 110

Page 88: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Asymptotics for marginal likelihood integrals

Theorem (Laplace approximation; Haughton, 1988)

Let {Pθ : θ ∈ Θ} be a regular exponential family with Θ ⊆ Rk . Consideran open set Γ ⊆ Rd and a smooth injective map g : Γ→ Rk withcontinuous inverse. Let θ0 = g(γ0) be the true parameter, and assumethat the prior density p(γ) is smooth and positive in a neighborhood of γ0.Then

logµn = ˆn −

d

2log(n) + Op(1),

whereˆn = sup

γ∈Γ`n(g(γ) |X (1), . . . ,X (n)

).

Recall: Rn = Op(1) if ∀ε > 0 ∃Mε ∀n P(|Rn| > Mε) < ε

Haughton actually gives expansion of log µn up to Op

(n−1/2

)Mathias Drton Lecture 3: Marginal likelihood integrals 82 / 110

Page 89: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Example: Normal means model

Observations:

X (1), . . . ,X (n) ∼ N (θ, Ik×k), θ ∈ Θ = Rk

Likelihood function:

Ln(θ | X (1), . . . ,X (n)) =

(1√

(2π)k

)n

exp{−n · 1

2 ||Xn − θ||2}

Model parametrization g : Rd → Rk

Marginal likelihood

µn = Cn

∫Rd

exp{−n · 1

2‖Xn − g(γ)‖2}

p(γ) dγ

Mathias Drton Lecture 3: Marginal likelihood integrals 83 / 110

Page 90: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Cuspidal cubic

Model Θ0 = {θ ∈ R2 : θ22 = θ3

1}Parametrized by g(γ) = (γ2, γ3)

If γ0 6= 0, i.e., g(γ0) 6= 0, thenHaughton’s Theorem applies.

If θ0 = g(γ0) 6= 0, then

log

∫ ∞−∞

exp{−n · 1

2‖Xn − g(γ)‖2}

p(γ) dγ = −1

2log(n) + Op(1).

(Exponent ≈ quadratic in γ, Gaussian density with variance c/n)

What if θ0 = 0 ⇐⇒ γ0 = 0?

Mathias Drton Lecture 3: Marginal likelihood integrals 84 / 110

Page 91: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Cuspidal cubic

Integral with normalizing constant omitted:∫ ∞−∞

exp{− 1

2

[(√

nγ2 −√

nXn,1)2 + (√

nγ3 −√

nXn,2)2]}

p(γ) dγ

Change of variables γ = n1/4γ:

n−1/4

∫ ∞−∞

exp{− 1

2

[(γ2 −

√nXn,1)2+( γ3

n1/4−√

nXn,2

)2]}p

n1/4

)d γ.

Let θ0 = 0 and Z1,Z2ind∼ N (0, 1). Limit when multiplying by n1/4:∫ ∞

−∞exp

{− 1

2

[(γ2 − Z1)2 + Z 2

2

]}p (0) dγ.

Hence, log µn = ˆn − 1

4 log(n) + Op(1)

Mathias Drton Lecture 3: Marginal likelihood integrals 85 / 110

Page 92: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Observation

Sequence of random intervals:

logµn = log

∫Rd

Cn exp{−n · 1

2‖Xn − g(γ)‖2}

p(γ) dγ

=

{ˆn − 1

2 log(n) + Op(1) if γ0 6= 0,ˆn − 1

4 log(n) + Op(1) if γ0 = 0

Deterministic intervals (replace Xn by expectation θ0 = g(γ0)):

log

∫ ∞−∞

Cn exp{−n · 1

2‖g(γ0)− g(γ)‖2}

p(γ) dγ

=

{n log(C )− 1

2 log(n) + O(1) if γ0 6= 0,

n log(C )− 14 log(n) + O(1) if γ0 = 0

Same asymptotics!

Mathias Drton Lecture 3: Marginal likelihood integrals 86 / 110

Page 93: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Laplace integrals

Theorem

Let {Pθ : θ ∈ Θ} be a regular exponential family. Consider a polynomialmap g : Rd → Θ, and let θ0 = g(γ0) be the true parameter. Assume thatthat the prior density p(γ) is smooth and positive on a compact andsemi-analytic supporting set. Then

logµn = ˆn − q log(n) + (s − 1) log log(n) + Op(1),

where the rational number q ∈ (0, d/2] and the integer s ∈ [d ] satisfy that

log

∫e−n‖g(γ)−θ0‖2

p(γ)dγ = −q log(n) + (s − 1) log log(n) + O(1).

Remark

The remainder can be shown to converge in distribution.

Mathias Drton Lecture 3: Marginal likelihood integrals 87 / 110

Page 94: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Watanabe’s book

The theorem is proven in the book byWatanabe.

Watanabe also discusses algebraictechniques for computing the learningcoefficient = growth index q and themultiplicity s

Singular integrals:

Arnol’d, V.I.; Gusein-Zade, S.M.;Varchenko, A.N. Singularities ofdifferentiable maps. Vol. I & II,1985/88.Work by Michael Greenblatt at UIC

Mathias Drton Lecture 3: Marginal likelihood integrals 88 / 110

Page 95: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Example: Sample vs true mean in normal means model

Random integral

logµn = log

∫Rd

exp{−n · 1

2‖Xn − g(γ)‖2}

p(γ) dγ

Simple bound for any a > 0:

2|〈Xn − θ0, g(γ)− θ0〉| ≤ a‖Xn − θ0‖2 +1

a‖g(γ)− θ0‖2

Bound in exponent:

‖Xn − g(γ)‖2a=1≤ 2‖g(γ)− θ0‖2 + 2‖Xn − θ0‖2

‖Xn − g(γ)‖2a=2≥ 1

2‖g(γ)− θ0‖2 − ‖Xn − θ0‖2

If deterministic integral based on e−n‖g(γ)−θ0‖2has an asymptotic

expansion then random integrals have same growth behavior.

Mathias Drton Lecture 3: Marginal likelihood integrals 89 / 110

Page 96: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Lecture outline

7 Information criteria for model selection

8 Marginal likelihood integrals

9 Resolution of singularities and Newton polyhedra

10 Reduced rank regression

Mathias Drton Lecture 3: Resolution of singularities and Newton polyhedra 90 / 110

Page 97: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Zeta function

Polynomial map f : Rd → [0,∞)

Smooth prior p(γ), positive on compact semi-analytic support

Laplace integral ∫e−nf (γ)p(γ) dγ

Zeta function:

ζ(λ) =

∫f (γ)λp(γ) dγ, λ ∈ C,Re(λ) > 0

Theorem

The zeta function ζ(λ) can be continued (uniquely) to a meromorphicfunction on all of C. All poles are negative rational numbers. The negatedgrowth index q is the largest pole of ζ(λ) and the multiplicity s is themultiplicity of this pole.

Mathias Drton Lecture 3: Resolution of singularities and Newton polyhedra 91 / 110

Page 98: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Local view

For large n, main contribution to∫e−nf (γ)p(γ) dγ

comes from neighborhood of

Vf = {γ : f (γ) = 0} ∩ supp(p).

Since prior support assumed compact, study the asymptotics of∫U(γ0)

e−nf (γ)p(γ) dγ, U(γ0) small neighborhood of γ0,

for all γ0 ∈ Vf

Note: For marginal likelihood f (γ) = 0 ⇐⇒ g(γ) = θ0

(‘identifiability’ issues)

Mathias Drton Lecture 3: Resolution of singularities and Newton polyhedra 92 / 110

Page 99: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Resolution of singularities

Theorem (Hironaka, 1964; Atiyah, 1970)

In the considered setup, for every γ0 ∈ Vf , there exists

a neighborhood U(γ0) of γ0 ∈ Rd and

changes of coordinates

such that the zeta function becomes a finite sum of the form∫U(γ0)

f (γ)λp(γ) dγ =

∑α

∫[0,b]d

(u

2k1(α)1 . . . u

2kd (α)d

)λφα(u)u

h1(α)1 . . . u

hd (α)d du,

where the φα are smooth and bounded away from zero on [0, b]d .

Mathias Drton Lecture 3: Resolution of singularities and Newton polyhedra 93 / 110

Page 100: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Largest pole and multiplicity

Once in ‘normal crossing form’ meromorphic continuation anddetermination of poles clear.

Example:∫(u2k)λuh du =

u2kλ+h+1

2kλ+ h + 1, Pole λ = −h + 1

2k

Growth index:

q = minα

min1≤j≤d

hj(α) + 1

2kj(α)

Multiplicity:

s = maxα

#

{j :

hj(α) + 1

2kj(α)= q

}

Mathias Drton Lecture 3: Resolution of singularities and Newton polyhedra 94 / 110

Page 101: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Example: Blow-up transformations

Product interval∫ 1

−1

∫ 1

−1e−n·(x4+y6) dy dx ∼ n−1/4n−1/6 · C = n−5/12 · C

Resolve by repeatedly applying blow-up transformation, i.e., the pair

x = x1, y = x1y1; x = x2y2, y = y2.

y = y’x = x’y’,

Mathias Drton Lecture 3: Resolution of singularities and Newton polyhedra 95 / 110

Page 102: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Example: Blow-up transformations

First blow-up transformation gives

x4 + y 6 = x41 (1 + x2

1 y 61 ) Jacob. x1

= y 42 (x2

2 + y 22 ) y1

In 1st coordinates normal crossing, 4λ+ 2 = 0, pole −12

In 2nd coordinates not normal crossing, repeat

y 4(x4 + y 2) = x61 y 4

1 (x21 + y 2

2 ) Jacob. x21 y1

= y 62 (1 + x4

2 y 22 ) y 2

2

In 2nd coordinates normal crossing, 6λ+ 3 = 0, pole −12

In 1st coordinates not normal crossing, repeat

x6y 4(x2 + y 2) = x121 y 4

1 (1 + y 21 ) Jacob. x4

1 y1

= x62 y 12

2 (1 + x22 ) x2

2 y 42

Normal crossing in both coordinates: q = 512 , s = 1

Mathias Drton Lecture 3: Resolution of singularities and Newton polyhedra 96 / 110

Page 103: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Example: Blow-up transformations

First blow-up transformation gives

x4 + y 6 = x41 (1 + x2

1 y 61 ) Jacob. x1

= y 42 (x2

2 + y 22 ) y1

In 1st coordinates normal crossing, 4λ+ 2 = 0, pole −12

In 2nd coordinates not normal crossing, repeat

y 4(x4 + y 2) = x61 y 4

1 (x21 + y 2

2 ) Jacob. x21 y1

= y 62 (1 + x4

2 y 22 ) y 2

2

In 2nd coordinates normal crossing, 6λ+ 3 = 0, pole −12

In 1st coordinates not normal crossing, repeat

x6y 4(x2 + y 2) = x121 y 4

1 (1 + y 21 ) Jacob. x4

1 y1

= x62 y 12

2 (1 + x22 ) x2

2 y 42

Normal crossing in both coordinates: q = 512 , s = 1

Mathias Drton Lecture 3: Resolution of singularities and Newton polyhedra 96 / 110

Page 104: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Example: Blow-up transformations

First blow-up transformation gives

x4 + y 6 = x41 (1 + x2

1 y 61 ) Jacob. x1

= y 42 (x2

2 + y 22 ) y1

In 1st coordinates normal crossing, 4λ+ 2 = 0, pole −12

In 2nd coordinates not normal crossing, repeat

y 4(x4 + y 2) = x61 y 4

1 (x21 + y 2

2 ) Jacob. x21 y1

= y 62 (1 + x4

2 y 22 ) y 2

2

In 2nd coordinates normal crossing, 6λ+ 3 = 0, pole −12

In 1st coordinates not normal crossing, repeat

x6y 4(x2 + y 2) = x121 y 4

1 (1 + y 21 ) Jacob. x4

1 y1

= x62 y 12

2 (1 + x22 ) x2

2 y 42

Normal crossing in both coordinates: q = 512 , s = 1

Mathias Drton Lecture 3: Resolution of singularities and Newton polyhedra 96 / 110

Page 105: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Example: Blow-up transformations

First blow-up transformation gives

x4 + y 6 = x41 (1 + x2

1 y 61 ) Jacob. x1

= y 42 (x2

2 + y 22 ) y1

In 1st coordinates normal crossing, 4λ+ 2 = 0, pole −12

In 2nd coordinates not normal crossing, repeat

y 4(x4 + y 2) = x61 y 4

1 (x21 + y 2

2 ) Jacob. x21 y1

= y 62 (1 + x4

2 y 22 ) y 2

2

In 2nd coordinates normal crossing, 6λ+ 3 = 0, pole −12

In 1st coordinates not normal crossing, repeat

x6y 4(x2 + y 2) = x121 y 4

1 (1 + y 21 ) Jacob. x4

1 y1

= x62 y 12

2 (1 + x22 ) x2

2 y 42

Normal crossing in both coordinates: q = 512 , s = 1

Mathias Drton Lecture 3: Resolution of singularities and Newton polyhedra 96 / 110

Page 106: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Resolution – Singular session

LIB "resolve.lib";ring R = 0,(x,y),dp;

ideal J = x4+y6;list L=resolve(J);presentTree(L);

list L=resolve(J,0,"A");presentTree(L);LIB "reszeta.lib";list coll=collectDiv(L);LIB "resgraph.lib";ResTree(L,coll[1]);

Mathias Drton Lecture 3: Resolution of singularities and Newton polyhedra 97 / 110

Page 107: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Distance of Newton polyhedron

∫ 1

−1

∫ 1

−1e−n·(x4+y6) dy dx ∼ n−1/4n−1/6 · C = n−5/12 · C

(12/5,12/5)

(6,0)

(0,4)

Distance:

ρ = 4 · 3

5= 6 · 2

5=

12

5=⇒ q =

1

ρ

Mathias Drton Lecture 3: Resolution of singularities and Newton polyhedra 98 / 110

Page 108: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Newton polyhedron

Polynomial

f (x) =∑a∈Nd

caxa, xa = xa11 . . . xad

d

Newton polyhedron Pf is the convex hull of the set⋃a:ca 6=0

({a}+ [0,∞)d

)Distance:

ρ = min{r : r · 1d ∈ Pf }

For A ⊂ Rd , definefA(x) =

∑a∈A∩Nd

caxa

Mathias Drton Lecture 3: Resolution of singularities and Newton polyhedra 99 / 110

Page 109: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Non-degenerate exponents and remoteness

Theorem

If the polynomial f has a minimum at zero and is non-degenerate, that is,for any compact face A of the Newton polyhedron the equation system

∂fA(x)

∂x1= . . .

∂fA(x)

∂xd= 0

has no solution in (R \ {0})d , then for small ε the growth index for theintegral ∫

[−ε,ε]de−nf (γ)p(γ) dγ

is q = 1/ρ and the multiplicity s is the codimension of thelowest-dimensional face containing the point at which the ray spanned by1d first intersects the Newton polyhedron.

Mathias Drton Lecture 3: Resolution of singularities and Newton polyhedra 100 / 110

Page 110: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Lecture outline

7 Information criteria for model selection

8 Marginal likelihood integrals

9 Resolution of singularities and Newton polyhedra

10 Reduced rank regression

Mathias Drton Lecture 3: Reduced rank regression 101 / 110

Page 111: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Reduced rank regression

Multivariate regression model

Y = θX + ε, θ ∈ Ra×b, rank(θ) ≤ h

X1

H

X2

Y1

Y2

Multivariate normal model (random design X )

Parametrize

θ = g(α, β) = αβT , α ∈ Ra×h, β ∈ Rb×h

Model selection problem: Determine h

WLOG: Assume coordinates of X and ε mutually independent withknown variances.

Mathias Drton Lecture 3: Reduced rank regression 102 / 110

Page 112: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Asymptotics – regular case

Consider model given by rank h.

Suppose true matrix θ0 has rank r ≤ h.

Interested in the asymptotics of the integral∫ ∫exp{−n‖αβT − θ0‖2} dα dβ

Regular case:

The Jacobian of the map g(α, β) = αβT achieves its maximal rankh(a + b − h) at a point (α0, β0) if and only if α0β

T0 has full rank h.

If θ0 has rank r = h, then the set g−1(θ0) ⊆ Rah+bh is a smoothmanifold of dimension h2.

Reparametrize and apply Laplace approximaton (Haughton’s result) toobtain

q = h(a + b − h)/2, s = 1.

Mathias Drton Lecture 3: Reduced rank regression 103 / 110

Page 113: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Asymptotics – singular case

Interested in the asymptotics of the integral∫ ∫exp{−n‖αβT − θ0‖2} dα dβ

Singular case: rank of θ0 is equal to r < h

Aoyagi & Watanabe (2005):Found growth index q and multiplicity s as a function of (a, b, h, r)

Simplest case with singularities is model rank h = 1

Mathias Drton Lecture 3: Reduced rank regression 104 / 110

Page 114: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Asymptotics – singular case for rank 1

Model rank h = 1

Only one singular point: θ0 = 0

Fiberg−1(θ0) = {(α0, β0) : α0 = 0 or β0 = 0}

singular at the origin (α0, β0) = 0 and smooth elsewhere.

Local integrals are∫U(α0)

∫U(β0)

exp{−n(α21 + · · ·+ α2

a)(β21 + · · ·+ β2

b)} dα dβ,

(α0, β0) ∈ g−1(0).

Mathias Drton Lecture 3: Reduced rank regression 105 / 110

Page 115: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Case 1

Suppose α0 = (α01, . . . , α0k , 0, . . . , 0) 6= 0. Then β0 = 0.

Shift (α0, β0) to origin by transformation αi = αi − α0i

Local integral becomes∫U(0)

exp{−n[(α1 + α01)2 + · · ·+ (αk + α0k)2 + α2k+1 + · · ·+ α2

a]

(β21 + · · ·+ β2

b)} d(α, β)

Function of α in exponent is bounded away from zero in aneighborhood U(0).

Asymptotics determined by that of∫U(0)

exp{−n(β21 + · · ·+ β2

b)} dβ

which is a regular integral with growth index b/2 and multiplicity 1.

Mathias Drton Lecture 3: Reduced rank regression 106 / 110

Page 116: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Case 2

Suppose α0 = β0 = 0.

Resolve (α21 + · · ·+ α2

a)(β21 + · · ·+ β2

b) by applying a blow-up to thefirst term and a blow-up to the second term.

We obtain∫U(0,0)

α2λ1 β2λ

1 αa−11 βb−1

1

(1 + α2

2 + . . .)λ (

1 + β22 + . . .

)λdα dβ.

Consider ∫α2λ+a−1

1 β2λ+b−11 dα1dβ1 =

α2λ+a1 β2λ+b

1

(2λ+ a)(2λ+ b).

Poles λ = −a/2 and λ = −b/2.

Mathias Drton Lecture 3: Reduced rank regression 107 / 110

Page 117: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Asymptotics for rank 1

Proposition

The marginal likelihood for the reduced rank regression model for rankh = 1 has growth index and multiplicity

(q, s) =

(a+b−1

2 , 1)

if θ0 6= 0,(min{a,b}

2 , 1)

if θ0 = 0 and a 6= b,(a2 = b

2 , 2)

if θ0 = 0 and a = b.

This can also be shown by looking at the Newton diagrams

Mathias Drton Lecture 3: Reduced rank regression 108 / 110

Page 118: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Exercise: Factor analysis

Let H and ε1, . . . , εd be mutually independent N (0, 1) r.v.

Define

X = αH + ε, α ∈ Rd

Then X ∼ N (0, θ) with covariance matrix θ = I + ααT , α ∈ Rd

X1 X2 X3 X4

H

What is the growth behaviour of marginal likelihood of this model?

Mathias Drton Lecture 3: Reduced rank regression 109 / 110

Page 119: An Introduction to Algebraic Statisticsmd5/Papers/algstat.pdf · 2010-01-13 · ‘Algebraic statistics’ Application and development of techniques in Algebraic Geometry, Commutative

Conclusion

Algebraic statistical models:useful framework for discussing non-smooth statistical models.

Computational algebra:Markov bases, vanishing ideals, singular loci, tangent cones, resolutionof singularities, . . .

Many open questions about classical statistical models . . .

Mathias Drton 110 / 110