dpps in stats and ml - with real bits of joint work w ... - flattened.pdf · dpps in stats and ml...

37
DPPs in stats and ML with real bits of joint work w/ Adrien Hardy, Michalis Titsias, Guillaume Gautier, and Michal Valko. emi Bardenet 1 CNRS & CRIStAL, Univ. Lille, France emi Bardenet (CNRS & Univ. Lille) Determinantal point processes in stats and ML: computational issues 1

Upload: others

Post on 20-Mar-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DPPs in stats and ML - with real bits of joint work w ... - flattened.pdf · DPPs in stats and ML with real bits of joint work w/ Adrien Hardy, Michalis Titsias, Guillaume Gautier,

DPPs in stats and MLwith real bits of joint work w/ Adrien Hardy, Michalis

Titsias, Guillaume Gautier, and Michal Valko.

Remi Bardenet

1CNRS & CRIStAL, Univ. Lille, France

Remi Bardenet (CNRS & Univ. Lille) Determinantal point processes in stats and ML: computational issues 1

Page 2: DPPs in stats and ML - with real bits of joint work w ... - flattened.pdf · DPPs in stats and ML with real bits of joint work w/ Adrien Hardy, Michalis Titsias, Guillaume Gautier,

Summary

Determinantal point processes

A zoo of DPPs

DPPs in stats and ML

Advances on inference and sampling

Remi Bardenet (CNRS & Univ. Lille) Determinantal point processes in stats and ML: computational issues 2

Page 3: DPPs in stats and ML - with real bits of joint work w ... - flattened.pdf · DPPs in stats and ML with real bits of joint work w/ Adrien Hardy, Michalis Titsias, Guillaume Gautier,

Summary

Determinantal point processes

A zoo of DPPs

DPPs in stats and ML

Advances on inference and sampling

Remi Bardenet (CNRS & Univ. Lille) Determinantal point processes in stats and ML: computational issues 3

Page 4: DPPs in stats and ML - with real bits of joint work w ... - flattened.pdf · DPPs in stats and ML with real bits of joint work w/ Adrien Hardy, Michalis Titsias, Guillaume Gautier,

Point processes

I A point process X on S is a random countable set of points inS .

I In most cases, it is defined by its joint intensities ρk

E

[k∏

i=1

X (Di )

]=

∫∏

Di

ρk(x1, . . . , xk)dµ(x1) . . . dµ(xk)

for disjoint Di s, see [6].

I A point process is determinantal with kernel K if:

Remi Bardenet (CNRS & Univ. Lille) Determinantal point processes in stats and ML: computational issues 4

Page 5: DPPs in stats and ML - with real bits of joint work w ... - flattened.pdf · DPPs in stats and ML with real bits of joint work w/ Adrien Hardy, Michalis Titsias, Guillaume Gautier,

Determinantal point processes

I Existence is tricky, see e.g. [11]

I A DPP is repulsive.

I Repulsiveness is geometric.

Remi Bardenet (CNRS & Univ. Lille) Determinantal point processes in stats and ML: computational issues 5

Page 6: DPPs in stats and ML - with real bits of joint work w ... - flattened.pdf · DPPs in stats and ML with real bits of joint work w/ Adrien Hardy, Michalis Titsias, Guillaume Gautier,

Summary

Determinantal point processes

A zoo of DPPs

DPPs in stats and ML

Advances on inference and sampling

Remi Bardenet (CNRS & Univ. Lille) Determinantal point processes in stats and ML: computational issues 6

Page 7: DPPs in stats and ML - with real bits of joint work w ... - flattened.pdf · DPPs in stats and ML with real bits of joint work w/ Adrien Hardy, Michalis Titsias, Guillaume Gautier,

Uniform spanning trees

I Let A be the vertex-edge incidence matrix of a connectedgraph G , and drop the last row.

I Sample a uniform spanning tree of G , then

edges in T ∼ DPP(K ),

with K = AT (AAT )−1A, see [5].

Remi Bardenet (CNRS & Univ. Lille) Determinantal point processes in stats and ML: computational issues 7

Page 8: DPPs in stats and ML - with real bits of joint work w ... - flattened.pdf · DPPs in stats and ML with real bits of joint work w/ Adrien Hardy, Michalis Titsias, Guillaume Gautier,

Random matrices

Eigenvalues of some random matrices are DPPs:

I when G is filled in with iid complex Gaussians,

40 30 20 10 0 10 20 30 4040

30

20

10

0

10

20

30

40

Figure: The Ginibre ensemble with N = 1000.

Remi Bardenet (CNRS & Univ. Lille) Determinantal point processes in stats and ML: computational issues 8

Page 9: DPPs in stats and ML - with real bits of joint work w ... - flattened.pdf · DPPs in stats and ML with real bits of joint work w/ Adrien Hardy, Michalis Titsias, Guillaume Gautier,

Random matrices

Eigenvalues of some random matrices are DPPs:

I when H = 12 (G + G ∗),

10 5 0 5 100.06

0.04

0.02

0.00

0.02

0.04

0.06

Figure: The GUE with N = 50.

Remi Bardenet (CNRS & Univ. Lille) Determinantal point processes in stats and ML: computational issues 8

Page 10: DPPs in stats and ML - with real bits of joint work w ... - flattened.pdf · DPPs in stats and ML with real bits of joint work w/ Adrien Hardy, Michalis Titsias, Guillaume Gautier,

A zoo of DPPs: N free fermions at equilibrium

I In statistical quantum physics, a system of one particle isdescribed at equilibrium by

HψE (q) = EψE (q).

I We want a Ψ : SN → C such that

|Ψ(qσ(1), . . . , qσ(N))|2 = |Ψ(q1, . . . , qN)|2,∀σ ∈ SN .

Remi Bardenet (CNRS & Univ. Lille) Determinantal point processes in stats and ML: computational issues 9

Page 11: DPPs in stats and ML - with real bits of joint work w ... - flattened.pdf · DPPs in stats and ML with real bits of joint work w/ Adrien Hardy, Michalis Titsias, Guillaume Gautier,

Summary

Determinantal point processes

A zoo of DPPs

DPPs in stats and ML

Advances on inference and sampling

Remi Bardenet (CNRS & Univ. Lille) Determinantal point processes in stats and ML: computational issues 10

Page 12: DPPs in stats and ML - with real bits of joint work w ... - flattened.pdf · DPPs in stats and ML with real bits of joint work w/ Adrien Hardy, Michalis Titsias, Guillaume Gautier,

Orthogonal polynomial ensembles

I Let µ be a positive Borel measure, [2] build a DPP(µ,KN) onRd such that

√N1+1/d

(N∑i=1

f (xi )

KN(xi , xi )−∫

f (x)µ(dx)

)law−−−−→

N→∞N(0,Ω2

f ,ω

).

for f essentially C 1, where Ωf ,ω measures the decay of theFourier coefficients of f .

I This is useful for Monte Carlo, provided we know how tosample from that DPP!

Remi Bardenet (CNRS & Univ. Lille) Determinantal point processes in stats and ML: computational issues 11

Page 13: DPPs in stats and ML - with real bits of joint work w ... - flattened.pdf · DPPs in stats and ML with real bits of joint work w/ Adrien Hardy, Michalis Titsias, Guillaume Gautier,

Computational issue #1: Sampling from a DPP

I The vanilla algorithm starts from a diagonalized

K (x , y) =∞∑i=1

λiϕi (x)ϕi (y).

I This is O(N3), knowing the diagonalization and neglectingrejection sampling!

Remi Bardenet (CNRS & Univ. Lille) Determinantal point processes in stats and ML: computational issues 12

Page 14: DPPs in stats and ML - with real bits of joint work w ... - flattened.pdf · DPPs in stats and ML with real bits of joint work w/ Adrien Hardy, Michalis Titsias, Guillaume Gautier,

An example from spatial statistics [10, Section 5.4]

Remi Bardenet (CNRS & Univ. Lille) Determinantal point processes in stats and ML: computational issues 13

Page 15: DPPs in stats and ML - with real bits of joint work w ... - flattened.pdf · DPPs in stats and ML with real bits of joint work w/ Adrien Hardy, Michalis Titsias, Guillaume Gautier,

An example from spatial statistics [10, Section 5.4]

I Compare a hardcore-Strauss model

p(x1:n|β, γ, r1, r2) ∝ βn∏i

1xi∈W∏i<j

1‖xi−xj‖>r1γ

1‖xi−xj‖<r2 .

(1)

I fitted with adhoc pseudolikelihood methods.

Remi Bardenet (CNRS & Univ. Lille) Determinantal point processes in stats and ML: computational issues 14

Page 16: DPPs in stats and ML - with real bits of joint work w ... - flattened.pdf · DPPs in stats and ML with real bits of joint work w/ Adrien Hardy, Michalis Titsias, Guillaume Gautier,

An example from spatial statistics [10, Section 5.4]

I to a Matern DPP

ρk(x|ρ, ν, α) = det((K (xi , xj)

))∏i

1xi∈W (1)

with K (x , y) = τKν,α(‖x − y‖), Kν,α(0) = 1.

I Since ∫K (x , x)1W (x)dx = τ |W |,

we have an unbiased estimator τ = n|W | .

Remi Bardenet (CNRS & Univ. Lille) Determinantal point processes in stats and ML: computational issues 14

Page 17: DPPs in stats and ML - with real bits of joint work w ... - flattened.pdf · DPPs in stats and ML with real bits of joint work w/ Adrien Hardy, Michalis Titsias, Guillaume Gautier,

Computational issue #2: Fitting a DPP

The density of a DPP

Let µ be supported on S compact,

K (x , y) =∑k>0

λkΦk(x)Φk(y),

and assume λk ⊂ [0, 1) for all k. Then DPP(K , µ) has a density fw.r.t. the unit rate Poisson process on S , and

f (x1, . . . , xn) ∝det((L(xi , xj)

))det(I + L)

where L = (I − K)−1K has kernel

L(x , y) =∑k>0

λi1− λi

Φk(x)Φk(y).

Remi Bardenet (CNRS & Univ. Lille) Determinantal point processes in stats and ML: computational issues 15

Page 18: DPPs in stats and ML - with real bits of joint work w ... - flattened.pdf · DPPs in stats and ML with real bits of joint work w/ Adrien Hardy, Michalis Titsias, Guillaume Gautier,

Analysis of [10]

Remi Bardenet (CNRS & Univ. Lille) Determinantal point processes in stats and ML: computational issues 16

Page 19: DPPs in stats and ML - with real bits of joint work w ... - flattened.pdf · DPPs in stats and ML with real bits of joint work w/ Adrien Hardy, Michalis Titsias, Guillaume Gautier,

Text summarization [9]

Remi Bardenet (CNRS & Univ. Lille) Determinantal point processes in stats and ML: computational issues 17

Page 20: DPPs in stats and ML - with real bits of joint work w ... - flattened.pdf · DPPs in stats and ML with real bits of joint work w/ Adrien Hardy, Michalis Titsias, Guillaume Gautier,

Text summarization [9]

I Build a kernel between sentences

Lij =√qiSij√qj

where Sij ∝∑

w tfi (w)tfj(w)idf(w)2, and qi = exp(θTui ).

I and sample from det(LI )1|I |=k .

I Fitting θ is relatively easy.

I Note this is not a DPP if L is not a projection.

Remi Bardenet (CNRS & Univ. Lille) Determinantal point processes in stats and ML: computational issues 17

Page 21: DPPs in stats and ML - with real bits of joint work w ... - flattened.pdf · DPPs in stats and ML with real bits of joint work w/ Adrien Hardy, Michalis Titsias, Guillaume Gautier,

Text summarization [9]

I Again, this requires a lot of sampling, but we have time.

Remi Bardenet (CNRS & Univ. Lille) Determinantal point processes in stats and ML: computational issues 17

Page 22: DPPs in stats and ML - with real bits of joint work w ... - flattened.pdf · DPPs in stats and ML with real bits of joint work w/ Adrien Hardy, Michalis Titsias, Guillaume Gautier,

Summary

Determinantal point processes

A zoo of DPPs

DPPs in stats and ML

Advances on inference and sampling

Remi Bardenet (CNRS & Univ. Lille) Determinantal point processes in stats and ML: computational issues 18

Page 23: DPPs in stats and ML - with real bits of joint work w ... - flattened.pdf · DPPs in stats and ML with real bits of joint work w/ Adrien Hardy, Michalis Titsias, Guillaume Gautier,

Inference

I Lots of activity in ML and stats, see [4, 13] and refs within,but no clear winning strategy.

I If you forget about K but parametrize L = Lθ instead, weshow [3] how to bypass the spectral decomposition, see also[1].

Remi Bardenet (CNRS & Univ. Lille) Determinantal point processes in stats and ML: computational issues 19

Page 24: DPPs in stats and ML - with real bits of joint work w ... - flattened.pdf · DPPs in stats and ML with real bits of joint work w/ Adrien Hardy, Michalis Titsias, Guillaume Gautier,

Remember computational issue #2: Fitting a DPP

The density of a DPP

Let µ be supported on S compact,

K (x , y) =∑k>0

λkΦk(x)Φk(y),

and assume λk ⊂ [0, 1) for all k. Then DPP(K , µ) has a density fw.r.t. the unit rate Poisson process on S , and

f (x1, . . . , xn) ∝det((L(xi , xj)

))det(I + L)

where L = (I − K)−1K has kernel

L(x , y) =∑k>0

λi1− λi

Φk(x)Φk(y).

Remi Bardenet (CNRS & Univ. Lille) Determinantal point processes in stats and ML: computational issues 20

Page 25: DPPs in stats and ML - with real bits of joint work w ... - flattened.pdf · DPPs in stats and ML with real bits of joint work w/ Adrien Hardy, Michalis Titsias, Guillaume Gautier,

Bounding Fredholm determinants

Proposition

Let Z = z1, . . . , zm ⊂ Rd , then

det LZdet(LZ + Ψ)

e−∫L(x,x)dµ(x)+tr(L−1

Z Ψ) 61

det(I + L)6

det LZdet(LZ + Ψ)

,

where LZ = ((L(zi , zj)) and Ψij =∫L(zi , x)L(x, zj)dµ(x).

I Now we can optimize over Z and plug this into MCMCroutines!

I Empirically, we only suffer from the dimension d .

Remi Bardenet (CNRS & Univ. Lille) Determinantal point processes in stats and ML: computational issues 21

Page 26: DPPs in stats and ML - with real bits of joint work w ... - flattened.pdf · DPPs in stats and ML with real bits of joint work w/ Adrien Hardy, Michalis Titsias, Guillaume Gautier,

What do the optimized inducing inputs look like?

Figure: The panels in the top row show the initial inducing inputlocations for various values of m, while the corresponding panels in thebottom row show the optimized locations.

Remi Bardenet (CNRS & Univ. Lille) Determinantal point processes in stats and ML: computational issues 22

Page 27: DPPs in stats and ML - with real bits of joint work w ... - flattened.pdf · DPPs in stats and ML with real bits of joint work w/ Adrien Hardy, Michalis Titsias, Guillaume Gautier,

Sampling finite projection DPPs

I Random projections can help [9].

I Some DPPs are just easy to sample: e.g. USTs of graphswith no bottleneck.

I Assume we know A such that K = AT (AAT )−1A.

I Key idea we use from [7] in [8]:

Vol(Zon(A)) , A[0, 1]n =∑B∈B

Vol(Zon(B)) =∑B∈B

det B

Remi Bardenet (CNRS & Univ. Lille) Determinantal point processes in stats and ML: computational issues 23

Page 28: DPPs in stats and ML - with real bits of joint work w ... - flattened.pdf · DPPs in stats and ML with real bits of joint work w/ Adrien Hardy, Michalis Titsias, Guillaume Gautier,

Sampling the zonotope

Remi Bardenet (CNRS & Univ. Lille) Determinantal point processes in stats and ML: computational issues 24

Page 29: DPPs in stats and ML - with real bits of joint work w ... - flattened.pdf · DPPs in stats and ML with real bits of joint work w/ Adrien Hardy, Michalis Titsias, Guillaume Gautier,

Sampling the zonotope

Remi Bardenet (CNRS & Univ. Lille) Determinantal point processes in stats and ML: computational issues 24

Page 30: DPPs in stats and ML - with real bits of joint work w ... - flattened.pdf · DPPs in stats and ML with real bits of joint work w/ Adrien Hardy, Michalis Titsias, Guillaume Gautier,

Sampling the zonotope

Remi Bardenet (CNRS & Univ. Lille) Determinantal point processes in stats and ML: computational issues 24

Page 31: DPPs in stats and ML - with real bits of joint work w ... - flattened.pdf · DPPs in stats and ML with real bits of joint work w/ Adrien Hardy, Michalis Titsias, Guillaume Gautier,

Sampling the zonotope

0 5 10 15 20 25 30

#iterations (x103)

1.0 1.0

0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2

0.0 0.0

0.2 0.2

0.4 0.4

0.6 0.6

0.8 0.8

1.0 1.0

Basis Exchange

Zonotope

Figure: Relative error of the mass of a triplet for a BA graph withrandom uniform weights.

Remi Bardenet (CNRS & Univ. Lille) Determinantal point processes in stats and ML: computational issues 24

Page 32: DPPs in stats and ML - with real bits of joint work w ... - flattened.pdf · DPPs in stats and ML with real bits of joint work w/ Adrien Hardy, Michalis Titsias, Guillaume Gautier,

Sampling the zonotope

0 5 10 15 20 25 30CPU time (s)

1.0 1.0

0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2

0.0 0.0

0.2 0.2

0.4 0.4

0.6 0.6

0.8 0.8

1.0 1.0Basis Exchange

Zonotope

Figure: Relative error of the mass of a triplet for a BA graph withrandom uniform weights.

Remi Bardenet (CNRS & Univ. Lille) Determinantal point processes in stats and ML: computational issues 24

Page 33: DPPs in stats and ML - with real bits of joint work w ... - flattened.pdf · DPPs in stats and ML with real bits of joint work w/ Adrien Hardy, Michalis Titsias, Guillaume Gautier,

Conclusion

I DPPs are the kernel machine of PPs,

I Applications in stats [10] and ML [9],

I Applications in signal processing [12] and Bayesiannonparametrics are coming!

I Fast inference and sampling are available.

I powerful statistical models and algorithms combine ideas fromalgebra, combinatorial geometry, functional analysis.

Remi Bardenet (CNRS & Univ. Lille) Determinantal point processes in stats and ML: computational issues 25

Page 34: DPPs in stats and ML - with real bits of joint work w ... - flattened.pdf · DPPs in stats and ML with real bits of joint work w/ Adrien Hardy, Michalis Titsias, Guillaume Gautier,

References I

[1] R. H. Affandi, E. B. Fox, R. P. Adams, and B. Taskar.

Learning the parameters of determinantal point processes.

In Proceedings of the International Conference on Machine Learning(ICML), 2014.

[2] R. Bardenet and A. Hardy.

Monte Carlo with determinantal point processes.

arXiv preprint arXiv:1605.00361, 2016.

[3] R. Bardenet and M. K. Titsias.

Inference for determinantal point processes without spectralknowledge.

In Advances in Neural Information Processing Systems (NIPS),pages 3375–3383, 2015.

Remi Bardenet (CNRS & Univ. Lille) Determinantal point processes in stats and ML: computational issues 26

Page 35: DPPs in stats and ML - with real bits of joint work w ... - flattened.pdf · DPPs in stats and ML with real bits of joint work w/ Adrien Hardy, Michalis Titsias, Guillaume Gautier,

References II

[4] V.-E. Brunel, A. Moitra, P. Rigollet, and J. Urschel.

Maximum likelihood estimation of determinantal point processes.

arXiv preprint arXiv:1701.06501, 2017.

[5] R. Burton and R. Pemantle.

Local characteristics, entropy and limit theorems for spanning treesand domino tilings via transfer-impedances.

Annals of Probability, 21(3):1329–1371, 07 1993.

[6] D. J. Daley and D. Vere-Jones.

An introduction to the theory of point processes.

Springer, 2nd edition, 2003.

[7] Martin Dyer and Alan Frieze.

Random walks, totally unimodular matrices, and a randomised dualsimplex algorithm.

Mathematical Programming, 64(1-3):1–16, 1994.

Remi Bardenet (CNRS & Univ. Lille) Determinantal point processes in stats and ML: computational issues 27

Page 36: DPPs in stats and ML - with real bits of joint work w ... - flattened.pdf · DPPs in stats and ML with real bits of joint work w/ Adrien Hardy, Michalis Titsias, Guillaume Gautier,

References III

[8] G. Gautier, R. Bardenet, and M. Valko.

Zonotope hit-and-run for efficient sampling of projection dpps.

In International Conference on Machine Learning (ICML), 2017.

[9] A. Kulesza and B. Taskar.

Determinantal point processes for machine learning.

Foundations and Trends in Machine Learning, 2012.

[10] F. Lavancier, J. Møller, and E. Rubak.

Determinantal point process models and statistical inference:Extended version.

Preprint arXiv: 1205.4818, 2014.

[11] O. Macchi.

The coincidence approach to stochastic point processes.

Advances in Applied Probability, 7:83–122, 1975.

Remi Bardenet (CNRS & Univ. Lille) Determinantal point processes in stats and ML: computational issues 28

Page 37: DPPs in stats and ML - with real bits of joint work w ... - flattened.pdf · DPPs in stats and ML with real bits of joint work w/ Adrien Hardy, Michalis Titsias, Guillaume Gautier,

References IV

[12] N. Tremblay, P.-O. Amblard, and S. Barthelme.

Graph sampling with determinantal processes.

arXiv preprint arXiv:1703.01594, 2017.

[13] J. Urschel, V.-E. Brunel, A. Moitra, and P. Rigollet.

Learning determinantal point processes with moments and cycles.

arXiv preprint arXiv:1703.00539, 2017.

Remi Bardenet (CNRS & Univ. Lille) Determinantal point processes in stats and ML: computational issues 29