nonparametric density estimation under total positivityskripka/workshop_high_dim/elina... · 2021....

58
Nonparametric Density Estimation under Total Positivity Elina Robeva University of British Columbia May 24, 2021 1 / 36

Upload: others

Post on 29-Jul-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Nonparametric Density Estimation under Total Positivityskripka/workshop_high_dim/Elina... · 2021. 5. 25. · reduction algorithms described in Groeneboom, Jongbloed and Wellner (2008)

Nonparametric Density Estimation under TotalPositivity

Elina RobevaUniversity of British Columbia

May 24, 2021

1 / 36

Page 2: Nonparametric Density Estimation under Total Positivityskripka/workshop_high_dim/Elina... · 2021. 5. 25. · reduction algorithms described in Groeneboom, Jongbloed and Wellner (2008)

Totally Positive Distributions

• A distribution with density p on X ⊆ Rd is multivariate totally positive oforder 2 (or MTP2) if

p(x)p(y) ≤ p(x ∧ y)p(x ∨ y) for all x , y ∈ X ,

x

y

x ∨ y

x ∧ ywhere x ∧ y and x ∨ y are the componentwise minimum and maximum.

• MTP2 is the same as log-supermodular:

log(p(x)) + log(p(y)) ≤ log(p(x ∧ y)) + log(p(x ∨ y)) for all x , y ∈ X .

2 / 36

Page 3: Nonparametric Density Estimation under Total Positivityskripka/workshop_high_dim/Elina... · 2021. 5. 25. · reduction algorithms described in Groeneboom, Jongbloed and Wellner (2008)

Totally Positive Distributions

• A distribution with density p on X ⊆ Rd is multivariate totally positive oforder 2 (or MTP2) if

p(x)p(y) ≤ p(x ∧ y)p(x ∨ y) for all x , y ∈ X ,

x

y

x ∨ y

x ∧ ywhere x ∧ y and x ∨ y are the componentwise minimum and maximum.

• MTP2 is the same as log-supermodular:

log(p(x)) + log(p(y)) ≤ log(p(x ∧ y)) + log(p(x ∨ y)) for all x , y ∈ X .

2 / 36

Page 4: Nonparametric Density Estimation under Total Positivityskripka/workshop_high_dim/Elina... · 2021. 5. 25. · reduction algorithms described in Groeneboom, Jongbloed and Wellner (2008)

Properties of MTP2 distributions

Theorem (Fortuin Kasteleyn Ginibre, 1971)MTP2 implies positive association.

DefinitionA random vector X taking values in Rd is positively associated if for anynon-decreasing functions φ, ψ : Rd → R

cov(φ(X ), ψ(X )) ≥ 0.

Theorem (Karlin and Rinot 1980; Lebowitz 1972)If X = (X1, . . . ,Xd ) is MTP2, then

(i) any marginal distribution is MTP2,

(ii) any conditional distribution is MTP2,

(iii) X has the marginal independence structure

Xi ⊥⊥ Xj ⇐⇒ cov(Xi ,Xj ) = 0.

3 / 36

Page 5: Nonparametric Density Estimation under Total Positivityskripka/workshop_high_dim/Elina... · 2021. 5. 25. · reduction algorithms described in Groeneboom, Jongbloed and Wellner (2008)

Examples of Totally Positive Distributions

[Felsenstein, 1973] [Agrawal, Roy, and Uhler, 2019]

[Fallat, Lauritzen, Sadeghi, Uhler, Wermuth, and Zwiernik, 2016]

4 / 36

Page 6: Nonparametric Density Estimation under Total Positivityskripka/workshop_high_dim/Elina... · 2021. 5. 25. · reduction algorithms described in Groeneboom, Jongbloed and Wellner (2008)

Examples of MTP2 distributions

• The joint distribution of observed variables influenced by one hidden variable (inthe Gaussian and binary case)

Z

X1X2 X3

X4

X5

• Many models imply MTP2:

• Ferromagnetic Ising models• Order statistics of i.i.d. variables• Brownian motion tree models• Gaussian/binary (latent) tree models (e.g. single factor analysis models)

5 / 36

Page 7: Nonparametric Density Estimation under Total Positivityskripka/workshop_high_dim/Elina... · 2021. 5. 25. · reduction algorithms described in Groeneboom, Jongbloed and Wellner (2008)

Examples of MTP2 distributions

A Gaussian random variable X ∼ N (µ,Σ) is MTP2 whenever Σ−1 is an M-matrix, i.e.its off-diagonal entries are nonpositive

(Σ−1)ij ≤ 0, ∀i 6= j .

Very rare: out of 100,000 uniformly sampled 5× 5 correlation matrices none wereMTP2. [Fallat et al., 2016].

Data: grades of 88 students in 5 math subjects[Mardia, Kent and Bibby, 1979, Fallat et al., 2016]

6 / 36

Page 8: Nonparametric Density Estimation under Total Positivityskripka/workshop_high_dim/Elina... · 2021. 5. 25. · reduction algorithms described in Groeneboom, Jongbloed and Wellner (2008)

Examples of MTP2 distributions

A Gaussian random variable X ∼ N (µ,Σ) is MTP2 whenever Σ−1 is an M-matrix, i.e.its off-diagonal entries are nonpositive

(Σ−1)ij ≤ 0, ∀i 6= j .

Very rare: out of 100,000 uniformly sampled 5× 5 correlation matrices none wereMTP2. [Fallat et al., 2016].

Data: grades of 88 students in 5 math subjects[Mardia, Kent and Bibby, 1979, Fallat et al., 2016]

6 / 36

Page 9: Nonparametric Density Estimation under Total Positivityskripka/workshop_high_dim/Elina... · 2021. 5. 25. · reduction algorithms described in Groeneboom, Jongbloed and Wellner (2008)

Examples of MTP2 distributions

A Gaussian random variable X ∼ N (µ,Σ) is MTP2 whenever Σ−1 is an M-matrix, i.e.its off-diagonal entries are nonpositive

(Σ−1)ij ≤ 0, ∀i 6= j .

Very rare: out of 100,000 uniformly sampled 5× 5 correlation matrices none wereMTP2. [Fallat et al., 2016].

Data: grades of 88 students in 5 math subjects[Mardia, Kent and Bibby, 1979, Fallat et al., 2016]

7 / 36

Page 10: Nonparametric Density Estimation under Total Positivityskripka/workshop_high_dim/Elina... · 2021. 5. 25. · reduction algorithms described in Groeneboom, Jongbloed and Wellner (2008)

Examples of MTP2 distributions

Data: 2016 monthly correlations of global stock markets

[InvestmentFrontier.com; Agrawal, Roy, and Uhler, 2019]

8 / 36

Page 11: Nonparametric Density Estimation under Total Positivityskripka/workshop_high_dim/Elina... · 2021. 5. 25. · reduction algorithms described in Groeneboom, Jongbloed and Wellner (2008)

Density estimation

Given i.i.d. samples X = {x1, . . . , xn} ⊂ Rd from an unknown distribution on Rd withdensity p, can we estimate p?

non-parametric: assume that p lies in a non-parametric family, e.g. impose

shape-constraints on p (convex, log-concave, monotone, etc.)

• infinite-dimensional problem

• need constraints that are:

• strong enough so that there is no spiky behavior• weak enough so that function class is large

9 / 36

Page 12: Nonparametric Density Estimation under Total Positivityskripka/workshop_high_dim/Elina... · 2021. 5. 25. · reduction algorithms described in Groeneboom, Jongbloed and Wellner (2008)

Density estimation

Given i.i.d. samples X = {x1, . . . , xn} ⊂ Rd from an unknown distribution on Rd withdensity p, can we estimate p?

non-parametric: assume that p lies in a non-parametric family, e.g. impose

shape-constraints on p (convex, log-concave, monotone, etc.)

• infinite-dimensional problem

• need constraints that are:

• strong enough so that there is no spiky behavior• weak enough so that function class is large

9 / 36

Page 13: Nonparametric Density Estimation under Total Positivityskripka/workshop_high_dim/Elina... · 2021. 5. 25. · reduction algorithms described in Groeneboom, Jongbloed and Wellner (2008)

Shape-constrained density estimation

• monotonically decreasing densities: [Grenander 1956, Rao 1969]

• convex densities: [Anevski 1994, Groeneboom, Jongbloed, and Wellner 2001]

• log-concave densities: [Cule, Samworth, and Stewart 2008, Dumbgen et al. 2009, . . .]

• generalized additive models with shape constraints: [Chen and Samworth 2016]

• totally positive (and log-concave) densities

10 / 36

Page 14: Nonparametric Density Estimation under Total Positivityskripka/workshop_high_dim/Elina... · 2021. 5. 25. · reduction algorithms described in Groeneboom, Jongbloed and Wellner (2008)

Maximum Likelihood Estimation under MTP2

Given i.i.d. samples X = {x1, . . . , xn} ⊂ Rd with weights w = (w1, . . . ,wn) (wherew1, . . . ,wn ≥ 0,

∑wi = 1) from a distribution p on Rd , can we estimate p?

We would like to

maximizep

n∑i=1

wi log(p(xi ))

s.t. p is an MTP2 density.

Problem: likelihood is unbounded, MTP2 constraint is too weak for MLE to exist.

11 / 36

Page 15: Nonparametric Density Estimation under Total Positivityskripka/workshop_high_dim/Elina... · 2021. 5. 25. · reduction algorithms described in Groeneboom, Jongbloed and Wellner (2008)

Maximum Likelihood Estimation under MTP2

Given i.i.d. samples X = {x1, . . . , xn} ⊂ Rd with weights w = (w1, . . . ,wn) (wherew1, . . . ,wn ≥ 0,

∑wi = 1) from a distribution p on Rd , can we estimate p?

We would like to

maximizep

n∑i=1

wi log(p(xi ))

s.t. p is an MTP2 density.

Problem: likelihood is unbounded, MTP2 constraint is too weak for MLE to exist.

11 / 36

Page 16: Nonparametric Density Estimation under Total Positivityskripka/workshop_high_dim/Elina... · 2021. 5. 25. · reduction algorithms described in Groeneboom, Jongbloed and Wellner (2008)

Maximum Likelihood Estimation under MTP2

Given i.i.d. samples X = {x1, . . . , xn} ⊂ Rd with weights w = (w1, . . . ,wn) (wherew1, . . . ,wn ≥ 0,

∑wi = 1) from a distribution p on Rd , can we estimate p?

We would like to

maximizep

n∑i=1

wi log(p(xi ))

s.t. p is an MTP2 density.

Problem: likelihood is unbounded, MTP2 constraint is too weak for MLE to exist.

11 / 36

Page 17: Nonparametric Density Estimation under Total Positivityskripka/workshop_high_dim/Elina... · 2021. 5. 25. · reduction algorithms described in Groeneboom, Jongbloed and Wellner (2008)

Maximum Likelihood Estimation under MTP2

To ensure that the likelihood function is bounded, we impose the condition that p islog-concave.

maximizep

n∑i=1

wi log(p(xi ))

s.t. p is an MTP2 density,

and p is log-concave.

A function f : Rd → R is log-concave if its logarithm is concave.

• Log-concavity is a natural assumption because it ensures the density iscontinuous and includes many known families of parametric distributions.

• Log-concave families:

• Gaussian;• Uniform(a, b);• Gamma(k, θ) for k ≥ 1;• Beta(a, b) for a, b ≥ 1.

• Maximum likelihood estimation under log-concavity is a well-studied problem(Cule et al. 2008, Dumbgen et al. 2009, Schuhmacher et al. 2010, . . .).

12 / 36

Page 18: Nonparametric Density Estimation under Total Positivityskripka/workshop_high_dim/Elina... · 2021. 5. 25. · reduction algorithms described in Groeneboom, Jongbloed and Wellner (2008)

Maximum Likelihood Estimation under MTP2

To ensure that the likelihood function is bounded, we impose the condition that p islog-concave.

maximizep

n∑i=1

wi log(p(xi ))

s.t. p is an MTP2 density,

and p is log-concave.

A function f : Rd → R is log-concave if its logarithm is concave.

• Log-concavity is a natural assumption because it ensures the density iscontinuous and includes many known families of parametric distributions.

• Log-concave families:

• Gaussian;• Uniform(a, b);• Gamma(k, θ) for k ≥ 1;• Beta(a, b) for a, b ≥ 1.

• Maximum likelihood estimation under log-concavity is a well-studied problem(Cule et al. 2008, Dumbgen et al. 2009, Schuhmacher et al. 2010, . . .).

12 / 36

Page 19: Nonparametric Density Estimation under Total Positivityskripka/workshop_high_dim/Elina... · 2021. 5. 25. · reduction algorithms described in Groeneboom, Jongbloed and Wellner (2008)

Maximum Likelihood Estimation under MTP2

To ensure that the likelihood function is bounded, we impose the condition that p islog-concave.

maximizep

n∑i=1

wi log(p(xi ))

s.t. p is an MTP2 density,

and p is log-concave.

A function f : Rd → R is log-concave if its logarithm is concave.

• Log-concavity is a natural assumption because it ensures the density iscontinuous and includes many known families of parametric distributions.

• Log-concave families:

• Gaussian;• Uniform(a, b);• Gamma(k, θ) for k ≥ 1;• Beta(a, b) for a, b ≥ 1.

• Maximum likelihood estimation under log-concavity is a well-studied problem(Cule et al. 2008, Dumbgen et al. 2009, Schuhmacher et al. 2010, . . .).

12 / 36

Page 20: Nonparametric Density Estimation under Total Positivityskripka/workshop_high_dim/Elina... · 2021. 5. 25. · reduction algorithms described in Groeneboom, Jongbloed and Wellner (2008)

Maximum Likelihood Estimation under Log-Concavity

maximizep

n∑i=1

wi log(p(xi ))

s.t. p is a densityand p is log-concave.

Theorem (Cule, Samworth and Stewart 2008)

• With probability 1, a log-concave maximum likelihood estimator p exists and isunique as long as there are at least d + 1 samples.

• Moreover, log(p) is a ’tent-function’ supported on the convex hull of the dataP(X ) = conv(x1, . . . , xn).

Log-concave density estimation 3

Fig. 1. The ‘tent-like’ structure of the graph of the logarithm of the maximum likelihood estimator for bivariatedata.

(2009) have studied its theoretical properties. Rufibach (2007) compared di↵erent algorithms forcomputing the univariate estimator, including the iterative convex minorant algorithm (Groeneboomand Wellner, 1992; Jongbloed, 1998), and three others. Dumbgen, Husler and Rufibach (2007)also present an Active Set algorithm, which has similarities with the vertex direction and vertexreduction algorithms described in Groeneboom, Jongbloed and Wellner (2008). Walther (2010)provides a nice recent review article on inference and modelling with log-concave densities. Otherrecent related work includes Seregin and Wellner (2009), Schuhmacher, Husler and Dumbgen (2010),Schuhmacher and Dumbgen (2010) and Koenker and Mizera (2010). For univariate data, it is alsowell-known that there exist maximum likelihood estimators of a non-increasing density supported on[0,1) (Grenander, 1956) and of a convex, decreasing density (Groeneboom, Jongbloed and Wellner,2001).

Figure 1 gives a diagram illustrating the structure of the maximum likelihood estimator on thelogarithmic scale. This structure is most easily visualised for two-dimensional data, where one canimagine associating a ‘tent pole’ with each observation, extending vertically out of the plane. Forcertain tent pole heights, the graph of the logarithm of the maximum likelihood estimator can bethought of as the roof of a taut tent stretched over the tent poles. The fact that the logarithm ofthe maximum likelihood estimator is of this ‘tent function’ form constitutes part of the proof of itsexistence and uniqueness.

In Sections 3.1 and 3.2, we discuss the computational problem of how to adjust the n tent poleheights so that the corresponding tent functions converge to the logarithm of the maximum likelihoodestimator. One reason that this computational problem is so challenging in more than one dimensionis the fact that it is di�cult to describe the set of tent pole heights that correspond to concavefunctions. The key observation, discussed in Section 3.1, is that it is possible to minimise a modifiedobjective function that is convex (though non-di↵erentiable). This allows us to apply the powerfulnon-di↵erentiable convex optimisation methodology of the subgradient method (Shor, 1985) and avariant called Shor’s r-algorithm, which has been implemented by Kappel and Kuntsevich (2000).

As an illustration of the estimates obtained, Figure 2 presents plots of the maximum likelihoodestimator, and its logarithm, for 1000 observations from a standard bivariate normal distribution.

13 / 36

Page 21: Nonparametric Density Estimation under Total Positivityskripka/workshop_high_dim/Elina... · 2021. 5. 25. · reduction algorithms described in Groeneboom, Jongbloed and Wellner (2008)

Maximum Likelihood Estimation under Log-Concavity

maximizep

n∑i=1

wi log(p(xi ))

s.t. p is a densityand p is log-concave.

Theorem (Cule, Samworth and Stewart 2008)

• With probability 1, a log-concave maximum likelihood estimator p exists and isunique as long as there are at least d + 1 samples.

• Moreover, log(p) is a ’tent-function’ supported on the convex hull of the dataP(X ) = conv(x1, . . . , xn).

Log-concave density estimation 3

Fig. 1. The ‘tent-like’ structure of the graph of the logarithm of the maximum likelihood estimator for bivariatedata.

(2009) have studied its theoretical properties. Rufibach (2007) compared di↵erent algorithms forcomputing the univariate estimator, including the iterative convex minorant algorithm (Groeneboomand Wellner, 1992; Jongbloed, 1998), and three others. Dumbgen, Husler and Rufibach (2007)also present an Active Set algorithm, which has similarities with the vertex direction and vertexreduction algorithms described in Groeneboom, Jongbloed and Wellner (2008). Walther (2010)provides a nice recent review article on inference and modelling with log-concave densities. Otherrecent related work includes Seregin and Wellner (2009), Schuhmacher, Husler and Dumbgen (2010),Schuhmacher and Dumbgen (2010) and Koenker and Mizera (2010). For univariate data, it is alsowell-known that there exist maximum likelihood estimators of a non-increasing density supported on[0,1) (Grenander, 1956) and of a convex, decreasing density (Groeneboom, Jongbloed and Wellner,2001).

Figure 1 gives a diagram illustrating the structure of the maximum likelihood estimator on thelogarithmic scale. This structure is most easily visualised for two-dimensional data, where one canimagine associating a ‘tent pole’ with each observation, extending vertically out of the plane. Forcertain tent pole heights, the graph of the logarithm of the maximum likelihood estimator can bethought of as the roof of a taut tent stretched over the tent poles. The fact that the logarithm ofthe maximum likelihood estimator is of this ‘tent function’ form constitutes part of the proof of itsexistence and uniqueness.

In Sections 3.1 and 3.2, we discuss the computational problem of how to adjust the n tent poleheights so that the corresponding tent functions converge to the logarithm of the maximum likelihoodestimator. One reason that this computational problem is so challenging in more than one dimensionis the fact that it is di�cult to describe the set of tent pole heights that correspond to concavefunctions. The key observation, discussed in Section 3.1, is that it is possible to minimise a modifiedobjective function that is convex (though non-di↵erentiable). This allows us to apply the powerfulnon-di↵erentiable convex optimisation methodology of the subgradient method (Shor, 1985) and avariant called Shor’s r-algorithm, which has been implemented by Kappel and Kuntsevich (2000).

As an illustration of the estimates obtained, Figure 2 presents plots of the maximum likelihoodestimator, and its logarithm, for 1000 observations from a standard bivariate normal distribution.

13 / 36

Page 22: Nonparametric Density Estimation under Total Positivityskripka/workshop_high_dim/Elina... · 2021. 5. 25. · reduction algorithms described in Groeneboom, Jongbloed and Wellner (2008)

Maximum Likelihood Estimation under Log-Concavity

maximizep

n∑i=1

wi log(p(xi ))

s.t. p is a densityand p is log-concave.

Theorem (Cule, Samworth and Stewart 2008)

• With probability 1, a log-concave maximum likelihood estimator p exists and isunique as long as there are at least d + 1 samples.

• Moreover, log(p) is a ’tent-function’ supported on the convex hull of the dataP(X ) = conv(x1, . . . , xn).

Log-concave density estimation 3

Fig. 1. The ‘tent-like’ structure of the graph of the logarithm of the maximum likelihood estimator for bivariatedata.

(2009) have studied its theoretical properties. Rufibach (2007) compared di↵erent algorithms forcomputing the univariate estimator, including the iterative convex minorant algorithm (Groeneboomand Wellner, 1992; Jongbloed, 1998), and three others. Dumbgen, Husler and Rufibach (2007)also present an Active Set algorithm, which has similarities with the vertex direction and vertexreduction algorithms described in Groeneboom, Jongbloed and Wellner (2008). Walther (2010)provides a nice recent review article on inference and modelling with log-concave densities. Otherrecent related work includes Seregin and Wellner (2009), Schuhmacher, Husler and Dumbgen (2010),Schuhmacher and Dumbgen (2010) and Koenker and Mizera (2010). For univariate data, it is alsowell-known that there exist maximum likelihood estimators of a non-increasing density supported on[0,1) (Grenander, 1956) and of a convex, decreasing density (Groeneboom, Jongbloed and Wellner,2001).

Figure 1 gives a diagram illustrating the structure of the maximum likelihood estimator on thelogarithmic scale. This structure is most easily visualised for two-dimensional data, where one canimagine associating a ‘tent pole’ with each observation, extending vertically out of the plane. Forcertain tent pole heights, the graph of the logarithm of the maximum likelihood estimator can bethought of as the roof of a taut tent stretched over the tent poles. The fact that the logarithm ofthe maximum likelihood estimator is of this ‘tent function’ form constitutes part of the proof of itsexistence and uniqueness.

In Sections 3.1 and 3.2, we discuss the computational problem of how to adjust the n tent poleheights so that the corresponding tent functions converge to the logarithm of the maximum likelihoodestimator. One reason that this computational problem is so challenging in more than one dimensionis the fact that it is di�cult to describe the set of tent pole heights that correspond to concavefunctions. The key observation, discussed in Section 3.1, is that it is possible to minimise a modifiedobjective function that is convex (though non-di↵erentiable). This allows us to apply the powerfulnon-di↵erentiable convex optimisation methodology of the subgradient method (Shor, 1985) and avariant called Shor’s r-algorithm, which has been implemented by Kappel and Kuntsevich (2000).

As an illustration of the estimates obtained, Figure 2 presents plots of the maximum likelihoodestimator, and its logarithm, for 1000 observations from a standard bivariate normal distribution.

13 / 36

Page 23: Nonparametric Density Estimation under Total Positivityskripka/workshop_high_dim/Elina... · 2021. 5. 25. · reduction algorithms described in Groeneboom, Jongbloed and Wellner (2008)

Optimizing over Tent Functions Log-concave density estimation 3

Fig. 1. The ‘tent-like’ structure of the graph of the logarithm of the maximum likelihood estimator for bivariatedata.

(2009) have studied its theoretical properties. Rufibach (2007) compared di↵erent algorithms forcomputing the univariate estimator, including the iterative convex minorant algorithm (Groeneboomand Wellner, 1992; Jongbloed, 1998), and three others. Dumbgen, Husler and Rufibach (2007)also present an Active Set algorithm, which has similarities with the vertex direction and vertexreduction algorithms described in Groeneboom, Jongbloed and Wellner (2008). Walther (2010)provides a nice recent review article on inference and modelling with log-concave densities. Otherrecent related work includes Seregin and Wellner (2009), Schuhmacher, Husler and Dumbgen (2010),Schuhmacher and Dumbgen (2010) and Koenker and Mizera (2010). For univariate data, it is alsowell-known that there exist maximum likelihood estimators of a non-increasing density supported on[0,1) (Grenander, 1956) and of a convex, decreasing density (Groeneboom, Jongbloed and Wellner,2001).

Figure 1 gives a diagram illustrating the structure of the maximum likelihood estimator on thelogarithmic scale. This structure is most easily visualised for two-dimensional data, where one canimagine associating a ‘tent pole’ with each observation, extending vertically out of the plane. Forcertain tent pole heights, the graph of the logarithm of the maximum likelihood estimator can bethought of as the roof of a taut tent stretched over the tent poles. The fact that the logarithm ofthe maximum likelihood estimator is of this ‘tent function’ form constitutes part of the proof of itsexistence and uniqueness.

In Sections 3.1 and 3.2, we discuss the computational problem of how to adjust the n tent poleheights so that the corresponding tent functions converge to the logarithm of the maximum likelihoodestimator. One reason that this computational problem is so challenging in more than one dimensionis the fact that it is di�cult to describe the set of tent pole heights that correspond to concavefunctions. The key observation, discussed in Section 3.1, is that it is possible to minimise a modifiedobjective function that is convex (though non-di↵erentiable). This allows us to apply the powerfulnon-di↵erentiable convex optimisation methodology of the subgradient method (Shor, 1985) and avariant called Shor’s r-algorithm, which has been implemented by Kappel and Kuntsevich (2000).

As an illustration of the estimates obtained, Figure 2 presents plots of the maximum likelihoodestimator, and its logarithm, for 1000 observations from a standard bivariate normal distribution.

Given points X = {x1, . . . , xn} and heights y = (y1, . . . , yn) ∈ Rn, the tent function

hX ,y : Rd → R

is the smallest concave function such that hX ,y (xi ) ≥ yi for all i . Thus, p = exp(hX ,y )for some y .

maximizep

n∑i=1

wi log(p(xi ))

s.t. p is a density

and p is log-concave.

INFINITE DIMENSIONAL

maximizey∈Rn

n∑i=1

wi log exp(hX ,y (xi ))

s.t. exp(hX ,y ) is a density.

FINITE DIMENSIONAL

14 / 36

Page 24: Nonparametric Density Estimation under Total Positivityskripka/workshop_high_dim/Elina... · 2021. 5. 25. · reduction algorithms described in Groeneboom, Jongbloed and Wellner (2008)

Optimizing over Tent Functions Log-concave density estimation 3

Fig. 1. The ‘tent-like’ structure of the graph of the logarithm of the maximum likelihood estimator for bivariatedata.

(2009) have studied its theoretical properties. Rufibach (2007) compared di↵erent algorithms forcomputing the univariate estimator, including the iterative convex minorant algorithm (Groeneboomand Wellner, 1992; Jongbloed, 1998), and three others. Dumbgen, Husler and Rufibach (2007)also present an Active Set algorithm, which has similarities with the vertex direction and vertexreduction algorithms described in Groeneboom, Jongbloed and Wellner (2008). Walther (2010)provides a nice recent review article on inference and modelling with log-concave densities. Otherrecent related work includes Seregin and Wellner (2009), Schuhmacher, Husler and Dumbgen (2010),Schuhmacher and Dumbgen (2010) and Koenker and Mizera (2010). For univariate data, it is alsowell-known that there exist maximum likelihood estimators of a non-increasing density supported on[0,1) (Grenander, 1956) and of a convex, decreasing density (Groeneboom, Jongbloed and Wellner,2001).

Figure 1 gives a diagram illustrating the structure of the maximum likelihood estimator on thelogarithmic scale. This structure is most easily visualised for two-dimensional data, where one canimagine associating a ‘tent pole’ with each observation, extending vertically out of the plane. Forcertain tent pole heights, the graph of the logarithm of the maximum likelihood estimator can bethought of as the roof of a taut tent stretched over the tent poles. The fact that the logarithm ofthe maximum likelihood estimator is of this ‘tent function’ form constitutes part of the proof of itsexistence and uniqueness.

In Sections 3.1 and 3.2, we discuss the computational problem of how to adjust the n tent poleheights so that the corresponding tent functions converge to the logarithm of the maximum likelihoodestimator. One reason that this computational problem is so challenging in more than one dimensionis the fact that it is di�cult to describe the set of tent pole heights that correspond to concavefunctions. The key observation, discussed in Section 3.1, is that it is possible to minimise a modifiedobjective function that is convex (though non-di↵erentiable). This allows us to apply the powerfulnon-di↵erentiable convex optimisation methodology of the subgradient method (Shor, 1985) and avariant called Shor’s r-algorithm, which has been implemented by Kappel and Kuntsevich (2000).

As an illustration of the estimates obtained, Figure 2 presents plots of the maximum likelihoodestimator, and its logarithm, for 1000 observations from a standard bivariate normal distribution.

Given points X = {x1, . . . , xn} and heights y = (y1, . . . , yn) ∈ Rn, the tent function

hX ,y : Rd → R

is the smallest concave function such that hX ,y (xi ) ≥ yi for all i . Thus, p = exp(hX ,y )for some y .

maximizep

n∑i=1

wi log(p(xi ))

s.t. p is a density

and p is log-concave.

INFINITE DIMENSIONAL

maximizey∈Rn

n∑i=1

wi log exp(hX ,y (xi ))

s.t. exp(hX ,y ) is a density.

FINITE DIMENSIONAL

14 / 36

Page 25: Nonparametric Density Estimation under Total Positivityskripka/workshop_high_dim/Elina... · 2021. 5. 25. · reduction algorithms described in Groeneboom, Jongbloed and Wellner (2008)

Optimizing over Tent Functions Log-concave density estimation 3

Fig. 1. The ‘tent-like’ structure of the graph of the logarithm of the maximum likelihood estimator for bivariatedata.

(2009) have studied its theoretical properties. Rufibach (2007) compared di↵erent algorithms forcomputing the univariate estimator, including the iterative convex minorant algorithm (Groeneboomand Wellner, 1992; Jongbloed, 1998), and three others. Dumbgen, Husler and Rufibach (2007)also present an Active Set algorithm, which has similarities with the vertex direction and vertexreduction algorithms described in Groeneboom, Jongbloed and Wellner (2008). Walther (2010)provides a nice recent review article on inference and modelling with log-concave densities. Otherrecent related work includes Seregin and Wellner (2009), Schuhmacher, Husler and Dumbgen (2010),Schuhmacher and Dumbgen (2010) and Koenker and Mizera (2010). For univariate data, it is alsowell-known that there exist maximum likelihood estimators of a non-increasing density supported on[0,1) (Grenander, 1956) and of a convex, decreasing density (Groeneboom, Jongbloed and Wellner,2001).

Figure 1 gives a diagram illustrating the structure of the maximum likelihood estimator on thelogarithmic scale. This structure is most easily visualised for two-dimensional data, where one canimagine associating a ‘tent pole’ with each observation, extending vertically out of the plane. Forcertain tent pole heights, the graph of the logarithm of the maximum likelihood estimator can bethought of as the roof of a taut tent stretched over the tent poles. The fact that the logarithm ofthe maximum likelihood estimator is of this ‘tent function’ form constitutes part of the proof of itsexistence and uniqueness.

In Sections 3.1 and 3.2, we discuss the computational problem of how to adjust the n tent poleheights so that the corresponding tent functions converge to the logarithm of the maximum likelihoodestimator. One reason that this computational problem is so challenging in more than one dimensionis the fact that it is di�cult to describe the set of tent pole heights that correspond to concavefunctions. The key observation, discussed in Section 3.1, is that it is possible to minimise a modifiedobjective function that is convex (though non-di↵erentiable). This allows us to apply the powerfulnon-di↵erentiable convex optimisation methodology of the subgradient method (Shor, 1985) and avariant called Shor’s r-algorithm, which has been implemented by Kappel and Kuntsevich (2000).

As an illustration of the estimates obtained, Figure 2 presents plots of the maximum likelihoodestimator, and its logarithm, for 1000 observations from a standard bivariate normal distribution.

Given points X = {x1, . . . , xn} and heights y = (y1, . . . , yn) ∈ Rn, the tent function

hX ,y : Rd → R

is the smallest concave function such that hX ,y (xi ) ≥ yi for all i . Thus, p = exp(hX ,y )for some y .

maximizep

n∑i=1

wi log(p(xi ))

s.t. p is a density

and p is log-concave.

INFINITE DIMENSIONAL

maximizey∈Rn

n∑i=1

wi log exp(hX ,y (xi ))

s.t. exp(hX ,y ) is a density.

FINITE DIMENSIONAL

14 / 36

Page 26: Nonparametric Density Estimation under Total Positivityskripka/workshop_high_dim/Elina... · 2021. 5. 25. · reduction algorithms described in Groeneboom, Jongbloed and Wellner (2008)

Optimizing over Tent Functions Log-concave density estimation 3

Fig. 1. The ‘tent-like’ structure of the graph of the logarithm of the maximum likelihood estimator for bivariatedata.

(2009) have studied its theoretical properties. Rufibach (2007) compared di↵erent algorithms forcomputing the univariate estimator, including the iterative convex minorant algorithm (Groeneboomand Wellner, 1992; Jongbloed, 1998), and three others. Dumbgen, Husler and Rufibach (2007)also present an Active Set algorithm, which has similarities with the vertex direction and vertexreduction algorithms described in Groeneboom, Jongbloed and Wellner (2008). Walther (2010)provides a nice recent review article on inference and modelling with log-concave densities. Otherrecent related work includes Seregin and Wellner (2009), Schuhmacher, Husler and Dumbgen (2010),Schuhmacher and Dumbgen (2010) and Koenker and Mizera (2010). For univariate data, it is alsowell-known that there exist maximum likelihood estimators of a non-increasing density supported on[0,1) (Grenander, 1956) and of a convex, decreasing density (Groeneboom, Jongbloed and Wellner,2001).

Figure 1 gives a diagram illustrating the structure of the maximum likelihood estimator on thelogarithmic scale. This structure is most easily visualised for two-dimensional data, where one canimagine associating a ‘tent pole’ with each observation, extending vertically out of the plane. Forcertain tent pole heights, the graph of the logarithm of the maximum likelihood estimator can bethought of as the roof of a taut tent stretched over the tent poles. The fact that the logarithm ofthe maximum likelihood estimator is of this ‘tent function’ form constitutes part of the proof of itsexistence and uniqueness.

In Sections 3.1 and 3.2, we discuss the computational problem of how to adjust the n tent poleheights so that the corresponding tent functions converge to the logarithm of the maximum likelihoodestimator. One reason that this computational problem is so challenging in more than one dimensionis the fact that it is di�cult to describe the set of tent pole heights that correspond to concavefunctions. The key observation, discussed in Section 3.1, is that it is possible to minimise a modifiedobjective function that is convex (though non-di↵erentiable). This allows us to apply the powerfulnon-di↵erentiable convex optimisation methodology of the subgradient method (Shor, 1985) and avariant called Shor’s r-algorithm, which has been implemented by Kappel and Kuntsevich (2000).

As an illustration of the estimates obtained, Figure 2 presents plots of the maximum likelihoodestimator, and its logarithm, for 1000 observations from a standard bivariate normal distribution.

Given points X = {x1, . . . , xn} and heights y = (y1, . . . , yn) ∈ Rn, the tent function

hX ,y : Rd → R

is the smallest concave function such that hX ,y (xi ) ≥ yi for all i . Thus, p = exp(hX ,y )for some y .

maximizep

n∑i=1

wi log(p(xi ))

s.t. p is a density

and p is log-concave.

INFINITE DIMENSIONAL

maximizey∈Rn

n∑i=1

wihX ,y (xi )

s.t. exp(hX ,y ) is a density.

FINITE DIMENSIONAL

15 / 36

Page 27: Nonparametric Density Estimation under Total Positivityskripka/workshop_high_dim/Elina... · 2021. 5. 25. · reduction algorithms described in Groeneboom, Jongbloed and Wellner (2008)

Optimizing over Tent Functions Log-concave density estimation 3

Fig. 1. The ‘tent-like’ structure of the graph of the logarithm of the maximum likelihood estimator for bivariatedata.

(2009) have studied its theoretical properties. Rufibach (2007) compared di↵erent algorithms forcomputing the univariate estimator, including the iterative convex minorant algorithm (Groeneboomand Wellner, 1992; Jongbloed, 1998), and three others. Dumbgen, Husler and Rufibach (2007)also present an Active Set algorithm, which has similarities with the vertex direction and vertexreduction algorithms described in Groeneboom, Jongbloed and Wellner (2008). Walther (2010)provides a nice recent review article on inference and modelling with log-concave densities. Otherrecent related work includes Seregin and Wellner (2009), Schuhmacher, Husler and Dumbgen (2010),Schuhmacher and Dumbgen (2010) and Koenker and Mizera (2010). For univariate data, it is alsowell-known that there exist maximum likelihood estimators of a non-increasing density supported on[0,1) (Grenander, 1956) and of a convex, decreasing density (Groeneboom, Jongbloed and Wellner,2001).

Figure 1 gives a diagram illustrating the structure of the maximum likelihood estimator on thelogarithmic scale. This structure is most easily visualised for two-dimensional data, where one canimagine associating a ‘tent pole’ with each observation, extending vertically out of the plane. Forcertain tent pole heights, the graph of the logarithm of the maximum likelihood estimator can bethought of as the roof of a taut tent stretched over the tent poles. The fact that the logarithm ofthe maximum likelihood estimator is of this ‘tent function’ form constitutes part of the proof of itsexistence and uniqueness.

In Sections 3.1 and 3.2, we discuss the computational problem of how to adjust the n tent poleheights so that the corresponding tent functions converge to the logarithm of the maximum likelihoodestimator. One reason that this computational problem is so challenging in more than one dimensionis the fact that it is di�cult to describe the set of tent pole heights that correspond to concavefunctions. The key observation, discussed in Section 3.1, is that it is possible to minimise a modifiedobjective function that is convex (though non-di↵erentiable). This allows us to apply the powerfulnon-di↵erentiable convex optimisation methodology of the subgradient method (Shor, 1985) and avariant called Shor’s r-algorithm, which has been implemented by Kappel and Kuntsevich (2000).

As an illustration of the estimates obtained, Figure 2 presents plots of the maximum likelihoodestimator, and its logarithm, for 1000 observations from a standard bivariate normal distribution.

Given points X = {x1, . . . , xn} and heights y = (y1, . . . , yn) ∈ Rn, the tent function

hX ,y : Rd → R

is the smallest concave function such that hX ,y (xi ) ≥ yi for all i . Thus, p = exp(hX ,y )for some y .

maximizep

n∑i=1

wi log(p(xi ))

s.t. p is a density

and p is log-concave.

INFINITE DIMENSIONAL

maximizey∈Rn

n∑i=1

wiyi

s.t. exp(hX ,y ) is a density.

FINITE DIMENSIONAL

16 / 36

Page 28: Nonparametric Density Estimation under Total Positivityskripka/workshop_high_dim/Elina... · 2021. 5. 25. · reduction algorithms described in Groeneboom, Jongbloed and Wellner (2008)

Optimizing over Tent Functions Log-concave density estimation 3

Fig. 1. The ‘tent-like’ structure of the graph of the logarithm of the maximum likelihood estimator for bivariatedata.

(2009) have studied its theoretical properties. Rufibach (2007) compared di↵erent algorithms forcomputing the univariate estimator, including the iterative convex minorant algorithm (Groeneboomand Wellner, 1992; Jongbloed, 1998), and three others. Dumbgen, Husler and Rufibach (2007)also present an Active Set algorithm, which has similarities with the vertex direction and vertexreduction algorithms described in Groeneboom, Jongbloed and Wellner (2008). Walther (2010)provides a nice recent review article on inference and modelling with log-concave densities. Otherrecent related work includes Seregin and Wellner (2009), Schuhmacher, Husler and Dumbgen (2010),Schuhmacher and Dumbgen (2010) and Koenker and Mizera (2010). For univariate data, it is alsowell-known that there exist maximum likelihood estimators of a non-increasing density supported on[0,1) (Grenander, 1956) and of a convex, decreasing density (Groeneboom, Jongbloed and Wellner,2001).

Figure 1 gives a diagram illustrating the structure of the maximum likelihood estimator on thelogarithmic scale. This structure is most easily visualised for two-dimensional data, where one canimagine associating a ‘tent pole’ with each observation, extending vertically out of the plane. Forcertain tent pole heights, the graph of the logarithm of the maximum likelihood estimator can bethought of as the roof of a taut tent stretched over the tent poles. The fact that the logarithm ofthe maximum likelihood estimator is of this ‘tent function’ form constitutes part of the proof of itsexistence and uniqueness.

In Sections 3.1 and 3.2, we discuss the computational problem of how to adjust the n tent poleheights so that the corresponding tent functions converge to the logarithm of the maximum likelihoodestimator. One reason that this computational problem is so challenging in more than one dimensionis the fact that it is di�cult to describe the set of tent pole heights that correspond to concavefunctions. The key observation, discussed in Section 3.1, is that it is possible to minimise a modifiedobjective function that is convex (though non-di↵erentiable). This allows us to apply the powerfulnon-di↵erentiable convex optimisation methodology of the subgradient method (Shor, 1985) and avariant called Shor’s r-algorithm, which has been implemented by Kappel and Kuntsevich (2000).

As an illustration of the estimates obtained, Figure 2 presents plots of the maximum likelihoodestimator, and its logarithm, for 1000 observations from a standard bivariate normal distribution.

Given points X = {x1, . . . , xn} and heights y = (y1, . . . , yn) ∈ Rn, the tent function

hX ,y : Rd → R

is the smallest concave function such that hX ,y (xi ) ≥ yi for all i . Thus, p = exp(hX ,y )for some y .

maximizep

n∑i=1

wi log(p(xi ))

s.t. p is a density

and p is log-concave.

INFINITE DIMENSIONAL

maximizey∈Rn

n∑i=1

wiyi

s.t.

∫exp(hX ,y (t))dt = 1

FINITE DIMENSIONAL

17 / 36

Page 29: Nonparametric Density Estimation under Total Positivityskripka/workshop_high_dim/Elina... · 2021. 5. 25. · reduction algorithms described in Groeneboom, Jongbloed and Wellner (2008)

Optimizing over Tent Functions Log-concave density estimation 3

Fig. 1. The ‘tent-like’ structure of the graph of the logarithm of the maximum likelihood estimator for bivariatedata.

(2009) have studied its theoretical properties. Rufibach (2007) compared di↵erent algorithms forcomputing the univariate estimator, including the iterative convex minorant algorithm (Groeneboomand Wellner, 1992; Jongbloed, 1998), and three others. Dumbgen, Husler and Rufibach (2007)also present an Active Set algorithm, which has similarities with the vertex direction and vertexreduction algorithms described in Groeneboom, Jongbloed and Wellner (2008). Walther (2010)provides a nice recent review article on inference and modelling with log-concave densities. Otherrecent related work includes Seregin and Wellner (2009), Schuhmacher, Husler and Dumbgen (2010),Schuhmacher and Dumbgen (2010) and Koenker and Mizera (2010). For univariate data, it is alsowell-known that there exist maximum likelihood estimators of a non-increasing density supported on[0,1) (Grenander, 1956) and of a convex, decreasing density (Groeneboom, Jongbloed and Wellner,2001).

Figure 1 gives a diagram illustrating the structure of the maximum likelihood estimator on thelogarithmic scale. This structure is most easily visualised for two-dimensional data, where one canimagine associating a ‘tent pole’ with each observation, extending vertically out of the plane. Forcertain tent pole heights, the graph of the logarithm of the maximum likelihood estimator can bethought of as the roof of a taut tent stretched over the tent poles. The fact that the logarithm ofthe maximum likelihood estimator is of this ‘tent function’ form constitutes part of the proof of itsexistence and uniqueness.

In Sections 3.1 and 3.2, we discuss the computational problem of how to adjust the n tent poleheights so that the corresponding tent functions converge to the logarithm of the maximum likelihoodestimator. One reason that this computational problem is so challenging in more than one dimensionis the fact that it is di�cult to describe the set of tent pole heights that correspond to concavefunctions. The key observation, discussed in Section 3.1, is that it is possible to minimise a modifiedobjective function that is convex (though non-di↵erentiable). This allows us to apply the powerfulnon-di↵erentiable convex optimisation methodology of the subgradient method (Shor, 1985) and avariant called Shor’s r-algorithm, which has been implemented by Kappel and Kuntsevich (2000).

As an illustration of the estimates obtained, Figure 2 presents plots of the maximum likelihoodestimator, and its logarithm, for 1000 observations from a standard bivariate normal distribution.

Given points X = {x1, . . . , xn} and heights y = (y1, . . . , yn) ∈ Rn, the tent function

hX ,y : Rd → R

is the smallest concave function such that hX ,y (xi ) ≥ yi for all i . Thus, p = exp(hX ,y )for some y .

maximizep

n∑i=1

wi log(p(xi ))

s.t. p is a density

and p is log-concave.

INFINITE DIMENSIONAL

maxy∈Rn

n∑i=1

wiyi −∫

exp(hX ,y (t))dt

FINITE DIMENSIONAL

18 / 36

Page 30: Nonparametric Density Estimation under Total Positivityskripka/workshop_high_dim/Elina... · 2021. 5. 25. · reduction algorithms described in Groeneboom, Jongbloed and Wellner (2008)

Maximum Likelihood Estimation under Log-concavity andMTP2

Questions:1. Does the MLE under log-concavity and MTP2 exist with probability 1 and, if so,

is it unique?

2. What is the shape of the MLE under log-concavity and MTP2?2.1 What is the support of the MLE?2.2 Is the MLE always exp(tent function)?

3. Which tent functions are allowed?4. Can we compute the MLE?5. Sample complexity?

19 / 36

Page 31: Nonparametric Density Estimation under Total Positivityskripka/workshop_high_dim/Elina... · 2021. 5. 25. · reduction algorithms described in Groeneboom, Jongbloed and Wellner (2008)

Existence and Uniqueness of the MLE

Theorem (R., Sturmfels, Tran, Uhler, 2018)

• The maximum likelihood estimator under log-concavity and MTP2 exists and isunique with probability 1 as long as there are at least 3 samples.

• Furthermore, the MLE is a consistent estimator.

Proof uses convergence properties for log-concave distributions, and does not shedlight on the shape of the MLE.

20 / 36

Page 32: Nonparametric Density Estimation under Total Positivityskripka/workshop_high_dim/Elina... · 2021. 5. 25. · reduction algorithms described in Groeneboom, Jongbloed and Wellner (2008)

The Support of the MLE

Support of MLE = ”Min-max convex hull” of X .

• MMconv(X ) = smallest min-max closed & convex set containing X .

• MM(X ) = smallest min-max closed set S containing X , i.e.x , y ∈ S ⇒ x ∧ y , x ∨ y ∈ S

Is it always true that MMconv(X ) = conv(MM(X ))? No, not always!

LemmaLet X = {x1, . . . , xn}. If X ⊆ R2 or X ⊆

∏di=1{ai , bi}, then,

MMconv(X ) = conv(MM(X )).

=⇒ we can compute MMconv(X ) for X ⊆ R2 and X ⊆∏d

i=1{ai , bi}.

21 / 36

Page 33: Nonparametric Density Estimation under Total Positivityskripka/workshop_high_dim/Elina... · 2021. 5. 25. · reduction algorithms described in Groeneboom, Jongbloed and Wellner (2008)

The Support of the MLE

Support of MLE = ”Min-max convex hull” of X .

• MMconv(X ) = smallest min-max closed & convex set containing X .

• MM(X ) = smallest min-max closed set S containing X , i.e.x , y ∈ S ⇒ x ∧ y , x ∨ y ∈ S

Is it always true that MMconv(X ) = conv(MM(X ))?

No, not always!

LemmaLet X = {x1, . . . , xn}. If X ⊆ R2 or X ⊆

∏di=1{ai , bi}, then,

MMconv(X ) = conv(MM(X )).

=⇒ we can compute MMconv(X ) for X ⊆ R2 and X ⊆∏d

i=1{ai , bi}.

21 / 36

Page 34: Nonparametric Density Estimation under Total Positivityskripka/workshop_high_dim/Elina... · 2021. 5. 25. · reduction algorithms described in Groeneboom, Jongbloed and Wellner (2008)

The Support of the MLE

Support of MLE = ”Min-max convex hull” of X .

• MMconv(X ) = smallest min-max closed & convex set containing X .

• MM(X ) = smallest min-max closed set S containing X , i.e.x , y ∈ S ⇒ x ∧ y , x ∨ y ∈ S

Is it always true that MMconv(X ) = conv(MM(X ))? No, not always!

LemmaLet X = {x1, . . . , xn}. If X ⊆ R2 or X ⊆

∏di=1{ai , bi}, then,

MMconv(X ) = conv(MM(X )).

=⇒ we can compute MMconv(X ) for X ⊆ R2 and X ⊆∏d

i=1{ai , bi}.

21 / 36

Page 35: Nonparametric Density Estimation under Total Positivityskripka/workshop_high_dim/Elina... · 2021. 5. 25. · reduction algorithms described in Groeneboom, Jongbloed and Wellner (2008)

The Support of the MLE

Support of MLE = ”Min-max convex hull” of X .

• MMconv(X ) = smallest min-max closed & convex set containing X .

• MM(X ) = smallest min-max closed set S containing X , i.e.x , y ∈ S ⇒ x ∧ y , x ∨ y ∈ S

Is it always true that MMconv(X ) = conv(MM(X ))? No, not always!

LemmaLet X = {x1, . . . , xn}. If X ⊆ R2 or X ⊆

∏di=1{ai , bi}, then,

MMconv(X ) = conv(MM(X )).

=⇒ we can compute MMconv(X ) for X ⊆ R2 and X ⊆∏d

i=1{ai , bi}.

21 / 36

Page 36: Nonparametric Density Estimation under Total Positivityskripka/workshop_high_dim/Elina... · 2021. 5. 25. · reduction algorithms described in Groeneboom, Jongbloed and Wellner (2008)

The Support of the MLE

Support of MLE = ”Min-max convex hull” of X .

• MMconv(X ) = smallest min-max closed & convex set containing X .

• MM(X ) = smallest min-max closed set S containing X , i.e.x , y ∈ S ⇒ x ∧ y , x ∨ y ∈ S

Is it always true that MMconv(X ) = conv(MM(X ))? No, not always!

LemmaLet X = {x1, . . . , xn}. If X ⊆ R2 or X ⊆

∏di=1{ai , bi}, then,

MMconv(X ) = conv(MM(X )).

=⇒ we can compute MMconv(X ) for X ⊆ R2 and X ⊆∏d

i=1{ai , bi}.

21 / 36

Page 37: Nonparametric Density Estimation under Total Positivityskripka/workshop_high_dim/Elina... · 2021. 5. 25. · reduction algorithms described in Groeneboom, Jongbloed and Wellner (2008)

The Min-Max Convex Hull

Now, consider X = {(0, 0, 0), (6, 0, 0), (6, 4, 0), (8, 4, 2)} ⊆ R3.

(0, 0, 0) (6, 0, 0)

(8, 4, 2)

(6, 4, 0)

(0, 0, 0) (6, 0, 0)

(8, 4, 2)(6, 4, 1.5)

(6, 4, 0)

It turns out thatMM(X ) = X .

Butconv(MM(X )) is not min-max closed!

This is because:

(6, 4,3

2) = max{(6, 4, 0), (6, 3,

3

2)} 6∈ conv(MM(X )).

Therefore,conv(MM(X )) ( MMconv(X ).

22 / 36

Page 38: Nonparametric Density Estimation under Total Positivityskripka/workshop_high_dim/Elina... · 2021. 5. 25. · reduction algorithms described in Groeneboom, Jongbloed and Wellner (2008)

The Min-Max Convex Hull

Now, consider X = {(0, 0, 0), (6, 0, 0), (6, 4, 0), (8, 4, 2)} ⊆ R3.

(0, 0, 0) (6, 0, 0)

(8, 4, 2)

(6, 4, 0)

(0, 0, 0) (6, 0, 0)

(8, 4, 2)(6, 4, 1.5)

(6, 4, 0)

It turns out thatMM(X ) = X .

Butconv(MM(X )) is not min-max closed!

This is because:

(6, 4,3

2) = max{(6, 4, 0), (6, 3,

3

2)} 6∈ conv(MM(X )).

Therefore,conv(MM(X )) ( MMconv(X ).

22 / 36

Page 39: Nonparametric Density Estimation under Total Positivityskripka/workshop_high_dim/Elina... · 2021. 5. 25. · reduction algorithms described in Groeneboom, Jongbloed and Wellner (2008)

Computing MMconv(X )

Theorem (The 2-D Projections Theorem)For any finite subset X ⊆ Rd ,

MMconv(X ) =⋂

1≤i<j≤d

π−1ij

(conv(MM(πij (X ))

).

πij : Rd → R2,

x 7→ (xi , xj ).

Corollary (Queyranne and Tardella, 2006)A convex polytope C in Rd is min-max closed if and only if it is defined by a finitecollection of bimonotone linear inequalities.

A linear inequality is bimonotone if it has the form

aixi + ajxj + b ≤ 0, where aiaj ≤ 0.

23 / 36

Page 40: Nonparametric Density Estimation under Total Positivityskripka/workshop_high_dim/Elina... · 2021. 5. 25. · reduction algorithms described in Groeneboom, Jongbloed and Wellner (2008)

Computing MMconv(X )

Theorem (The 2-D Projections Theorem)For any finite subset X ⊆ Rd ,

MMconv(X ) =⋂

1≤i<j≤d

π−1ij

(conv(MM(πij (X ))

).

πij : Rd → R2,

x 7→ (xi , xj ).

Corollary (Queyranne and Tardella, 2006)A convex polytope C in Rd is min-max closed if and only if it is defined by a finitecollection of bimonotone linear inequalities.

A linear inequality is bimonotone if it has the form

aixi + ajxj + b ≤ 0, where aiaj ≤ 0.

23 / 36

Page 41: Nonparametric Density Estimation under Total Positivityskripka/workshop_high_dim/Elina... · 2021. 5. 25. · reduction algorithms described in Groeneboom, Jongbloed and Wellner (2008)

Back to Log-concave and MTP2 Maximum LikelihoodEstimation

1. Does the MLE under log-concavity and MTP2 exist with probability 1 and, if so,is it unique? Yes.

2. What is the shape of the MLE under log-concavity and MTP2?

2.1 What is the support of the MLE? MMconv(X ); We can compute it.2.2 Is the MLE always exp(tent function)?

3. Which tent functions are allowed?

4. Can we compute the MLE?

5. Sample complexity?

24 / 36

Page 42: Nonparametric Density Estimation under Total Positivityskripka/workshop_high_dim/Elina... · 2021. 5. 25. · reduction algorithms described in Groeneboom, Jongbloed and Wellner (2008)

Supermodular Tent Functions

Recall that p = exp(h) is MTP2 if and only if h is supermodular, i.e.

h(x) + h(y) ≤ h(x ∧ y) + h(x ∨ y), for all x , y ∈ Rd .

Theorem (R., Sturmfels, Tran, Uhler, 2018)Let X ⊂ Rd be a finite set of points. A tent function h is supermodular if and only ifall of the walls of the subdivision h induces are bimonotone.

(0, 0) (0, 1) (0, 0) (0, 1)

(1, 0) (1, 1) (1, 0) (1, 1)

25 / 36

Page 43: Nonparametric Density Estimation under Total Positivityskripka/workshop_high_dim/Elina... · 2021. 5. 25. · reduction algorithms described in Groeneboom, Jongbloed and Wellner (2008)

Supermodular Tent Functions

Recall that p = exp(h) is MTP2 if and only if h is supermodular, i.e.

h(x) + h(y) ≤ h(x ∧ y) + h(x ∨ y), for all x , y ∈ Rd .

Theorem (R., Sturmfels, Tran, Uhler, 2018)Let X ⊂ Rd be a finite set of points. A tent function h is supermodular if and only ifall of the walls of the subdivision h induces are bimonotone.

(0, 0) (0, 1) (0, 0) (0, 1)

(1, 0) (1, 1) (1, 0) (1, 1)

h(x)

h(y)h(x ∧ y)

h(x ∨ y)

RemarkTo find the best hX ,y , need to optimize over heights y that induce bimonotonesubdivisions.

• In general not convex.

• Example: X = {0, 1} × {0, 1} × {0, 1, 2}.

26 / 36

Page 44: Nonparametric Density Estimation under Total Positivityskripka/workshop_high_dim/Elina... · 2021. 5. 25. · reduction algorithms described in Groeneboom, Jongbloed and Wellner (2008)

Maximum Likelihood Estimation under Log-concavity andMTP2

1. Does the MLE under log-concavity and MTP2 exist with probability 1 and, if so,is it unique? Yes.

2. What is the shape of the MLE under log-concavity and MTP2?

2.1 What is the support of the MLE? MMconv(X ); We can compute it.2.2 Is the MLE always exp(tent function)?

3. Which tent functions are allowed? Bimonotone tent functions.

4. Can we compute the MLE?

5. Sample complexity?

27 / 36

Page 45: Nonparametric Density Estimation under Total Positivityskripka/workshop_high_dim/Elina... · 2021. 5. 25. · reduction algorithms described in Groeneboom, Jongbloed and Wellner (2008)

When is the MLE the exponential of a tent function?

Theorem (R., Sturmfels, Tran, Uhler 2018)If X ⊆ R2 or X ⊆

∏di=1{ai , bi}, then the MLE is the exponential of a tent function

p∗ = exp(hX ,y ),

where X = MM(X ).

28 / 36

Page 46: Nonparametric Density Estimation under Total Positivityskripka/workshop_high_dim/Elina... · 2021. 5. 25. · reduction algorithms described in Groeneboom, Jongbloed and Wellner (2008)

Finding the MLETheorem (R., Sturmfels, Tran, Uhler 2018)If X ⊆ R2 or X ⊆

∏i{ai , bi}, then the set of heights for which exp(hX ,y ) is MTP2 is a

convex polytope S.Therefore, we can use, e.g. the conditional gradient method, to find y∗.

maximizey

n∑i=1

wiyi −∫

exp(hX ,y )

s.t. y ∈ S.

29 / 36

Page 47: Nonparametric Density Estimation under Total Positivityskripka/workshop_high_dim/Elina... · 2021. 5. 25. · reduction algorithms described in Groeneboom, Jongbloed and Wellner (2008)

Finding the MLETheorem (R., Sturmfels, Tran, Uhler 2018)If X ⊆ R2 or X ⊆

∏i{ai , bi}, then the set of heights for which exp(hX ,y ) is MTP2 is a

convex polytope S.Therefore, we can use, e.g. the conditional gradient method, to find y∗.

maximizey

n∑i=1

wiyi −∫

exp(hX ,y )

s.t. y ∈ S.

29 / 36

Page 48: Nonparametric Density Estimation under Total Positivityskripka/workshop_high_dim/Elina... · 2021. 5. 25. · reduction algorithms described in Groeneboom, Jongbloed and Wellner (2008)

What is the shape of the MLE in the general case?

Let X = {(0, 0, 0), (6, 0, 0), (6, 4, 0), (8, 4, 2), (6, 4, 32

)}, w = 128

(15, 1, 1, 1, 10). Thelog-concave MLE is not MTP2.

(0, 0, 0) (6, 0, 0)

(8, 4, 2)(6, 4, 1.5)

(6, 4, 0)

The MTP2 and log-concave MLE is exp(tent function) on X ∪ {(6, 3, 32

), (7.5, 4, 32

)}with subdivision as above.

30 / 36

Page 49: Nonparametric Density Estimation under Total Positivityskripka/workshop_high_dim/Elina... · 2021. 5. 25. · reduction algorithms described in Groeneboom, Jongbloed and Wellner (2008)

What is the shape of the MLE in the general case?

Let X = {(0, 0, 0), (6, 0, 0), (6, 4, 0), (8, 4, 2), (6, 4, 32

)}, w = 128

(15, 1, 1, 1, 10). Thelog-concave MLE φ is not supermodular.

(0, 0, 0) (6, 0, 0)

(8, 4, 2)(6, 4, 1.5)

(6, 4, 0)

(7.5, 4, 1.5)(6, 3, 1.5)

The MTP2 and log-concave MLE is exp(tent function) on X ∪ {(6, 3, 32

), (7.5, 4, 32

)}with subdivision as above.

31 / 36

Page 50: Nonparametric Density Estimation under Total Positivityskripka/workshop_high_dim/Elina... · 2021. 5. 25. · reduction algorithms described in Groeneboom, Jongbloed and Wellner (2008)

Sample complexity for log-concave density estimation

• Fd = set of log-concave and MTP2 densities in Rd

• h2(f , g) =∫

(√f −√g)2 Hellinger distance

• pn = log-concave MLE from n samples

Theorem (Kim and Samworth, 2014)

(a). Upper bound for d = 2, 3:

supp0∈F2

Ep0 [h2(p0, pn)] = O(n−2

d+1 log n).

(b). Lower bound for d ≥ 2:

infpn

supp0∈F2

Ep0 [h2(p0, pn)] ≥ c2n− 2

d+1 .

MLE is minimax optimal up to log factorsfor d = 2, 3.

Theorem (Carpenter et al., 2018)When d ≥ 4:

(a). Upper bound:

supp0∈Fd

Ep0 [h2(p0, pn)] = O(n−2

d+3 )

(b). Lower bound:

infpn

supp0∈Fd

Ep0 [h2(p0, pn)] ≥ c2n− 2

d+1

32 / 36

Page 51: Nonparametric Density Estimation under Total Positivityskripka/workshop_high_dim/Elina... · 2021. 5. 25. · reduction algorithms described in Groeneboom, Jongbloed and Wellner (2008)

Sample complexity for log-concave density estimation

• Fd = set of log-concave and MTP2 densities in Rd

• h2(f , g) =∫

(√f −√g)2 Hellinger distance

• pn = log-concave MLE from n samples

Theorem (Kim and Samworth, 2014)

(a). Upper bound for d = 2, 3:

supp0∈F2

Ep0 [h2(p0, pn)] = O(n−2

d+1 log n).

(b). Lower bound for d ≥ 2:

infpn

supp0∈F2

Ep0 [h2(p0, pn)] ≥ c2n− 2

d+1 .

MLE is minimax optimal up to log factorsfor d = 2, 3.

Theorem (Carpenter et al., 2018)When d ≥ 4:

(a). Upper bound:

supp0∈Fd

Ep0 [h2(p0, pn)] = O(n−2

d+3 )

(b). Lower bound:

infpn

supp0∈Fd

Ep0 [h2(p0, pn)] ≥ c2n− 2

d+1

32 / 36

Page 52: Nonparametric Density Estimation under Total Positivityskripka/workshop_high_dim/Elina... · 2021. 5. 25. · reduction algorithms described in Groeneboom, Jongbloed and Wellner (2008)

Sample complexity for log-concave and MTP2

• Fd = set of log-concave and MTP2 densities in Rd

• h2(f , g) =∫

(√f −√g)2 Hellinger distance

• pn = MTP2 and log-concave MLE from n samples

Theorem (R., Uhler, Wang, 2018)When d = 2:

(a). Upper bound:

supp0∈F2

Ep0 [h2(p0, pn)] = O(n−23 log n).

(b). Lower bound:

infpn

supp0∈F2

Ep0 [h2(p0, pn)] ≥ c2n− 2

3 .

MLE is minimax optimal up to log factors.

ProblemWhen d ≥ 3:

(a). Upper bound?

supp0∈Fd

Ep0 [h2(p0, pn)] = O(n−23 log n)

(b). Lower bound:

infpn

supp0∈Fd

Ep0 [h2(p0, pn)] ≥ c2poly(d)n−23

No curse of dimensionality?

33 / 36

Page 53: Nonparametric Density Estimation under Total Positivityskripka/workshop_high_dim/Elina... · 2021. 5. 25. · reduction algorithms described in Groeneboom, Jongbloed and Wellner (2008)

Sample complexity for log-concave and MTP2

• Fd = set of log-concave and MTP2 densities in Rd

• h2(f , g) =∫

(√f −√g)2 Hellinger distance

• pn = MTP2 and log-concave MLE from n samples

Theorem (R., Uhler, Wang, 2018)When d = 2:

(a). Upper bound:

supp0∈F2

Ep0 [h2(p0, pn)] = O(n−23 log n).

(b). Lower bound:

infpn

supp0∈F2

Ep0 [h2(p0, pn)] ≥ c2n− 2

3 .

MLE is minimax optimal up to log factors.

ProblemWhen d ≥ 3:

(a). Upper bound?

supp0∈Fd

Ep0 [h2(p0, pn)] = O(n−23 log n)

(b). Lower bound:

infpn

supp0∈Fd

Ep0 [h2(p0, pn)] ≥ c2poly(d)n−23

No curse of dimensionality?

33 / 36

Page 54: Nonparametric Density Estimation under Total Positivityskripka/workshop_high_dim/Elina... · 2021. 5. 25. · reduction algorithms described in Groeneboom, Jongbloed and Wellner (2008)

Conclusion

MLE

Is the MLE always exp(tent function)?

Rates for d ≥ 3?

Faster algorithms for finding the MLE?

Kernel Density Estimation

Min-max closure of sample, MM(X ), convolved with standard Gaussian is MTP2.

[with Ali Zartash, 2019]

Consistency and rates?

Test statistics for MTP2 distributions?

Multivariate total positivity of higher orders (i.e. MTPk )?

34 / 36

Page 55: Nonparametric Density Estimation under Total Positivityskripka/workshop_high_dim/Elina... · 2021. 5. 25. · reduction algorithms described in Groeneboom, Jongbloed and Wellner (2008)

Conclusion

MLE

Is the MLE always exp(tent function)?

Rates for d ≥ 3?

Faster algorithms for finding the MLE?

Kernel Density Estimation

Min-max closure of sample, MM(X ), convolved with standard Gaussian is MTP2.

[with Ali Zartash, 2019]

Consistency and rates?

Test statistics for MTP2 distributions?

Multivariate total positivity of higher orders (i.e. MTPk )?

34 / 36

Page 56: Nonparametric Density Estimation under Total Positivityskripka/workshop_high_dim/Elina... · 2021. 5. 25. · reduction algorithms described in Groeneboom, Jongbloed and Wellner (2008)

Conclusion

MLE

Is the MLE always exp(tent function)?

Rates for d ≥ 3?

Faster algorithms for finding the MLE?

Kernel Density Estimation

Min-max closure of sample, MM(X ), convolved with standard Gaussian is MTP2.

[with Ali Zartash, 2019]

Consistency and rates?

Test statistics for MTP2 distributions?

Multivariate total positivity of higher orders (i.e. MTPk )?

34 / 36

Page 57: Nonparametric Density Estimation under Total Positivityskripka/workshop_high_dim/Elina... · 2021. 5. 25. · reduction algorithms described in Groeneboom, Jongbloed and Wellner (2008)

Thank you!

35 / 36

Page 58: Nonparametric Density Estimation under Total Positivityskripka/workshop_high_dim/Elina... · 2021. 5. 25. · reduction algorithms described in Groeneboom, Jongbloed and Wellner (2008)

References

M. Cule, R. Samworth, M. Stuart. Maximum likelihood estimation of a multidimensional log-concave

density. Statistical Methodology Series B 72:5 (2010), 545-607

J.-C. Hutter, C. Mao, P. Rigollet, and E. Robeva. Optimal Rates for Estimation of Two-dimensional Totally

Positive Distributions.

E. Robeva, B. Sturmfels, and C. Uhler. Geometry of Log-concave Density Estimation. arXiv:1704.01910

E. Robeva, B. Sturmfels, N. Tran, and C. Uhler. Maximum Likelihood Estimation for Totally Positive

Log-Concave Densities. arXiv:1806.10120

E. Robeva and M. Sun. Bimonotone Subdivisions of Point Configurations in the Plane. In preparation

E. Robeva, C. Uhler, and Y. Wang. Rates of Estimation of Log-Concave and Totally Positive Densities. In

preparation

A. Zartash and E. Robeva. Kernel density estimation for totally positive random vectors. arXiv:1910:02345

36 / 36