the horseshoe estimator for sparse signals carlos m. carvalho nicholas g. polson james g. scott...

The horseshoe estimator for sparse signals

CARLOS M. CARVALHO

NICHOLAS G. POLSON

JAMES G. SCOTT

Biometrika (2010)

Presented by Eric Wang

10/14/2010

Overview

• This paper proposes a highly analytically tractable horseshoe estimator that is more robust and adaptive to different sparsity patterns than existing approaches.

• Two theorems are proved characterizing the proposed estimator’s tail robustness and demonstrating super-efficient rate of convergence to the correct estimate of the sampling density in sparse situation.

• The proposed estimator’s performance is demonstrated using both real and simulated data. The authors show its answer correspond quite closely to those obtained by Bayesian model averaging.

• Consider a p-dimensional vector where is sparse, the authors propose the following model for estimation and prediction:

where is a standard half-Cauchy distribution with mean 0 and scale parameter a.

• The name horseshoe prior arises from the observation that, for fixed values

where and is the amount of shrinkage toward zero, a posteriori. has a horseshoe shaped prior .

The horseshoe estimator

The horseshoe estimator

• The meaning of is as follows: yields virtually no shrinkage, and describes signals while yields near total shrinkage and (hopefully) describes noise.

•At right is the prior on the shrinkage coefficient .

The horseshoe density function

• An analytic density function lacks an analytic form, but very tight bounds are available:

Theorem 1. The univariate horseshoe density satisfies the following: (a) (b) For

where

• Alternatively, it is possible to integrate over yielding

though the dependence among causes more issues. Therefore the authors do not take this approach.

Horseshoe estimator for sparse signals

Review of similar methods

• Scott & Berger (2006) studied the discrete mixture

where

• Tipping (2001) studied the Student-t prior is defined by an inverse-gamma mixing density,

• The double-exponential prior (Bayesian lasso) has mixing density

Review of similar methods• The normal-Jeffreys prior is an improper prior and is induced

by placing the Jeffreys’ prior on each variance term

leading to . This choice is commonly used in the absence of a global scale parameter.

• The Strawderman-Berger prior does not have an analytic form, but arises from assuming , with

• The normal-exponential-gamma family of priors generalizes the lasso specification using to mix over the exponential rate parameter, leading to

Review of similar methods

Shrinkage of noiseTail robustness of prior

Robustness to large signals

• Theorem 2. Let be the likelihood, and suppose that is a zero-mean scale mixture of normals: with having proper prior . Assume further that the likelihood and are such that the marginal density is finite for all . Define the following three pseudo-densities, which may be improper:

Then

• If is a Gaussian likelihood, then the result of Theorem 2 reduces to

• A key result of Theorem 2 is that if the prior on is chosen such that the derivative of the log probability leads to the

derivative of the log predictive probability that

is bounded at 0 at large . This happens for heavy-tailed priors, including the proposed horseshoe prior. This yields

Robustness to large signals

The horseshoe score function

• Theorem 3. Suppose . Let denote the predictive density under the horseshoe prior for known scale parameter , i.e. where and . Then for some that depends upon , and

• Corollary:

• Although the horseshoe prior has no analytic form, it does lead to the following posterior mean

where is a degenerate hypergeometric function of two variables.

Estimating

• The conditional posterior distribution of is approximately

if dimensionality p is large.

• This approximately yields a distribution for where .

• If most observations are shrunk toward 0, then will be small with high probability.

Comparison to double exponential

Super-efficient convergence

• Theorem 4. Suppose the true sampling model is . Then:

(1) For under the horseshoe prior, the optimal rate of convergence of when is

where b is a constant. When , the optimal rate is .

(2) Suppose is any other density that is continuous, bounded above, and strictly positive on a neighborhood of the true value . For under , the optimal rate of convergence of , regardless of , is

Example - simulated data

• Data generated from

Example-Vanguard mutual-fund data

• Here, the authors show how the horseshoe can provide a regularized estimate of a large covariance matrix whose inverse may be sparse.

• Vanguard mutual funds dataset containing n = 86 weekly returns for p = 59 funds.

• Suppose the observation matrix is

with each p-dimensional vector is drawn from a zero-mean Gaussian with covariance matrix .

• We will model the Cholesky decomposition of .

Example-Vanguard mutual-fund data

• The goal is to estimate the ensemble of regression models in the implied triangular system , where is the column of Y.

• The regression coefficients are assumed to have a Horseshoe prior, and posterior means were computed using MCMC.

Conclusions

• This paper introduces the horseshoe prior as a good default prior for sparse problems.

• Empirically, the model performs similarly to Bayesian model averaging, the current standard.

• The model exhibits strong global shrinkage and robust local adaptation to signals.

the horseshoe estimator for sparse signals carlos m. carvalho nicholas g. polson james g. scott...

Documents

proposed horseshoe prior

univariate horseshoe

horseshoe estimatorthe

tractable horseshoe

strawdermanberger prior

studentt prior

sampling density

marginal density