bayesian model choice and information criteria in sparse generalized linear...

41
Bayesian Model Choice and Information Criteria in Sparse Generalized Linear Models Mathias Drton Department of Statistics University of Chicago (Paper with this title: Rina Foygel & M.D., arXiv:1112.5635)

Upload: others

Post on 16-Mar-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bayesian Model Choice and Information Criteria in Sparse Generalized Linear Modelsgalton.uchicago.edu/~drton/Stuff/drton_ebic.pdf · 2012. 6. 25. · Chen & Chen ’11Generalized

Bayesian Model Choice and Information Criteria inSparse Generalized Linear Models

Mathias Drton

Department of StatisticsUniversity of Chicago

(Paper with this title: Rina Foygel & M.D., arXiv:1112.5635)

Page 2: Bayesian Model Choice and Information Criteria in Sparse Generalized Linear Modelsgalton.uchicago.edu/~drton/Stuff/drton_ebic.pdf · 2012. 6. 25. · Chen & Chen ’11Generalized

Outline

1 BIC and extensions

2 Asymptotics for marginal likelihood of GLMs

3 Consistency for GLMs

4 Ising models

2 / 36

Page 3: Bayesian Model Choice and Information Criteria in Sparse Generalized Linear Modelsgalton.uchicago.edu/~drton/Stuff/drton_ebic.pdf · 2012. 6. 25. · Chen & Chen ’11Generalized

Outline

1 BIC and extensions

2 Asymptotics for marginal likelihood of GLMs

3 Consistency for GLMs

4 Ising models

BIC and extensions 2 / 36

Page 4: Bayesian Model Choice and Information Criteria in Sparse Generalized Linear Modelsgalton.uchicago.edu/~drton/Stuff/drton_ebic.pdf · 2012. 6. 25. · Chen & Chen ’11Generalized

Bayesian information criterion (BIC)

Sample Y1, . . . ,Yn

Parametric model MMaximized log-likelihood function ˆ(M)

Bayesian information criterion (Schwarz, 1978)

BIC(M) := ˆ(M)− dim(M)

2log n

‘Generic’ model selection approach:

Maximize BIC(M) over set of considered models

BIC and extensions 3 / 36

Page 5: Bayesian Model Choice and Information Criteria in Sparse Generalized Linear Modelsgalton.uchicago.edu/~drton/Stuff/drton_ebic.pdf · 2012. 6. 25. · Chen & Chen ’11Generalized

Motivation: 1) Bayesian model choice

Posterior model probability in fully Bayesian treatment:

P(M|Y1, . . . ,Yn) ∝ P(M)︸ ︷︷ ︸prior

P(Y1, . . . ,Yn |M).

Marginal likelihood:

Ln(M) := P(Y1, . . . ,Yn |M)

=

∫P(Y1, . . . ,Yn | θM,M)︸ ︷︷ ︸

likelihood fct.

d P(θM |M)︸ ︷︷ ︸prior

BIC and extensions 4 / 36

Page 6: Bayesian Model Choice and Information Criteria in Sparse Generalized Linear Modelsgalton.uchicago.edu/~drton/Stuff/drton_ebic.pdf · 2012. 6. 25. · Chen & Chen ’11Generalized

Motivation: 2) Asymptotics

Y1, . . . ,Yn i.i.d. sample from P0 ∈M

Theorem (Schwarz, 1978; Haughton, 1988; and others)

Assume P(θM |M) is a ‘nice’ prior on Rd . Then in ‘nice’ models,

log Ln(M) = ˆn(M)− d

2log n + Op(1),

and a better (Laplace) approximation is possible:

log Ln(M) = ˆn(M)− d

2log( n

)+ log P(θM |M)

− 1

2log det

[1

nHessian(θM)

]+ Op

(n−1/2

)

BIC and extensions 5 / 36

Page 7: Bayesian Model Choice and Information Criteria in Sparse Generalized Linear Modelsgalton.uchicago.edu/~drton/Stuff/drton_ebic.pdf · 2012. 6. 25. · Chen & Chen ’11Generalized

Consistency

Theorem

Fix a finite set of ‘nice’ models. Then, BIC selects a true model ofsmallest dimension with probability tending to one as n→∞.

Proof.

Finite set of models =⇒ pairwise comparisons suffice.

If P0 ∈M1 (M2 and d1 < d2, then

ˆn(M2)− ˆ

n(M1) = Op(1); and (d2 − d1) log n→∞.

If P0 ∈M1 \ clos(M2), then with probability tending to one,

1

n

[ˆn(M1)− ˆ

n(M2)]> ε > 0; and log(n)/n→ 0.

BIC and extensions 6 / 36

Page 8: Bayesian Model Choice and Information Criteria in Sparse Generalized Linear Modelsgalton.uchicago.edu/~drton/Stuff/drton_ebic.pdf · 2012. 6. 25. · Chen & Chen ’11Generalized

Linear regression (covariates i.i.d. N(0, 1), φ1 = 1, σ = 2)

BIC and extensions 7 / 36

Page 9: Bayesian Model Choice and Information Criteria in Sparse Generalized Linear Modelsgalton.uchicago.edu/~drton/Stuff/drton_ebic.pdf · 2012. 6. 25. · Chen & Chen ’11Generalized

BIC in higher-dimensional linear regression

Exhaustive search up to 6 covariates

10 20 30 40 50 60 70 80

0.0

0.2

0.4

0.6

0.8

1.0

p

Pro

b co

rrec

t

n = p,σ = 1,k = 2,φ1 = 1φ2 = 1

BIC and extensions 8 / 36

Page 10: Bayesian Model Choice and Information Criteria in Sparse Generalized Linear Modelsgalton.uchicago.edu/~drton/Stuff/drton_ebic.pdf · 2012. 6. 25. · Chen & Chen ’11Generalized

Higher-dimensional linear regression . . . too large models

σ = 1,k = 2,φ1 = φ2 = 1

Broman & Speed (2002)

BIC and extensions 9 / 36

Page 11: Bayesian Model Choice and Information Criteria in Sparse Generalized Linear Modelsgalton.uchicago.edu/~drton/Stuff/drton_ebic.pdf · 2012. 6. 25. · Chen & Chen ’11Generalized

Informative prior on models in higher dim. regression

σ = 1,k = 2,φ1 = φ2 = 1

BIC and extensions 10 / 36

Page 12: Bayesian Model Choice and Information Criteria in Sparse Generalized Linear Modelsgalton.uchicago.edu/~drton/Stuff/drton_ebic.pdf · 2012. 6. 25. · Chen & Chen ’11Generalized

Informative prior on models in higher dim. regression

Exhaustive search up to 6 covariates

10 20 30 40 50 60 70 80

0.0

0.2

0.4

0.6

0.8

1.0

p

Pro

b co

rrec

t

BICEBIC

n = p,σ = 1,k = 2,φ1 = 1φ2 = 1

BIC and extensions 11 / 36

Page 13: Bayesian Model Choice and Information Criteria in Sparse Generalized Linear Modelsgalton.uchicago.edu/~drton/Stuff/drton_ebic.pdf · 2012. 6. 25. · Chen & Chen ’11Generalized

Extended Bayesian information criterion

Linear regression

Models given by subsets of covariates J ⊂ [p] := {1, . . . , p}Prior on models

P(J) =1

p + 1· 1( p|J|)

has k = #covariates and (J|k) uniformly distributed.

Extended BIC defined as

EBIC(J) = BIC(J)− |J| log p;

we have |J| � p in mind.

Bogdan et al. (2004), Chen & Chen (2008), Scott and Berger (2010), . . .

BIC and extensions 12 / 36

Page 14: Bayesian Model Choice and Information Criteria in Sparse Generalized Linear Modelsgalton.uchicago.edu/~drton/Stuff/drton_ebic.pdf · 2012. 6. 25. · Chen & Chen ’11Generalized

Theory = consistency for EBIC

Chen & Chen ’08 High-dimensional sparse linear regression(fixed design, # active covariates bounded).

Chen & Chen ’11 Generalized linear models(fixed design, canonical link).

Chen et al. ’11 Generalizations for fixed design regression

Gao et al. ’10 Gaussian graphical modelsFoygel & D ’10 (adjust penalty for number of graphs)

BIC and extensions 13 / 36

Page 15: Bayesian Model Choice and Information Criteria in Sparse Generalized Linear Modelsgalton.uchicago.edu/~drton/Stuff/drton_ebic.pdf · 2012. 6. 25. · Chen & Chen ’11Generalized

Questions

Bayesian connection under high-dimensional asymptotics:

� Laplace approximation to marginal likelihood accurate uniformly over agrowing number of models?

� EBIC captures growth of marginal likelihood?

Consistency for random designs?

Consistency for pseudo-likelihood approaches to graphical modelselection?

Consistency of fully Bayesian model choice as corollaries?

Shang & Clayton (2011)

BIC and extensions 14 / 36

Page 16: Bayesian Model Choice and Information Criteria in Sparse Generalized Linear Modelsgalton.uchicago.edu/~drton/Stuff/drton_ebic.pdf · 2012. 6. 25. · Chen & Chen ’11Generalized

Outline

1 BIC and extensions

2 Asymptotics for marginal likelihood of GLMs

3 Consistency for GLMs

4 Ising models

Asymptotics for marginal likelihood of GLMs 15 / 36

Page 17: Bayesian Model Choice and Information Criteria in Sparse Generalized Linear Modelsgalton.uchicago.edu/~drton/Stuff/drton_ebic.pdf · 2012. 6. 25. · Chen & Chen ’11Generalized

Generalized linear model: Setup

Independent (response) observations Y1, . . . ,Yn

Distribution of Yi ∼ pθi from univariate exponential family:

pθ(y) ∝ exp {y · θ − b(θ)} , θ ∈ Θ = R.

Linearity:θ = (θ1, . . . , θn)T = Xφ, φ ∈ Rp,

for design matrix X = (Xij) ∈ Rn×p.(rows , experiments, col’s , covariates)

Random design with X1•, . . . ,Xn• i.i.d.

Variable selection:

Find support J∗ ⊂ [p] of true parameter φ∗.

Asymptotics for marginal likelihood of GLMs 16 / 36

Page 18: Bayesian Model Choice and Information Criteria in Sparse Generalized Linear Modelsgalton.uchicago.edu/~drton/Stuff/drton_ebic.pdf · 2012. 6. 25. · Chen & Chen ’11Generalized

Assumptions

(A) Bounded covariates (or a moment condition)

(B1) Subexponential growth of dimension: log(pn) = o(n).

(B2) Dimension of smallest true model bounded by a fixed q ∈ N.

(B3) Small sets of covariates have second moment matrices with mimimaleigenvalue bounded away from zero:

λmin

(E[X1JXT

1J

])> a > 0 for all |J| ≤ 2q.

(B4) Norm of signal ‖φ∗‖2 bounded.

Asymptotics for marginal likelihood of GLMs 17 / 36

Page 19: Bayesian Model Choice and Information Criteria in Sparse Generalized Linear Modelsgalton.uchicago.edu/~drton/Stuff/drton_ebic.pdf · 2012. 6. 25. · Chen & Chen ’11Generalized

Theorem (Laplace approximation)

Assume (A), (B1)-(B4) and ‘nice priors’ (fJ : J ⊂ [p], |J| ≤ q). Then thereis a constant C such that the marginal likelihood sequence Ln(J) satisfiesthat

log Ln(J) = `n(φJ)− |J|2

log(n) + log fJ(φJ) +|J|2

log(2π)

− 1

2log det

(1

nHessianJ(φJ)

)± C

√log(np)

nfor all |J| ≤ q,

with probability tending to 1 as n→∞.

Asymptotics for marginal likelihood of GLMs 18 / 36

Page 20: Bayesian Model Choice and Information Criteria in Sparse Generalized Linear Modelsgalton.uchicago.edu/~drton/Stuff/drton_ebic.pdf · 2012. 6. 25. · Chen & Chen ’11Generalized

EBIC approximation

EBIC (with parameter γ ≥ 0):

EBICγ(J) = `n(φJ) − |J|2

log(n) − γ|J| · log(p) .

Corollary

Assume (A), (B1)-(B4) and ‘nice priors’ (fJ : J ⊂ [p], |J| ≤ q). Adopt theunnormalized model prior

Pγ(J) =

(p

|J|

)−γ· 1 {|J| ≤ q} .

Then there is a constant C ′ such that with probability tending to 1 asn→∞, we have∣∣∣log

[Pγ(J,Y )

]− EBICγ(J)

∣∣∣ ≤ C ′ for all |J| ≤ q.

Asymptotics for marginal likelihood of GLMs 19 / 36

Page 21: Bayesian Model Choice and Information Criteria in Sparse Generalized Linear Modelsgalton.uchicago.edu/~drton/Stuff/drton_ebic.pdf · 2012. 6. 25. · Chen & Chen ’11Generalized

Laplace approximation to marginal likelihood

∫RJ

exp(`n(φJ + γ)

)· fJ(φJ + γ) dγ

Taylor series:

`n(φJ + γ) = `[n](φJ) − 1

2γ> HessianJ(φJ + tγ · γ) γ

Approximation by Gaussian integral:

fJ(φJ) ·∫RJ

exp(`n(φJ)

)· exp

(−1

2γ> HessianJ(φJ) γ

)dγ

= fJ(φJ) · exp(`n(φJ) ·

√(2π

n

)|J|· det

(1

nHessianJ(φJ)

)−1

Asymptotics for marginal likelihood of GLMs 20 / 36

Page 22: Bayesian Model Choice and Information Criteria in Sparse Generalized Linear Modelsgalton.uchicago.edu/~drton/Stuff/drton_ebic.pdf · 2012. 6. 25. · Chen & Chen ’11Generalized

Laplace approximation to marginal likelihood

∫RJ

exp

(`n(φJ)− 1

2γ> HessianJ(φJ + tγ · γ) γ︸ ︷︷ ︸

≈ `n(φJ)− 1

2γ> HessianJ(φJ) γ

)dγ

`n(φJ)

γ = 0 ↔ φ = φJ

‖γ‖2 ≤√

log(p)n

‖γ‖2 ≤ 1

Asymptotics for marginal likelihood of GLMs 21 / 36

Page 23: Bayesian Model Choice and Information Criteria in Sparse Generalized Linear Modelsgalton.uchicago.edu/~drton/Stuff/drton_ebic.pdf · 2012. 6. 25. · Chen & Chen ’11Generalized

Laplace approximation to marginal likelihood

∫RJ

exp

(`n(φJ)− 1

2γ> HessianJ(φJ + tγ · γ) γ︸ ︷︷ ︸

≈ `n(φJ)− 1

2γ> HessianJ(φJ) γ

)dγ

`n(φJ)

γ = 0 ↔ φ = φJ

‖γ‖2 ≤√

log(p)n

‖γ‖2 ≤ 1

Asymptotics for marginal likelihood of GLMs 21 / 36

Page 24: Bayesian Model Choice and Information Criteria in Sparse Generalized Linear Modelsgalton.uchicago.edu/~drton/Stuff/drton_ebic.pdf · 2012. 6. 25. · Chen & Chen ’11Generalized

Laplace approximation to marginal likelihood

∫RJ

exp

(`n(φJ)− 1

2γ> HessianJ(φJ + tγ · γ) γ︸ ︷︷ ︸

≈ `n(φJ)− 1

2γ> HessianJ(φJ) γ

)dγ

`n(φJ)

γ = 0 ↔ φ = φJ

‖γ‖2 ≤√

log(p)n

‖γ‖2 ≤ 1

Asymptotics for marginal likelihood of GLMs 21 / 36

Page 25: Bayesian Model Choice and Information Criteria in Sparse Generalized Linear Modelsgalton.uchicago.edu/~drton/Stuff/drton_ebic.pdf · 2012. 6. 25. · Chen & Chen ’11Generalized

Laplace approximation to marginal likelihood

∫RJ

exp

(`n(φJ)− 1

2γ> HessianJ(φJ + tγ · γ) γ︸ ︷︷ ︸

≈ `n(φJ)− 1

2γ> HessianJ(φJ) γ

)dγ

`n(φJ)

γ = 0 ↔ φ = φJ

‖γ‖2 ≤√

log(p)n

‖γ‖2 ≤ 1

Asymptotics for marginal likelihood of GLMs 21 / 36

Page 26: Bayesian Model Choice and Information Criteria in Sparse Generalized Linear Modelsgalton.uchicago.edu/~drton/Stuff/drton_ebic.pdf · 2012. 6. 25. · Chen & Chen ’11Generalized

Laplace approximation to marginal likelihood

∫RJ

exp

(`n(φJ)− 1

2γ> HessianJ(φJ + tγ · γ) γ︸ ︷︷ ︸

≈ `n(φJ)− 1

2γ> HessianJ(φJ) γ

)dγ

●●

`n(φJ)

γ = 0 ↔ φ = φJ

‖γ‖2 ≤√

log(p)n

‖γ‖2 ≤ 1

Asymptotics for marginal likelihood of GLMs 21 / 36

Page 27: Bayesian Model Choice and Information Criteria in Sparse Generalized Linear Modelsgalton.uchicago.edu/~drton/Stuff/drton_ebic.pdf · 2012. 6. 25. · Chen & Chen ’11Generalized

Assumptions on priors

Family of priors (fJ : J ⊂ [p], |J| ≤ q) is ‘nice’ if for constants0 < F1,F2,F3 <∞ we have uniformly for all |J| ≤ q:

(i) an upper bound:sup φJ fJ(φJ) ≤ F1 <∞,

(ii) a lower bound over a compact set:

inf ‖φJ‖2≤R+1fJ(φJ) ≥ F2 > 0,

where R is a function of the constants in (A) & (B1)-(B4),

(iii) a Lipschitz property on the same compact set:

sup ‖φJ‖2≤R+1 ‖∇fJ(φJ)‖2 ≤ F3 <∞.

Asymptotics for marginal likelihood of GLMs 22 / 36

Page 28: Bayesian Model Choice and Information Criteria in Sparse Generalized Linear Modelsgalton.uchicago.edu/~drton/Stuff/drton_ebic.pdf · 2012. 6. 25. · Chen & Chen ’11Generalized

Outline

1 BIC and extensions

2 Asymptotics for marginal likelihood of GLMs

3 Consistency for GLMs

4 Ising models

Consistency for GLMs 23 / 36

Page 29: Bayesian Model Choice and Information Criteria in Sparse Generalized Linear Modelsgalton.uchicago.edu/~drton/Stuff/drton_ebic.pdf · 2012. 6. 25. · Chen & Chen ’11Generalized

(B5) Small true coefficients don’t decay too fast:√log(npn)

n= o

(min

{∣∣φ∗j ∣∣ : j ∈ J∗}).

Theorem (EBIC consistency in GLM)

Assume (A), (B1)-(B5). Let

κ = limn→∞

log pn

log n∈ [0,∞],

and take γ > 1− 12κ . Then with prob. tending to 1 as n→∞, we have

BICγ(J∗)− maxJ 6=J∗,|J|≤q

BICγ(J) ≥ log(p) · Chigh + log(n) · Clow

for constants Chigh,Clow > 0.

Consistency for GLMs 24 / 36

Page 30: Bayesian Model Choice and Information Criteria in Sparse Generalized Linear Modelsgalton.uchicago.edu/~drton/Stuff/drton_ebic.pdf · 2012. 6. 25. · Chen & Chen ’11Generalized

EBIC approximates Bayesian model choice

Corollary (Consistency of Bayesian model choice)

Assume (A), (B1)-(B5) and ‘nice priors’. Then with probability tending to1 as n→∞, we have

Pγ(J∗ |Y ) > maxJ 6=J∗,|J|≤q

Pγ(J |Y ) .

Consistency for GLMs 25 / 36

Page 31: Bayesian Model Choice and Information Criteria in Sparse Generalized Linear Modelsgalton.uchicago.edu/~drton/Stuff/drton_ebic.pdf · 2012. 6. 25. · Chen & Chen ’11Generalized

Experiment for sparse logistic regression (with lasso)

Spambase data from UCI Machine Learning Data Repository

n0 = 4601 emails, p0 = 57 covariates

Downsample to n < n0 experiments.

Create p − p0 noise covariates by random permutation.

Total number of covariates p satisfies pn = p0

25 ≈ 2.28.

Select a model from lasso path using EBIC, cross-validation andstability selection (Meinshausen & Buhlmann, 2010).

Consistency for GLMs 26 / 36

Page 32: Bayesian Model Choice and Information Criteria in Sparse Generalized Linear Modelsgalton.uchicago.edu/~drton/Stuff/drton_ebic.pdf · 2012. 6. 25. · Chen & Chen ’11Generalized

Positive selection and false discovery rate

Number of samples

Pos

itive

sel

ectio

n ra

te (

PS

R)

100 200 300 400 500 600

0%10

%20

%30

%40

%50

%

●●

●●

●● ●

BIC0

BIC0.25

BIC0.5

BIC1

Cross−validationStability selection

Number of samples

Fals

e di

scov

ery

rate

(F

DR

)

100 200 300 400 500 6000%

20%

40%

60%

80%

● ● ● ● ● ●● ● ● ● ● ●

BIC0

BIC0.25

BIC0.5

BIC1

Cross−validationStability selection

Consistency for GLMs 27 / 36

Page 33: Bayesian Model Choice and Information Criteria in Sparse Generalized Linear Modelsgalton.uchicago.edu/~drton/Stuff/drton_ebic.pdf · 2012. 6. 25. · Chen & Chen ’11Generalized

Comparison to full data

P−value of feature in the full regression (sample size 4601)

Sm

ooth

ed p

rob.

of s

elec

tion

(sub

sam

ple

size

600

)

0.0 0.1 0.2 0.3 0.4 0.5 0.6

0.0

0.1

0.2

0.3

0.4

0.5

BIC0

BIC0.25

BIC0.5

BIC1

Cross−validationStability selection

Figure: Smoothed probability of selecting a true feature, as a function of thep-value of that feature in the full regression.

Consistency for GLMs 28 / 36

Page 34: Bayesian Model Choice and Information Criteria in Sparse Generalized Linear Modelsgalton.uchicago.edu/~drton/Stuff/drton_ebic.pdf · 2012. 6. 25. · Chen & Chen ’11Generalized

Outline

1 BIC and extensions

2 Asymptotics for marginal likelihood of GLMs

3 Consistency for GLMs

4 Ising models

Ising models 29 / 36

Page 35: Bayesian Model Choice and Information Criteria in Sparse Generalized Linear Modelsgalton.uchicago.edu/~drton/Stuff/drton_ebic.pdf · 2012. 6. 25. · Chen & Chen ’11Generalized

Ising model

Observe i.i.d. X (1), . . . ,X (n) ∈ {0, 1}p

Likelihood function:

1

Z (Θ)· exp

{∑jΘj0xj +

∑j<kΘjkxjxk

}↑ ↑

normalizing (sparse)

const. potential matrix

Full conditional for Xj is proportional to

exp{

xj ·(

Θj0 +∑

k 6=jΘjkxk

)}Model selection problem:

Find support E∗ (the ‘graph’) of true potential matrix Θ∗

Ising models 30 / 36

Page 36: Bayesian Model Choice and Information Criteria in Sparse Generalized Linear Modelsgalton.uchicago.edu/~drton/Stuff/drton_ebic.pdf · 2012. 6. 25. · Chen & Chen ’11Generalized

Neighborhood selection for sparse Ising models

For each Xj , select its neighborhood via the Lasso:

Θ(λ)j• = arg max

[`Xj |X−j

(Θj•

)+ λ ·

∑k 6=j

|Θjk |]

(Meinshausen & Buhlmann, 2006; Ravikumar et al., 2010)

How to choose λ, i.e., neighborhoods from each path?

Cross-validation tends to select too large neighborhoods.

Apply EBIC:

� Let Ej,λ be the edges incident to j in support of Θ(λ)j• .

� Maximize

`Xj |X−j

(λ)j•

)− |Ej,λ|

2log(n) − |Ej,λ| · γ log(p)

with respect to λ.

Ising models 31 / 36

Page 37: Bayesian Model Choice and Information Criteria in Sparse Generalized Linear Modelsgalton.uchicago.edu/~drton/Stuff/drton_ebic.pdf · 2012. 6. 25. · Chen & Chen ’11Generalized

Consisteny of EBIC for Ising model selection

Theorem

Consider subexponential growth of p = pn with

κ = limn→∞

log pn

log n∈ [0,∞].

Assume

all neighborhood sizes bounded by a constant,√log(np)

n � |Θ∗jk | ≤ a constant, for all edges (j , k).

Take γ > 2− 12κ . Then with probability tending to 1 as n→∞:

EBICγ selects the right neighborhood for every Xj .

Follows from consistency of EBIC for GLMs with random covariates.

Ising models 32 / 36

Page 38: Bayesian Model Choice and Information Criteria in Sparse Generalized Linear Modelsgalton.uchicago.edu/~drton/Stuff/drton_ebic.pdf · 2012. 6. 25. · Chen & Chen ’11Generalized

Precipitation data (U.S. Historical Climatology Network)

89 weather stations

measure precipitation (1 or 0) on 278 (nonconsecutive) dates

discard locations of the weather stations can we recover the geographical layout?

●●

●●

● ●

●●

●●

● ●

●●

●●

−96 −94 −92 −90 −88 −86

3638

4042

Longitude

Latit

ude

Ising models 33 / 36

Page 39: Bayesian Model Choice and Information Criteria in Sparse Generalized Linear Modelsgalton.uchicago.edu/~drton/Stuff/drton_ebic.pdf · 2012. 6. 25. · Chen & Chen ’11Generalized

●●

●●

● ●

●●

●●

● ●

●●

● ●

BIC

●●

●●

● ●

●●

●●

● ●

●●

● ●

●●

●●

● ●

●●

●●

● ●

●●

● ●

Extended BIC

●●

●●

● ●

●●

●●

● ●

●●

● ●

●●

●●

● ●

●●

●●

● ●

●●

● ●

Cross−validation

●●

●●

● ●

●●

●●

● ●

●●

● ●

●●

●●

● ●

●●

●●

● ●

●●

● ●

Stability selection

●●

●●

● ●

●●

●●

● ●

●●

● ●

γ = 0.25

Ising models 34 / 36

Page 40: Bayesian Model Choice and Information Criteria in Sparse Generalized Linear Modelsgalton.uchicago.edu/~drton/Stuff/drton_ebic.pdf · 2012. 6. 25. · Chen & Chen ’11Generalized

Edge selection vs distance

Distance between weather stations (miles)

Sm

ooth

ed p

roba

bilit

y of

sel

ectin

g ed

ge

0 100 200 300 400 500 600

0.0

0.2

0.4

0.6

0.8

1.0

BICextended BICcross−validationstability selection

Ising models 35 / 36

Page 41: Bayesian Model Choice and Information Criteria in Sparse Generalized Linear Modelsgalton.uchicago.edu/~drton/Stuff/drton_ebic.pdf · 2012. 6. 25. · Chen & Chen ’11Generalized

Conclusion

Laplace approximation can be accurate uniformly over large numberof sparse GLMs

Chen & Chen’s extended Bayesian information criterion (EBIC):

� connected to Bayesian model choice;� its consistency proves consistency of ‘generic’ Bayesian procedures;� computationally inexpensive alternative to stability selection and other

resampling methods;� seems useful for tuning regularization methods.

For details including references, see:

Bayesian model choice and information criteria in sparse generalizedlinear models (with Rina Foygel). arXiv:1112.5635

Ising models 36 / 36