regularized poisson and logistic methods for spatial point … · 2017-06-08 · 19th workshop on...

19
Regularized Poisson and logistic methods for spatial point processes intensity estimation with a diverging number of covariates Achmad Choiruddin 1 *Work with: Jean-fran¸ cois Coeurjolly 1,2 and Fr´ ed´ erique Letu´ e 1 1 Laboratory Jean Kuntzmann, Universit´ e Grenoble Alpes, France 2 Department of Mathematics, Universit´ e du Qu´ ebec ` a Montr´ eal, Canada 19th Workshop on Stochastic Geometry, Stereology and Image Analysis (SGSIA), Luminy, France May 2017 1/16 Achmad Choiruddin Feature selection for spatial point processes

Upload: others

Post on 12-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Regularized Poisson and logistic methods for spatial point … · 2017-06-08 · 19th Workshop on Stochastic Geometry, Stereology and Image Analysis (SGSIA), Luminy, France May 2017

Regularized Poisson and logistic methodsfor spatial point processes intensity estimation

with a diverging number of covariates

Achmad Choiruddin1

*Work with: Jean-francois Coeurjolly1,2 and Frederique Letue1

1 Laboratory Jean Kuntzmann, Universite Grenoble Alpes, France2 Department of Mathematics, Universite du Quebec a Montreal, Canada

19th Workshop on Stochastic Geometry, Stereology and ImageAnalysis (SGSIA), Luminy, France

May 2017

1/16 Achmad Choiruddin Feature selection for spatial point processes

Page 2: Regularized Poisson and logistic methods for spatial point … · 2017-06-08 · 19th Workshop on Stochastic Geometry, Stereology and Image Analysis (SGSIA), Luminy, France May 2017

Contents

Introduction

MotivationSpatial point processes intensity estimation

Proposed methods

MethodologyAsymptotic resultsApplication to forestry dataset

Conclusion and possible extensions

2/16 Achmad Choiruddin Feature selection for spatial point processes

Page 3: Regularized Poisson and logistic methods for spatial point … · 2017-06-08 · 19th Workshop on Stochastic Geometry, Stereology and Image Analysis (SGSIA), Luminy, France May 2017

IntroductionMotivation: Barro Colorado Island (BCI) study

3604 locations of Beilschmiediapendula Lauraceae (BPL) tree

Intensity: function of covariates

ρ(u;β) = exp(β>z(u)), β ∈ Rp,z(u) = {z1(u), · · · , zp(u)}> arespatial covariates at u

p = moderate!

In reality...

3/16 Achmad Choiruddin Feature selection for spatial point processes

Page 4: Regularized Poisson and logistic methods for spatial point … · 2017-06-08 · 19th Workshop on Stochastic Geometry, Stereology and Image Analysis (SGSIA), Luminy, France May 2017

IntroductionMotivation: Barro Colorado Island (BCI) study

What if ...

Parsimonius model

Improve the prediction

Computationally easy and fast to implement

4/16 Achmad Choiruddin Feature selection for spatial point processes

Page 5: Regularized Poisson and logistic methods for spatial point … · 2017-06-08 · 19th Workshop on Stochastic Geometry, Stereology and Image Analysis (SGSIA), Luminy, France May 2017

Spatial point processesNotations and existing methods

X is a spatial point process, D is the observed domain, |D| isthe volume of observed domain

The Poisson log-likelihood

`(β) =∑

u∈X∩Dβ>z(u)−

∫Dexp(β>z(u))du.

Estimate β by solving `(1)(β) = 0

Estimating equation-based methods for more general pointprocesses

5/16 Achmad Choiruddin Feature selection for spatial point processes

Page 6: Regularized Poisson and logistic methods for spatial point … · 2017-06-08 · 19th Workshop on Stochastic Geometry, Stereology and Image Analysis (SGSIA), Luminy, France May 2017

Spatial point processesNotations and existing methods

X is a spatial point process, D is the observed domain, |D| isthe volume of observed domain

The weighted Poisson log-likelihood (Guan and Shen, 2010)

`(w,β) =∑

u∈X∩Dw(u)β>z(u)−

∫Dw(u) exp(β>z(u))du.

Estimate β by solving `(1)(w,β) = 0

Estimating equation-based methods for more general pointprocesses

5/16 Achmad Choiruddin Feature selection for spatial point processes

Page 7: Regularized Poisson and logistic methods for spatial point … · 2017-06-08 · 19th Workshop on Stochastic Geometry, Stereology and Image Analysis (SGSIA), Luminy, France May 2017

Spatial point processesNotations and existing methods

X is a spatial point process, D is the observed domain, |D| isthe volume of observed domain

The weighted logistic regression log-likelihood (Baddeley etal., 2014; Choiruddin et al., 2017)

`(w;β) =∑

u∈X∩Dw(u) log

(ρ(β)

g(u) + ρ(β)

)−∫Dw(u)g(u) log

(ρ(β) + g(u)

g(u)

)du.

Estimate β by solving `(1)(w,β) = 0

Estimating equation-based methods for more general pointprocesses

5/16 Achmad Choiruddin Feature selection for spatial point processes

Page 8: Regularized Poisson and logistic methods for spatial point … · 2017-06-08 · 19th Workshop on Stochastic Geometry, Stereology and Image Analysis (SGSIA), Luminy, France May 2017

Proposed methodsMethodology

Regularized likelihood :

Q(w,β) = `(w,β)− |D|p∑j=1

pλj (|βj |)

where :

`(w,β) is either the Poisson or logistic log-likelihoods

pλ(θ) is a penalty function parameterized by λ ≥ 0

Computation: spatstat + glmnet (and ncvreg)

6/16 Achmad Choiruddin Feature selection for spatial point processes

Page 9: Regularized Poisson and logistic methods for spatial point … · 2017-06-08 · 19th Workshop on Stochastic Geometry, Stereology and Image Analysis (SGSIA), Luminy, France May 2017

Proposed methodsRegularization methods

Method∑pj=1 pλj (|βj |)

Ridge∑pj=1

12λβ2

j

Lasso∑pj=1 λ|βj |

Enet∑pj=1 λ{α|βj |+

12(1− α)β2

j }

SCAD∑pj=1 pλ(|βj |), with pλ(θ) =

λθ if (θ ≤ λ)γλθ− 1

2(θ2+λ2)

γ−1if (λ < θ < γλ)

λ2(γ2−1)2(γ−1)

if (θ ≥ γλ)

MC+∑pj=1

{(λ|βj | −

β2j

)I(|βj | < γλ) + 1

2γλ2I(|βj | ≥ γλ)

}Adaptive Lasso

∑pj=1 λj |βj |

Adaptive enet∑pj=1 λj{α|βj |+

12(1− α)β2

j }

7/16 Achmad Choiruddin Feature selection for spatial point processes

Page 10: Regularized Poisson and logistic methods for spatial point … · 2017-06-08 · 19th Workshop on Stochastic Geometry, Stereology and Image Analysis (SGSIA), Luminy, France May 2017

Asymptotic theoryConditions

The pn-dimensional vector of true coefficient values:

β0 = {β01, . . . , β0pn}> = {β01, . . . , β0s, β0(s+1), . . . , β0pn}

>

= {β>01,β>02}> = {β>01,0

>}>

X observed over D = Dn, n = 1, 2, . . . which expands to Rd as n→∞.

We allow pn →∞ as n→∞ and we have a fixed s.

We assume that p3n/|Dn| → 0 as n→∞.

We define

an = maxj=1,...,s

{p′λn,j(|β0j |)},

bn = infj=s+1,...,pn

inf|θ|≤εnθ 6=0

p′λn,j(|θ|), for εn = K

√pn/|Dn|, and

cn = maxj=1,...,s

{p′′λn,j(|β0j |)}.

8/16 Achmad Choiruddin Feature selection for spatial point processes

Page 11: Regularized Poisson and logistic methods for spatial point … · 2017-06-08 · 19th Workshop on Stochastic Geometry, Stereology and Image Analysis (SGSIA), Luminy, France May 2017

Asymptotic theoryMain results

Theorem 1

Under some regularity conditions, if an = O(|Dn|−1/2) and cn → 0, thereexists a local maximizer β of Q(w,β) such that‖β − β0‖ = OP

(√pn(|Dn|−1/2 + an)

).

Theorem 2

Under some regularity conditions, If an|Dn|1/2 → 0, bn√|Dn|/p2n →∞ and√

pncn → 0 as n→∞, the root-(|Dn|/pn) consistent local maximizers

β = (β>1 , β

>2 )> in Theorem 1 satisfy:

(i) Sparsity: P(β2 = 0)→ 1 as n→∞,

(ii) Asymptotic Normality: |Dn|1/2Σn(w,β0)−1/2(β1 − β01)

d−→ N (0, Is),where Σn(w,β0) = |Dn|{An,11(w,β0) + Πn}−1{Bn,11(w,β0) +Cn,11(w,β0)}{An,11(w,β0) + Πn}−1

9/16 Achmad Choiruddin Feature selection for spatial point processes

Page 12: Regularized Poisson and logistic methods for spatial point … · 2017-06-08 · 19th Workshop on Stochastic Geometry, Stereology and Image Analysis (SGSIA), Luminy, France May 2017

Asymptotic theoryDiscussion

Note! Need to obey |Dn|1/2an → 0 and√|Dn|/p2nbn →∞ as

n→∞ to satisfy the Theorem 2

Method an bn Satisfy?

Ridge λn maxj=1,...s

{|β0j |} 0 No

Lasso λn λn No

Enet λn [(1− α) maxj=1,...s

{|β0j |}+ α] λnα No

ALasso maxj=1,...s

{λn,j} infj=s+1,...p

{λn,j} Yes

Aenet maxj=1,...s

{λn,j((1− α)|β0j |+ α

)} α inf

j=s+1,...p{λn,j} Yes

SCAD 0* λn** Yes

MC+ 0* λn − K√pn

γ√|Dn|

** Yes

* if λn → 0 for n sufficient large** if

√|Dn|/p2nλn →∞ for n sufficient large

10/16 Achmad Choiruddin Feature selection for spatial point processes

Page 13: Regularized Poisson and logistic methods for spatial point … · 2017-06-08 · 19th Workshop on Stochastic Geometry, Stereology and Image Analysis (SGSIA), Luminy, France May 2017

ApplicationBarro Colorado Island (BCI) study

Estimate BPL intensity: log ρ(u;β) = β1z1(u) + · · ·+ β93z93(u)

93 covariates: 2 topology, 13 soil nutrients, 78 interactions

Regularized (un)weighted PL with LASSO, AL and SCAD

MethodUnweighted Weighted

#Selected #No #Selected #No

LASSO 77 16 45 48

AL 50 43 10 83

SCAD 58 35 3 90

11/16 Achmad Choiruddin Feature selection for spatial point processes

Page 14: Regularized Poisson and logistic methods for spatial point … · 2017-06-08 · 19th Workshop on Stochastic Geometry, Stereology and Image Analysis (SGSIA), Luminy, France May 2017

Application10 common selected covariates

CovariatesUnweighted Weighted

LASSO ALASSO SCAD LASSO ALASSO SCAD

Elev 0.32 0.40 0.33 0.40 0.32 0

Slope 0.39 0.40 0.36 0.42 0.44 0

Cu 0.56 0.31 0.61 0.39 0.33 0

Mn 0.14 0.14 0.09 0.15 0.22 0

P -0.48 -0.43 -0.54 -0.33 -0.57 -1.07

Zn -0.75 -0.66 -0.83 -0.58 -0.40 0

Al:P -0.30 -0.29 -0.31 -0.28 -0.16 0

Mg:P 0.62 0.29 0.45 0.48 0.42 0

Zn:N 0.21 0.35 0.30 0 0 0.62

N.Min:pH 0.44 0.44 0.49 0.25 0.27 0

12/16 Achmad Choiruddin Feature selection for spatial point processes

Page 15: Regularized Poisson and logistic methods for spatial point … · 2017-06-08 · 19th Workshop on Stochastic Geometry, Stereology and Image Analysis (SGSIA), Luminy, France May 2017

Application10 common selected covariates

CovariatesUnweighted Weighted

LASSO ALASSO SCAD LASSO ALASSO SCAD

Elev 0.32 0.40 0.33 0.40 0.32 0

Slope 0.39 0.40 0.36 0.42 0.44 0

Cu 0.56 0.31 0.61 0.39 0.33 0

Mn 0.14 0.14 0.09 0.15 0.22 0

P -0.48 -0.43 -0.54 -0.33 -0.57 -1.07

Zn -0.75 -0.66 -0.83 -0.58 -0.40 0

Al:P -0.30 -0.29 -0.31 -0.28 -0.16 0

Mg:P 0.62 0.29 0.45 0.48 0.42 0

Zn:N 0.21 0.35 0.30 0 0 0.62

N.Min:pH 0.44 0.44 0.49 0.25 0.27 0

12/16 Achmad Choiruddin Feature selection for spatial point processes

Page 16: Regularized Poisson and logistic methods for spatial point … · 2017-06-08 · 19th Workshop on Stochastic Geometry, Stereology and Image Analysis (SGSIA), Luminy, France May 2017

ApplicationEstimates of BPL intensity (log scale)

3604 locations of BPL

LASSO WLASSO

AL WAL

SCAD WSCAD

13/16 Achmad Choiruddin Feature selection for spatial point processes

Page 17: Regularized Poisson and logistic methods for spatial point … · 2017-06-08 · 19th Workshop on Stochastic Geometry, Stereology and Image Analysis (SGSIA), Luminy, France May 2017

Conclusion and possible extensionsConclusion

Regularized versions of EE derived from Poisson and logisticregression likelihoods

Large classes of spatial point processes and many penaltyfunctions

Nice theoretical properties and computationally preferable

Satisfy our Theorems: SCAD, MC+, ALasso, Aenet

Regularized WPL may be more appropriate for a clusteredprocess

Adaptive Lasso may perform best

14/16 Achmad Choiruddin Feature selection for spatial point processes

Page 18: Regularized Poisson and logistic methods for spatial point … · 2017-06-08 · 19th Workshop on Stochastic Geometry, Stereology and Image Analysis (SGSIA), Luminy, France May 2017

Conclusion and possible extensionsPossible extensions

Topology

Soil Nutrients

Other trees

Deal with very large datasets → some troubles

The Dantzig selector vs. penalization methods

Multivariate, spatio-temporal point processes

15/16 Achmad Choiruddin Feature selection for spatial point processes

Page 19: Regularized Poisson and logistic methods for spatial point … · 2017-06-08 · 19th Workshop on Stochastic Geometry, Stereology and Image Analysis (SGSIA), Luminy, France May 2017

Some References

Baddeley, A; Rubak, E; and Turner, R. (2015). Spatial point patterns:methodology and application with R, CRC press.

Baddeley, A; Coeurjolly, J.-F; Rubak, E; and Waagepetersen , R. (2014).Logistic regression for spatial Gibbs point processes, Biometrika,101(2):377–392.

Choiruddin, A; Coeurjolly, J-F; and Letue, F. (2017). Convex andnon-convex regularization methods for spatial point processes intensityestimation, ArXiv preprint arXiv:1703.02462.

Friedman, J; Hastie, T; Hofling H; Tibshirani, R; et al. (2007). Pathwisecoordinate optimization. The Annals of Applied Statistics, 1(2):302–332.

Guan, Y and Shen Y. (2010). A weighted estimating equation approachfor inhomogeneous spatial point process, Biometrika, 97(4), 867-880.

Thurman, A. L; Fu, R; Guan, Y; and Zhu, J. (2015). Regularizedestimating equations for model selection of clustered spatial point process,Statistica Sinica,25(1), 173-188.

16/16 Achmad Choiruddin Feature selection for spatial point processes