exponentially weighted aggregation laplace prior for linear...

30
Introduction: prediction in high dimension Penalization and Lasso Exponentially weighted average Exponentially weighted aggregation Laplace prior for linear regression Arnak Dalalyan, Edwin Grappin & Quentin Paris [email protected] JPS - Les Houches - 2016 Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior

Upload: others

Post on 05-Jun-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Exponentially weighted aggregation Laplace prior for linear …jps.math.cnrs.fr/slides/Grappin.pdf · 2016-04-20 · edwin.grappin@ensae.fr JPS-LesHouches-2016 ArnakDalalyan,EdwinGrappin&QuentinParis

Introduction: prediction in high dimensionPenalization and Lasso

Exponentially weighted average

Exponentially weighted aggregationLaplace prior for linear regression

Arnak Dalalyan, Edwin Grappin & Quentin Paris

[email protected]

JPS - Les Houches - 2016

Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior

Page 2: Exponentially weighted aggregation Laplace prior for linear …jps.math.cnrs.fr/slides/Grappin.pdf · 2016-04-20 · edwin.grappin@ensae.fr JPS-LesHouches-2016 ArnakDalalyan,EdwinGrappin&QuentinParis

Introduction: prediction in high dimensionPenalization and Lasso

Exponentially weighted average

Linear regression: goals & settingsLinear regression: least squares

Goals & settings

We observe n labels (Yi )i∈{1,...,n} and there is a linear relationbetween the label and the p features (Xi ,j)j∈{1,...,p} such that:

Y = Xβ? + ξ,

where Y ∈ Rn, X ∈ Rn×p, β? ∈ Rp and ξ ∈ Rn a random variablesuch that ξi is N (0, σ2).

Our interests are:

Low prediction loss: ‖X (β? − β̂)‖22 (fitting β? is lessimportant),Good quality when p is large (p >> n),Efficient use of sparsity property of β? (β? is s-sparse if atmost s elements are non null).

Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior

Page 3: Exponentially weighted aggregation Laplace prior for linear …jps.math.cnrs.fr/slides/Grappin.pdf · 2016-04-20 · edwin.grappin@ensae.fr JPS-LesHouches-2016 ArnakDalalyan,EdwinGrappin&QuentinParis

Introduction: prediction in high dimensionPenalization and Lasso

Exponentially weighted average

Linear regression: goals & settingsLinear regression: least squares

Goals & settings

We observe n labels (Yi )i∈{1,...,n} and there is a linear relationbetween the label and the p features (Xi ,j)j∈{1,...,p} such that:

Y = Xβ? + ξ,

where Y ∈ Rn, X ∈ Rn×p, β? ∈ Rp and ξ ∈ Rn a random variablesuch that ξi is N (0, σ2).

Our interests are:

Low prediction loss: ‖X (β? − β̂)‖22 (fitting β? is lessimportant),Good quality when p is large (p >> n),Efficient use of sparsity property of β? (β? is s-sparse if atmost s elements are non null).

Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior

Page 4: Exponentially weighted aggregation Laplace prior for linear …jps.math.cnrs.fr/slides/Grappin.pdf · 2016-04-20 · edwin.grappin@ensae.fr JPS-LesHouches-2016 ArnakDalalyan,EdwinGrappin&QuentinParis

Introduction: prediction in high dimensionPenalization and Lasso

Exponentially weighted average

Linear regression: goals & settingsLinear regression: least squares

Least squares method

Ordinary least squares (OLS) estimator is defined by:

β̂OLS = arg minβ∈Rp

‖Y − Xβ‖22.

OLS minimizes the sum of the squares of the residuals.

Overfitting. If p is very large, OLS haspoor prediction results:

There is not a unique solution whenp > n,Does not detect meaningfulfeatures among all features,Performance is focus on fitting thedata not predicting labels.

Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior

Page 5: Exponentially weighted aggregation Laplace prior for linear …jps.math.cnrs.fr/slides/Grappin.pdf · 2016-04-20 · edwin.grappin@ensae.fr JPS-LesHouches-2016 ArnakDalalyan,EdwinGrappin&QuentinParis

Introduction: prediction in high dimensionPenalization and Lasso

Exponentially weighted average

Linear regression: goals & settingsLinear regression: least squares

Least squares method

Ordinary least squares (OLS) estimator is defined by:

β̂OLS = arg minβ∈Rp

‖Y − Xβ‖22.

OLS minimizes the sum of the squares of the residuals.

Overfitting. If p is very large, OLS haspoor prediction results:

There is not a unique solution whenp > n,Does not detect meaningfulfeatures among all features,Performance is focus on fitting thedata not predicting labels.

Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior

Page 6: Exponentially weighted aggregation Laplace prior for linear …jps.math.cnrs.fr/slides/Grappin.pdf · 2016-04-20 · edwin.grappin@ensae.fr JPS-LesHouches-2016 ArnakDalalyan,EdwinGrappin&QuentinParis

Introduction: prediction in high dimensionPenalization and Lasso

Exponentially weighted average

Linear regression: goals & settingsLinear regression: least squares

Least squares method

Ordinary least squares (OLS) estimator is defined by:

β̂OLS = arg minβ∈Rp

‖Y − Xβ‖22.

OLS minimizes the sum of the squares of the residuals.

Overfitting. If p is very large, OLS haspoor prediction results:

There is not a unique solution whenp > n,

Does not detect meaningfulfeatures among all features,Performance is focus on fitting thedata not predicting labels.

Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior

Page 7: Exponentially weighted aggregation Laplace prior for linear …jps.math.cnrs.fr/slides/Grappin.pdf · 2016-04-20 · edwin.grappin@ensae.fr JPS-LesHouches-2016 ArnakDalalyan,EdwinGrappin&QuentinParis

Introduction: prediction in high dimensionPenalization and Lasso

Exponentially weighted average

Linear regression: goals & settingsLinear regression: least squares

Least squares method

Ordinary least squares (OLS) estimator is defined by:

β̂OLS = arg minβ∈Rp

‖Y − Xβ‖22.

OLS minimizes the sum of the squares of the residuals.

Overfitting. If p is very large, OLS haspoor prediction results:

There is not a unique solution whenp > n,Does not detect meaningfulfeatures among all features,

Performance is focus on fitting thedata not predicting labels.

Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior

Page 8: Exponentially weighted aggregation Laplace prior for linear …jps.math.cnrs.fr/slides/Grappin.pdf · 2016-04-20 · edwin.grappin@ensae.fr JPS-LesHouches-2016 ArnakDalalyan,EdwinGrappin&QuentinParis

Introduction: prediction in high dimensionPenalization and Lasso

Exponentially weighted average

Linear regression: goals & settingsLinear regression: least squares

Least squares method

Ordinary least squares (OLS) estimator is defined by:

β̂OLS = arg minβ∈Rp

‖Y − Xβ‖22.

OLS minimizes the sum of the squares of the residuals.

Overfitting. If p is very large, OLS haspoor prediction results:

There is not a unique solution whenp > n,Does not detect meaningfulfeatures among all features,Performance is focus on fitting thedata not predicting labels.

Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior

Page 9: Exponentially weighted aggregation Laplace prior for linear …jps.math.cnrs.fr/slides/Grappin.pdf · 2016-04-20 · edwin.grappin@ensae.fr JPS-LesHouches-2016 ArnakDalalyan,EdwinGrappin&QuentinParis

Introduction: prediction in high dimensionPenalization and Lasso

Exponentially weighted average

Penalized regression`0 penalizationA trade off between sparsity and convexityLasso and oracle inequality

Penalized regression

In our case, a good estimator has the following properties:Guarantees on prediction results,Use sparsity assumption to manage p > n,Computationnaly fast (of paramount importance when p islarge).

Penalized regression is a method that combines the usual fittingterm with a penalty term :

β̂pen = arg minβ∈Rp

(‖Y − Xβ‖22 + λP(β)

),

P is the penalty function and λ ≥ 0 controls the trade off betweenthe two terms.

Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior

Page 10: Exponentially weighted aggregation Laplace prior for linear …jps.math.cnrs.fr/slides/Grappin.pdf · 2016-04-20 · edwin.grappin@ensae.fr JPS-LesHouches-2016 ArnakDalalyan,EdwinGrappin&QuentinParis

Introduction: prediction in high dimensionPenalization and Lasso

Exponentially weighted average

Penalized regression`0 penalizationA trade off between sparsity and convexityLasso and oracle inequality

Penalized regression

In our case, a good estimator has the following properties:Guarantees on prediction results,Use sparsity assumption to manage p > n,Computationnaly fast (of paramount importance when p islarge).

Penalized regression is a method that combines the usual fittingterm with a penalty term :

β̂pen = arg minβ∈Rp

(‖Y − Xβ‖22 + λP(β)

),

P is the penalty function and λ ≥ 0 controls the trade off betweenthe two terms.

Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior

Page 11: Exponentially weighted aggregation Laplace prior for linear …jps.math.cnrs.fr/slides/Grappin.pdf · 2016-04-20 · edwin.grappin@ensae.fr JPS-LesHouches-2016 ArnakDalalyan,EdwinGrappin&QuentinParis

Introduction: prediction in high dimensionPenalization and Lasso

Exponentially weighted average

Penalized regression`0 penalizationA trade off between sparsity and convexityLasso and oracle inequality

Subset selection with a `0 penalization

An intuitive candidate would be a penalization based on `0pseudo-norm (the sparsity level):

‖β‖0 =

p∑i=1

1βi 6=0.

β̂`0 = arg minβ∈Rp

(‖Y − Xβ‖22 + λ‖β‖0

).

The penalty forces many elements of β̂ to be null. It chooses themost important features.Due to the `0 pseudo-norm, the objective function is nonconvex.Hence, computational time grows exponentially with p.

Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior

Page 12: Exponentially weighted aggregation Laplace prior for linear …jps.math.cnrs.fr/slides/Grappin.pdf · 2016-04-20 · edwin.grappin@ensae.fr JPS-LesHouches-2016 ArnakDalalyan,EdwinGrappin&QuentinParis

Introduction: prediction in high dimensionPenalization and Lasso

Exponentially weighted average

Penalized regression`0 penalizationA trade off between sparsity and convexityLasso and oracle inequality

Subset selection with a `0 penalization

An intuitive candidate would be a penalization based on `0pseudo-norm (the sparsity level):

‖β‖0 =

p∑i=1

1βi 6=0.

β̂`0 = arg minβ∈Rp

(‖Y − Xβ‖22 + λ‖β‖0

).

The penalty forces many elements of β̂ to be null. It chooses themost important features.

Due to the `0 pseudo-norm, the objective function is nonconvex.Hence, computational time grows exponentially with p.

Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior

Page 13: Exponentially weighted aggregation Laplace prior for linear …jps.math.cnrs.fr/slides/Grappin.pdf · 2016-04-20 · edwin.grappin@ensae.fr JPS-LesHouches-2016 ArnakDalalyan,EdwinGrappin&QuentinParis

Introduction: prediction in high dimensionPenalization and Lasso

Exponentially weighted average

Penalized regression`0 penalizationA trade off between sparsity and convexityLasso and oracle inequality

Subset selection with a `0 penalization

An intuitive candidate would be a penalization based on `0pseudo-norm (the sparsity level):

‖β‖0 =

p∑i=1

1βi 6=0.

β̂`0 = arg minβ∈Rp

(‖Y − Xβ‖22 + λ‖β‖0

).

The penalty forces many elements of β̂ to be null. It chooses themost important features.Due to the `0 pseudo-norm, the objective function is nonconvex.Hence, computational time grows exponentially with p.

Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior

Page 14: Exponentially weighted aggregation Laplace prior for linear …jps.math.cnrs.fr/slides/Grappin.pdf · 2016-04-20 · edwin.grappin@ensae.fr JPS-LesHouches-2016 ArnakDalalyan,EdwinGrappin&QuentinParis

Introduction: prediction in high dimensionPenalization and Lasso

Exponentially weighted average

Penalized regression`0 penalizationA trade off between sparsity and convexityLasso and oracle inequality

Choice of the penalization term

Let q > 0, we consider the estimators

β̂q = arg minβ∈Rp

(‖Y − Xβ‖22 + λ‖β‖qq

).

If q <1, the solution is sparsebut the problem is nonconvex.

Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior

Page 15: Exponentially weighted aggregation Laplace prior for linear …jps.math.cnrs.fr/slides/Grappin.pdf · 2016-04-20 · edwin.grappin@ensae.fr JPS-LesHouches-2016 ArnakDalalyan,EdwinGrappin&QuentinParis

Introduction: prediction in high dimensionPenalization and Lasso

Exponentially weighted average

Penalized regression`0 penalizationA trade off between sparsity and convexityLasso and oracle inequality

Choice of the penalization term

Let q > 0, we consider the estimators

β̂q = arg minβ∈Rp

(‖Y − Xβ‖22 + λ‖β‖qq

).

If q > 1, the problem is convexbut the solution is not sparse.

Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior

Page 16: Exponentially weighted aggregation Laplace prior for linear …jps.math.cnrs.fr/slides/Grappin.pdf · 2016-04-20 · edwin.grappin@ensae.fr JPS-LesHouches-2016 ArnakDalalyan,EdwinGrappin&QuentinParis

Introduction: prediction in high dimensionPenalization and Lasso

Exponentially weighted average

Penalized regression`0 penalizationA trade off between sparsity and convexityLasso and oracle inequality

Choice of the penalization term

Let q > 0, we consider the estimators

β̂q = arg minβ∈Rp

(‖Y − Xβ‖22 + λ‖β‖qq

).

If q = 1, the solution is sparseand the problem is convex.

Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior

Page 17: Exponentially weighted aggregation Laplace prior for linear …jps.math.cnrs.fr/slides/Grappin.pdf · 2016-04-20 · edwin.grappin@ensae.fr JPS-LesHouches-2016 ArnakDalalyan,EdwinGrappin&QuentinParis

Introduction: prediction in high dimensionPenalization and Lasso

Exponentially weighted average

Penalized regression`0 penalizationA trade off between sparsity and convexityLasso and oracle inequality

Lasso, the `1 norm

The Lasso estimator is defined by:

β̂L = arg minβ∈Rp

(‖Y − Xβ‖222n

+ λ‖β‖1).

TheoremDalalyan & al. (2014). On the Prediction Performance of the Lasso

Let λ = 2σ√

2 log(p/δ)n . Then, with probability at least 1− δ,

‖X (β? − β̂L)‖22n

≤ infβ∈Rp

s−sparse

(‖X (β? − β)‖22n

+10 s σ2 log(p/δ)

n κ

),

where κ is a constant depending on the design of X .

Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior

Page 18: Exponentially weighted aggregation Laplace prior for linear …jps.math.cnrs.fr/slides/Grappin.pdf · 2016-04-20 · edwin.grappin@ensae.fr JPS-LesHouches-2016 ArnakDalalyan,EdwinGrappin&QuentinParis

Introduction: prediction in high dimensionPenalization and Lasso

Exponentially weighted average

Penalized regression`0 penalizationA trade off between sparsity and convexityLasso and oracle inequality

Lasso, the `1 norm

The Lasso estimator is defined by:

β̂L = arg minβ∈Rp

(‖Y − Xβ‖222n

+ λ‖β‖1).

TheoremDalalyan & al. (2014). On the Prediction Performance of the Lasso

Let λ = 2σ√

2 log(p/δ)n . Then, with probability at least 1− δ,

‖X (β? − β̂L)‖22n

≤ infβ∈Rp

s−sparse

(‖X (β? − β)‖22n

+10 s σ2 log(p/δ)

n κ

),

where κ is a constant depending on the design of X .

Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior

Page 19: Exponentially weighted aggregation Laplace prior for linear …jps.math.cnrs.fr/slides/Grappin.pdf · 2016-04-20 · edwin.grappin@ensae.fr JPS-LesHouches-2016 ArnakDalalyan,EdwinGrappin&QuentinParis

Introduction: prediction in high dimensionPenalization and Lasso

Exponentially weighted average

EWA: definitionOracle inequalityThe choice of TConclusion & questions

EWA: definition

Lasso estimator is a maximum a posteriori estimator with Laplaceprior :

β̂L = arg minβ∈Rp

(‖Y − Xβ‖222n

+ λ‖β‖1)

= arg maxβ∈Rp

[exp(− 1

2‖Y − Xβ‖22

σ2

)︸ ︷︷ ︸

∝N (Xβ,σ2In)

exp(− λnσ2 ‖β‖1

)︸ ︷︷ ︸∝π0(β):Laplace prior

]

Let, V (β) = 12σ2 ‖Y − Xβ‖22 + λn

σ2 ‖β‖1, andπ̂T (β) ∝ exp

{−V (β)

T

}. We define the exponentially weighted

average (EWA) estimator with Laplace prior by

β̂EWA =

∫Rpβπ̂T (β)dβ.

Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior

Page 20: Exponentially weighted aggregation Laplace prior for linear …jps.math.cnrs.fr/slides/Grappin.pdf · 2016-04-20 · edwin.grappin@ensae.fr JPS-LesHouches-2016 ArnakDalalyan,EdwinGrappin&QuentinParis

Introduction: prediction in high dimensionPenalization and Lasso

Exponentially weighted average

EWA: definitionOracle inequalityThe choice of TConclusion & questions

EWA: definition

Lasso estimator is a maximum a posteriori estimator with Laplaceprior :

β̂L = arg minβ∈Rp

(‖Y − Xβ‖222n

+ λ‖β‖1)

= arg maxβ∈Rp

[exp(− 1

2‖Y − Xβ‖22

σ2

)︸ ︷︷ ︸

∝N (Xβ,σ2In)

exp(− λnσ2 ‖β‖1

)︸ ︷︷ ︸∝π0(β):Laplace prior

]

Let, V (β) = 12σ2 ‖Y − Xβ‖22 + λn

σ2 ‖β‖1, andπ̂T (β) ∝ exp

{−V (β)

T

}.

We define the exponentially weightedaverage (EWA) estimator with Laplace prior by

β̂EWA =

∫Rpβπ̂T (β)dβ.

Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior

Page 21: Exponentially weighted aggregation Laplace prior for linear …jps.math.cnrs.fr/slides/Grappin.pdf · 2016-04-20 · edwin.grappin@ensae.fr JPS-LesHouches-2016 ArnakDalalyan,EdwinGrappin&QuentinParis

Introduction: prediction in high dimensionPenalization and Lasso

Exponentially weighted average

EWA: definitionOracle inequalityThe choice of TConclusion & questions

EWA: definition

Lasso estimator is a maximum a posteriori estimator with Laplaceprior :

β̂L = arg minβ∈Rp

(‖Y − Xβ‖222n

+ λ‖β‖1)

= arg maxβ∈Rp

[exp(− 1

2‖Y − Xβ‖22

σ2

)︸ ︷︷ ︸

∝N (Xβ,σ2In)

exp(− λnσ2 ‖β‖1

)︸ ︷︷ ︸∝π0(β):Laplace prior

]

Let, V (β) = 12σ2 ‖Y − Xβ‖22 + λn

σ2 ‖β‖1, andπ̂T (β) ∝ exp

{−V (β)

T

}. We define the exponentially weighted

average (EWA) estimator with Laplace prior by

β̂EWA =

∫Rpβπ̂T (β)dβ.

Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior

Page 22: Exponentially weighted aggregation Laplace prior for linear …jps.math.cnrs.fr/slides/Grappin.pdf · 2016-04-20 · edwin.grappin@ensae.fr JPS-LesHouches-2016 ArnakDalalyan,EdwinGrappin&QuentinParis

Introduction: prediction in high dimensionPenalization and Lasso

Exponentially weighted average

EWA: definitionOracle inequalityThe choice of TConclusion & questions

Results

Theorem

Let λ = 2σ√

2 log(p/δ)n , then with probability at least 1− δ,

‖X (β? − β̂EWA)‖22n

≤ infβ∈Rp

s−sparse

(‖X (β? − β)‖22n

+10sσ2 log(p/δ)

)+ 2H(T ).

WhereH(T ) = pT −

∫G (β)π̂T (β)dβ + G (β̂EWA),

and G (β) = 1n‖Xβ‖

22 + λ‖β‖1. G is convex, hence H(T ) ≤ pT .

Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior

Page 23: Exponentially weighted aggregation Laplace prior for linear …jps.math.cnrs.fr/slides/Grappin.pdf · 2016-04-20 · edwin.grappin@ensae.fr JPS-LesHouches-2016 ArnakDalalyan,EdwinGrappin&QuentinParis

Introduction: prediction in high dimensionPenalization and Lasso

Exponentially weighted average

EWA: definitionOracle inequalityThe choice of TConclusion & questions

The choice of T

If T = 0, β̂L = β̂EWA.

We are interested in T < 1/p,remember: H(T ) ≤ pT .The larger T is, the larger is thevariance of the posterior.We believe that variance bringsrobustness to the choice of λ.

Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior

Page 24: Exponentially weighted aggregation Laplace prior for linear …jps.math.cnrs.fr/slides/Grappin.pdf · 2016-04-20 · edwin.grappin@ensae.fr JPS-LesHouches-2016 ArnakDalalyan,EdwinGrappin&QuentinParis

Introduction: prediction in high dimensionPenalization and Lasso

Exponentially weighted average

EWA: definitionOracle inequalityThe choice of TConclusion & questions

The choice of T

If T = 0, β̂L = β̂EWA.We are interested in T < 1/p,remember: H(T ) ≤ pT .

The larger T is, the larger is thevariance of the posterior.We believe that variance bringsrobustness to the choice of λ.

Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior

Page 25: Exponentially weighted aggregation Laplace prior for linear …jps.math.cnrs.fr/slides/Grappin.pdf · 2016-04-20 · edwin.grappin@ensae.fr JPS-LesHouches-2016 ArnakDalalyan,EdwinGrappin&QuentinParis

Introduction: prediction in high dimensionPenalization and Lasso

Exponentially weighted average

EWA: definitionOracle inequalityThe choice of TConclusion & questions

The choice of T

If T = 0, β̂L = β̂EWA.We are interested in T < 1/p,remember: H(T ) ≤ pT .The larger T is, the larger is thevariance of the posterior.

We believe that variance bringsrobustness to the choice of λ.

Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior

Page 26: Exponentially weighted aggregation Laplace prior for linear …jps.math.cnrs.fr/slides/Grappin.pdf · 2016-04-20 · edwin.grappin@ensae.fr JPS-LesHouches-2016 ArnakDalalyan,EdwinGrappin&QuentinParis

Introduction: prediction in high dimensionPenalization and Lasso

Exponentially weighted average

EWA: definitionOracle inequalityThe choice of TConclusion & questions

The choice of T

If T = 0, β̂L = β̂EWA.We are interested in T < 1/p,remember: H(T ) ≤ pT .The larger T is, the larger is thevariance of the posterior.We believe that variance bringsrobustness to the choice of λ.

Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior

Page 27: Exponentially weighted aggregation Laplace prior for linear …jps.math.cnrs.fr/slides/Grappin.pdf · 2016-04-20 · edwin.grappin@ensae.fr JPS-LesHouches-2016 ArnakDalalyan,EdwinGrappin&QuentinParis

Introduction: prediction in high dimensionPenalization and Lasso

Exponentially weighted average

EWA: definitionOracle inequalityThe choice of TConclusion & questions

The choice of T

If T = 0, β̂L = β̂EWA.We are interested in T < 1/p,remember: H(T ) ≤ pT .The larger T is, the larger is thevariance of the posterior.We believe that variance bringsrobustness to the choice of λ.

Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior

Page 28: Exponentially weighted aggregation Laplace prior for linear …jps.math.cnrs.fr/slides/Grappin.pdf · 2016-04-20 · edwin.grappin@ensae.fr JPS-LesHouches-2016 ArnakDalalyan,EdwinGrappin&QuentinParis

Introduction: prediction in high dimensionPenalization and Lasso

Exponentially weighted average

EWA: definitionOracle inequalityThe choice of TConclusion & questions

Conclusion & questions

Results:EWA with Laplace prior is a family of estimator that includesthe Lasso.There is a sharp oracle inequality for this family of estimator.

Questions:What is a good value of T?Can we prove a result on the robustness of λ?Can we compute efficiently this estimator?

Thank you!

Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior

Page 29: Exponentially weighted aggregation Laplace prior for linear …jps.math.cnrs.fr/slides/Grappin.pdf · 2016-04-20 · edwin.grappin@ensae.fr JPS-LesHouches-2016 ArnakDalalyan,EdwinGrappin&QuentinParis

Introduction: prediction in high dimensionPenalization and Lasso

Exponentially weighted average

EWA: definitionOracle inequalityThe choice of TConclusion & questions

Conclusion & questions

Results:EWA with Laplace prior is a family of estimator that includesthe Lasso.There is a sharp oracle inequality for this family of estimator.

Questions:What is a good value of T?Can we prove a result on the robustness of λ?Can we compute efficiently this estimator?

Thank you!

Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior

Page 30: Exponentially weighted aggregation Laplace prior for linear …jps.math.cnrs.fr/slides/Grappin.pdf · 2016-04-20 · edwin.grappin@ensae.fr JPS-LesHouches-2016 ArnakDalalyan,EdwinGrappin&QuentinParis

Introduction: prediction in high dimensionPenalization and Lasso

Exponentially weighted average

EWA: definitionOracle inequalityThe choice of TConclusion & questions

Conclusion & questions

Results:EWA with Laplace prior is a family of estimator that includesthe Lasso.There is a sharp oracle inequality for this family of estimator.

Questions:What is a good value of T?Can we prove a result on the robustness of λ?Can we compute efficiently this estimator?

Thank you!

Arnak Dalalyan, Edwin Grappin & Quentin Paris EWA & Laplace prior