automatic estimation of regularization parameters:...

67
Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future AUTOMATIC ESTIMATION OF REGULARIZATION PARAMETERS : INITIAL STEPS Rosemary Renaut http://math.asu.edu/ ˜ rosie QUANTITATIVE SUSCEPTIBILITY MAPPING (QSM) J ULY 27, 2013 Acknowledgements: Cornell MRI Laboratory Yi Wang Tian Liu Pascal Spincemaille Shaui Wang 1 / 68

Upload: nguyenminh

Post on 25-Mar-2018

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Automatic estimation of regularization parameters: …rosie/mypresentations/cornellqsm2.pdfMotivating Example 3D DataContextRegularization Parameter EstimationUsing the Noise PropertiesConclusions

Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion

AUTOMATIC ESTIMATION OF REGULARIZATION

PARAMETERS: INITIAL STEPS

Rosemary Renauthttp://math.asu.edu/˜rosie

QUANTITATIVE SUSCEPTIBILITY MAPPING (QSM)JULY 27, 2013

Acknowledgements: Cornell MRI LaboratoryYi WangTian LiuPascal SpincemailleShaui Wang

1 / 68

Page 2: Automatic estimation of regularization parameters: …rosie/mypresentations/cornellqsm2.pdfMotivating Example 3D DataContextRegularization Parameter EstimationUsing the Noise PropertiesConclusions

Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion

Outline

Motivating Example 3D Data

Context

Regularization Parameter Estimation

Using the Noise Properties

Conclusions and Future

Theoretical Discussion

2 / 68

Page 3: Automatic estimation of regularization parameters: …rosie/mypresentations/cornellqsm2.pdfMotivating Example 3D DataContextRegularization Parameter EstimationUsing the Noise PropertiesConclusions

Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion

Motivating Example for QSM

Neuroimage 59, 2012 (2560-2568): Liu et al

Morphology enabled dipole inversion for quantitativesusceptibility mapping using structural consistency between themagnitude image and the susceptibility map.

Tissue local magnetic field (b) obtained as convolution of dipolekernel (A) with susceptibility (x):

b ≈ Ax

Least squares or image based formulation: solve for x

‖Wb(Ax− b)‖2 +1

σ2‖WG(∇x)‖2

Wb weighting matrix for the noise on the data bWG weighting matrix for gradient ∇x dependent on noise level.σ is the unknown regularization parameter 3 / 68

Page 4: Automatic estimation of regularization parameters: …rosie/mypresentations/cornellqsm2.pdfMotivating Example 3D DataContextRegularization Parameter EstimationUsing the Noise PropertiesConclusions

Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion

Motivating Example for QSM

Neuroimage 59, 2012 (2560-2568): Liu et al

Morphology enabled dipole inversion for quantitativesusceptibility mapping using structural consistency between themagnitude image and the susceptibility map.

Tissue local magnetic field (b) obtained as convolution of dipolekernel (A) with susceptibility (x):

b ≈ Ax

Least squares or image based formulation: solve for x

‖Wb(Ax− b)‖2 +1

σ2‖WG(∇x)‖2

Wb weighting matrix for the noise on the data bWG weighting matrix for gradient ∇x dependent on noise level.σ is the unknown regularization parameter 4 / 68

Page 5: Automatic estimation of regularization parameters: …rosie/mypresentations/cornellqsm2.pdfMotivating Example 3D DataContextRegularization Parameter EstimationUsing the Noise PropertiesConclusions

Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion

Motivating Example for QSM

Neuroimage 59, 2012 (2560-2568): Liu et al

Morphology enabled dipole inversion for quantitativesusceptibility mapping using structural consistency between themagnitude image and the susceptibility map.

Tissue local magnetic field (b) obtained as convolution of dipolekernel (A) with susceptibility (x):

b ≈ Ax

Least squares or image based formulation: solve for x

‖Wb(Ax− b)‖2 +1

σ2‖WG(∇x)‖2

Wb weighting matrix for the noise on the data bWG weighting matrix for gradient ∇x dependent on noise level.σ is the unknown regularization parameter 5 / 68

Page 6: Automatic estimation of regularization parameters: …rosie/mypresentations/cornellqsm2.pdfMotivating Example 3D DataContextRegularization Parameter EstimationUsing the Noise PropertiesConclusions

Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion

Motivating Example for QSM

Neuroimage 59, 2012 (2560-2568): Liu et al

Morphology enabled dipole inversion for quantitativesusceptibility mapping using structural consistency between themagnitude image and the susceptibility map.

Tissue local magnetic field (b) obtained as convolution of dipolekernel (A) with susceptibility (x):

b ≈ Ax

Least squares or image based formulation: solve for x

‖Wb(Ax− b)‖2 +1

σ2‖WG(∇x)‖2

Wb weighting matrix for the noise on the data bWG weighting matrix for gradient ∇x dependent on noise level.σ is the unknown regularization parameter 6 / 68

Page 7: Automatic estimation of regularization parameters: …rosie/mypresentations/cornellqsm2.pdfMotivating Example 3D DataContextRegularization Parameter EstimationUsing the Noise PropertiesConclusions

Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion

Goals

1. Develop approach to automatically estimate parameterσ = 1/

√λ

2. Use validated parameter estimation techniques3. Employ statistical information from the data4. Efficient implementation5. Extend to L1 regularization

7 / 68

Page 8: Automatic estimation of regularization parameters: …rosie/mypresentations/cornellqsm2.pdfMotivating Example 3D DataContextRegularization Parameter EstimationUsing the Noise PropertiesConclusions

Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion

Goals

1. Develop approach to automatically estimate parameterσ = 1/

√λ

2. Use validated parameter estimation techniques3. Employ statistical information from the data4. Efficient implementation5. Extend to L1 regularization

8 / 68

Page 9: Automatic estimation of regularization parameters: …rosie/mypresentations/cornellqsm2.pdfMotivating Example 3D DataContextRegularization Parameter EstimationUsing the Noise PropertiesConclusions

Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion

Goals

1. Develop approach to automatically estimate parameterσ = 1/

√λ

2. Use validated parameter estimation techniques3. Employ statistical information from the data4. Efficient implementation5. Extend to L1 regularization

9 / 68

Page 10: Automatic estimation of regularization parameters: …rosie/mypresentations/cornellqsm2.pdfMotivating Example 3D DataContextRegularization Parameter EstimationUsing the Noise PropertiesConclusions

Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion

Goals

1. Develop approach to automatically estimate parameterσ = 1/

√λ

2. Use validated parameter estimation techniques3. Employ statistical information from the data4. Efficient implementation5. Extend to L1 regularization

10 / 68

Page 11: Automatic estimation of regularization parameters: …rosie/mypresentations/cornellqsm2.pdfMotivating Example 3D DataContextRegularization Parameter EstimationUsing the Noise PropertiesConclusions

Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion

Goals

1. Develop approach to automatically estimate parameterσ = 1/

√λ

2. Use validated parameter estimation techniques3. Employ statistical information from the data4. Efficient implementation5. Extend to L1 regularization

11 / 68

Page 12: Automatic estimation of regularization parameters: …rosie/mypresentations/cornellqsm2.pdfMotivating Example 3D DataContextRegularization Parameter EstimationUsing the Noise PropertiesConclusions

Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion

Context: L2 regularization

Solve ill-conditionedAx ≈ b

Standard Tikhonov, L approximates a derivative operator

x(λ) = arg minx{1

2‖Ax− b‖22 +

λ2

2‖Lx‖22}

x(λ) solves normal equations provided null(L)∩ null(A) = {0}

(ATA+ λ2LTL)x(λ) = ATb

Multiple approaches exist for estimating parameter λ = 1/√σ

12 / 68

Page 13: Automatic estimation of regularization parameters: …rosie/mypresentations/cornellqsm2.pdfMotivating Example 3D DataContextRegularization Parameter EstimationUsing the Noise PropertiesConclusions

Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion

Context: L2 regularization

Solve ill-conditionedAx ≈ b

Standard Tikhonov, L approximates a derivative operator

x(λ) = arg minx{1

2‖Ax− b‖22 +

λ2

2‖Lx‖22}

x(λ) solves normal equations provided null(L)∩ null(A) = {0}

(ATA+ λ2LTL)x(λ) = ATb

Multiple approaches exist for estimating parameter λ = 1/√σ

13 / 68

Page 14: Automatic estimation of regularization parameters: …rosie/mypresentations/cornellqsm2.pdfMotivating Example 3D DataContextRegularization Parameter EstimationUsing the Noise PropertiesConclusions

Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion

Context: L2 regularization

Solve ill-conditionedAx ≈ b

Standard Tikhonov, L approximates a derivative operator

x(λ) = arg minx{1

2‖Ax− b‖22 +

λ2

2‖Lx‖22}

x(λ) solves normal equations provided null(L)∩ null(A) = {0}

(ATA+ λ2LTL)x(λ) = ATb

Multiple approaches exist for estimating parameter λ = 1/√σ

14 / 68

Page 15: Automatic estimation of regularization parameters: …rosie/mypresentations/cornellqsm2.pdfMotivating Example 3D DataContextRegularization Parameter EstimationUsing the Noise PropertiesConclusions

Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion

Some Methods: assume variance τ2 in weighted Wbb

Morozov-Discrepancy - smooths - is a χ2 test on the residual (ResidualDiscrepancy - de Rochefort)

‖Wb(Ax(σ)− b)‖2 ≈ τ2

L-curve well -known find corner of of (x, y) plot:(log(‖Wb(Ax(σ)− b‖2), log(‖Lx‖2)

)Generalized Cross Validation (GCV) - minimization

‖Wb(Ax(σ)− b‖2

Tr(Im − (ATWbA+ 1/σ2LTL)−1ATWbA)

Unbiased Predictive Risk Estimation (UPRE) minimization

‖Wb(Ax(σ)− b)‖2 − 2τ2(m− Tr((ATWbA+ 1/σ2LTL)−1ATWbA)

)χ2 method - based on noise distribution in the data.And others e.g. Residual Periodogram ...

15 / 68

Page 16: Automatic estimation of regularization parameters: …rosie/mypresentations/cornellqsm2.pdfMotivating Example 3D DataContextRegularization Parameter EstimationUsing the Noise PropertiesConclusions

Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion

Some Methods: assume variance τ2 in weighted Wbb

Morozov-Discrepancy - smooths - is a χ2 test on the residual (ResidualDiscrepancy - de Rochefort)

‖Wb(Ax(σ)− b)‖2 ≈ τ2

L-curve well -known find corner of of (x, y) plot:(log(‖Wb(Ax(σ)− b‖2), log(‖Lx‖2)

)Generalized Cross Validation (GCV) - minimization

‖Wb(Ax(σ)− b‖2

Tr(Im − (ATWbA+ 1/σ2LTL)−1ATWbA)

Unbiased Predictive Risk Estimation (UPRE) minimization

‖Wb(Ax(σ)− b)‖2 − 2τ2(m− Tr((ATWbA+ 1/σ2LTL)−1ATWbA)

)χ2 method - based on noise distribution in the data.And others e.g. Residual Periodogram ...

16 / 68

Page 17: Automatic estimation of regularization parameters: …rosie/mypresentations/cornellqsm2.pdfMotivating Example 3D DataContextRegularization Parameter EstimationUsing the Noise PropertiesConclusions

Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion

Some Methods: assume variance τ2 in weighted Wbb

Morozov-Discrepancy - smooths - is a χ2 test on the residual (ResidualDiscrepancy - de Rochefort)

‖Wb(Ax(σ)− b)‖2 ≈ τ2

L-curve well -known find corner of of (x, y) plot:(log(‖Wb(Ax(σ)− b‖2), log(‖Lx‖2)

)Generalized Cross Validation (GCV) - minimization

‖Wb(Ax(σ)− b‖2

Tr(Im − (ATWbA+ 1/σ2LTL)−1ATWbA)

Unbiased Predictive Risk Estimation (UPRE) minimization

‖Wb(Ax(σ)− b)‖2 − 2τ2(m− Tr((ATWbA+ 1/σ2LTL)−1ATWbA)

)χ2 method - based on noise distribution in the data.And others e.g. Residual Periodogram ...

17 / 68

Page 18: Automatic estimation of regularization parameters: …rosie/mypresentations/cornellqsm2.pdfMotivating Example 3D DataContextRegularization Parameter EstimationUsing the Noise PropertiesConclusions

Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion

Some Methods: assume variance τ2 in weighted Wbb

Morozov-Discrepancy - smooths - is a χ2 test on the residual (ResidualDiscrepancy - de Rochefort)

‖Wb(Ax(σ)− b)‖2 ≈ τ2

L-curve well -known find corner of of (x, y) plot:(log(‖Wb(Ax(σ)− b‖2), log(‖Lx‖2)

)Generalized Cross Validation (GCV) - minimization

‖Wb(Ax(σ)− b‖2

Tr(Im − (ATWbA+ 1/σ2LTL)−1ATWbA)

Unbiased Predictive Risk Estimation (UPRE) minimization

‖Wb(Ax(σ)− b)‖2 − 2τ2(m− Tr((ATWbA+ 1/σ2LTL)−1ATWbA)

)χ2 method - based on noise distribution in the data.And others e.g. Residual Periodogram ...

18 / 68

Page 19: Automatic estimation of regularization parameters: …rosie/mypresentations/cornellqsm2.pdfMotivating Example 3D DataContextRegularization Parameter EstimationUsing the Noise PropertiesConclusions

Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion

Some Methods: assume variance τ2 in weighted Wbb

Morozov-Discrepancy - smooths - is a χ2 test on the residual (ResidualDiscrepancy - de Rochefort)

‖Wb(Ax(σ)− b)‖2 ≈ τ2

L-curve well -known find corner of of (x, y) plot:(log(‖Wb(Ax(σ)− b‖2), log(‖Lx‖2)

)Generalized Cross Validation (GCV) - minimization

‖Wb(Ax(σ)− b‖2

Tr(Im − (ATWbA+ 1/σ2LTL)−1ATWbA)

Unbiased Predictive Risk Estimation (UPRE) minimization

‖Wb(Ax(σ)− b)‖2 − 2τ2(m− Tr((ATWbA+ 1/σ2LTL)−1ATWbA)

)χ2 method - based on noise distribution in the data.And others e.g. Residual Periodogram ...

19 / 68

Page 20: Automatic estimation of regularization parameters: …rosie/mypresentations/cornellqsm2.pdfMotivating Example 3D DataContextRegularization Parameter EstimationUsing the Noise PropertiesConclusions

Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion

Some characteristics of the methods

Method Idea Many λ Algorithm Statistical UniqueDiscrepancy Easy No Root finding Yes Yes

L-curve Easy Yes spline No NoGCV Hard Yes Minimum Yes No

UPRE Hard Yes Minimum Yes Noχ2 Ok No Root finding Yes Yes

1. In particular χ2 and UPRE rely on provision of statistics ofthe noise distribution

2. UPRE and GCV require a matrix trace estimation -expensive

20 / 68

Page 21: Automatic estimation of regularization parameters: …rosie/mypresentations/cornellqsm2.pdfMotivating Example 3D DataContextRegularization Parameter EstimationUsing the Noise PropertiesConclusions

Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion

Some characteristics of the methods

Method Idea Many λ Algorithm Statistical UniqueDiscrepancy Easy No Root finding Yes Yes

L-curve Easy Yes spline No NoGCV Hard Yes Minimum Yes No

UPRE Hard Yes Minimum Yes Noχ2 Ok No Root finding Yes Yes

1. In particular χ2 and UPRE rely on provision of statistics ofthe noise distribution

2. UPRE and GCV require a matrix trace estimation -expensive

21 / 68

Page 22: Automatic estimation of regularization parameters: …rosie/mypresentations/cornellqsm2.pdfMotivating Example 3D DataContextRegularization Parameter EstimationUsing the Noise PropertiesConclusions

Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion

Weighting for the noise - assume noise η in b

Suppose η ∼ (0, Cb), i.e. Cb is covariance of the noise in b.Cb is SPD: hence Cb = (C

1/2b )2 and is invertible.

Multiplying by W 1/2b = (C

1/2b )−1 whitens the noise in b

W1/2b (Ax̂− b) = η̄, where η̄ ∼ (0,W

1/2b Cb(W

1/2b )T ) = (0, Im)

ie. we have the weighted form (‖A‖2W = ATWA)

x(σ) = arg minx{‖Ax− b‖2Wb

+ 1/σ2‖x‖2}

More generally : Wx = 1/σ2I and augmented residual r(σ)

x(Wx) = arg minx

∥∥∥∥∥(W

1/2b A

W1/2x

)x−

(W

1/2b b0n

)∥∥∥∥∥2

:= arg minx‖r(σ)A‖2

22 / 68

Page 23: Automatic estimation of regularization parameters: …rosie/mypresentations/cornellqsm2.pdfMotivating Example 3D DataContextRegularization Parameter EstimationUsing the Noise PropertiesConclusions

Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion

Weighting for the noise - assume noise η in b

Suppose η ∼ (0, Cb), i.e. Cb is covariance of the noise in b.Cb is SPD: hence Cb = (C

1/2b )2 and is invertible.

Multiplying by W 1/2b = (C

1/2b )−1 whitens the noise in b

W1/2b (Ax̂− b) = η̄, where η̄ ∼ (0,W

1/2b Cb(W

1/2b )T ) = (0, Im)

ie. we have the weighted form (‖A‖2W = ATWA)

x(σ) = arg minx{‖Ax− b‖2Wb

+ 1/σ2‖x‖2}

More generally : Wx = 1/σ2I and augmented residual r(σ)

x(Wx) = arg minx

∥∥∥∥∥(W

1/2b A

W1/2x

)x−

(W

1/2b b0n

)∥∥∥∥∥2

:= arg minx‖r(σ)A‖2

23 / 68

Page 24: Automatic estimation of regularization parameters: …rosie/mypresentations/cornellqsm2.pdfMotivating Example 3D DataContextRegularization Parameter EstimationUsing the Noise PropertiesConclusions

Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion

Weighting for the noise - assume noise η in b

Suppose η ∼ (0, Cb), i.e. Cb is covariance of the noise in b.Cb is SPD: hence Cb = (C

1/2b )2 and is invertible.

Multiplying by W 1/2b = (C

1/2b )−1 whitens the noise in b

W1/2b (Ax̂− b) = η̄, where η̄ ∼ (0,W

1/2b Cb(W

1/2b )T ) = (0, Im)

ie. we have the weighted form (‖A‖2W = ATWA)

x(σ) = arg minx{‖Ax− b‖2Wb

+ 1/σ2‖x‖2}

More generally : Wx = 1/σ2I and augmented residual r(σ)

x(Wx) = arg minx

∥∥∥∥∥(W

1/2b A

W1/2x

)x−

(W

1/2b b0n

)∥∥∥∥∥2

:= arg minx‖r(σ)A‖2

24 / 68

Page 25: Automatic estimation of regularization parameters: …rosie/mypresentations/cornellqsm2.pdfMotivating Example 3D DataContextRegularization Parameter EstimationUsing the Noise PropertiesConclusions

Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion

Weighting for the noise - assume noise η in b

Suppose η ∼ (0, Cb), i.e. Cb is covariance of the noise in b.Cb is SPD: hence Cb = (C

1/2b )2 and is invertible.

Multiplying by W 1/2b = (C

1/2b )−1 whitens the noise in b

W1/2b (Ax̂− b) = η̄, where η̄ ∼ (0,W

1/2b Cb(W

1/2b )T ) = (0, Im)

ie. we have the weighted form (‖A‖2W = ATWA)

x(σ) = arg minx{‖Ax− b‖2Wb

+ 1/σ2‖x‖2}

More generally : Wx = 1/σ2I and augmented residual r(σ)

x(Wx) = arg minx

∥∥∥∥∥(W

1/2b A

W1/2x

)x−

(W

1/2b b0n

)∥∥∥∥∥2

:= arg minx‖r(σ)A‖2

25 / 68

Page 26: Automatic estimation of regularization parameters: …rosie/mypresentations/cornellqsm2.pdfMotivating Example 3D DataContextRegularization Parameter EstimationUsing the Noise PropertiesConclusions

Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion

Weighting for the noise - assume noise η in b

Suppose η ∼ (0, Cb), i.e. Cb is covariance of the noise in b.Cb is SPD: hence Cb = (C

1/2b )2 and is invertible.

Multiplying by W 1/2b = (C

1/2b )−1 whitens the noise in b

W1/2b (Ax̂− b) = η̄, where η̄ ∼ (0,W

1/2b Cb(W

1/2b )T ) = (0, Im)

ie. we have the weighted form (‖A‖2W = ATWA)

x(σ) = arg minx{‖Ax− b‖2Wb

+ 1/σ2‖x‖2}

More generally : Wx = 1/σ2I and augmented residual r(σ)

x(Wx) = arg minx

∥∥∥∥∥(W

1/2b A

W1/2x

)x−

(W

1/2b b0n

)∥∥∥∥∥2

:= arg minx‖r(σ)A‖2

26 / 68

Page 27: Automatic estimation of regularization parameters: …rosie/mypresentations/cornellqsm2.pdfMotivating Example 3D DataContextRegularization Parameter EstimationUsing the Noise PropertiesConclusions

Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion

Statistical Properties of the Augmented Regularized Residual

For a given solution

x(Wx) = W−1x AT (ATW−1

x A+W−1b )−1b

the augmented residual is

J(Wx) = bT (ATW−1x A+W−1

b )−1b = ‖r(Wx)‖2

Lemma (Distribution of the Cost Functional)

If Wb and Wx have been chosen appropriately functional J is arandom variable which follows a χ2 distribution with m degreesof freedom:

J(Wx) ∼ χ2(m) E(J(x(Wx))) = m Var(J) = 2m

27 / 68

Page 28: Automatic estimation of regularization parameters: …rosie/mypresentations/cornellqsm2.pdfMotivating Example 3D DataContextRegularization Parameter EstimationUsing the Noise PropertiesConclusions

Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion

Statistical Properties of the Augmented Regularized Residual

For a given solution

x(Wx) = W−1x AT (ATW−1

x A+W−1b )−1b

the augmented residual is

J(Wx) = bT (ATW−1x A+W−1

b )−1b = ‖r(Wx)‖2

Lemma (Distribution of the Cost Functional)

If Wb and Wx have been chosen appropriately functional J is arandom variable which follows a χ2 distribution with m degreesof freedom:

J(Wx) ∼ χ2(m) E(J(x(Wx))) = m Var(J) = 2m

28 / 68

Page 29: Automatic estimation of regularization parameters: …rosie/mypresentations/cornellqsm2.pdfMotivating Example 3D DataContextRegularization Parameter EstimationUsing the Noise PropertiesConclusions

Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion

χ2 method to find the parameter (Mead and Renaut)

Find Wx = σ2I such that

m−√

2mzα/2 < bT (ATW−1x A+W−1

b )−1b < m+√

2mzα/2

Using the SVD W1/2b A = UΣV T let s = UTW

1/2b b - solve

F (σ) = sTdiag(1

1 + σ2σ2i)s−m = 0.

Spectral decompositions A = G∗ΛG : s = Gb̃ =ˆ̃b, Λ = diag(σi)

Large Scale Implement using CG or other projected methods withmapped regularization L

σ(k+1) = σ(k)(1 + α(k) 1

2

(σ(k)

‖Lx(σ(k)‖

)2

(J(σ(k))− m̃)

m̃ - degrees of freedom in the residual. α a line searchparameter.

29 / 68

Page 30: Automatic estimation of regularization parameters: …rosie/mypresentations/cornellqsm2.pdfMotivating Example 3D DataContextRegularization Parameter EstimationUsing the Noise PropertiesConclusions

Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion

χ2 method to find the parameter (Mead and Renaut)

Find Wx = σ2I such that

m−√

2mzα/2 < bT (ATW−1x A+W−1

b )−1b < m+√

2mzα/2

Using the SVD W1/2b A = UΣV T let s = UTW

1/2b b - solve

F (σ) = sTdiag(1

1 + σ2σ2i)s−m = 0.

Spectral decompositions A = G∗ΛG : s = Gb̃ =ˆ̃b, Λ = diag(σi)

Large Scale Implement using CG or other projected methods withmapped regularization L

σ(k+1) = σ(k)(1 + α(k) 1

2

(σ(k)

‖Lx(σ(k)‖

)2

(J(σ(k))− m̃)

m̃ - degrees of freedom in the residual. α a line searchparameter.

30 / 68

Page 31: Automatic estimation of regularization parameters: …rosie/mypresentations/cornellqsm2.pdfMotivating Example 3D DataContextRegularization Parameter EstimationUsing the Noise PropertiesConclusions

Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion

χ2 method to find the parameter (Mead and Renaut)

Find Wx = σ2I such that

m−√

2mzα/2 < bT (ATW−1x A+W−1

b )−1b < m+√

2mzα/2

Using the SVD W1/2b A = UΣV T let s = UTW

1/2b b - solve

F (σ) = sTdiag(1

1 + σ2σ2i)s−m = 0.

Spectral decompositions A = G∗ΛG : s = Gb̃ =ˆ̃b, Λ = diag(σi)

Large Scale Implement using CG or other projected methods withmapped regularization L

σ(k+1) = σ(k)(1 + α(k) 1

2

(σ(k)

‖Lx(σ(k)‖

)2

(J(σ(k))− m̃)

m̃ - degrees of freedom in the residual. α a line searchparameter.

31 / 68

Page 32: Automatic estimation of regularization parameters: …rosie/mypresentations/cornellqsm2.pdfMotivating Example 3D DataContextRegularization Parameter EstimationUsing the Noise PropertiesConclusions

Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion

χ2 method to find the parameter (Mead and Renaut)

Find Wx = σ2I such that

m−√

2mzα/2 < bT (ATW−1x A+W−1

b )−1b < m+√

2mzα/2

Using the SVD W1/2b A = UΣV T let s = UTW

1/2b b - solve

F (σ) = sTdiag(1

1 + σ2σ2i)s−m = 0.

Spectral decompositions A = G∗ΛG : s = Gb̃ =ˆ̃b, Λ = diag(σi)

Large Scale Implement using CG or other projected methods withmapped regularization L

σ(k+1) = σ(k)(1 + α(k) 1

2

(σ(k)

‖Lx(σ(k)‖

)2

(J(σ(k))− m̃)

m̃ - degrees of freedom in the residual. α a line searchparameter.

32 / 68

Page 33: Automatic estimation of regularization parameters: …rosie/mypresentations/cornellqsm2.pdfMotivating Example 3D DataContextRegularization Parameter EstimationUsing the Noise PropertiesConclusions

Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion

χ2 method to find the parameter (Mead and Renaut)

Find Wx = σ2I such that

m−√

2mzα/2 < bT (ATW−1x A+W−1

b )−1b < m+√

2mzα/2

Using the SVD W1/2b A = UΣV T let s = UTW

1/2b b - solve

F (σ) = sTdiag(1

1 + σ2σ2i)s−m = 0.

Spectral decompositions A = G∗ΛG : s = Gb̃ =ˆ̃b, Λ = diag(σi)

Large Scale Implement using CG or other projected methods withmapped regularization L

σ(k+1) = σ(k)(1 + α(k) 1

2

(σ(k)

‖Lx(σ(k)‖

)2

(J(σ(k))− m̃)

m̃ - degrees of freedom in the residual. α a line searchparameter.

33 / 68

Page 34: Automatic estimation of regularization parameters: …rosie/mypresentations/cornellqsm2.pdfMotivating Example 3D DataContextRegularization Parameter EstimationUsing the Noise PropertiesConclusions

Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion

Some Results: Simulated data with 10% colored noise - no masks

Figure: Estimates obtained automatically by χ2 method, above, andbelow the optimal estimates by sweeping through 50 choices

34 / 68

Page 35: Automatic estimation of regularization parameters: …rosie/mypresentations/cornellqsm2.pdfMotivating Example 3D DataContextRegularization Parameter EstimationUsing the Noise PropertiesConclusions

Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion

Example for Dipole Inversion: The SNR estimates

Figure: Estimates obtained automatically by χ2 method indicated ascompared to optimum. SNR 10 log 10(‖xtrue‖2/‖xtrue − x‖2). Allimage based methods and using CG

35 / 68

Page 36: Automatic estimation of regularization parameters: …rosie/mypresentations/cornellqsm2.pdfMotivating Example 3D DataContextRegularization Parameter EstimationUsing the Noise PropertiesConclusions

Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion

Computational CostsCosts in seconds are for the χ2 and optimal search

χ2 Opt χ2 Opt χ2 Opt χ2 Opt57 526 131 578 83 619 167 683

The ratio for the cost increase of searching optimally:

9.29 4.41 7.44 4.09

Clear dependence on model of regularization and weighting.χ2 finds the optimal parameter at reduced cost

Remarks

Noise distribution must be knownParameters must be tuned relating to WG, Wb and thetruncation for the dipole (see talk of Karin Shmueli - stillrelevant for the forward operation with regularization)

36 / 68

Page 37: Automatic estimation of regularization parameters: …rosie/mypresentations/cornellqsm2.pdfMotivating Example 3D DataContextRegularization Parameter EstimationUsing the Noise PropertiesConclusions

Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion

Computational CostsCosts in seconds are for the χ2 and optimal search

χ2 Opt χ2 Opt χ2 Opt χ2 Opt57 526 131 578 83 619 167 683

The ratio for the cost increase of searching optimally:

9.29 4.41 7.44 4.09

Clear dependence on model of regularization and weighting.χ2 finds the optimal parameter at reduced cost

Remarks

Noise distribution must be knownParameters must be tuned relating to WG, Wb and thetruncation for the dipole (see talk of Karin Shmueli - stillrelevant for the forward operation with regularization)

37 / 68

Page 38: Automatic estimation of regularization parameters: …rosie/mypresentations/cornellqsm2.pdfMotivating Example 3D DataContextRegularization Parameter EstimationUsing the Noise PropertiesConclusions

Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion

Computational CostsCosts in seconds are for the χ2 and optimal search

χ2 Opt χ2 Opt χ2 Opt χ2 Opt57 526 131 578 83 619 167 683

The ratio for the cost increase of searching optimally:

9.29 4.41 7.44 4.09

Clear dependence on model of regularization and weighting.χ2 finds the optimal parameter at reduced cost

Remarks

Noise distribution must be knownParameters must be tuned relating to WG, Wb and thetruncation for the dipole (see talk of Karin Shmueli - stillrelevant for the forward operation with regularization)

38 / 68

Page 39: Automatic estimation of regularization parameters: …rosie/mypresentations/cornellqsm2.pdfMotivating Example 3D DataContextRegularization Parameter EstimationUsing the Noise PropertiesConclusions

Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion

Computational CostsCosts in seconds are for the χ2 and optimal search

χ2 Opt χ2 Opt χ2 Opt χ2 Opt57 526 131 578 83 619 167 683

The ratio for the cost increase of searching optimally:

9.29 4.41 7.44 4.09

Clear dependence on model of regularization and weighting.χ2 finds the optimal parameter at reduced cost

Remarks

Noise distribution must be knownParameters must be tuned relating to WG, Wb and thetruncation for the dipole (see talk of Karin Shmueli - stillrelevant for the forward operation with regularization)

39 / 68

Page 40: Automatic estimation of regularization parameters: …rosie/mypresentations/cornellqsm2.pdfMotivating Example 3D DataContextRegularization Parameter EstimationUsing the Noise PropertiesConclusions

Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion

Phantom Data

Figure: Estimates obtained automatically by χ2 method, left, and rightthe optimal estimates by sweeping through 50 choices 40 / 68

Page 41: Automatic estimation of regularization parameters: …rosie/mypresentations/cornellqsm2.pdfMotivating Example 3D DataContextRegularization Parameter EstimationUsing the Noise PropertiesConclusions

Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion

Example comparing SNR estimates by k− spaced method : Simulation

Figure: Notice good estimates and minimal cost using χ2 in applied tok− space data 41 / 68

Page 42: Automatic estimation of regularization parameters: …rosie/mypresentations/cornellqsm2.pdfMotivating Example 3D DataContextRegularization Parameter EstimationUsing the Noise PropertiesConclusions

Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion

Results in Fourier domain contaminated by aliasing/artifacts

Figure: Important to correctly identify noise levels and truncation forthe dipole convolution

42 / 68

Page 43: Automatic estimation of regularization parameters: …rosie/mypresentations/cornellqsm2.pdfMotivating Example 3D DataContextRegularization Parameter EstimationUsing the Noise PropertiesConclusions

Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion

Observations / Conclusions

1. χ2 successfully applies for 3D inversion with noiseinformation

2. χ2 has potential to steer toward optimal parameters3. There are a number of theoretical results justifying the

approach.4. Still needs to be refined for use in spectral domain ( to

include gradients)5. Efficient implementations require consideration of better

Krylov methods.6. Suggests use of χ2 for use in other formulations. e.g. L1

43 / 68

Page 44: Automatic estimation of regularization parameters: …rosie/mypresentations/cornellqsm2.pdfMotivating Example 3D DataContextRegularization Parameter EstimationUsing the Noise PropertiesConclusions

Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion

Observations / Conclusions

1. χ2 successfully applies for 3D inversion with noiseinformation

2. χ2 has potential to steer toward optimal parameters3. There are a number of theoretical results justifying the

approach.4. Still needs to be refined for use in spectral domain ( to

include gradients)5. Efficient implementations require consideration of better

Krylov methods.6. Suggests use of χ2 for use in other formulations. e.g. L1

44 / 68

Page 45: Automatic estimation of regularization parameters: …rosie/mypresentations/cornellqsm2.pdfMotivating Example 3D DataContextRegularization Parameter EstimationUsing the Noise PropertiesConclusions

Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion

Observations / Conclusions

1. χ2 successfully applies for 3D inversion with noiseinformation

2. χ2 has potential to steer toward optimal parameters3. There are a number of theoretical results justifying the

approach.4. Still needs to be refined for use in spectral domain ( to

include gradients)5. Efficient implementations require consideration of better

Krylov methods.6. Suggests use of χ2 for use in other formulations. e.g. L1

45 / 68

Page 46: Automatic estimation of regularization parameters: …rosie/mypresentations/cornellqsm2.pdfMotivating Example 3D DataContextRegularization Parameter EstimationUsing the Noise PropertiesConclusions

Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion

Observations / Conclusions

1. χ2 successfully applies for 3D inversion with noiseinformation

2. χ2 has potential to steer toward optimal parameters3. There are a number of theoretical results justifying the

approach.4. Still needs to be refined for use in spectral domain ( to

include gradients)5. Efficient implementations require consideration of better

Krylov methods.6. Suggests use of χ2 for use in other formulations. e.g. L1

46 / 68

Page 47: Automatic estimation of regularization parameters: …rosie/mypresentations/cornellqsm2.pdfMotivating Example 3D DataContextRegularization Parameter EstimationUsing the Noise PropertiesConclusions

Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion

Observations / Conclusions

1. χ2 successfully applies for 3D inversion with noiseinformation

2. χ2 has potential to steer toward optimal parameters3. There are a number of theoretical results justifying the

approach.4. Still needs to be refined for use in spectral domain ( to

include gradients)5. Efficient implementations require consideration of better

Krylov methods.6. Suggests use of χ2 for use in other formulations. e.g. L1

47 / 68

Page 48: Automatic estimation of regularization parameters: …rosie/mypresentations/cornellqsm2.pdfMotivating Example 3D DataContextRegularization Parameter EstimationUsing the Noise PropertiesConclusions

Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion

Observations / Conclusions

1. χ2 successfully applies for 3D inversion with noiseinformation

2. χ2 has potential to steer toward optimal parameters3. There are a number of theoretical results justifying the

approach.4. Still needs to be refined for use in spectral domain ( to

include gradients)5. Efficient implementations require consideration of better

Krylov methods.6. Suggests use of χ2 for use in other formulations. e.g. L1

48 / 68

Page 49: Automatic estimation of regularization parameters: …rosie/mypresentations/cornellqsm2.pdfMotivating Example 3D DataContextRegularization Parameter EstimationUsing the Noise PropertiesConclusions

Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion

Extending for L1 using Augmented Lagrangian: Simple Example (herewith UPRE)

49 / 68

Page 50: Automatic estimation of regularization parameters: …rosie/mypresentations/cornellqsm2.pdfMotivating Example 3D DataContextRegularization Parameter EstimationUsing the Noise PropertiesConclusions

Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion

Theoretical Results: Relating UPRE and χ2

1. UPRE is designed to minimize the bias in the solution2. UPRE requires Trace operator (can be optimized)

Lemma (Connecting UPRE and χ2)

The σ solving the χ2 functional provides a local minimum of theUPRE functional.

Proof: GSVD expansion for operators.

Lemma (Convergence with increasing resolution by χ2 )

Suppose kernel is square integrable. Then σ(m)χ2 as a functionof the number of equations, converges with increasing m.

RemarkBoth results assist in justification of use of the augmenteddiscrepancy principle. Also certain kernels we may searchextensively for low resolution.

50 / 68

Page 51: Automatic estimation of regularization parameters: …rosie/mypresentations/cornellqsm2.pdfMotivating Example 3D DataContextRegularization Parameter EstimationUsing the Noise PropertiesConclusions

Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion

Theoretical Results: Relating UPRE and χ2

1. UPRE is designed to minimize the bias in the solution2. UPRE requires Trace operator (can be optimized)

Lemma (Connecting UPRE and χ2)

The σ solving the χ2 functional provides a local minimum of theUPRE functional.

Proof: GSVD expansion for operators.

Lemma (Convergence with increasing resolution by χ2 )

Suppose kernel is square integrable. Then σ(m)χ2 as a functionof the number of equations, converges with increasing m.

RemarkBoth results assist in justification of use of the augmenteddiscrepancy principle. Also certain kernels we may searchextensively for low resolution.

51 / 68

Page 52: Automatic estimation of regularization parameters: …rosie/mypresentations/cornellqsm2.pdfMotivating Example 3D DataContextRegularization Parameter EstimationUsing the Noise PropertiesConclusions

Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion

Theoretical Results: Relating UPRE and χ2

1. UPRE is designed to minimize the bias in the solution2. UPRE requires Trace operator (can be optimized)

Lemma (Connecting UPRE and χ2)

The σ solving the χ2 functional provides a local minimum of theUPRE functional.

Proof: GSVD expansion for operators.

Lemma (Convergence with increasing resolution by χ2 )

Suppose kernel is square integrable. Then σ(m)χ2 as a functionof the number of equations, converges with increasing m.

RemarkBoth results assist in justification of use of the augmenteddiscrepancy principle. Also certain kernels we may searchextensively for low resolution.

52 / 68

Page 53: Automatic estimation of regularization parameters: …rosie/mypresentations/cornellqsm2.pdfMotivating Example 3D DataContextRegularization Parameter EstimationUsing the Noise PropertiesConclusions

Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion

Theoretical Results: Relating UPRE and χ2

1. UPRE is designed to minimize the bias in the solution2. UPRE requires Trace operator (can be optimized)

Lemma (Connecting UPRE and χ2)

The σ solving the χ2 functional provides a local minimum of theUPRE functional.

Proof: GSVD expansion for operators.

Lemma (Convergence with increasing resolution by χ2 )

Suppose kernel is square integrable. Then σ(m)χ2 as a functionof the number of equations, converges with increasing m.

RemarkBoth results assist in justification of use of the augmenteddiscrepancy principle. Also certain kernels we may searchextensively for low resolution.

53 / 68

Page 54: Automatic estimation of regularization parameters: …rosie/mypresentations/cornellqsm2.pdfMotivating Example 3D DataContextRegularization Parameter EstimationUsing the Noise PropertiesConclusions

Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion

Theoretical Results: Relating UPRE and χ2

1. UPRE is designed to minimize the bias in the solution2. UPRE requires Trace operator (can be optimized)

Lemma (Connecting UPRE and χ2)

The σ solving the χ2 functional provides a local minimum of theUPRE functional.

Proof: GSVD expansion for operators.

Lemma (Convergence with increasing resolution by χ2 )

Suppose kernel is square integrable. Then σ(m)χ2 as a functionof the number of equations, converges with increasing m.

RemarkBoth results assist in justification of use of the augmenteddiscrepancy principle. Also certain kernels we may searchextensively for low resolution.

54 / 68

Page 55: Automatic estimation of regularization parameters: …rosie/mypresentations/cornellqsm2.pdfMotivating Example 3D DataContextRegularization Parameter EstimationUsing the Noise PropertiesConclusions

Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion

Theoretical Results: Relating UPRE and χ2

1. UPRE is designed to minimize the bias in the solution2. UPRE requires Trace operator (can be optimized)

Lemma (Connecting UPRE and χ2)

The σ solving the χ2 functional provides a local minimum of theUPRE functional.

Proof: GSVD expansion for operators.

Lemma (Convergence with increasing resolution by χ2 )

Suppose kernel is square integrable. Then σ(m)χ2 as a functionof the number of equations, converges with increasing m.

RemarkBoth results assist in justification of use of the augmenteddiscrepancy principle. Also certain kernels we may searchextensively for low resolution.

55 / 68

Page 56: Automatic estimation of regularization parameters: …rosie/mypresentations/cornellqsm2.pdfMotivating Example 3D DataContextRegularization Parameter EstimationUsing the Noise PropertiesConclusions

Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion

Extending for L1 regularizationFinding the optimal parameter for the Tikhonov is a first step inSplit Bregman (Goldstein and Osher, 2009)Introduce d ≈ Lx and let R(x) = 1

2σ2 ‖d− Lx‖22 + µ‖d‖1

(x,d)(σ, µ) = arg minx,d{1

2‖Ax− b‖22 +

1

2σ2‖d− Lx‖22 + µ‖d‖1}

Alternating minimization separates steps for d from x

Various versions of the iteration can be defined. Fundamentally:

S1 : x(k+1) = arg minx{1

2‖Ax− b‖22 +

1

2σ2‖Lx− (d(k+1) − g(k))‖22}

S2 : d(k+1) = arg mind{ 1

2σ2‖d− (Lx(k+1) + g(k))‖22 + µ‖d‖1}

S3 : g(k+1) = g(k) + Lx(k+1) − d(k+1).

Notice dimension increase of the problem56 / 68

Page 57: Automatic estimation of regularization parameters: …rosie/mypresentations/cornellqsm2.pdfMotivating Example 3D DataContextRegularization Parameter EstimationUsing the Noise PropertiesConclusions

Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion

Extending for L1 regularizationFinding the optimal parameter for the Tikhonov is a first step inSplit Bregman (Goldstein and Osher, 2009)Introduce d ≈ Lx and let R(x) = 1

2σ2 ‖d− Lx‖22 + µ‖d‖1

(x,d)(σ, µ) = arg minx,d{1

2‖Ax− b‖22 +

1

2σ2‖d− Lx‖22 + µ‖d‖1}

Alternating minimization separates steps for d from x

Various versions of the iteration can be defined. Fundamentally:

S1 : x(k+1) = arg minx{1

2‖Ax− b‖22 +

1

2σ2‖Lx− (d(k+1) − g(k))‖22}

S2 : d(k+1) = arg mind{ 1

2σ2‖d− (Lx(k+1) + g(k))‖22 + µ‖d‖1}

S3 : g(k+1) = g(k) + Lx(k+1) − d(k+1).

Notice dimension increase of the problem57 / 68

Page 58: Automatic estimation of regularization parameters: …rosie/mypresentations/cornellqsm2.pdfMotivating Example 3D DataContextRegularization Parameter EstimationUsing the Noise PropertiesConclusions

Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion

Extending for L1 regularizationFinding the optimal parameter for the Tikhonov is a first step inSplit Bregman (Goldstein and Osher, 2009)Introduce d ≈ Lx and let R(x) = 1

2σ2 ‖d− Lx‖22 + µ‖d‖1

(x,d)(σ, µ) = arg minx,d{1

2‖Ax− b‖22 +

1

2σ2‖d− Lx‖22 + µ‖d‖1}

Alternating minimization separates steps for d from x

Various versions of the iteration can be defined. Fundamentally:

S1 : x(k+1) = arg minx{1

2‖Ax− b‖22 +

1

2σ2‖Lx− (d(k+1) − g(k))‖22}

S2 : d(k+1) = arg mind{ 1

2σ2‖d− (Lx(k+1) + g(k))‖22 + µ‖d‖1}

S3 : g(k+1) = g(k) + Lx(k+1) − d(k+1).

Notice dimension increase of the problem58 / 68

Page 59: Automatic estimation of regularization parameters: …rosie/mypresentations/cornellqsm2.pdfMotivating Example 3D DataContextRegularization Parameter EstimationUsing the Noise PropertiesConclusions

Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion

Extending for L1 regularizationFinding the optimal parameter for the Tikhonov is a first step inSplit Bregman (Goldstein and Osher, 2009)Introduce d ≈ Lx and let R(x) = 1

2σ2 ‖d− Lx‖22 + µ‖d‖1

(x,d)(σ, µ) = arg minx,d{1

2‖Ax− b‖22 +

1

2σ2‖d− Lx‖22 + µ‖d‖1}

Alternating minimization separates steps for d from x

Various versions of the iteration can be defined. Fundamentally:

S1 : x(k+1) = arg minx{1

2‖Ax− b‖22 +

1

2σ2‖Lx− (d(k+1) − g(k))‖22}

S2 : d(k+1) = arg mind{ 1

2σ2‖d− (Lx(k+1) + g(k))‖22 + µ‖d‖1}

S3 : g(k+1) = g(k) + Lx(k+1) − d(k+1).

Notice dimension increase of the problem59 / 68

Page 60: Automatic estimation of regularization parameters: …rosie/mypresentations/cornellqsm2.pdfMotivating Example 3D DataContextRegularization Parameter EstimationUsing the Noise PropertiesConclusions

Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion

Extending for L1 regularizationFinding the optimal parameter for the Tikhonov is a first step inSplit Bregman (Goldstein and Osher, 2009)Introduce d ≈ Lx and let R(x) = 1

2σ2 ‖d− Lx‖22 + µ‖d‖1

(x,d)(σ, µ) = arg minx,d{1

2‖Ax− b‖22 +

1

2σ2‖d− Lx‖22 + µ‖d‖1}

Alternating minimization separates steps for d from x

Various versions of the iteration can be defined. Fundamentally:

S1 : x(k+1) = arg minx{1

2‖Ax− b‖22 +

1

2σ2‖Lx− (d(k+1) − g(k))‖22}

S2 : d(k+1) = arg mind{ 1

2σ2‖d− (Lx(k+1) + g(k))‖22 + µ‖d‖1}

S3 : g(k+1) = g(k) + Lx(k+1) − d(k+1).

Notice dimension increase of the problem60 / 68

Page 61: Automatic estimation of regularization parameters: …rosie/mypresentations/cornellqsm2.pdfMotivating Example 3D DataContextRegularization Parameter EstimationUsing the Noise PropertiesConclusions

Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion

Focus: Tikhonov Step of the Algorithm

S1 : x(k+1) = arg minx{1

2‖Ax− b‖22 +

1

2σ2‖Lx− (d(k) − g(k))‖22}

Update for x: Introduce

h(k) = d(k) − g(k).

Then

x(k+1) = arg minx{1

2‖Ax− b‖22 +

1

2σ2‖Lx− h(k)‖22}.

Standard least squares update using a Tikhonov regularizer.Depends on changing right hand sideAlso depends on parameter σ.

61 / 68

Page 62: Automatic estimation of regularization parameters: …rosie/mypresentations/cornellqsm2.pdfMotivating Example 3D DataContextRegularization Parameter EstimationUsing the Noise PropertiesConclusions

Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion

Focus: Tikhonov Step of the Algorithm

S1 : x(k+1) = arg minx{1

2‖Ax− b‖22 +

1

2σ2‖Lx− (d(k) − g(k))‖22}

Update for x: Introduce

h(k) = d(k) − g(k).

Then

x(k+1) = arg minx{1

2‖Ax− b‖22 +

1

2σ2‖Lx− h(k)‖22}.

Standard least squares update using a Tikhonov regularizer.Depends on changing right hand sideAlso depends on parameter σ.

62 / 68

Page 63: Automatic estimation of regularization parameters: …rosie/mypresentations/cornellqsm2.pdfMotivating Example 3D DataContextRegularization Parameter EstimationUsing the Noise PropertiesConclusions

Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion

Focus: Tikhonov Step of the Algorithm

S1 : x(k+1) = arg minx{1

2‖Ax− b‖22 +

1

2σ2‖Lx− (d(k) − g(k))‖22}

Update for x: Introduce

h(k) = d(k) − g(k).

Then

x(k+1) = arg minx{1

2‖Ax− b‖22 +

1

2σ2‖Lx− h(k)‖22}.

Standard least squares update using a Tikhonov regularizer.Depends on changing right hand sideAlso depends on parameter σ.

63 / 68

Page 64: Automatic estimation of regularization parameters: …rosie/mypresentations/cornellqsm2.pdfMotivating Example 3D DataContextRegularization Parameter EstimationUsing the Noise PropertiesConclusions

Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion

Theoretical results: using Unbiased Predictive Risk for the SB Tik

LemmaSuppose noise in h(k) is stochastic, inverse Gaussiancovariance weighting applied to both data fit Ax ≈ b andderivative Lx ≈ h for b and h; then optimal choice for σ at allsteps is σ = 1. Otherwise h(k+1) is deterministic and σ changeswith iteration.

RemarkCan we expect h(k) is stochastic?

RemarkBecause h changes optimal choice for σ changes with eachiteration, converging as h converges.

64 / 68

Page 65: Automatic estimation of regularization parameters: …rosie/mypresentations/cornellqsm2.pdfMotivating Example 3D DataContextRegularization Parameter EstimationUsing the Noise PropertiesConclusions

Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion

Theoretical results: using Unbiased Predictive Risk for the SB Tik

LemmaSuppose noise in h(k) is stochastic, inverse Gaussiancovariance weighting applied to both data fit Ax ≈ b andderivative Lx ≈ h for b and h; then optimal choice for σ at allsteps is σ = 1. Otherwise h(k+1) is deterministic and σ changeswith iteration.

RemarkCan we expect h(k) is stochastic?

RemarkBecause h changes optimal choice for σ changes with eachiteration, converging as h converges.

65 / 68

Page 66: Automatic estimation of regularization parameters: …rosie/mypresentations/cornellqsm2.pdfMotivating Example 3D DataContextRegularization Parameter EstimationUsing the Noise PropertiesConclusions

Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion

Theoretical results: using Unbiased Predictive Risk for the SB Tik

LemmaSuppose noise in h(k) is stochastic, inverse Gaussiancovariance weighting applied to both data fit Ax ≈ b andderivative Lx ≈ h for b and h; then optimal choice for σ at allsteps is σ = 1. Otherwise h(k+1) is deterministic and σ changeswith iteration.

RemarkCan we expect h(k) is stochastic?

RemarkBecause h changes optimal choice for σ changes with eachiteration, converging as h converges.

66 / 68

Page 67: Automatic estimation of regularization parameters: …rosie/mypresentations/cornellqsm2.pdfMotivating Example 3D DataContextRegularization Parameter EstimationUsing the Noise PropertiesConclusions

Motivating Example 3D Data Context Regularization Parameter Estimation Using the Noise Properties Conclusions and Future Theoretical Discussion

Theoretical results: using Unbiased Predictive Risk for the SB Tik

LemmaSuppose noise in h(k) is stochastic, inverse Gaussiancovariance weighting applied to both data fit Ax ≈ b andderivative Lx ≈ h for b and h; then optimal choice for σ at allsteps is σ = 1. Otherwise h(k+1) is deterministic and σ changeswith iteration.

RemarkCan we expect h(k) is stochastic?

RemarkBecause h changes optimal choice for σ changes with eachiteration, converging as h converges.

67 / 68