a student-t based sparsity enforcing hierarchical prior...

.

A Student-t based sparsity enforcing hierarchicalprior for linear inverse problems and its efficientBayesian computation for 2D and 3D Computed

Tomography

Ali Mohammad-Djafari, Li Wang, Nicolas Gacand

Folker BleichrodtLaboratoire des Signaux et Systemes (L2S)

UMR8506 CNRS-CentraleSupelec-UNIV PARIS SUDSUPELEC, 91192 Gif-sur-Yvette, France

http://lss.centralesupelec.frEmail: [email protected]

http://djafari.free.frhttp://publicationslist.org/djafari

iTwist2016, Aug. 24-26, 2016, Aalborg, Denemark

A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 1/27

Contents

1. Computed Tomography in 2D and 3D

2. Main classical methods

3. Basic Bayesian approach

4. Sparsity enforcing models through Student-t and IGSM

5. Computational tools: JMAP, EM, VBA

6. Implementation issuesI Main GPU implementation steps: Forward and Back

ProjectionsI Multi-Resolution implementation

7. Some results

8. Conclusions


Computed Tomography: Seeing inside of a body

I f (x , y) a section of a real 3D body f (x , y , z)

I gφ(r) a line of observed radiography gφ(r , z)

I Forward model:Line integrals or Radon Transform

gφ(r) =

∫Lr,φ

f (x , y) dl + εφ(r)

=

∫∫f (x , y) δ(r − x cosφ− y sinφ) dx dy + εφ(r)

I Inverse problem: Image reconstruction

Given the forward model H (Radon Transform) anda set of data gφi (r), i = 1, · · · ,Mfind f (x , y)


2D and 3D Computed Tomography

3D 2D

gφ(r1, r2) =

∫Lr1,r2,φ

f (x , y , z) dl gφ(r) =

∫Lr,φ

f (x , y) dl

Forward probelm: f (x , y) or f (x , y , z) −→ gφ(r) or gφ(r1, r2)Inverse problem: gφ(r) or gφ(r1, r2) −→ f (x , y) or f (x , y , z)


Algebraic methods: Discretization

f (x , y)

-x

6y

�� @

@@

��@@

HHH

��r

φ

•D

g(r , φ)

S•

@@

@@

@@@

@@

@@@

�

�

��

��

fN

f1

fj

gi

HijQQQQQQQQ

QQ

f (x , y) =∑

j fj bj(x , y)

bj(x , y) =

{1 if (x , y) ∈ pixel j0 else

g(r , φ) =

∫Lf (x , y) dl gi =

N∑j=1

Hij fj + εi

gk = Hk f + εk , k = 1, · · · ,K −→ g = Hf + ε

gk projection at angle φk , g all the projections.A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 5/27

Algebraic methods

g =

g1...

gK

,H =

H1...

HK

→ gk = Hk f+εk → g =∑k

Hk f+εk = Hf+ε

I H is huge dimensional: 2D: 106 × 106, 3D: 109 × 109.I Hf corresponds to forward projectionI Htg corresponds to Back projection (BP)I H may not be invertible and even not squareI H is, in general, ill-conditionedI In limited angle tomography H is under determined, si the

problem has infinite number of solutionsI Minimum Norm Solution

f = Ht(HHt)−1g =∑k

Htk(HkHt

k)−1gk

can be interpreted as the Filtered Back Projection solution.A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 6/27

Prior information or constraints

I Positivity: fj > 0 or fj ∈ IR+

I Boundedness: 1 > fj) > 0 or fj ∈ [0, 1]

I Smoothness: fj depends on the neighborhoods.

I Sparsity: many fj are zeros.

I Sparsity in a transform domain: f = Dz and many zj arezeros.

I Discrete valued (DV): fj ∈ {0, 1, ...,K}I Binary valued (BV): fj ∈ {0, 1}I Compactness: f (r) is non zero in one or few

non-overlapping compact regions

I Combination of the above mentioned constraintsI Main mathematical questions:

I Which combination results to unique solution ?I How to apply them ?


Algebraic methods: RegularizationI Minimum Norm Solution:

minimize ‖f‖22 s.t. Hf = g −→ f = Ht(HHt)−1gI Least square Solution:

f = arg minf{J(f) = ‖g −Hf‖2

}→ f = (HtH)−1Htg

I Quadratic Regularization:

J(f) = ‖g −Hf‖2 + λ‖f‖22 −→ f = (HtH + λI)−1Htg

I L1 Regularization: J(f) = ‖g −Hf‖2 + λ‖f‖1I Lpq Regularization: J(f) = ‖g −Hf‖pp + λ‖Df‖qqI More general Regularization:

J(f) =∑i

φ(g i − [Hf]i ) + λ∑j

ψ(Df]j)

with convex potential functions φ and Ψ or

J(f) = ∆1(g,Hf) + λ∆2(f, f0)

with ∆1 and ∆2 any distances (L2, L2, ..) or divergence (KL)


Deterministic approaches

I Iterative methods: SIRT, ART, Quadratic or L1 regularization,Bloc Coordinate Descent, Multiplicative ART,...

I Criteria: J(f) = ‖g −Hf‖2 + λ‖Df‖2I Gradient based algorithms: ∇J(f) = −2Ht(g−Hf) + 2λDtDf

I Simplest algorithm:

f(k+1)

= f(k)

+ α(k)[Ht(g −Hf

(k)) + 2λDtDf

(k)]

I More criteria:J(f) =

∑i φ(g i − [Hf]i ) + λ

∑j ψ((Df]j)

with φ(t) and ψ(t) = {t2, |t|, |t|p, ...} or non convex ones.

I Imposing constraints in each iteration (example: DART)

I Mathematical studies of uniqueness and convergence of thesealgorithms are necessary

I Many specialized algorithms (ISTA, FISTA, ADMM, AMP,GAMP,...) are developped for L1 regularization.


Bayesian estimation approachM : g = Hf + ε

I Observation model M + Hypothesis on the noise ε −→p(g|f;M) = pε(g −Hf)

I A priori information p(f|M)

I Bayes : p(f|g;M) =p(g|f;M) p(f|M)

p(g|M)I Gaussian priors:

I Prior knowledge on the noise: ε ∼ N (0, vε2I)

I Prior knowledge on f: f ∼ N (0, v2f (D′D)−1)

I A posteriori: p(f|g) ∝ exp[− 1

2vε2‖g −Hf‖2 − 1

2v2f‖Df‖2

]I MAP : f = arg maxf {p(f|g)} = arg minf {J(f)}

with J(f) = ‖g −Hf‖2 + λ‖Df‖2, λ = vε2

v2f

I Advantage : characterization of the solution

p(f|g) = N (f, Σ) with

f =(H′H + λD′D

)−1H′g, Σ = vε

(H′H + λD′D

)−1A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 10/27

Sparsity enforcing modelsI 3 classes of models:

1- Generalized Gaussian (Double Exp. as particular case)2- Mixture models (2 Gaussians mixture, BG, ..)3- Heavy tailed (Student-t and Cauchy)

I Student-t model

St(f |ν) ∝ exp

[−ν + 1

2log(1 + f 2/ν

)]I Infinite Gausian Scaled Mixture (IGSM) equivalence

St(f |ν) ∝∫ ∞0N (f |, 0, 1/z)G(z |α, β) dz , with α = β = ν/2

p(f|z) =∏

j p(fj |z j) =∏

j N (fj |0, 1/z j) ∝ exp[−1

2

∑j z j f

2j

]p(z|α, β) =

∏j G(z j |α, β) ∝

∏j z

(α−1)j exp [−βz j ]

∝ exp[∑

j(α− 1) ln z j − βz j]

p(f, z|α, β) ∝ exp[−1

2

∑j z j f

2j + (α− 1) ln z j − βz j

]A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 11/27

Sparse model in a Transform domain

f gradient of f z =Haar Transform of fpiecewise continuous Sparse Sparse

I Analysis: J(f) = ‖g −Hf‖22 + λ‖Df‖1I Synthesis: f = Dz −→ J(z) = ‖g −HDz‖2 + λ‖z‖1I Explicit modelling

f = Dz + ξ, z sparse, ξ Gaussian with unknown variance


Sparse model in a Transform domain

αz0 , βz0?

vz��?

z��?D

αε0 , βε0?

vε��?

αξ0 , βξ0?

vξ��?

��ξ

@@R f��

��

ε��?

H

��g

g = Hf + ε, f = Dz + ξ, z sparsep(g|f, vε) = N (g|Hf,Vε) Vε = diag [vε]p(f|z) = N (f|Dz,VξI) Vξ = diag [vξ]p(z|vz) = N (z|0,Vz), Vz = diag [vz ]p(vε) =

∏i IG(vεi |αε0 , βε0)

p(vz) =∏

j IG(v zj |αz0 , βz0)

p(v ξ) =∏

j IG(v ξ j |αξ0 , βξ0)

p(f, z, vε, vz , v ξ|g) ∝p(g|f, vε) p(f|zf ) p(z|vz)p(vε) p(vz) p(v ξ)

– JMAP:

(f, z, vε, vz , v ξ) = arg max(f ,z,vε,vz ,vξ)

{p(f, z, vε, vz , v ξ|g)}

Alternate optimization.

– VBA: Approximatep(f, z, vε, vz , v ξ|g) by q1(f) q2(z) q3(vε) q4(vz) q5(v ξ)Alternate optimization.


JMAP Algorithm[f, z, vz , vε, vξ] = arg max(f ,z,vz ,vε,vξ) {p (f, z, vz , vε, vξ|g)}

iter : f(k+1)

= f(k)− γ(k)f ∇J (f

(k))

iter : z(k+1) = z(k) − γ(k)z ∇J (z(k))

v zj =βz0+

12z2j

αz0+3/2 , vεi =βε0+

12

(g i−

[Hf]i

)2αε0+3/2 , v ξ j =

βξ0+12

(f j−[Dz]

j

)2αξ0+3/2

where

J (f) = 12 (g −Hf)′V−1ε (g −Hf) + 1

2 (f −Dz)′V−1ξ (f −Dz)

J (z) = 12 (f −Dz)′V−1ξ (f −Dz) + 1

2z′Vz−1z

γ(k)f =

∥∥∥∥∇J (f(k)

)

∥∥∥∥2∥∥∥∥YεH∇J (f(k)

)

∥∥∥∥2+∥∥∥∥Yξ∇J (f(k)

)

∥∥∥∥2 where Yε = V− 1

2ε and Yξ = V

− 12

ξ

γ(k)z =

∥∥∥∇J (z(k))∥∥∥2∥∥∥YξD∇J (z(k)

)∥∥∥2+∥∥∥Yz∇J (z(k)

)∥∥∥2 where Yz = V

− 12

z

and ∇J (·) is the gradient of J (·).A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 14/27

Implementation issues

I In almost all the algorithms, the step of computation of fneeds an optimization algorithm. The criterion to optimize isoften in the form ofJ(f) = ‖g −Hf‖22 + λ‖Df‖22 or J(z) = ‖g −HDz‖2 + λ‖z‖1

I Very often, we use the gradient based algorithms which needto compute ∇J(f) = −2Ht(g −Hf) + 2λDtDf and so, thesimplest case, in each step, we have

f(k+1)

= f(k)

+ α(k)[Ht(g −Hf

(k)) + 2λDtDf

(k)]

1. Compute g = Hf (Forward projection)

2. Compute δg = g − g (Error or residual)

3. Compute δf1 = H′δg (Backprojection of error)

4. Compute δf2 = −D′Df and update f(k+1)

= f(k)

+ [δf1 + δf2]

I Steps 1 and 3 need great computational cost and have beenimplemented on GPU. In this work, we used ASTRA Toolbox.


Multi-Resolution Implementation

Sacle 1: black g(1) = H(1)f(1) ( N × N )

Sacle 2: green g(2) = H(2)f(2) (N/2× N/2)

Sacle 3: red g(3) = H(3)f(3) (N/4× N/4)


ResultsPhantom:128 x 128 x 128Projections:128, 64 and 32SNR=40 dB

128 projections 64 projections 32 projections


ResultsPhantom:128 x 128 x 128Projections:128SNR=20 dB

QR TV HHBM


Results with High SNR=30dB

δf = ‖f−f‖‖f‖ δg = ‖Hf−g‖

‖g‖


Results: High SNR=30dB and Low SNR=20dB

δf = ‖f−f‖‖f‖ High SNR=30dB δf = ‖f−f‖

‖f‖ Low SNR=20dB


Variational Bayesian Approximation (VBA)Depending on cases, we have to handlep(f,θ|g), p(f, z,θ|g), p(f,w,θ|g) or p(f,w, z,θ|g).

Let consider the simplest case:

I Approximate p(f,θ|g) by q(f,θ|g) = q1(f|g) q2(θ|g)and then continue computations.

I Criterion KL(q(f,θ|g) : p(f,θ|g))I KL(q : p) =

∫ ∫q ln q/p =

∫ ∫q1q2 ln q1q2

p =∫q1 ln q1 +∫

q2 ln q2 −∫ ∫

q ln p = −H(q1)− H(q2)− < ln p >q

I Iterative algorithm q1 −→ q2 −→ q1 −→ q2, · · · q1(f) ∝ exp[〈ln p(g, f,θ;M)〉q2(θ)

]q2(θ) ∝ exp

[〈ln p(g, f,θ;M)〉q1(f)

]

p(f,θ|g) −→VariationalBayesian

Approximation

−→ q1(f) −→ f

−→ q2(θ) −→ θA. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 21/27

JMAP, Marginalization, Sampling and exploration, VBAI JMAP:

p(f,θ|g)optimization

−→ f

−→ θ

AlternateOptimization

{f = arg maxf {p(f|θ, g)}θ = arg maxθ {p(θ|fg)}

I Marginalization

p(f,θ|g)

Joint Posterior

−→ p(θ|g)

Marginalize over f

−→ θ −→ p(f|θ, g) −→ f

I Sampling and ExplorationI Gibbs sampling: f ∼ p(f|θ, g)→ θ ∼ p(θ|fg)I Other sampling methods: IS, MH, Slice sampling,...

I Variational Bayesian Approximation

p(f,θ|g) −→VariationalBayesian

Approximation

−→ q1(f) −→ f

−→ q2(θ) −→ θ


VBA: Choice of family of laws q1 and q2

I Case 1 : −→ Joint MAP{q1(f |f) = δ(f − f)

q2(θ|θ) = δ(θ − θ)−→

f = arg maxf

{p(f, θ|g;M)

}θ= arg maxθ

{p(f,θ|g;M)

} (1)

I Case 2 : −→ EM{q1(f) ∝ p(f|θ, g)

q2(θ|θ) = δ(θ − θ)−→

Q(θ, θ)= 〈ln p(f,θ|g;M)〉q1(f |θ)

θ = arg maxθ

{Q(θ, θ)

} (2)

I Appropriate choice for inverse problems{q1(f) ∝ p(f|θ, g;M)

q2(θ) ∝ p(θ|f, g;M)−→{

Accounts for the uncertainties of

θ for f and vise versa.

(3)Exponential families, Conjugate priors


JMAP, EM and VBAJMAP Alternate optimization Algorithm:

θ(0) −→ θ−→ f = arg maxf

{p(f, θ|g)

}−→f −→ f

↑ ↓θ ←− θ←− θ = arg maxθ

{p(f,θ|g)

}←−f

EM:

θ(0) −→ θ−→ q1(f) = p(f|θ, g) −→q1(f) −→ f↑ ↓

θ ←− θ←−Q(θ, θ) = 〈ln p(f,θ|g)〉q1(f)θ = arg maxθ

{Q(θ, θ)

} ←− q1(f)

VBA:

θ(0) −→ q2(θ)−→ q1(f) ∝ exp[〈ln p(f,θ|g)〉q2(θ)

]−→q1(f) −→ f

↑ ↓θ ←− q2(θ)←− q2(θ) ∝ exp

[〈ln p(f,θ|g)〉q1(f)

]←−q1(f)


VBA

Approximate p(f, z, vz , vξ, vε|g) by q1(f) q2(z) q3(vz) q4(vξ) q5(vε)Alternate optimization:

q1(f) = N (f|µf , Σf )

q2(z) = N (z|µz , Σz)

q3(vz) =∏

j IG(v zj |αzj , βzj )

q4(vξ) =∏

j IG(v ξ i |αξj , βξj )q5(vε) =

∏i IG(vεi |αεi , βεi )

µf =(

H′V−1ε H + v−1ξ

)−1H′V−1ε (g −Dz)

Σf =(

H′V−1ε H + v−1ξ

)−1µz =

(D′V−1ξ D + v−1z

)−1D′V−1ξ f

Σz =(

D′V−1ξ D + v−1z

)−1


JMAP Algorithm

[f, z, vz , vε, vξ] = arg max(f ,z,vz ,vε,vξ) {p (f, z, vz , vε, vξ|g)}

iter : f(k+1)

= f(k)− γ(k)f ∇J (f

(k))

iter : z(k+1) = z(k) − γ(k)z ∇J (z(k))

v zj =βz0+

12z2j

αz0+3/2 , vεi =βε0+

12

(g i−

[Hf]i

)2αε0+3/2 , v ξ j =

βξ0+12

(f j−[Dz]

j

)2αξ0+3/2

The expressions of v zj , vεi and v ξ j are more complex and needs thecomputation of the diagonal elements of the posterior covariances.Specific techniques are needed to compute them efficiently.

v zj =βz0+

12<z2j >

αz0+3/2 , vεi =βε0+

12

(⟨g i−

[Hf]i

)2⟩αε0+3/2 , v ξ j =

βξ0+12

⟨(f j−[Dz]

j

)2⟩αξ0+3/2


Conclusions

I Computed Tomography is an ill-posed Inverse problem

I Algebraic methods and Regularization methods push furtherthe limitations.

I Bayesian approach has many potentials for real applicationswhere we need to estimate the hyper parameters(semi-supervised) and to quantify the uncertainties.

I Hierarchical prior model with hidden variables are verypowerful tools for Bayesian approach to inverse problems.

I Main Bayesian computation tools: JMAP, VBA, AMP andMCMC

I Application in different other imaging systems: Microwaves,PET, Ultrasound, Optical Diffusion Tomography (ODT),..

I Current Projects: Efficient implementation in 3D cases toreconstruct 1024 x 1024 x 1024 volumes using GPU.


a student-t based sparsity enforcing hierarchical prior...

Documents