a student-t based sparsity enforcing hierarchical prior...
TRANSCRIPT
.
A Student-t based sparsity enforcing hierarchicalprior for linear inverse problems and its efficientBayesian computation for 2D and 3D Computed
Tomography
Ali Mohammad-Djafari, Li Wang, Nicolas Gacand
Folker BleichrodtLaboratoire des Signaux et Systemes (L2S)
UMR8506 CNRS-CentraleSupelec-UNIV PARIS SUDSUPELEC, 91192 Gif-sur-Yvette, France
http://lss.centralesupelec.frEmail: [email protected]
http://djafari.free.frhttp://publicationslist.org/djafari
iTwist2016, Aug. 24-26, 2016, Aalborg, Denemark
A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 1/27
Contents
1. Computed Tomography in 2D and 3D
2. Main classical methods
3. Basic Bayesian approach
4. Sparsity enforcing models through Student-t and IGSM
5. Computational tools: JMAP, EM, VBA
6. Implementation issuesI Main GPU implementation steps: Forward and Back
ProjectionsI Multi-Resolution implementation
7. Some results
8. Conclusions
A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 2/27
Computed Tomography: Seeing inside of a body
I f (x , y) a section of a real 3D body f (x , y , z)
I gφ(r) a line of observed radiography gφ(r , z)
I Forward model:Line integrals or Radon Transform
gφ(r) =
∫Lr,φ
f (x , y) dl + εφ(r)
=
∫∫f (x , y) δ(r − x cosφ− y sinφ) dx dy + εφ(r)
I Inverse problem: Image reconstruction
Given the forward model H (Radon Transform) anda set of data gφi (r), i = 1, · · · ,Mfind f (x , y)
A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 3/27
2D and 3D Computed Tomography
3D 2D
gφ(r1, r2) =
∫Lr1,r2,φ
f (x , y , z) dl gφ(r) =
∫Lr,φ
f (x , y) dl
Forward probelm: f (x , y) or f (x , y , z) −→ gφ(r) or gφ(r1, r2)Inverse problem: gφ(r) or gφ(r1, r2) −→ f (x , y) or f (x , y , z)
A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 4/27
Algebraic methods: Discretization
f (x , y)
-x
6y
����� @
@@
���@@
HHH
���������������r
φ
•D
g(r , φ)
S•
@@
@@
@@@
@@
@@@
�
�
��
�������
fN
f1
fj
gi
HijQQQQQQQQ
f (x , y) =∑
j fj bj(x , y)
bj(x , y) =
{1 if (x , y) ∈ pixel j0 else
g(r , φ) =
∫Lf (x , y) dl gi =
N∑j=1
Hij fj + εi
gk = Hk f + εk , k = 1, · · · ,K −→ g = Hf + ε
gk projection at angle φk , g all the projections.A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 5/27
Algebraic methods
g =
g1...
gK
,H =
H1...
HK
→ gk = Hk f+εk → g =∑k
Hk f+εk = Hf+ε
I H is huge dimensional: 2D: 106 × 106, 3D: 109 × 109.I Hf corresponds to forward projectionI Htg corresponds to Back projection (BP)I H may not be invertible and even not squareI H is, in general, ill-conditionedI In limited angle tomography H is under determined, si the
problem has infinite number of solutionsI Minimum Norm Solution
f = Ht(HHt)−1g =∑k
Htk(HkHt
k)−1gk
can be interpreted as the Filtered Back Projection solution.A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 6/27
Prior information or constraints
I Positivity: fj > 0 or fj ∈ IR+
I Boundedness: 1 > fj) > 0 or fj ∈ [0, 1]
I Smoothness: fj depends on the neighborhoods.
I Sparsity: many fj are zeros.
I Sparsity in a transform domain: f = Dz and many zj arezeros.
I Discrete valued (DV): fj ∈ {0, 1, ...,K}I Binary valued (BV): fj ∈ {0, 1}I Compactness: f (r) is non zero in one or few
non-overlapping compact regions
I Combination of the above mentioned constraintsI Main mathematical questions:
I Which combination results to unique solution ?I How to apply them ?
A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 7/27
Algebraic methods: RegularizationI Minimum Norm Solution:
minimize ‖f‖22 s.t. Hf = g −→ f = Ht(HHt)−1gI Least square Solution:
f = arg minf{J(f) = ‖g −Hf‖2
}→ f = (HtH)−1Htg
I Quadratic Regularization:
J(f) = ‖g −Hf‖2 + λ‖f‖22 −→ f = (HtH + λI)−1Htg
I L1 Regularization: J(f) = ‖g −Hf‖2 + λ‖f‖1I Lpq Regularization: J(f) = ‖g −Hf‖pp + λ‖Df‖qqI More general Regularization:
J(f) =∑i
φ(g i − [Hf]i ) + λ∑j
ψ(Df]j)
with convex potential functions φ and Ψ or
J(f) = ∆1(g,Hf) + λ∆2(f, f0)
with ∆1 and ∆2 any distances (L2, L2, ..) or divergence (KL)
A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 8/27
Deterministic approaches
I Iterative methods: SIRT, ART, Quadratic or L1 regularization,Bloc Coordinate Descent, Multiplicative ART,...
I Criteria: J(f) = ‖g −Hf‖2 + λ‖Df‖2I Gradient based algorithms: ∇J(f) = −2Ht(g−Hf) + 2λDtDf
I Simplest algorithm:
f(k+1)
= f(k)
+ α(k)[Ht(g −Hf
(k)) + 2λDtDf
(k)]
I More criteria:J(f) =
∑i φ(g i − [Hf]i ) + λ
∑j ψ((Df]j)
with φ(t) and ψ(t) = {t2, |t|, |t|p, ...} or non convex ones.
I Imposing constraints in each iteration (example: DART)
I Mathematical studies of uniqueness and convergence of thesealgorithms are necessary
I Many specialized algorithms (ISTA, FISTA, ADMM, AMP,GAMP,...) are developped for L1 regularization.
A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 9/27
Bayesian estimation approachM : g = Hf + ε
I Observation model M + Hypothesis on the noise ε −→p(g|f;M) = pε(g −Hf)
I A priori information p(f|M)
I Bayes : p(f|g;M) =p(g|f;M) p(f|M)
p(g|M)I Gaussian priors:
I Prior knowledge on the noise: ε ∼ N (0, vε2I)
I Prior knowledge on f: f ∼ N (0, v2f (D′D)−1)
I A posteriori: p(f|g) ∝ exp[− 1
2vε2‖g −Hf‖2 − 1
2v2f‖Df‖2
]I MAP : f = arg maxf {p(f|g)} = arg minf {J(f)}
with J(f) = ‖g −Hf‖2 + λ‖Df‖2, λ = vε2
v2f
I Advantage : characterization of the solution
p(f|g) = N (f, Σ) with
f =(H′H + λD′D
)−1H′g, Σ = vε
(H′H + λD′D
)−1A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 10/27
Sparsity enforcing modelsI 3 classes of models:
1- Generalized Gaussian (Double Exp. as particular case)2- Mixture models (2 Gaussians mixture, BG, ..)3- Heavy tailed (Student-t and Cauchy)
I Student-t model
St(f |ν) ∝ exp
[−ν + 1
2log(1 + f 2/ν
)]I Infinite Gausian Scaled Mixture (IGSM) equivalence
St(f |ν) ∝∫ ∞0N (f |, 0, 1/z)G(z |α, β) dz , with α = β = ν/2
p(f|z) =∏
j p(fj |z j) =∏
j N (fj |0, 1/z j) ∝ exp[−1
2
∑j z j f
2j
]p(z|α, β) =
∏j G(z j |α, β) ∝
∏j z
(α−1)j exp [−βz j ]
∝ exp[∑
j(α− 1) ln z j − βz j]
p(f, z|α, β) ∝ exp[−1
2
∑j z j f
2j + (α− 1) ln z j − βz j
]A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 11/27
Sparse model in a Transform domain
f gradient of f z =Haar Transform of fpiecewise continuous Sparse Sparse
I Analysis: J(f) = ‖g −Hf‖22 + λ‖Df‖1I Synthesis: f = Dz −→ J(z) = ‖g −HDz‖2 + λ‖z‖1I Explicit modelling
f = Dz + ξ, z sparse, ξ Gaussian with unknown variance
A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 12/27
Sparse model in a Transform domain
αz0 , βz0?
vz����?
z����?D
αε0 , βε0?
v���?
αξ0 , βξ0?
v���?
����ξ
@@R f����
���
���?
H
����g
g = Hf + ε, f = Dz + ξ, z sparsep(g|f, vε) = N (g|Hf,Vε) Vε = diag [vε]p(f|z) = N (f|Dz,VξI) Vξ = diag [vξ]p(z|vz) = N (z|0,Vz), Vz = diag [vz ]p(vε) =
∏i IG(vεi |αε0 , βε0)
p(vz) =∏
j IG(v zj |αz0 , βz0)
p(v ξ) =∏
j IG(v ξ j |αξ0 , βξ0)
p(f, z, vε, vz , v ξ|g) ∝p(g|f, vε) p(f|zf ) p(z|vz)p(vε) p(vz) p(v ξ)
– JMAP:
(f, z, vε, vz , v ξ) = arg max(f ,z,vε,vz ,vξ)
{p(f, z, vε, vz , v ξ|g)}
Alternate optimization.
– VBA: Approximatep(f, z, vε, vz , v ξ|g) by q1(f) q2(z) q3(vε) q4(vz) q5(v ξ)Alternate optimization.
A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 13/27
JMAP Algorithm[f, z, vz , vε, vξ] = arg max(f ,z,vz ,vε,vξ) {p (f, z, vz , vε, vξ|g)}
iter : f(k+1)
= f(k)− γ(k)f ∇J (f
(k))
iter : z(k+1) = z(k) − γ(k)z ∇J (z(k))
v zj =βz0+
12z2j
αz0+3/2 , vεi =βε0+
12
(g i−
[Hf]i
)2αε0+3/2 , v ξ j =
βξ0+12
(f j−[Dz]
j
)2αξ0+3/2
where
J (f) = 12 (g −Hf)′V−1ε (g −Hf) + 1
2 (f −Dz)′V−1ξ (f −Dz)
J (z) = 12 (f −Dz)′V−1ξ (f −Dz) + 1
2z′Vz−1z
γ(k)f =
∥∥∥∥∇J (f(k)
)
∥∥∥∥2∥∥∥∥YεH∇J (f(k)
)
∥∥∥∥2+∥∥∥∥Yξ∇J (f(k)
)
∥∥∥∥2 where Yε = V− 1
2ε and Yξ = V
− 12
ξ
γ(k)z =
∥∥∥∇J (z(k))∥∥∥2∥∥∥YξD∇J (z(k)
)∥∥∥2+∥∥∥Yz∇J (z(k)
)∥∥∥2 where Yz = V
− 12
z
and ∇J (·) is the gradient of J (·).A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 14/27
Implementation issues
I In almost all the algorithms, the step of computation of fneeds an optimization algorithm. The criterion to optimize isoften in the form ofJ(f) = ‖g −Hf‖22 + λ‖Df‖22 or J(z) = ‖g −HDz‖2 + λ‖z‖1
I Very often, we use the gradient based algorithms which needto compute ∇J(f) = −2Ht(g −Hf) + 2λDtDf and so, thesimplest case, in each step, we have
f(k+1)
= f(k)
+ α(k)[Ht(g −Hf
(k)) + 2λDtDf
(k)]
1. Compute g = Hf (Forward projection)
2. Compute δg = g − g (Error or residual)
3. Compute δf1 = H′δg (Backprojection of error)
4. Compute δf2 = −D′Df and update f(k+1)
= f(k)
+ [δf1 + δf2]
I Steps 1 and 3 need great computational cost and have beenimplemented on GPU. In this work, we used ASTRA Toolbox.
A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 15/27
Multi-Resolution Implementation
Sacle 1: black g(1) = H(1)f(1) ( N × N )
Sacle 2: green g(2) = H(2)f(2) (N/2× N/2)
Sacle 3: red g(3) = H(3)f(3) (N/4× N/4)
A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 16/27
ResultsPhantom:128 x 128 x 128Projections:128, 64 and 32SNR=40 dB
128 projections 64 projections 32 projections
A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 17/27
ResultsPhantom:128 x 128 x 128Projections:128SNR=20 dB
QR TV HHBM
A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 18/27
Results with High SNR=30dB
δf = ‖f−f‖‖f‖ δg = ‖Hf−g‖
‖g‖
A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 19/27
Results: High SNR=30dB and Low SNR=20dB
δf = ‖f−f‖‖f‖ High SNR=30dB δf = ‖f−f‖
‖f‖ Low SNR=20dB
A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 20/27
Variational Bayesian Approximation (VBA)Depending on cases, we have to handlep(f,θ|g), p(f, z,θ|g), p(f,w,θ|g) or p(f,w, z,θ|g).
Let consider the simplest case:
I Approximate p(f,θ|g) by q(f,θ|g) = q1(f|g) q2(θ|g)and then continue computations.
I Criterion KL(q(f,θ|g) : p(f,θ|g))I KL(q : p) =
∫ ∫q ln q/p =
∫ ∫q1q2 ln q1q2
p =∫q1 ln q1 +∫
q2 ln q2 −∫ ∫
q ln p = −H(q1)− H(q2)− < ln p >q
I Iterative algorithm q1 −→ q2 −→ q1 −→ q2, · · · q1(f) ∝ exp[〈ln p(g, f,θ;M)〉q2(θ)
]q2(θ) ∝ exp
[〈ln p(g, f,θ;M)〉q1(f)
]
p(f,θ|g) −→VariationalBayesian
Approximation
−→ q1(f) −→ f
−→ q2(θ) −→ θA. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 21/27
JMAP, Marginalization, Sampling and exploration, VBAI JMAP:
p(f,θ|g)optimization
−→ f
−→ θ
AlternateOptimization
{f = arg maxf {p(f|θ, g)}θ = arg maxθ {p(θ|fg)}
I Marginalization
p(f,θ|g)
Joint Posterior
−→ p(θ|g)
Marginalize over f
−→ θ −→ p(f|θ, g) −→ f
I Sampling and ExplorationI Gibbs sampling: f ∼ p(f|θ, g)→ θ ∼ p(θ|fg)I Other sampling methods: IS, MH, Slice sampling,...
I Variational Bayesian Approximation
p(f,θ|g) −→VariationalBayesian
Approximation
−→ q1(f) −→ f
−→ q2(θ) −→ θ
A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 22/27
VBA: Choice of family of laws q1 and q2
I Case 1 : −→ Joint MAP{q1(f |f) = δ(f − f)
q2(θ|θ) = δ(θ − θ)−→
f = arg maxf
{p(f, θ|g;M)
}θ= arg maxθ
{p(f,θ|g;M)
} (1)
I Case 2 : −→ EM{q1(f) ∝ p(f|θ, g)
q2(θ|θ) = δ(θ − θ)−→
Q(θ, θ)= 〈ln p(f,θ|g;M)〉q1(f |θ)
θ = arg maxθ
{Q(θ, θ)
} (2)
I Appropriate choice for inverse problems{q1(f) ∝ p(f|θ, g;M)
q2(θ) ∝ p(θ|f, g;M)−→{
Accounts for the uncertainties of
θ for f and vise versa.
(3)Exponential families, Conjugate priors
A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 23/27
JMAP, EM and VBAJMAP Alternate optimization Algorithm:
θ(0) −→ θ−→ f = arg maxf
{p(f, θ|g)
}−→f −→ f
↑ ↓θ ←− θ←− θ = arg maxθ
{p(f,θ|g)
}←−f
EM:
θ(0) −→ θ−→ q1(f) = p(f|θ, g) −→q1(f) −→ f↑ ↓
θ ←− θ←−Q(θ, θ) = 〈ln p(f,θ|g)〉q1(f)θ = arg maxθ
{Q(θ, θ)
} ←− q1(f)
VBA:
θ(0) −→ q2(θ)−→ q1(f) ∝ exp[〈ln p(f,θ|g)〉q2(θ)
]−→q1(f) −→ f
↑ ↓θ ←− q2(θ)←− q2(θ) ∝ exp
[〈ln p(f,θ|g)〉q1(f)
]←−q1(f)
A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 24/27
VBA
Approximate p(f, z, vz , vξ, vε|g) by q1(f) q2(z) q3(vz) q4(vξ) q5(vε)Alternate optimization:
q1(f) = N (f|µf , Σf )
q2(z) = N (z|µz , Σz)
q3(vz) =∏
j IG(v zj |αzj , βzj )
q4(vξ) =∏
j IG(v ξ i |αξj , βξj )q5(vε) =
∏i IG(vεi |αεi , βεi )
µf =(
H′V−1ε H + v−1ξ
)−1H′V−1ε (g −Dz)
Σf =(
H′V−1ε H + v−1ξ
)−1µz =
(D′V−1ξ D + v−1z
)−1D′V−1ξ f
Σz =(
D′V−1ξ D + v−1z
)−1
A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 25/27
JMAP Algorithm
[f, z, vz , vε, vξ] = arg max(f ,z,vz ,vε,vξ) {p (f, z, vz , vε, vξ|g)}
iter : f(k+1)
= f(k)− γ(k)f ∇J (f
(k))
iter : z(k+1) = z(k) − γ(k)z ∇J (z(k))
v zj =βz0+
12z2j
αz0+3/2 , vεi =βε0+
12
(g i−
[Hf]i
)2αε0+3/2 , v ξ j =
βξ0+12
(f j−[Dz]
j
)2αξ0+3/2
The expressions of v zj , vεi and v ξ j are more complex and needs thecomputation of the diagonal elements of the posterior covariances.Specific techniques are needed to compute them efficiently.
v zj =βz0+
12<z2j >
αz0+3/2 , vεi =βε0+
12
(⟨g i−
[Hf]i
)2⟩αε0+3/2 , v ξ j =
βξ0+12
⟨(f j−[Dz]
j
)2⟩αξ0+3/2
A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 26/27
Conclusions
I Computed Tomography is an ill-posed Inverse problem
I Algebraic methods and Regularization methods push furtherthe limitations.
I Bayesian approach has many potentials for real applicationswhere we need to estimate the hyper parameters(semi-supervised) and to quantify the uncertainties.
I Hierarchical prior model with hidden variables are verypowerful tools for Bayesian approach to inverse problems.
I Main Bayesian computation tools: JMAP, VBA, AMP andMCMC
I Application in different other imaging systems: Microwaves,PET, Ultrasound, Optical Diffusion Tomography (ODT),..
I Current Projects: Efficient implementation in 3D cases toreconstruct 1024 x 1024 x 1024 volumes using GPU.
A. Mohammad-Djafari, Bayesian Discrete Tomography from a few number of projections, Mars 21-23, 2016, Polytechnico de Milan, Italy. 27/27