a dual em algorithm for tv regularized gaussian …

Manuscript submitted to doi:XXX/xx.xx.xx.xxAIMS’ JournalsVolume X, Number 0X, XX XXXX pp. X–XX

A DUAL EM ALGORITHM FOR TV REGULARIZED GAUSSIAN

MIXTURE MODEL IN IMAGE SEGMENTATION

Shi Yan, Jun Liu∗, Haiyang Huang

Laboratory of Mathematics and Complex Systems (Ministry of Education of China),School of Mathematical Sciences, Beijing Normal University, Beijing 100875, P.R. China.

Xue-cheng Tai

Department of Mathematics, Hong Kong Baptist University, Kowloon Tong, Hong Kong.

andDepart of Mathematics, University of Bergen, P. O. Box 7800, N-5020, Bergen, Norway.

Abstract. A dual expectation-maximization (EM) algorithm for total vari-ation (TV) regularized Gaussian mixture model (GMM) is proposed in this

paper. The algorithm is built upon the EM algorithm with TV regulariza-

tion (EM-TV) model which combines the statistical and variational methodstogether for image segmentation. Inspired by the projection algorithm pro-

posed by Chambolle, we give a dual algorithm for the EM-TV model. The re-

lated dual problem is smooth and can be easily solved by a projection gradientmethod, which is stable and fast. Given the parameters of GMM, the proposed

algorithm can be seen as a forward-backward splitting method which converges.

This method can be easily extended to many other applications. Numericalresults show that our algorithm can provide high quality segmentation results

with fast computation speed. Compared with the well-known statistics basedmethods such as hidden Markov random field with EM method (HMRF-EM),

the proposed algorithm has a better performance. The proposed method could

also be applied to MRI segmentation such as SPM8 software and improve thesegmentation results.

1. Introduction. Image segmentation is a basic part of the image processing witha long history [22]. Images may come from every corner of the world, and someof the natural images may contain noise and are not good enough to segment di-rectly. There are many methods for image segmentation after several decades ofdevelopment, one of the most important methods is the statistical method. In sucha method, a region with its noise can be approximately presented by a Gaussianmixture model (GMM) [26, 17, 23, 39, 41]. A common method of solving GMMis the well-known expectation-maximization (EM) algorithm, which is firstly intro-duced in [16]. GMM-EM method has been studied a lot in many references such as[9, 30, 38, 41]. However, when using GMM-EM method in image segmentation, onedrawback is its sensitive to noise due to the lack of regularization. To fill this flaw, in[41], the authors proposed a hidden Markov random field method (HMRF) to GMMand solved it by EM algorithm (HMRF-EM), which can achieve stable segmentation

2010 Mathematics Subject Classification. Primary: 68U10.Key words and phrases. Dual algorithm, Expectation-maximization, Gaussian mixture model,

Total variation, Projection gradient, Image segmentation.∗ Corresponding author: Jun Liu.

1

http://dx.doi.org/XXX/xx.xx.xx.xx

2 SHI YAN, JUN LIU, HAIYANG HUANG AND XUE-CHENG TAI

results under noise. In fact, the local spatial positions of the neighborhood pixelsare taken into consideration in HMRF, which makes the segmentation more robustto noise. However, the classification provided by HMRF may produce some zigzagsnear the boundaries due to the improper regularization [25], especially when thelevels of noise are high. One may see this phenomenon in the numerical experimentssection.

As for regularization, total variation (TV) is one of the most successful methods ininverse problems. TV was introduced by Rudin-Osher-Fatemi in [33], known as ROFmodel. In fact, Mumford and Shah introduced a multi-region segmentation modelin [31], where we can also treat the regularization term as TV, as well in [20, 35].In [12], Chan and Vese proposed the CV model which combined TV regularizationand level set method [32] for two-region image segmentation. Some extensionscan be found in [1, 11]. It is well-known that there are many fast algorithms tosolve TV based minimization problems, such as the dual method [10], split Bregmanmethod [21], augmented Lagrange method [37], graph-cut (max-flow) method [7, 8],continuous max-flow method [40] and so on. Besides, there is a natural connectionbetween TV and the length of the curves theoretically [19]. From the denoisingexperimental results, we can see that, as a regularization term, TV can keep thejump between classification and suppress noise, which means that TV is good atpreserving edges and removing noise.

In [26], Liu et al. proposed an image segmentation model called EM-TV modelby integrating EM and TV. They proposed a unified variational functional whichbrings EM algorithm and TV regularization together, and this model holds bothadvantages from the statistical and PDE methods. In their method, the minimiza-tion problem was solved by a splitting scheme where a L1 penalty term was usedto keep the structure of EM. To be different from the traditional L2 penalty prob-lem, the convergence of the algorithm is unknown due to lack of strict convexityof L1. Besides, they introduced some extra auxiliary variables and many penaltyparameters in the algorithm. For real implementations, it is very important anddifficult to find some good values of these parameters for such a splitting scheme.Bad parameters may cause the algorithm fail to converge or produce undesirablesegmentation results.

In this paper, we introduce a new method to solve this EM-TV model. Theidea comes from the Chambolle projection method [10]. With the help of Fenchelduality method, we get a smooth dual problem for the EM-TV model. This dualproblem can be easily solved by a projection gradient method, which is fast andstable. Moreover, we can get the convergence of this algorithm. Compared withintroducing two auxiliary variables in [26], we only introduce one dual variableduring the iteration, and the solution can be given by an explicit formulation. Theproposed algorithm does not have many control parameters, which tends to be morestable.

Our method is related to the global minimization method described in [5]. Letus point out the main differences of these two methods. Firstly, the theoreticalfundaments are totally different. Our method is built upon the statistics methodwhile theirs is from the viewpoint of deterministic approach. In this paper, we try tosolve EM-TV model, but [5] studied the Potts model with approximation method.Thus, our model can segment data with the same mean but with different variancesdue to the property of GMM. In this case, [5] cannot work since the mean is theonly factor for segmenting images. Secondly, in [5], the authors did not provide the

DUAL EM-TV ALGORITHM FOR TVR-GMM 3

convergence of their algorithm. In this paper, we will mathematically show thatour algorithm converges.

The rest of the paper is organized as follows. In section 2, we introduce thebasic idea of GMM, EM and the EM-TV model. In section 3, we will show thedual algorithm for solving the EM-TV model. In section 4 describes the discretemethod of our algorithm. In section 5, some numerical results and comparisons arepresented. Finally, we summarize our algorithm.

2. GMM, EM and EM-TV model.

2.1. GMM and EM. GMM and EM algorithm were first introduced in statistics,which can be used in image segmentation. GMM is a probabilistic model, represent-ing the distribution of multi-class. EM algorithm is a 2-step iterations algorithm,the first step is usually called E-step, which uses current estimate of parametersto calculate the maximum a posteriori (MAP) function, the second step is usuallycalled M-step, which uses the function in E-step to update parameters.

Here we take gray image as example, let I(x) be a gray image where x ∈ Ω ⊂ R2.A GMM’s distribution for image pixels can be represented as follows:

pGMM (I(x)) =

K∑

k=1

αkp(I(x);µk,Σk),

where K is the number of mixtures, αk is the weight of k-th mixture class and it

satisfies∑Kk=1 α

k = 1. p is the Gaussian density function, with p(I(x);µk,Σk) =1√

2πΣkexp(− (I(x)−µk)2

2Σk) and µk,Σk are the mean and variance of the k-th mixture,

respectively. Usually, K is an integer given by the user as a prior.The related negative log-likelihood function of GMM can be written as

L(Θ) = −∫

Ω

log

K∑

k=1

αkp(I(x);µk,Σk)dx,

where Θ = α1, · · · , αK , µ1, · · · , µK ,Σ1, · · · ,ΣK is the parameter set.

Note that optimizing log∑Kk=1 type functional directly is not efficient, but it can

be addressed by a log-sum operation commutativity property [27, 34]:

Lemma 2.1. (Commutativity of log-sum operations [27, 34]). Given functionsfk(x) > 0, then one can get

− log

K∑

k=1

fk(x) = minφ∈A−

K∑

k=1

φk(x) log fk(x) +

K∑

k=1

φk(x) log φk(x),

where φ(x) = (φ1(x), φ2(x), · · · , φK(x)) is a vector-valued function, and φ ∈ A,with

A = φ|K∑

k=1

φk(x) = 1, φk(x) ∈ (0, 1),∀x ∈ Ω. (1)

Let

H(φ,Θ) =−∫

Ω

K∑

k=1

log[αkp(I(x);µk,Σk)]φk(x)dx

+

∫

Ω

K∑

k=1

φk(x) log φk(x)dx,φ ∈ A,

(2)


then H(φ,Θ) and L(Θ) have the same minimizer Θ∗ with respect to variable Θ,see more in [27].

It is easy to get φ∗ and Θ∗ from H(φ,Θ) by alternating scheme

φm+1 = arg minφ∈A

H(φ,Θm),

Θm+1 = arg minΘ

H(φm+1,Θ).

The above iteration strictly corresponds to the E-step and M-step of EM algo-rithm, respectively.

By applying the first-order optimization condition and Lagrange multiplier tech-nique, we can easily get the solution of E-step and M-step as follows:

E-step: for x ∈ Ω, k = 1, 2, · · · ,K,

(φk)m+1(x) =(αk)mp(I(x); (µk)m, (Σk)m)

∑Kl=1(αl)mp(I(x); (µl)m, (Σl)m)

. (3)

M-step: for k = 1, 2, · · · ,K,

(αk)m+1 =

∑x∈Ω(φk)m+1(x)

|Ω| ,

(µk)m+1 =

∑x∈Ω(φk)m+1(x)I(x)∑x∈Ω(φk)m+1(x)

,

(Σk)m+1 =

∑x∈Ω(φk)m+1(I(x)− (µk)m+1)2

∑x∈Ω(φk)m+1(x)

.

(4)

2.2. TV regularization. TV regularization was first introduced in ROF model in[33], which aims at noise removal.

In this paper, we use the rotation invariant isotropic TV as

TV (φ) =

∫

Ω

√φx1(x)2 + φx2(x)2dx,

due to its better performance than anisotropic TV [28], since it can provide moreprecise boundaries.

2.3. EM-TV model. In [26], Liu et al. proposed EM-TV model, which combinesGMM-EM method with TV regularization, and got some good results.

GMM-EM algorithm clusters pixels only by the intensity of pixels, which doesnot take any spatial or geometric information into the segmentation. In [26], theauthors added the TV regularization into the energy (2) to get the EM-TV model:

minφ∈A,Θ

E(φ,Θ)

= minφ∈A,Θ

H(φ,Θ) + γ∑Kk=1 TV (φk)

= minφ∈A,Θ

−∫

Ω

K∑

k=1

log[αkpk(x;µk,Σk)]φk(x)dx

+

∫

Ω

K∑

k=1

φk(x) log φk(x)dx+ γ

K∑

k=1

TV (φk).

(5)

Here γ is a regularization parameter.


3. The proposed method. In this section, we discuss how to solve (5) efficiently.Alternating method is applied to solve φ and Θ, which can be written as,

φm+1 = arg minφ∈A

E(φ,Θm),Θm+1 = arg min

ΘE(φm+1,Θ). (6)

For the minimization of φ, we shall propose a dual method due to existence ofTV norm. This dual algorithm is built upon [10]. Compared with the originaldual algorithm [10], there are some segmentation constraints and nonlinear entropyregularization in EM-TV model, which make this problem more difficult. For sim-plification, we denote our method as Dual EM-TV.

3.1. Minimization of φ (Dual EM-TV). Fix Θm, the sub-problem for φ is,

minφ∈A E(φ,Θm) =

∫

Ω

K∑

k=1

fk(x)φk(x)dx+

∫

Ω

K∑

k=1

φk(x) log φk(x)dx+γ

K∑

k=1

TV (φk),

(7)where (fk)m(x) = − log[(αk)mp(I(x); (µk)m, (Σk)m)]. For shortness, in the next,we omit the superscript m of (fk)m(x), and write it as fk(x) in this section.

Note that there is a constraint for φ, we use Lagrange multiplier method and getthe following unconstrained energy minimization problem,

minφEL(φ,Θm) =

∫

Ω

K∑

k=1

fk(x)φk(x)dx+

∫

Ω

K∑

k=1

φk(x) log φk(x)dx

+γ∑Kk=1 TV (φk)+

∫

Ω

λ(x)(

K∑

k=1

φk(x)− 1)dx,(8)

It is easy to see that EL(φ,Θm) is convex with respect to φ, when φk(x) >0,∀k ∈ 1, 2 · · · ,K, x ∈ Ω. Moreover, this energy can be separated and thus wecan deal with each k separately as follows,

EL(φ,Θm) =∑Kk=1

∫

Ω

fk(x)φk(x)dx+

∫

Ω

φk(x) log φk(x)dx

+γTV (φk) +

∫

Ω

λ(x)φk(x)dx −∫

Ω

λ(x)dx.

We can see that the intersection of domain of each term is nonempty, whichmeans the subgradient of EL(φ,Θm) is equal to the sum of subgradient of eachterm in EL(φ,Θm). Now the necessary and sufficient condition for problem (8)with its minimizer φ∗ = ((φ∗)1, (φ∗)2, · · · , (φ∗)K) satisfies

0 ∈ fk + log(φ∗)k + 1 + λ+ γ∂TV ((φ∗)k). (9)

Here ∂TV (u) is the subdifferential of TV (u) at u, defined as

∂TV (u) = ω|TV (v) ≥ TV (u) + 〈ω, v − u〉,∀v ∈ A,where 〈·, ·〉 denotes the corresponding L2 inner product.

Then we have:

−(fk + log(φ∗)k + 1 + λ)

γ∈ ∂TV ((φ∗)k),

and we get

(φ∗)k ∈ ∂TV ∗(−(fk + log(φ∗)k + 1 + λ)

γ), (10)


where TV ∗ is the Legendre-Fenchel transform of TV , defined by

TV ∗(v) = supu∈A〈u, v〉 − TV (u).

It is well-known that TV ∗ is a “characteristic function” of set B:

TV ∗(v) =

0, if v ∈ B,+∞, otherwise,

B = v = divη|η = (η1, η2) ∈ C1c (Ω,R2), ||η||∞ ≤ 1,

(11)

where ||η||∞ = maxx∈Ω

√η1(x)2 + η2(x)2.

More details about the dual presentation of TV norm can be found in Chambolle’sprojection method [10], as well as in [18, 24].

Now, let

(g∗)k(x) =−(fk(x) + log(φ∗)k(x) + 1 + λ(x))

γ, x ∈ Ω, (12)

and we get the equivalent form

(φ∗)k(x) = exp(−(fk(x) + 1 + λ(x))− γ(g∗)k(x)), x ∈ Ω. (13)

Put (12)(13) into (10), we have

0 ∈ − exp(−(fk + 1 + λ)− γ(g∗)k) + ∂TV ∗((g∗)k),

which indicates that (g∗)k is the minimizer of energy,

F (gk) =

∫

Ω

1

γexp(−(fk(x) + 1 + λ(x))− γ(gk)(x))dx+ TV ∗(gk). (14)

By (11), if gk /∈ B, F (gk) = +∞, thus (14) can be written as a constraintminimization problem

mingk

F (gk) = mingk∈B

1

γ

∫

Ω

exp(−(fk(x) + 1 + λ(x))− γ(gk)(x))dx. (15)

According to the definition of B, computing (g∗)k can be transferred to solvingthe following problem:

(η∗)k = arg minηk∈B

1

γ

∫

Ω

exp(−(fk(x) + 1 + λ(x))− γdivηk(x))dx, (16)

in which, gk and ηk are related as gk = divηk and B = ηk| ||ηk||∞ 6 1, η ∈C1c (Ω,R2).We can use the projection gradient method to solve (16), then one can get the

iteration scheme

(ηk)n+1 = ProjB((ηk)n −∆t∇(exp(−(fk(x) + 1 + λ(x))− γdivηk(x)))). (17)

Where the operator ProjB(ξ) means we project ξ onto a convex set B, and ∆t is aproper time step.

As for the Lagrange multiplier, from (13), we can give out the closed-form solutionof λ(x) by

λ(x) = ln

(K∑

k=1

exp(−fk(x)− γ(g∗)k(x))

)− 1,


if (g∗)k(x) is known. Thus, we set the Lagrange multiplier as

λm(x) = ln

(K∑

k=1

exp(−(fk)m(x)− γdiv(ηk)m(x))

)− 1. (18)

And fix λ = λm(x) when solving η.Now put λ back into (17), we can get the iteration of (ηk)n as

(ηk)n+1 =ProjB((ηk)n

−∆t∇((αk)mp(I(x); (µk)m, (Σk)m) exp(−γdiv(ηk)n(x))

∑Ki=1(αi)mp(I(x); (µi)m, (Σi)m) exp(−γdiv(ηi)m(x))

)).(19)

3.2. Minimization of Θ. At the m-iteration, we fix (φk)m+1. Then we have tominimize such an energy respect to Θ:

E(φm+1,Θ) = −∫

Ω

K∑

k=1

log[αkp(I(x);µk,Σk)](φk)m+1(x)dx. (20)

This optimization problem is the same as the classical EM model, so it can beupdated by (4).

4. Implementation. Here we introduce the discrete scheme of our algorithm.First, let X ⊂ R2 be a M × N rectangle area, and F1 = f : X → R,F2 =η : X → R2 are two function spaces defined on X. ∀f ∈ F1,∀η = (η1, η2) ∈ F2,fi,j ,ηi,j = (η1

i,j , η2i,j) stands for f(x),η = (η1, η2) at location (i, j), respectively,

here i = 1, 2, · · · ,M, j = 1, 2, · · · , N.The discrete operator ∇d : F1 → F2 with Neumann boundary condition can be

defined as

(∇df)1i,j =

fi+1,j − fi,j , if i < M,0, if i = M,

(∇df)2i,j =

fi,j+1 − fi,j , if j < N,0, if j = N.

It is not difficult to calculate that the discrete adjoint operator divd : F2 → F1

has the formulation

(divdη)i,j = divd((η1i,j , η

2i,j))

=

η1i,j − η1

i−1,j , if 1 < i < M,η1

1,j , if i = 1,−η1

M−1,j , if i = M,+

η2i,j − η2

i,j−1, if 1 < j < N,η2i,1, if j = 1,−η2

i,N−1, if j = N.

The projection operator ProjBd : F2 → Bd has the discrete formulation

(ProjBd(η)

)i,j

=

(η1i,j , η

2i,j), (η1

i,j)2 + (η2

i,j)2 ≤ 1,

(η1i,j√

(η1i,j)2+(η2i,j)

2,

η2i,j√(η1i,j)

2+(η2i,j)2), (η1

i,j)2 + (η2

i,j)2 > 1.

Note that Bd = ηk| ||ηk||∞ 6 1,η ∈ C1c (X,R2) is also discrete.

The corresponding discrete scheme of the main iteration becomes :

(ηk)n+1 =ProjBd((ηk)n

−∆t∇d((αk)mp(I(x); (µk)m, (Σk)m) exp(−γdivd(ηk)n(x))

∑Ki=1(αi)mp(I(x); (µi)m, (Σi)m) exp(−γdivd(ηi)m(x))

)).

(21)


Remark 1. In practice, we might slightly modify the second term :∫Ω

∑Kk=1 φ

k(x) log φk(x)dx as δ∫

Ω

∑Kk=1 φ

k(x) log φk(x)dx. This is actually equiv-alent to the continuous smooth method in [2, 13]. Then our iteration of η becomes:

(ηk)n+1 =ProjBd((ηk)n

−∆t∇d(((αk)mp(I(x); (µk)m, (Σk)m))

1δ exp(−γdivd(ηk)n(x)

δ )∑Ki=1((αi)mp(I(x); (µi)m, (Σi)m))

1δ exp(−γdivd(ηi)m(x)

δ ))).

(22)

4.1. Summary of Algorithm. Now, we can summary our algorithm in Algorithm1:

Algorithm 1 Dual EM-TV Algorithm

Set φ0 = (1, 0, · · · , 0), and Θ0 by several EM iteration (3) (4). Let m = 0.

1. Dual step: Get ηm+1 by (21) until convergence, i.e. ηm+1 = limn→∞

ηn+1.

2. Prime step: Get the smooth segmentation φm+1 by (13).

3. Convergence checking. If ||φm+1−φm||||φm|| > ε, then go to the next step. Else, end

the algorithm4. Update the parameter Θm+1 using (4),5. let m = m+ 1.

4.2. Convergence analysis. For the η-sub problem, we have the following con-vergence result.

Proposition 1. Assume f is bounded by M , i.e. |fij | 6M . Then for all 0 < ∆t <K4γ exp(−(2M + 8γ)), the sequence generated by (21) converges to one η∗ ∈ G if the

fixed point set G of iteration scheme (21) is nonempty. Here K is the number ofclassification and γ is the regularization parameter.

We put the proof of Proposition 1 in the appendix section.

5. Experiment results. In this section, we demonstrate the segmentation results.The computing platform is a laptop equipped with Processor Intel Core i7-8550UCPU @ 1.80Hz and Matlab R2010b. We take HMRF-EM method [36, 41] as coun-terparty. In HMRF method, the Gibbs prior is applied as a regularization, whilethe Gibbs distribution is given by

P (x) = Z−1 exp(−U(x)), (23)

where Z is a normalization constant. The prior energy function is set to U(x) =γ∑y∈Nx(1− Class(x, y)), where Nx denotes the neighborhood of point x, and

Class(x, y) =

0 label(x) 6= label(y),1 label(x) = label(y).

(24)

In our numerical experiments, the size of Nx in regularization term of HMRF-EM is chosen as |Nx|=4, 8, 12, 20 neighborhoods, respectively. For short, we callthem as HMRF-EM 4n, HMRF-EM 8n, HMRF-EM 12n, HMRF-EM 20n, respec-tively. The codes for running HMRF-EM are got from https://se.mathworks.

com/matlabcentral/fileexchange/37530-hmrf-em-image in [36].

https://se.mathworks.com/matlabcentral/fileexchange/37530-hmrf-em-image

https://se.mathworks.com/matlabcentral/fileexchange/37530-hmrf-em-image


5.1. Parameter selection. A possible way to speed up the algorithm is to use theArmijo rule [42] to choose the step size ∆t in (21) by line search. One can set thetheoretical upper bound of ∆t appeared in the proposition 1 as the Armijo rule’slow bound (αmin in [42]). We can also set the initial ∆t as some proper constantsif fast convergence is preferred. In this paper, we do not further study the optimalstep size and just use some constants as step size.

There are two parameters δ and γ in our model. δ controls the smoothness ofthe classification function. In the experiment, δ is not very sensitive, so it can bechosen in a wide range. Generally speaking, the heavier noise, the larger δ shouldbe taken. In our experiments, we choose δ from 1 to 1000. γ is the regularizationparameter, it would affect the smoothness of the boundary.

The initialization of parameters for Dual EM-TV method is as follows: α, µ,Σare got from K-means method. ηk = 0, k = 1, 2, · · · ,K. γ and δ are set as differentvalues for each image: in heart image, δ = 20, γ = 18; in 4-color image: δ = 20,γ = 12. The time step for Dual EM-TV method is ∆t = 0.5 for heart image, and∆t = 1 for 4 color image.

The initialization of parameters for HMRF-EM method is as follows: α, µ,Σare got from K-means method as well. For 4-neighborhood, γ = 320, for 8-neighborhood: γ = 160, for 20-neighborhood: γ = 40.

5.2. Comparison of Dual EM-TV, Original EM-TV [26] and HMRF-EMmodel. Here, we compare the segmentation results of these three methods on twosynthetic images. These two gray images are both normalized to [0,1]. We addadditive white Gaussian noise with standard deviation 150/255 and 50/255 intoheart image (size 615 × 615) and 4-color image (size 600 × 600), respectively.

To compare the segmentation results, we use the segmentation accuracy index(SAI)

SAI =NcNt

,

to measure the classification quality. Here Nc is the correctly labeled points, whileNt is the total number of points.

The CPU time and accuracy results are listed in Table 1 and Table 2. Thesegmentation results comparisons are displayed in Figures 1, 2, 3, 4. In Figures1 and 3, the first row contains original, noisy images and the energy evaluation ofthe Dual EM-TV method. While the second and third rows are the ground truth,segmentation results of Dual EM-TV, Original EM-TV [26], HMRF-EM 4n, HMRF-EM 8n, HMRF-EM 20n, respectively. In Figure 2 and 4, the enlarged images of therelated red square areas in Figure 1 and 3 are shown.

One can find that our algorithm is faster than HMRF-EM model and providesbetter segmentation results. Compared to the original EM-TV method, our pro-posed method produced slightly better results and costed much less CPU time.


Table 1. Comparison of 2 classes segmentation for Dual EM-TV,Original EM-TV [26] and HMRF-EM method (Heart image withsize 615× 615).

Methods CPU time (seconds) Accuracy (SAI)

Dual EM-TV 1.233 99.70%Original EM-TV [26] 9.482 98.02%HMRF-EM 4n 4.052 91.84%HMRF-EM 8n 3.052 97.38%HMRF-EM 20n 1.528 99.24%

Table 2. Comparison of 4 classes segmentation for Dual EM-TV,Original EM-TV [26] and HMRF-EM methods (4-color image withsize 600× 600).

Methods CPU time (seconds) Accuracy (SAI)

Dual EM-TV 1.699 99.90%Original EM-TV [26] 18.502 99.02%HMRF-EM 4n 3.480 97.39%HMRF-EM 8n 3.202 99.81%HMRF-EM 20n 2.388 99.83%

5.3. Comparison of Dual EM-TV and CV method. In this subsection, weshow an example that our model can segment an image with two regions which havethe same means but different variances.

We first synthetic two regions with the same mean but different variances inFigure 5(a). The size of image is 100× 100, the mean of the 70× 70 inside region is128, and the variance is 1000. While the mean and variance of outside region are setas 128, 10, respectively. Due to the normalization, the middle of the histogram isslightly apart from 128. The histogram in Figure 5(b-d) clearly shows the differenceof variance. Please see Figure 5 for details.

Then we apply GMM-EM method on Figure 5(a) and gets the segmentationresult on Figure 6(a). And then we use the parameters got from this GMM-EMmethod and apply out Dual EM-TV method with the segmentation result in Figure6(b). Figure 6(c) is the segmentation result of CV model. As you can see thatGMM-EM based method successfully segmented the target image while CV cannot.

5.4. Natural image segmentation. Here, we show more comparisons for DualEM-TV and HMRF-EM method in Figure 7. We add Gaussian noise with mean0, standard deviation of 50/255. We can see that our algorithm can get a verygood segmentation. In Figure 7, the first and second columns contain the originaland noisy images. All the segmentation results provided by Dual EM-TV, HMRF-EM 4n, HMRF-EM 8n, HMRF-EM 12n are displayed in the right four columns,respectively.

We list the parameters appeared in this experiment: row 1 (the cameraman),∆t = 1, δ = 1000, γ = 400; row 2 (synthetic image objects), ∆t = 1, δ = 1000, γ =


(a) Clean image (b) Image with noise

0 5 10 15 20 25

Total iteration

3.5

4

4.5

5

5.5

6

Energ

y

×106 Energy change of Dual EM-TV

(c) Energy of Dual EM-TV

(d) Ground truth (e) Dual EM-TV result (f) Original EM-TV re-sult

(g) HMRF-EM 4n result (h) HMRF-EM 8n result (i) HMRF-EM 20n result

Figure 1: Comparison between Dual EM-TV and HMRF-EM method bysegmentation results in two-region: heart image.

13

Figure 1. Comparison of Dual EM-TV, Original EM-TV [26] andHMRF-EM methods for 2 classes segmentation.

400; row 3 (synthetic image 2), ∆t = 1, δ = 1000, γ = 480. row 4 (starfish), ∆t =0.5, δ = 40, γ = 39.

5.5. Color Images. It is easy to extend the GMM based model to color images.We take some real-life images from database [3, 29] to test our algorithm. In Figure8 and 9, we apply our model to images with two-region and multi-region, respec-tively. In these figures, the original images are shown on the left column and thesegmentation results are displayed in the right column. These results show that ouralgorithm can produce a good segmentation results.

The parameters we used are as follows. In Figure 8, row 1 (bear image), we let∆t = 0.5, γ = δ = 10; row 2 (eagle image), ∆t = 0.5, γ = 7, δ = 10; row 3 (horseimage), ∆t = 0.1, γ = 50, δ = 10; row 4 (desert image), ∆t = 0.5, γ = δ = 10. In


(a) Ground truth (b) Dual EM-TV result (c) Original EM-TV re-sult

(d) HMRF-EM 4n result (e) HMRF-EM 8n result (f) HMRF-EM 20n result

Figure 2: Comparison between Dual EM-TV and HMRF-EM method bysegmentation results in two-region: heart image. Zoom in.

6.4 Natural image segmentation1

Here, we show more comparisons for Dual EM-TV and HMRF-EM method in Figure 7.2

We add Gaussian noise with mean 0, standard deviation of 50/255. We can see that our3

algorithm can get a very good segmentation. In Figure 7, the first and second columns4

contain the original images and the related noisy images. All the segmentation results5

provided by Dual EM-TV, HMRF-EM 4n, HMRF-EM 8n, HMRF-EM 12n are displayed6

in the right four columns, respectively.7

Parameter selection: For the cameraman, ∆t = 1, δ = 1000, γ = 400. For synthetic8

image objs, ∆t = 1, δ = 1000, γ = 400. For synthetic image 2, ∆t = 1, δ = 1000, γ = 480.9

For starfish: ∆t = 0.5, δ = 40, γ = 39.10

6.5 Color Image11

As we mentioned in Section 5, we extend our model to fit color images and get segmen-12

tation results. These are real-life images from database [dataset] [37, 38]. In Figure 8, we13

14

Figure 2. Enlarged red square regions in Figure 1

Figure 9: row 1 (moon image), we let ∆t = 0.5, γ = 2, δ = 10; row 2 (stone image)∆t = 0.005, γ = 80, δ = 1; row 3 (temple image) ∆t = 0.005, γ = 80, δ = 1.

Remark 2. The color images are all segmented in both RGB and HSI format andwe choose the better result here.

5.6. Under different levels of noise. Here, we discuss how the noise levels affectour model and the segmentation results. Figure 10 is a two class segmentationresults. The noise is additive white Gaussian noise, with standard deviation of120/255, 240/255, 300/255, 390/255, respectively. In Figure 11, We add additivewhite Gaussian noise, with standard deviation of 50/255, 60/255, 70/255, 80/255,respectively, and test the outcomes. As you can see, we can get better results thanHMRF-EM model under different levels of noise.

Parameters are as follows. For heart image with noise 120/255, 240/255, 300/255, ∆t = 0.1, γ = 60, δ = 20, for noise 390/255, ∆t = 0.1, γ = 80, δ = 20. For 4-color image with noise 50/255, ∆t = 0.1, γ = 12, δ = 20, for noise 60/255, ∆t =0.01, γ = 12, δ = 2, for noise 70/255, ∆t = 0.01, γ = 18, δ = 2, for noise 80/255,∆t = 0.01, γ = 18, δ = 2.

5.7. Applied to 3-D MR Images. In Figure 12, we use the proposed methodinstead of the segmentation method in [4] on 3-D MR images, based on SPM8(http://www.fil.ion.ucl.ac.uk/spm/software/spm8/), which is a MATLAB tool-box for MRI analysis. To compare, we based on the newsegment module in [4] totest our method, and apply to the MR images from BrainWeb(http://brainweb.bic.mni.mcgill.ca/brainweb/). The image dimension is 181×

http://www.fil.ion.ucl.ac.uk/spm/software/spm8/

http://brainweb.bic.mni.mcgill.ca/brainweb/


(a) Clean image (b) Image with noise

0 2 4 6 8 10 12 14

Total iteration

4.4

4.6

4.8

5

5.2

5.4

5.6

5.8

En

erg

y

×106 Energy change of Dual EM-TV

(c) Energy of Dual EM-TV

(d) Ground truth (e) Dual EM-TV result (f) Original EM-TV re-sult

(g) HMRF-EM 4n result (h) HMRF-EM 8n result (i) HMRF-EM 20n result

Figure 3: Comparison between Dual EM-TV and HMRF-EM method bysegmentation results in multi-region: 4-color image.

apply our model to images with two-region. Left column: Original Image. Right column:1

Segmentation of our algorithm. In Figure 9, we apply our model to images with multi-2

region. Left column: Original Image. Right column: Segmentation of our algorithm. The3

15

Figure 3. Comparison of Dual EM-TV, Original EM-TV [26] andHMRF-EM methods for 4 classes segmentation.

217×181, with voxels of 1×1×1mm. The data with image intensity non-uniformity40% RF (a parameter specifies the intensity non-uniformity level, larger value re-sponding to higher level) and level of noise 9% is tested.

The parameters are as follows: ∆t = 0.0001, γ = δ = 1.We use Dice metric (DM) to compare the segmentation accuracy, as

DM =2Nag

Na +Ng,

where Nag denotes the number of the voxels that are correctly assigned to the k-thclass by both the ground truth and the segmentation algorithm, Na and Ng are thenumbers of the voxels assigned to the k-th class by the algorithm and ground truth,respectively.


(a) Ground truth (b) Dual EM-TV result (c) Original EM-TV re-sult

(d) HMRF-EM 4n result (e) HMRF-EM 8n result (f) HMRF-EM 20n result

Figure 4: Comparison between Dual EM-TV and HMRF-EM method bysegmentation results in multi-region: 4-color image. Zoom in.

results show that our algorithm did a good job in segmentation.1

Parameters are as follows: Bear image: ∆t = 0.5, γ = δ = 10. Eagle image: ∆t =2

0.5, γ = 7, δ = 10. Horse image: ∆t = 0.1, γ = 50, δ = 10. Desert image: ∆t = 0.5, γ =3

δ = 10.4

Remark 2. The color images are all segmented in both RGB and HSI format and we5

choose the better result here.6

6.6 Different noise level to the same image7

Here, we discuss how the noise level affect our model and the segmentation results. Figure8

10 is a two class segmentation results. The noise is additive Gaussian white noise, with9

standard deviation of 120/255, 240/255, 300/255, 390/255, respectively. In Figure 11, We10

add additive Gaussian white noise, with standard deviation of 50/255, 60/255, 70/255,11

80/255, respectively, and test the outcomes. The noise level in Figure 11 is lower than12

16

Figure 4. Enlarged red square regions in Figure 3

We compare our results with new segment in SPM 8, the DM is listed in Table3, WM denotes white matter, GM denotes gray matter. As you can see that ourmethod is slightly better.

Table 3. Comparison of DM in SPM8 on brain images

noise level 5% 5% 7% 7% 9% 9%

brain part WM GM WM GM WM GM

Dual EM-TV 0.9380 0.9122 0.9218 0.8970 0.9035 0.8818New Segment 0.9317 0.9099 0.9035 0.8822 0.8759 0.8536

6. Discussion and Conclusion. In this paper, we proposed a dual algorithm forsolving the EM-TV model. To combine the dual, project gradient and Lagrangemultiplier methods, we get an elegant dual EM algorithm which is fast and stable.Compared with the original EM-TV model, the optimization problem is smooth andeasier to solve. Compared with HMRF-EM model, our algorithm is faster and hasbetter segmentation performance since the isotropic TV cannot be solved by HMRFtype algorithm. Based on the experiment results, we can see that our algorithmcan produce good segmentation results under high levels of noise, and can alsoachieve some good segmentations on both color and gray images. The proposedmethod could also be applied to SPM8 and gets good results. Our method can


(a) Image

(b) Image histogram (c) Inside histogram (d) Outside histogram

Figure 5: An image with two regions, which are the same mean but differentvariance, and the histogram.

27

Figure 5. An image with two regions, which are the same meanbut different variance, and the histogram.

(a) GMM-EM (b) Dual EM-TV (c) CV model

Figure 6: Comparision of segmentation results of GMM-EM, Dual EM-TVand CV model on region with same mean but different variances image.

(a) Originalimage

(b) Noisy im-age

(c) Dual EM-TV

(d) HMRF-EM 4n

(e) HMRF-EM 8n

(f) HMRF-EM 12n

Figure 7: Comparison between Dual EM-TV and HMRF-EM method.

28

Figure 6. Comparision of segmentation results of GMM-EM,Dual EM-TV and CV model on region with same mean but differ-ent variances image.


(a) GMM-EM (b) Dual EM-TV (c) CV model

Figure 6: Comparision of segmentation results of GMM-EM, Dual EM-TVand CV model on region with same mean but different variances image.

(a) Originalimage

(b) Noisy im-age

(c) Dual EM-TV

(d) HMRF-EM 4n

(e) HMRF-EM 8n

(f) HMRF-EM 12n

Figure 7: Comparison between Dual EM-TV and HMRF-EM method.

28

Figure 7. Comparison between Dual EM-TV and HMRF-EM method.

be extended to non-parameter segmentation mixture model such as kernel methodwhich is parameter free. We will leave these as our future research.

Acknowledgments. Jun Liu and Haiyang Huang were partially supported by TheNational Key Research and Development Program of China (2017YFA0604903).Jun Liu was also supported by the National Natural Science Foundation of China(No. 11871035). Shi Yan was partially supported by China Scholarship Council.The research of Tai is supported by HKBU startup grant, RG(R)- RC/17-18/02-MATH, and FRG2/17-18/033.

7. Appendix A. For self-contained in this paper, we give out the proof of Propo-sition 1.

Let T : F1 → F2 be an operator.Introduce notations,

T1(η) = ProjBd(η),

T2(η) = (I −∆tT20) (η),

T20(η) = ∇d(exp(−(fk + 1 + λm)− γdivdη)).

Definition 7.1 (Lipschitz continuous). T is L-Lipschitz continuous, L ∈ R, if

||Tx− Ty|| ≤ L||x− y||, ∀ x, y ∈ F1.

Definition 7.2 (Non-expansive operator). T is a non-expansive operator if T is1-Lipschitz continuous, i.e.,

||T (x)− T (y)|| ≤ ||x− y||, ∀x, y ∈ F1.


(a) Original image (b) Dual EM-TV result

Figure 8: Segmentation results by Dual EM-TV in color images for two-region images.

29

Figure 8. Segmentation results by Dual EM-TV in color imagesfor two-region images.

Definition 7.3 (α-average non-expansive operator). T is an α-average non-expansiveoperator if there exists a non-expansive operator R and α ∈ (0, 1), s.t.

T = (1− α)I + αR,

here I is the identity operator, i.e., I(x) = x, ∀x.An equivalent definition of α-average non-expansive operator is, if and only if T

satisfies:

||T (x)− T (y)||2 ≤ ||x− y||2 − 1− αα||(I − T )(x)− (I − T )(y)||2.


(a) Original image (b) Dual EM-TV result

Figure 9: Segmentation results by Dual EM-TV in color images for multi-region images.

30

Figure 9. Segmentation results by Dual EM-TV in color imagesfor multi-region images.

Definition 7.4 (Firmly non-expansive operator). T is firmly non-expansive if andonly if T is a 1

2 -average non-expansive operator. Another equivalent definition offirmly non-expansive is that the following inequality holds,

||T (x)− T (y)||2 6 〈T (x)− T (y), x− y〉.

In our paper, the spaces F1,F2 are often taken as R or Rn. In the next, we givesome properties of T1, T2, T20 appeared in our algorithm.

Proposition 2. [15] T1 is firmly non-expansive, i.e. T1 is a 12 -average non-expansive

operator.

Proof. T1 is a projection operator, which is a special proximate operator, and thusit is firmly non-expansive (more details please find in [15]).

Proposition 3. If f is bounded by M , then T20 is β-Lipschitz continuous withβ = 8γ

K exp (2M + 8γ).


(A)

(B)

(C)

(D)(a) Noisy im-age

(b) DualEm-TV

(c) HMRF-EM 4n

(d) HMRF-EM 8n

(e) HMRF-EM 12n

(f) HMRF-EM 20n

Figure 10: Segmentation comparison of the same two-region image by dif-ferent noise level. Each row presents the different level of noise, with mean0, standard deviation of, from up to down: A: 120/255, B: 240/255, C:300/255, D: 390/255. Each column presents different method, from left toright: a: noisy image, b: Dual EM-TV, c: HMRF-EM 4n, d: HMRF-EM 8n,e: HMRF-EM 12n, f: HMRF-EM 20n.

31

Figure 10. Segmentation comparison under different noise levels.Noise standard deviation: A: 120/255, B: 240/255, C: 300/255, D:390/255.

To prove Proposition 3, we need the following lemmas, and where K is theclassification number and γ is the regularization parameter.

Lemma 7.5. Under the assumption of Proposition 3, exp(−(fk + 1 + λm)) is boundedby

[exp(−2M − 4γ)

K,

exp(2M + 4γ)

K].

Proof.

exp(−(fk(x) + λm(x) + 1) =exp(−fk(x))

∑Ki=1 exp(−f i(x)− γdivd(ηi)m(x))

.

Since (ηk)m ∈ [−1, 1], then divd(ηk)m ∈ [−4, 4], due to the definition of divd. Then

we know that

exp(−(fk(x) + λm(x) + 1) ∈ [exp(−2M − 4γ)

K,

exp(2M + 4γ)

K].

Here we denote K0 = exp(2M+4γ)K .

Lemma 7.6. ∇d is a K1-Lipschitz linear operator, where K1 =√

8.

Proof. It is easy to check ∇d is linear. Moreover

||∇dg||2 =∑i,j(gi+1,j − gi,j)2 + (gi,j+1 − gi,j)2

≤ ∑i,j 2(gi+1,j)

2 + 2(gi,j)2 + 2(gi,j+1)2 + 2(gi,j)

2

= 8||g||2,


(A)

(B)

(C)

(D)(a) Noisy im-age

(b) DualEM-TV

(c) HMRF-EM, 4n

(d) HMRF-EM, 8n

(e) HMRF-EM, 12n

(f) HMRF-EM, 20n

Figure 11: Segmentation comparison of the same multi-region image by dif-ferent noise level. Each row presents the different level of noise, with mean 0,standard deviation of, from up to down: A: 50/255, B: 60/255, C: 70/255, D:80/255. Each column presents different method, from left to right: a: noisyimage, b: Dual EM-TV, c: HMRF-EM 4n, d: HMRF-EM 8n, e: HMRF-EM12n, f: HMRF-EM 20n.

32

Figure 11. Segmentation comparison of the same multi-regionimage by different noise levels. Noise standard deviation: A:50/255, B: 60/255, C: 70/255, D: 80/255.

which means ||∇dg|| 6 K1||g|| with K1 =√

8.

Lemma 7.7. exp(z) satisfies K2-Lipschitz condition when z ∈ [−4γ, 4γ], K2 =exp(4γ).

Proof. We can see that exp(z) is only affected by the (i, j) position, so we canconsider it in each point as in the one dimensional case:

| exp(z)− exp(y)| = | exp(δ)||z − y|,where δ is between z and y by the differential mean value theorem. And since exp(z)is monotone , we can say

| exp(z)− exp(y)| ≤ | exp(maxz, y)||z − y|.While in our algorithm, z = −γdivdηn, then z ∈ [−4γ, 4γ], and let K2 = exp(4γ),Now ∀g1, g2 ∈ F1

|| exp(g1)− exp(g2)|| ≤ K2||g1 − g2||.

Lemma 7.8. divd is a K3-Lipschitz linear operator, K3 =√

8.

Proof. Same as ∇d, we can get ||divdη|| ≤ K3||η||, where K3 =√

8.

Now we show the proof of Proposition 3,


A Dual EM Algorithm for TV Regularized Gaussian Mixture Model in Image Segmentation 21

Figure 12. Results of SPM8 with one slice of MR Im-ages. Left two column: MR Images with 0% RF and 0%noise, the first column is the newsegment method, thesecond column is the proposed method. Right two col-umn: MR Images with 40% RF and 9% noise, the thirdcolumn is the newsegment method, the fourth column isthe proposed method. The first row: original image, thesecond and third row: the segmentation results.

Lemma 8.5. Under the condition of Proposition 3, exp(−(fk + 1 + λm)) is boundedby

[exp(−2M − 4γ)

K,

exp(2M + 4γ)

K].

Proof.

exp(−(fk(x) + λm(x) + 1) =exp(−fk(x))

∑Kk=1 exp(−fk(x)− γdivd(ηk)m(x))

.

Since (ηk)m ∈ [−1, 1], then divd(ηk)m ∈ [−4, 4], due to the definition of divd. Then

we know that

exp(−(fk(x) + λm(x) + 1) ∈ [exp(−2M − 4γ)

K,

exp(2M + 4γ)

K].

1

Here we denote K0 = exp(2M+4γ)K .2

Lemma 8.6. ∇d is a K1-Lipschitz linear operator, where K1 =√

8.3

Proof. It is easy to check ∇d is linear. Moreover

||∇dg||2 =∑

i,j

(gi+1,j−gi,j)2+(gi,j+1−gi,j)2 ≤∑

i,j

2(gi+1,j)2+2(gi,j)

2+2(gi,j+1)2+2(gi,j)2 = 8||g||2,

which means ||∇dg|| 6 K1||g|| with K1 =√

8.4

5

Inverse Problems and Imaging Volume 00, No. 0 (2018),

Figure 12. Results of SPM8 with one slice of MR Images. Firstrow: left image, MRI with 0% RF and 0% noise; right: MRI with40% RF and 9% noise). Second and third rows: the first and thirdcolumns are the newsegment method for left and right images, re-spectively; the second and fourth columns are the proposed methodfor left and right images, respectively.

Proof. By the Lemmas 7.5-7.8, we have

||T20(η)− T20(θ)|| = ||∇d(exp(−(fk + 1 + λm)− γdivdη))−∇d(exp(−(fk + 1 + λm)− γdivdθ))||

= ||∇d(exp(−(fk + 1 + λm))(exp(−γdivdη)− exp(−γdivdθ)))||

≤ ||∇d|| ||(exp(−(fk + 1 + λm))(exp(−γdivdη)− exp(−γdivdθ)))||

≤ K1K0 ||(exp(−γdivdη)− exp(−γdivdθ)))||≤ K1K0K2 ||((−γdivdη)− (−γdivdθ)))||≤ K1K0K2γ ||((divdη)− (divdθ)))||≤ γK1K0K2 K3||η − θ||.

Let

β = γK1K0K2 K3 =8γ

Kexp (2M + 8γ),

we complete the proof.

Proposition 4. T2 is α2-average non-expansive operators if

0 < ∆t <K

4γexp(−(2M + 8γ)).

Proof. According to Proposition 3, T20 is β-Lipschitz. Moreover, T20 is a form of∇H with H is a convex function with respect to η, so 1

βT20 is firmly non-expansive

by [6], then there exist a non-expansive operator R, s.t. 1βT20 = 1

2 (I + R). Back


to T2, we have T2 = I − ∆tT20 = (1 − ∆tβ2 )I + ∆tβ2 (−R) (Note that if R is non-

expansive, −R is non-expansive too). Now if we let (1 −∆tβ2 ) ∈ (0, 1), then T2 is

α2-average non-expansive, with α2 = ∆tβ2 . (1−∆tβ2 ) ∈ (0, 1) means that we need

0 < 1−∆tβ2 < 1, i.e., 0 < ∆t < K4γ exp(−(2M + 8γ)).

Proposition 5. [14]. T1 T2 is α-average non-expansive, with α = 21+ 1

max1/2,α2.

Proof. This proposition is the same as Lemma 2.2(iii) in [14]. But the proof isslightly different.

Let α1 = 12 , then according to the Proposition 2 and Proposition4, T1, T2 are

α1, α2-average non-expansive respectively, and we have

||T1(η)− T1(δ)||2 ≤ ||η − δ||2 − 1− α1

α1||(I − T1)(η)− (I − T1)(δ)||2.

||T2(η)− T2(δ)||2 ≤ ||η − δ||2 − 1− α2

α2||(I − T2)(η)− (I − T2)(δ)||2.

Let α0 = maxα1, α2, we have

−1− α0

α0≥ −1− α2

α2,

−1− α0

α0≥ −1− α1

α1.

Now we back to T1 T2,

||T1 T2(η)− T1 T2(δ)||2= ||T1(T2(η))− T1(T2(δ))||2≤ ||T2(η)− T2(δ)||2 − 1−α1

α1||(I − T1)(T2(η))− (I − T1)(T2(δ))||2

≤ ||η − δ||2 − 1−α2

α2||(I − T2)(η)− (I − T2)(δ)||2

− 1−α1

α1||(I − T1)(T2(η))− (I − T1)(T2(δ))||2

≤ ||η − δ||2 − 1−α0

α0||(I − T2)(η)− (I − T2)(δ)||2

− 1−α0

α0||(I − T1)(T2(η))− (I − T1)(T2(δ))||2

= ||η − δ||2− 1−α0

α0(||(I − T2)(η)− (I − T2)(δ)||2 + ||(I − T1)(T2(η))− (I − T1)(T2(δ))||2)

≤ ||η − δ||2− 1−α0

α0

12 (||(I − T2)(η)− (I − T2)(δ) + (I − T1)(T2(η))− (I − T1)(T2(δ))||2)

= ||η − δ||2 − 1−α0

α0

12 (||(I − T1 T2)(η)− (I − T1 T2)(δ)||2).

Here we use the Cauchy-Schwarz inequality, which is (a2 + b2) ≥ 12 (a+ b)2.

Let 1−αα = 1−α0

α0

12 , we get that α = 2

1+ 1maxα1,α2

. Which means that T1 T2 is

α-average non-expansive.

Proposition 6. If G = η|η = T1(T2(η)) is not empty, f is bounded by M , and0 < ∆t < K

4γ exp(−(2M + 8γ)) then (21) convergence to one η∗ ∈ G.

Proof. The idea of this proof is from [14].


From Proposition 5, T = T1 T2 is α-average non-expansive. Then we will provethat ηn+1 = T (ηn) converges.

||ηn − η|| = ||T (ηn−1)− T (η)||≤ ||ηn−1 − η|| − 1−α

α ||(I − T )(ηn−1)− (I − T )(η)||= ||ηn−1 − η|| − 1−α

α ||(I − T )(ηn−1)||≤ ||ηn−1 − η||≤ ||η0 − η||,

(25)

which means that ||ηn − η|| is bounded and monotone, then the limit

limn→+∞

||ηn − η||

exists. While, ηn is bounded due to

||ηn|| ≤ ||ηn − η||+ ||η||.So there exists a subsequence of ηn denoted as ηnj that convergences to η∗.

Back to (25), we rewrite it as

1− αα||(I − T )(ηn−1)|| ≤ ||ηn−1 − η|| − ||ηn − η||.

Let n = 1, 2, · · · , N , and sum them together, we have∑Nn=1

1−αα ||(I − T )(ηn−1)|| ≤ ∑N

n=1(||ηn−1 − η|| − ||ηn − η||)= ||η0 − η|| − ||ηN − η||.

Now let N → +∞, we have

+∞∑

n=1

1− αα||(I − T )(ηn−1)|| ≤ ||η0 − η|| − lim

N→+∞||ηN − η||,

which means

limn→+∞

1− αα||(I − T )(ηn−1)|| = 0.

Because α ∈ (0, 1), this means

limn→+∞

||ηn−1 − ηn|| = 0.

Then

||T (η∗)− η∗|| = limj→+∞

||T (ηnj )− ηnj || = limj→+∞

||ηnj+1 − ηnj || = 0.

Which means η∗ ∈ G, now we can replace η∗ with η in all of our results.For ∀ n, we take j = maxi|ni ≤ n, then nj ≤ n < nj+1, and

||ηnj+1 − η∗|| < ||ηn − η∗|| ≤ ||ηnj − η∗||,let n→ +∞, which also implies j → +∞, we have

limn→+∞

||ηn − η∗|| = 0.

i.e.

limn→∞

ηn = η∗.


REFERENCES

[1] W. Allard, Total variation regularization for image denoising, i. geometric theory, SIAM

Journal on Mathematical Analysis, 39 (2007), 1150–1190.[2] U. Amato and W. Hughes, Maximum entropy regularization of fredholm integral equations

of the first kind, Inverse Problems, 7 (1991), 793.[3] P. Arbelaez, M. Maire, C. Fowlkes and J. Malik, Contour detection and hierarchical image

segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence, 33 (2011),

898–916, URL http://dx.doi.org/10.1109/TPAMI.2010.161.[4] J. Ashburner and K. J. Friston, Unified segmentation, NeuroImage, 26 (2005), 839 – 851,

URL http://www.sciencedirect.com/science/article/pii/S1053811905001102.

[5] E. Bae, J. Yuan and X. Tai, Global minimization for continuous multiphase partitioningproblems using a dual approach, International Journal of Computer Vision, 92 (2011), 112–

129.

[6] J. Baillon and G. Haddad, Quelques proprietes des operateurs angle-bornes etn-cycliquementmonotones, Israel Journal of Mathematics, 26 (1977), 137–150.

[7] Y. Boykov and V. Kolmogorov, An experimental comparison of min-cut/max-flow algorithms

for energy minimization in vision, IEEE Transactions on Pattern Analysis and MachineIntelligence, 26 (2004), 1124–1137.

[8] Y. Boykov, O. Veksler and R. Zabih, Fast approximate energy minimization via graph cuts,IEEE Transactions on Pattern Analysis and Machine Intelligence, 23 (2001), 1222–1239.

[9] C. Carson, S. Belongie, H. Greenspan and J. Malik, Blobworld: Image segmentation us-

ing expectation-maximization and its application to image querying, IEEE Transactions onPattern Analysis and Machine Intelligence, 24 (2002), 1026–1038.

[10] A. Chambolle, An algorithm for total variation minimization and applications, Journal of

Mathematical Imaging and Vision, 20 (2004), 89–97.[11] T. Chan, S. Esegodlu and A. Yip, Recent developments in total variation image restoration,

Mathematical Models of Computer Vision, 17.

[12] T. Chan and L. Vese, Active contours without edges, IEEE Transactions on Image Processing,10 (2001), 266–277.

[13] Y. Chiang, P. Borbat and J. Freed, Maximum entropy: A complement to tikhonov regular-

ization for determination of pair distance distributions by pulsed esr, Journal of MagneticResonance, 177 (2005), 184–196.

[14] P. Combettes, Solving monotone inclusions via compositions of nonexpansive averaged oper-ators, Optimization, 53 (2004), 475–504.

[15] P. Combettes and V. Wajs, Signal recovery by proximal forward-backward splitting, Multiscale

Modeling and Simulation, 4 (2005), 1168–1200.[16] A. Dempster, N. Laird and D. Rubin, Maximum likelihood from incomplete data via the em

algorithm, Journal of the Royal Statistical Society. Series B (methodological), 1–38.[17] J. Duarte-Carvajalino, G. Sapiro, G. Yu and L. Carin, Online adaptive statistical compressed

sensing of gaussian mixture models, arXiv preprint arXiv:1112.5895.

[18] I. Ekeland and R. Temam, Convex Analysis and Variational Problems, SIAM, 1999.

[19] L. Evans and R. Gariepy, Measure Theory and Fine Properties of Functions, CRC Press,2015.

[20] S. Gao and T. Bui, Image segmentation and selective smoothing by using mumford-shahmodel, IEEE Transactions on Image Processing, 14 (2005), 1537–1549.

[21] T. Goldstein and S. Osher, The split bregman method for l1-regularized problems, SIAM

Journal on Imaging Sciences, 2 (2009), 323–343.

[22] R. Gonzalez, R. Woods and S. Eddins, Digital Image Publishing Using MATLAB, PrenticeHall, 2004.

[23] L. Gupta and T. Sortrakul, A gaussian-mixture-based image segmentation algorithm, PatternRecognition, 31 (1998), 315–325.

[24] J. Hiriart-Urruty and C. Lemarechal, Convex Analysis and Minimization Algorithms I,

Springer, 1993.[25] H. Ishikawa, Exact optimization for markov random fields with convex priors, IEEE Trans-

actions on Pattern Analysis and Machine Intelligence, 25 (2003), 1333–1336.

[26] J. Liu, Y. Ku and S. Leung, Expectation–maximization algorithm with total variation regular-ization for vector-valued image segmentation, Journal of Visual Communication and Image

Representation, 23 (2012), 1234–1244.

http://dx.doi.org/10.1109/TPAMI.2010.161

http://www.sciencedirect.com/science/article/pii/S1053811905001102


[27] J. Liu, X. Tai, H. Huang and Z. Huan, A weighted dictionary learning model for denoisingimages corrupted by mixed noise, IEEE Transactions on Image Processing, 22 (2013), 1108–

1120.

[28] J. Liu, X. Tai, S. Leung and H. Huang, A new continuous max-flow algorithm for multiphaseimage segmentation using super-level set functions, Journal of Visual Communication &

Image Representation, 25 (2014), 1472–1488.[29] D. Martin, C. Fowlkes, D. Tal and J. Malik, A database of human segmented natural images

and its application to evaluating segmentation algorithms and measuring ecological statistics,

in Computer Vision, 2001. ICCV 2001. Proceedings. Eighth IEEE International Conferenceon, vol. 2, IEEE, 2001, 416–423.

[30] G. McLachlan and T. Krishnan, The EM Algorithm and Extensions, vol. 382, John Wiley &

Sons, 2007.[31] D. Mumford and J. Shah, Optimal approximations by piecewise smooth functions and asso-

ciated variational problems, Communications on Pure and Applied Mathematics, 42 (1989),

577–685.[32] S. Osher and J. Sethian, Fronts propagating with curvature-dependent speed: algorithms

based on hamilton-jacobi formulations, Journal of Computational Physics, 79 (1988), 12–49.

[33] L. Rudin, S. Osher and E. Fatemi, Nonlinear total variation based noise removal algorithms,Physica D: Nonlinear Phenomena, 60 (1992), 259–268.

[34] M. Teboulle, A unified continuous optimization framework for center-based clustering meth-ods, Journal of Machine Learning Research, 8 (2007), 65–102.

[35] L. Vese and T. Chan, A multiphase level set framework for image segmentation using the

mumford and shah model, International Journal of Computer Vision, 50 (2002), 271–293.[36] Q. Wang, Hmrf-em-image: Implementation of the hidden markov random field model and its

expectation-maximization algorithm, arXiv preprint arXiv:1207.3510.

[37] C. Wu and X. Tai, Augmented lagrangian method, dual methods, and split bregman iterationfor rof, vectorial tv, and high order models, SIAM Journal on Imaging Sciences, 3 (2010),

300–339.

[38] C. Wu, On the convergence properties of the em algorithm, The Annals of Statistics, 95–103.[39] G. Yu and G. Sapiro, Statistical compressed sensing of gaussian mixture models, IEEE Trans-

actions on Signal Processing, 59 (2011), 5842–5858.

[40] J. Yuan, E. Bae, X. Tai and Y. Boykov, A continuous max-flow approach to potts model,Computer Vision (ECCV 2010), European Conference on, 379–392.

[41] Y. Zhang, M. Brady and S. Smith, Segmentation of brain mr images through a hidden markovrandom field model and the expectation-maximization algorithm, IEEE Transactions on Med-

ical Imaging, 20 (2001), 45–57.

[42] M. Zhu, S. J. Wright and T. Chan, Duality-based algorithms for total-variation-regularizedimage restoration, Computational Optimization and Applications, 47 (2010), 377–400.

Received xxxx 20xx; revised xxxx 20xx.

E-mail address: [email protected]




mailto:[email protected]




a dual em algorithm for tv regularized gaussian …

Documents