a novel gaussian mixture model for superpixel segmentation

A novel Gaussian mixture model for superpixel

segmentation

Zhihua Bana, Jianguo Liua,∗, Li Caoa

aNational Key laboratory of Science and Technology on Multi-spectral InformationProcessing, School of Automation, Huazhong University of Science and Technology,

Wuhan, Hubei Province 430074, China

Abstract

Superpixel segmentation is used to partition an image into perceptually co-

herence atomic regions. As a preprocessing step of computer vision appli-

cations, it can enormously reduce the number of entries of subsequent algo-

rithms. With each superpixel associated with a Gaussian distribution, we

assume that a pixel is generated by first randomly choosing one of the su-

perpixels, and then the pixel is drawn from the corresponding Gaussian den-

sity. Unlike most applications of Gaussian mixture model in clustering, data

points in our model are assumed to be non-identically distributed. Given an

image, a log-likelihood function is constructed for maximizing. Based on a

solution derived from the expectation-maximization method, a well designed

algorithm is proposed. Our method is of linear complexity with respect to

the number of pixels, and it can be implemented using parallel techniques.

To the best of our knowledge, our algorithm outperforms the state-of-the-

art in accuracy and presents a competitive performance in computational

∗Corresponding author. Tel.:+862787558912; fax: +862787543130.Email addresses: [email protected] (Zhihua Ban), [email protected] (Jianguo Liu)

Preprint submitted to Pattern Recognition December 30, 2016

arX

iv:1

612.

0879

2v1

[cs

.CV

] 2

8 D

ec 2

016

efficiency.

Keywords: Expectation-maximization, Gaussian mixture model, parallel

algorithm, superpixel segmentation

1. Introduction

Partitioning image into superpixels can be used as a preprocessing step to

complex computer vision tasks, such as segmentation [1], visual tracking [2],

stereo matching [3], edge detection [4], etc. Sophisticated algorithms benefit

from working with superpixels, instead of just pixels, because superpixels

reduce input entries and enable feature computation on more meaningful

regions.

Like many terminologies in computer vision, there is no rigorous math-

ematical definition for superpixel. The commonly accepted description of a

superpixel is “a group of connected, perceptually homogeneous pixels which

does not overlap any other superpixel.” For superpixel segmentation, the

following properties are generally desirable.

Prop. 1. Accuracy. Superpixels should adhere well to object bound-

aries. Superpixels crossing object boundaries arbitrarily may lead to bad or

catastrophic result for subsequent algorithms. [5, 6, 7, 8]

Prop. 2. Regularity. The shape of superpixels should be regular. Super-

pixels with regular shape make it easier to construct a graph for subsequent

algorithms. Moreover, these superpixels are visually pleasant which is helpful

for algorithm designers’ analysis. [9, 10, 11]

Prop. 3. Similar size. Superpixels should have a similar size. This

property enables subsequent algorithms dealing with each superpixel without

2

bias [12, 13, 14]. As pixels have the same “size” and the term of “superpixel”

is originated from “pixel”, this property is also reasonable intuitively. This is

a key property to distinguish between superpixel and other over-segmented

regions.

Prop. 4. Efficiency. A superpixel algorithm should have a low complex-

ity. Extracting superpixels effectively is critical for real-time applications.

[12, 6].

Under the constraint of Prop. 3, the requirements on accuracy and reg-

ularity are to a certain extent oppositional. Intuitively, if a superpixel, with

a limited size, needs to adhere well to object boundaries, the superpixel

has to adjust its shape to that object which may be irregular. To our best

knowledge, state-of-the-art superpixel algorithms failed to find a compromise

between regularity and accuracy. As four typical algorithms shown in Fig.

1(b)-1(e), the shape of superpixels generated by NC [15, 16] (Fig. 1(b)) and

LRW [10] (Fig. 1(c)) is more regular than that of superpixels extracted by

SEEDS [6] (Fig. 1(d)) and ERS [7] (Fig. 1(e)) Nonetheless, the superpixels

generated by SEEDS [6] and ERS [7] adhere object boundaries better than

those of NC [15] and LRW [10]. In this work, A Gaussian mixture model

(GMM) and an algorithm derived from the expectation-maximization (EM)

[17] are built. It is shown that the proposed method can strike a balance

between regularity and accuracy. An example is displayed in Fig. 1(a), the

compromise is that superpixels at regions with complex textures have an ir-

regular shape to adhere object boundaries, while at homogeneous regions,

the superpixels are regular.

Computational efficiency is a matter of both algorithmic complexity and

3

implementation. Our algorithm has a linear complexity with respect to the

number of pixels. As an algorithm has to read all pixels, linear time theo-

retically is the best time complexity for superpixel problem. Algorithms can

be categorized into two major groups: parallel algorithms that are able to

be implemented with parallel techniques and scale for the number of parallel

processing units, and serial algorithms whose implementations are usually

executed sequentially and only part of the system resources can be used on a

parallel computer. Modern computer architectures are parallel and applica-

tions can benefit from parallel algorithms because parallel implementations

generally run faster than serial implementations for the same algorithm. The

proposed algorithm is inherently parallel and our serial implementation can

easily achieve speedups by adding few simple OpenMP directives.

Our method is constructed by modelling each pixel with a Gaussian mix-

ture model; associating each superpixel to one of the Gaussian densities;

and further solving the proposed model with the expectation-maximization

algorithm. Differing from the commonly used assumption that data points

are independent and identically distributed (i.i.d.) in clustering applications,

pixels are assumed to be independent but non-identically distributed in our

model. The proposed approach was tested on the Berkeley Segmentation

Data Set and Benchmarks 500 (BSDS500) [18]. To the best of our knowl-

edge, the proposed method outperforms state-of-the-art methods in accuracy

and presents a competitive performance in computational efficiency.

The rest of this paper is organized as follows. Section 2 presents an

overview of related works on superpixel segmentation. Section 3 introduces

the model, solution, algorithm, parallel potential, parameters, and complex-

4

(a) (b) (c) (d) (e)

Figure 1: Superpixel segmentations by five algorithms: (a) Our method, (b) NC [15], (c)

LRW [10], (d) SEEDS [6], and (e) ERS [7]. Each segmentation has approximately 200

superpixels. The second row zooms in the regions of interest defined by the black boxes in

the first row. At the third row, superpixel boundaries are drawn to purely black images

to highlight shapes of the superpixels.

ity of the proposed method. Experiments are discussed in section 4. Finally,

the paper is concluded in section 5.

2. Related works

The concept of superpixel was first introduced by Xiaofeng Ren and Ji-

tendra Malik in 2003 [19]. During the last decades, the superpixel problem

has been well studied. Exsiting superpixel algorithms extract superpixels ei-

ther by optimizing superpixel boundaries, such as finding paths and evolving

curves, or by grouping pixels, e.g. the most well-known SLIC [12].

5

2.1. Optimize boundaries

Algorithms exctact superpixels not by labelling pixels directly but by

marking superpixel boundaries, or by only updating the label of pixels on

superpixel boundary is in this category.

Rohkohl et al. present a superpixel method that iteratively assigns su-

perpixel boundaries to their most similar neighbouring superpixel [20]. A

superpixel is represented with a group of pixels that are randomly selected

from that superpixel. The similarity between a pixel and a super-pixel is

defined as the average similarities from the pixel to all the selected represen-

tatives.

Aiming to extract lattice-like superpixels, or “superpixel lattices”, [11]

partitions an image into superpixels by gradually adding horizontal and ver-

tical paths in strips of a pre-computed boundary map. The paths are formed

by two different methods: s-t min-cut and dynamic programming. The for-

mer finds paths by graph cuts and the latter constructs paths directly. The

paths have been designed to avoid parallel paths crossing and guarantee per-

pendicular paths cross only once. The idea of modelling superpixel bound-

aries as paths (or seam carving [21]) and the use of dynamic programming

were borrowed by later variations or improvements [22, 23, 24, 25, 26, 27].

In TurboPixels [14], Levinshtein et al. model the boundary of each super-

pixel as a closed curve. So, the connectivity is naturally guaranteed. Based

on level-set evolution, the curves gradually sweep over the unlabelled pix-

els to form superpixels under the constraints of two velocities. Although

this method can produce superpixels with homogeneous size and shape, its

accuracy is relative poor.

6

In VCells [5], a superpixel is represented as a mean vector of colour of

pixels in that superpixel. With the designed distance [5], VCells iteratively

updates superpixel boundaries to their nearest neighbouring superpixel. The

iteration stops when there are no more pixels need to be updated.

SEEDS [28, 6] exchanges superpixel boundaries using a hierarchical struc-

ture. At the first iteration, the biggest blocks on superpixel boundary are

updated for a better energy. The size of pixel blocks becomes smaller and

smaller as the number of iterations increases. The iteration stops after the

update of boundary exchanges in pixel level.

Improved from SLIC [12], [29, 30] present more complex energy. To min-

imize their corresponding energy, [29, 30] update boundary pixels instead of

assigning a label for all pixels in each iteration. Based on [29], [30] adds the

connectivity and superpixel size into their energy. For the pixel updating,

[30] uses a hierarchical structure like SEEDS [28], while [30] exchanges la-

bels only in pixel level. Zhu et al. propose a speedup of SLIC [12] by only

moving unstable boundary pixels, the label of which changed in the previous

iteration [22].

Besides, based on pre-computed line segments or edge maps of the input

image, [31, 9] align superpixel boundaries to the lines or the edges to form

superpixels with very regular shape.

2.2. Grouping pixels

Superpixels algorithms that assign labels for all pixels in each iteration is

in this category.

With an affinity matrix constructed based on boundary cue [32], the

algorithm developed in [16, 19], which is usually abbreviated as NC [12],

7

uses normalized cut [15] to extract superpixels. This method produces very

regular superpixels, while its time complexity is approximately O(N3/2) [14],

which is expensive as a preprocessing step, where N is the number of pixels.

In Quick shift (QS) [33], the pixel density is estimated on a Parzen window

with a Gaussian kernel. A pixel is assigned to the same group with its

parent which is the nearest pixel with a greater density and within a specified

distance. QS does not guarantee connectivity, or in other words, pixels with

the same label may not be connected.

Veksler et al. propose an approach that distributes a number of overlap-

ping square patches on the input image and extracts superpixels by finding

a label for each pixel from patches that cover the present pixel [34]. The

expansion algorithm in [35] is gradually adapted to modify pixel label within

local regions with a fixed size in each iteration. This method can generate

superpixels with regular shape and its run-time is proportional to the num-

ber of overlapping patches [34]. A similar solution in [36] is to formulate the

superpixel problem as a two-label problem and build an algorithm through

grouping pixels into vertical and horizontal bands. By doing this, pixels in

the same vertical and horizontal group form a superpixel.

Starting from an empty graph edge set, ERS [7] sequentially adds edges

to the set until the desired number of superpixels is reached. At each adding,

ERS [7] takes the edge that results in the greatest increase of an objective

function. The number of generated superpixels is exactly equal to the desired

number. This method adheres object boundary well and its performance in

accuracy was not surpassed until our method is proposed.

SLIC [12] is the most well-known superpixel algorithm due to its efficiency

8

and simplicity. In SLIC [12], a pixel corresponds to a five dimensional vector

including colour and spatial location, and k-means is employed to cluster

those vectors locally, i.e. each pixel only compares with superpixels that fall

into a specified spatial distance and is assigned to the nearest superpixel.

Many variations follow the idea of SLIC in order to either decrease its run-

time [37, 38, 39] or improve its accuracy [40, 29]. LSC [8] also uses a k-means

method to refine superpixels. Instead of directly using the 5D vector used in

SLIC [12], LSC [8] maps them to a feature space and a weighted k-means is

adopted to extract superpixels. It is the most recent algorithm that achieves

equally well accuracy with ERS [7].

Based on marker-based watershed transform, [13, 37] incorporate spatial

constraints to an image gradient in order to produce superpixels with regular

shape and similar size. Generally, those methods run relatively faster, but

adhere ground-truth boundaries badly.

LRW [10] groups pixels using an improved random walk algorithm. By

using texture features to optimize an initial superpixel map, this method can

produce regular superpixels in regions with complex texture. However, this

method suffers from a very slow speed.

Although FH [41], mean shift [42] and watersheds [43], have been refered

to as “superpixel” alogrithms in the literature, they are not covered in this

paper as the size of the regions produced by them varies enormously. This

is mainly because these algorithms do not offer direct control to the size of

the segmented regions. Structure-sensitive or content-sensitive superpixels

in [44, 45] are also not considered to be superpixels, as they do not aim to

extract regions with similar size (see Prop. 3 in section 1).

9

A large number of superpixel algorithms have been proposed, however,

few works present novel models and most of the exsiting energy functions

are variation of the objective function of k-means. In our work, we propose

a novel model to tackle the superpixel problem. With a comprehensively

designed algorithm, the underlying segmentation from the model is well re-

vealed.

3. The method

The proposed method can be described by two steps: the first one is

to introduce the proposed new model, in which pixel and superpixel are

associated with each other; after that, an algorithm is constructed to solve

this model in the second step. The complexity of the proposed algorithm is

presented at the end of this section.

3.1. Model

In the proposed model, i is supposed to be the pixel index of an input

image I with its width W and height H in pixels. The total number of

pixels in I can be denoted as N = W · H. For each pixel i, which belongs

to one of the integers in the image pixel set V = {0, 1, . . . , N − 1}, (xi, yi)

represents its position on the image plane, where, xi ∈ {0, 1, . . . ,W −1}, and

yi ∈ {0, 1, . . . , H−1}. ci is used to represent its intensity or colour. If colour

image is used, ci is a vector, otherwise, ci is a scalar. To better represent

pixel i, a random variable Zi = (Xi, Yi, Ci)T along with its observed value

zi = (xy, yi, ci)T is used. Note that here the random variables Zi, for all

i ∈ V , are independent but non-identically distributed as discussed below.

10

The width vx and height vy of each superpixel should be specified by user.

If the desired number of superpixels K is preferred, we obtain vx and vy using

equation (1).

vx = vy =

⌊√W ·HK

⌋. (1)

It is encouraged to use the same value for vx and vy, or they should not have

a big difference as we wish the generated superpixels are with square shape.

Once vx and vy are obtained, the numbers of superpixels nx and ny re-

spectively along the width and the height of I are defined using equation

(2).

nx =

⌊W

vx

⌋, ny =

⌊H

vy

⌋. (2)

For simplicity of discussion, we assume thatW mod vx = 0 andH mod vy =

0. Therefore, the initial number of superpixels K becomes nx · ny.

For each individual pixel i, there are two initial superpixel numbers, Li,x

and Li,y, which are defined in equation (3).

Li,x =

⌊xivx

⌋, Li,y =

⌊yivy

⌋. (3)

Based on equation (2) and (3), it can be inferred that 0 ≤ Li,x ≤ nx− 1, and

0 ≤ Li,y ≤ ny − 1. Li is used to denote the random latent variable for pixel

i. The possible values of Li are in a set Ki expressed in equation (4).

Ki ={k | k = (Li,x + tx) + (Li,y + ty) · nx, k ≥ 0}, (4)

where tx ∈ {−tx, 1− tx, . . . , tx}, ty ∈ {−ty, 1− ty, . . . , ty}, and tx and ty are

positive integers, such as 1. Obviously, Ki is a subset ofK = {0, 1, . . . , K−1}.

We assume that a pixel is generated by first randomly choosing one of the

Gaussian densities with the same probability 1/K, and then being sampled

11

on the selected Gaussian distribution. With the definitions and notations

above, pixel i is described by a mixture of Gaussians as defined in equation

(5).

p(Zi = zi) =∑k∈Ki

Pr(Li = k) p(zi | Li = k;µk,Σk) , i ∈ V , (5)

where Pr(Li = k) is manually set as 1/K, k ∈ {0, 1, . . . , K − 1}. Although

this setting results in a fact that∫zip(Zi = zi)dzi may not equal 1, its

effect will be removed in our algorithm due to the same value for the prior

distribution of Li. For a given k, p(zi|Li = k;µk,Σk) is a Gaussian density

function parametrized by a mean vector, µk, and a covariance matrix Σk, as

shown in equation (6).

p(z | Li = k;µk,Σk) = (6)

1

(2π)D/2√

det(Σk)exp

{− 1

2(z− µk)TΣ−1k (z− µk)

},

where k ∈ K and D is the number of components in z.

Given an image, our model is defined as maximizing equation (7), which

is extended from logarithmic likelihood function used in many statistic esti-

mation problems.

L(θ) =∑i∈V

ln p(Zi = zi)

=∑i∈V

ln∑k∈Ki

wk p(zi | Li = k;θk). (7)

In the above equation, θ = (θ1, . . . ,θK−1) is used to denote the parameters

in the Gaussian densities, where θk = (µk,Σk).

The label of pixel i is determined by the posterior distribution of Li as

12

shown below.

Li = argk∈Kimax Pr(Li = k | Zi = zi;θk). (8)

The posterior probability of Li can be expressed as

Pr(Li = k | Zi = zi) =p(Li = k,Zi = zi)

p(Zi = zi)

=Pr(Li = k)p(Zi = zi|Li = k)∑k∈Ki

Pr(Li = k)p(Zi = zi|Li = k)

=p(zi|Li = k;θk)∑k∈Ki

p(zi|Li = k;θk). (9)

Therefore, once we find a solution to maximize (7), Li can be easily obtained.

3.2. Solution

As Pr(Li = k) is constant, we will use wk to represent it in the following

text. According to Jensen’s inequality, L(θ) is greater than or equal to

Q(R,θ) as shown below.

L(θ) =∑i∈V

ln∑k∈Ki

Ri,kwk p(zi | Li = k;θk)

Ri,k

(10)

≥∑i∈V

∑k∈Ki

Ri,k lnwk p(zi | Li = k;θk)

Ri,k

= Q(R,θ), (11)

where Ri,k ≥ 0,∑

k∈KiRi,k = 1, and R = {Ri,k | i ∈ V, k ∈ Ki}. We

use the expectation-maximization (EM) algorithm to iteratively maximize

Q(R,θ) to approach the maximum of L(θ) with two steps: the expectation

step (E-step) and the maximization step (M-step).

E-step: once a guess of θ is given, Q(R,θ) is expected to be tightly

attached to L(θ). To this end, R is required to ensure L(θ) = Q(R,θ).

13

Equation (12) is a sufficient condition for Jensen’s inequality to hold the

equality.wk p(zi | Li = k;θk)

Ri,k

= α, k ∈ Ki, (12)

where α is a constant number. Since∑

k∈KiRi,k = 1, α can be eliminated

and Ri,k can be updated by equation (13).

Ri,k =p(zi|Li = k;θk)∑k∈Ki

p(zi|Li = k;θk). (13)

Notice that equation (13) is exactly the same with equation (9). Therefore,

equation (8) can be rewrote as

Li = argk∈KimaxRi,k. (14)

M-step: in this step, θ is derived by maximizing Q(R,θ) with a given

R. To do this, we first get the derivatives of Q(R,θ) with respect to µk and

Σk, and set the derivatives to zero, as seen in equations (15)-(17). Then the

parameters are obtained by solving equation (17).

∂Q(R,θ)

∂µk=∑i∈Ik

Ri,k

{Σ−1k (zi − µk)

}, (15)

∂Q(R,θ)

∂Σk

=∑i∈Ik

Ri,k

{1

2Σ−1k (zi − µk)(zi − µk)TΣ−1k −

1

2Σ−1k

}, (16)

∂Q(R,θ)

∂µk= 0,

∂Q(R,θ)

∂Σk

= 0, (17)

µk =

∑i∈Ik Ri,kzi∑i∈Ik Ri,k

, (18)

Σk =

∑i∈Ik Ri,k(zi − µk)(zi − µk)T∑

i∈Ik Ri,k

, (19)

where Ik = {i|k ∈ Ki, i ∈ V } is a subset of V . The update of θ will mono-

tonically improve L(θ): L(θ(t+1)) = Q(R(t+1),θ(t+1)) ≥ Q(R(t),θ(t+1)) ≥

Q(R(t),θ(t)) = L(θ(t)).

14

3.3. The algorithm

In this section, we will discuss the choice of covariance matrices and tricks

to make the algorithm running well in practice.

It can be noted that although the solution in section 3.2 supports full

covariance matrices, i.e., a covariance matrix with all its elements as shown

in equation (19), only block diagonal matrices are used in this paper (see

equation (20)). This is done for three reasons. First, computing block diag-

onal matrices is more efficient than full matrices. Second, generally there is

no strong relation between the spatial coordinates (xi, yi) and the intensity

or the colour ci. So it is reasonable to consider them separately. Third, full

matrices will not bring better performance but give bad results for colour

images. For different colour space, it is encouraged to split components that

do not have strong relation into different covariance matrices. For example,

if CIELAB is adopted, it is better to put colour-opponent dimensions a and b

into a 2 by 2 covariance matrix. In this case, (20) will become (21). However,

we will keep using (20) to discuss our algorithm for simplicity.

Σk =

Σk,s 0

0 Σk,c

, (20)

Σk =

Σk,s 0 0

0 σ2k,l 0

0 0 Σk,(a,b)

, (21)

where Σk,s and Σk,c respectively represent the spatial covariance matrix and

the colour covariance matrix. The covariance matrices are updated according

to equations (22) and (23) which are derived by replacing Σk in equation (16)

15

with the block diagonal matrices, and by further solving (17).

Σk,s =

∑i∈Ik Ri,k(zi,s − µk,s)(zi,s − µk,s)T∑

i∈Ik Ri,k

, (22)

Σk,c =

∑i∈Ik Ri,k(zi,c − µk,c)(zi,c − µk,c)T∑

i∈Ik Ri,k

, (23)

where zi,s and µi,s are the spatial components of zi and µi, and zi,c and µi,c

are, for grayscale image, the intensity component, or, for colour image, the

colour component of zi and µi.

Since Σk,s and Σk,c are positive semi-definite in practice, they may be not

invertible sometimes. To avoid this trouble, we first compute the eigende-

composition of the two covariance matrices as shown in equations (24) and

(25), then eigenvalues on the major diagonal of Λk,s and Λk,c are modified

using equations (26) and (27), and finally Σk,c and Σk,c are reconstructed

with the equations (28) and (29).

Σk,s = Qk,s Λk,s Q−1k,s, (24)

Σk,c = Qk,c Λk,c Q−1k,c, (25)

where Λk,s and Λk,c are diagonal matrices with eigenvalues on their respective

major diagonal. λk,s(js) and λk,c(jc) for colour image are used to denote the

respective eigenvalues, for js ∈ {0, 1} and jc ∈ {0, 1, 2}. Qk,s and Qk,c are

orthogonal matrices. If the input image is grayscale, Qk,c = 1, Σk,c and Λk,c

are scalars, and jc will be reduced to 0.

λk,s(js) =

λk,s if λk,s(js) ≥ εs,

εs else.(26)

16

λk,c(jc) =

λk,c if λk,c(jc) ≥ εc,

εc else.(27)

where εs and εc are two constant numbers.

Σk,s = Qk,s Λk,s Q−1k,s, (28)

Σk,c = Qk,c Λk,c Q−1k,c, (29)

where Λk,s and Λk,c are diagonal matrices with λk,s(js) and λk,c(jc) on their

respective major diagonal.

After initializing θ, equations (13), (18), (28), and (29) are iterated until

convergence. Once the iteration stops, the superpixel label Li can be obtained

using equation (14).

As the connectivity of superpixels cannot be guaranteed, a postprocessing

step is required. This is done by sorting the isolated superpixels in ascending

order according to their sizes, and sequentially merging small isolated super-

pixels, which are less than one fourth of the desired superpixel size, to their

nearest neighbouring superpixels, with only intensity or colour being taken

into account. Once an isolated superpixel (source) is merged to another su-

perpixel (destination), the size of the source superpixel is cleared to zero,

and the size of the destination superpixel will be updated by adding the size

of the source superpixel. This size updating trick will prevent the size of the

produced superpixels from significantly varying.

As a preprocessing step, superpixel algorithm should run as fast as pos-

sible. Since in SLIC [12] and LSC [8], iterating a certain number of times

is sufficient for most images without checking convergence, we borrow this

trick to our algorithm and set the number of iterations T as a parameter.

The proposed algorithm can be summarized in Algorithm 1.

17

Algorithm 1 The proposed superpixel algorithm

Input : vx and vy, or K

Output: Li, i ∈ V

1: Initialize µk, k ∈ K, using K seed pixels over the input image uniformly

at fixed horizontal and vertical intervals vx and vy.

2: Initialize Σk,s and Σk,c.

3: Compute Li,x and Li,y using equation (3).

4: Calculate R using equation (13), set t = 0.

5: while t < T do

6: Compute µk using equations (18).

7: Compute Σt+1k,s and Σt+1

k,c using equations (28) and (29).

8: Update Ri,k using equation (13).

9: t = t+ 1.

10: end while

11: Li is determined by equation (14).

12: Merge small superpixels to their nearest neighbour.

18

3.4. Parallel potential

As the frequency of a single processor is difficult to improve, modern

processors are designed using parallel architecture. If an algorithm is able

to be implemented with parallel techniques and scales for the number of

parallel processing units, its computational efficiency will be significantly

improved. Fortunately, the most expensive part of our algorithm, namely

the computing R and θ, can be parallelly executed. Each Ri,k is computed

independently, and so do µk and Σk. In our experiments, we will show that

our implementation is easy to get speedup on multi-core CPUs.

3.5. Parameters

In addition to the parameters (i.e. vx and vy, or K) left to users, tx, ty,

T , εs, εc, and the initialization of θ should be assigned before starting the

proposed algorithm.

tx and ty control the size of overlapping region of neighbouring superpix-

els. We set them to 1 for all the results in this paper. If we use a large

tx or ty, the run-time will increase a lot but the results will not present a

satisfactory improvement in accuracy. In general, larger T will give better

performance but, again, it will sacrifice the efficiency. We have found that

T = 10 is enough for most images. In most state-of-the-art algorithms, the

size of overlapping region is not provided as parameters. We make them free

to users so that they can customize their own algorithm.

Unlike tx, ty and T , different εs, εc, and initialization of θ will not change

the run-time but give a different performance in accuracy. Although εs and

εc are originally used to prevent the covariance matrices from being singular,

they also can weigh the relative importance between spatial proximity and

19

colour similarity. For instance, a larger εc produces more regular superpixels,

and the opposite is true for a smaller εc. As εc and εs are opposite to each

other, we set εs = 2 and leave εc for detailed description in section 4. As we

hope superpixels being local or regularly positioned on the image plane, µk

are initialized regularly as already presented in Algorithm 1. For Σk,s, we

set their main diagonal to v2x, v2y and others to zero, so that neighbouring

superpixels can be well overlapped at the beginning. The initialization of

Σk,c is not very straightforward, the basic idea is to set their main diagonal

with a small colour distance with which two pixels are perceptually uniform.

The effect of different initialization of Σk,c will be discussed in section 4.

3.6. Complexity

The updating of R has a complexity of O(|Ki| · (T + 1) · N), for i ∈ V .

According to equation (4), (T + 1) ·N ≤ |Ki| · (T + 1) ·N < (2tx + 1) · (2ty +

1) · (T + 1) ·N , in which T , tx, and ty are constant numbers in our algorithm.

Therefore, the complexity of updating R is O(N).

Based on equations (18), (22), and (23), the complexity of updating θ

is O(T · K · |Ik|), for k ∈ {0, 1, . . . , K}. According to equations (3) and

(4), we have vx · vy ≤ |Ik| ≤ (2tx + 1) · (2ty + 1) · vx · vy. As a result,

T ·K · vx · vy ≤ T ·K · |Ik| ≤ T · (2tx + 1) · (2ty + 1) ·K · vx · vy, which means

T ·N ≤ T ·K · |Ik| ≤ T · (2tx + 1) · (2ty + 1) ·N . Therefore, the updating of

Gaussian parameters has a complexity of O(N).

In the worst case, the sorting procedure in the postprocessing step re-

quires O(m2) operations, where m is the number of isolated superpixels.

The merging step needs O(m ·n) operations, where m is the number of small

isolated superpixels and n represents the average number of their adjacent

20

100 200 300 400 500 600

Number of superpixels

0.8

0.85

0.9

BR

6=26=46=66=86=100.863

0.8660.869

(a)

100 200 300 400 500 600


0.15

0.2

0.25

0.3

UE

6=26=46=66=86=10

0.202

0.203

(b)

100 200 300 400 500 600


0.93

0.94

0.95

0.96

AS

A

6=26=46=66=86=100.9585

0.9587

(c)

Figure 2: Different initialization for Σk,c. Experiments are performed on BSDS500 and

results are averaged over 500 images. The results of BR, UE, and ASA are correspondingly

plotted in (a), (b), and (c). In order to see more details, part of the results are zoomed

in. (better see in colour)

neighbours. In practice, m2 + m · n � T · N , the operations required for

the postprocessing step can be ignored. Therefore, the proposed superpixel

algorithm is of a linear complexity O(N).

4. Experiment

In this section, algorithms are evaluated in terms of accuracy, compu-

tational efficiency, and visual effects. Like many state-of-the-art superpixel

algorithms, we also use CIELAB colour space for our experiments because it

is perceptually uniform for small colour distance.

Accuracy : three commonly used metrics are adopted: boundary recall

(BR), under-segmentation error (UE), and achievable segmentation accuracy

(ASA). To assess the performance of the selected algorithms, experiments

are conducted on the Berkeley Segmentation Data Set and Benchmarks 500

(BSDS500) which is an extension of BSDS300. These two data sets have

been wildly used in superpixel algorithms. BSDS500 contains 500 images,

and each one of them has the size of 481×321 or 321×481 with at least four

21

ground-truth human annotations.

• BR measures the percentage of ground-truth boundaries correctly re-

covered by the superpixel boundary pixels. A true boundary pixel is

considered to be correctly recovered if it falls within two pixels from at

least one superpixel boundary. A high BR indicates that very few true

boundaries are missed.

• A superpixel should not cross ground-truth boundary, or, in other

words, it should not cover more than one object. To quantify this

notion, UE calculates the percentage of superpixels that have pixels

“leak” from their covered object as shown in equation (30).

UE = (−1) +1

N

∑|sk∩sg | > ε|sk|

|sk|, (30)

where sk and sg are pixel sets of superpixel k and ground-truth segment

g. ε = 0.05 is generally accepted.

• If we assign every superpixel with the label of a ground-truth segment

(a) (b) (c) (d) (e)

Figure 3: visual results with (a) λ = 2, (b) λ = 4, (c) λ = 6, (d) λ = 8, and (e) λ = 10.

The test image is from BSDS500 and approximately 400 superpixels are extracted in each

image.

22

into which the most pixels of the superpixel fall, how much segmen-

tation accuracy can we achieve, or how many pixels are correctly seg-

mented? ASA is designed to answer this question. Its formula is defined

in equation (31) in which G is the set of ground-truth segments.

ASA =1

N

∑k∈K

max

{|sk ∩ sg|

∣∣ g ∈ G}. (31)

Computational efficiency : execution time is used to quantify this prop-

erty.

4.1. Effect of parameters

As we have mentioned in section 3.4, the effect of εc and the initialization

of Σk,c is discussed in this section.

Σk,c are initialized to diagonal matrix with the same λ2 on their major

diagonal. As shown in Fig. 2, there is no obvious regularity. In Fig. 2, the

maximum difference between two lines is around 0.001∼ 0.006 which is very

small. Although it seems that small λ will lead to a better BR result, it is

not true for UE and ASA. For instance, in the enlarged region of Fig. 2b, the

result of λ = 10 is slightly better than λ = 6. Visual results with different

λ are plotted in Fig. 3, it is hard for human to distinguish the difference

among the five results.

εc can be used to control the regularity of the generated superpixels in

each iteration. As shown in Fig. 4, small difference of εc does not present

obvious variation for UE and ASA, but it do affect the results of BR. In

general, a larger εc leads to more regular superpixels. Conversely, the shape

of superpixels generated with a smaller εc is relative irregular (see Fig. 5).

23

100 200 300 400 500 600


0.8

0.85

0.9

BR

0c=20c=40c=60c=80c=10

0.860.8650.87

(a)

100 200 300 400 500 600


0.15

0.2

0.25

0.3

UE

0c=2

0c=4

0c=6

0c=8

0c=10

0.2

0.202

(b)

100 200 300 400 500 600


0.94

0.95

0.96

AS

A

0c=2

0c=4

0c=6

0c=8

0c=100.95840.95860.9588

(c)

Figure 4: Results with different εc. Experiments are performed on BSDS500 and results

are averaged over 500 images. The results of BR, UE, and ASA are correspondingly plotted

in (a), (b), and (c). In order to see more details, part of the results are zoomed in. (better

see in colour)

(a) (b) (c) (d) (e)

Figure 5: visual results with (a) εc = 2, (b) εc = 4, (c) εc = 6, (d) εc = 8, and (e) εc = 10.

The test image is from BSDS500 and approximately 400 superpixels are extracted in each

image. The second row is enlarged from the rectangular marked in the first row.

Because superpixels with irregular shape will produce more boundary pixels,

the result of BR with small εc is better than that with greater εc.

We will use λ = 8 and εc = 8 in the following experiments. Although

this setting does not give the best performance in accuracy, the shape of

superpixels using this setting is regular and visually pleasant (see Fig. 5(d)).

Moreover, it is enough to outperform state-of-the-art algorithms as shown in

Fig. 6.

24

4.2. Parallel scalability

In order to evaluate scalability for the number of processors, we test our

implementation on an machine attached with an Intel(R) Xeon(R) CPU E5-

2620 v3 @ 2.40GHz and 8 GB RAM. The source code is not optimized for

any specific architecture. Only two OpenMP directives are added for the

updating of Σk, µk, and R, as they can be computed independently (see

section 3.4). As listed in Table 1, for a given image, multiple cores will

present a better performance.

Table 1: run-time (ms) of our implementation on different images with various resolution.

The program is executed using 1, 2, 4, and 6 cores.

Resolution 1 core 2 cores 4 cores 6 cores

240×320 393.646 303.821 227.078 200.708

320×480 776.586 589.785 400.073 321.548

480×640 1569.74 1011.62 743.629 624.561

640×960 3186.71 2244.12 1353.72 1069.79

4.3. Comparison with state-of-the-art algorithms

We compare the proposed algorithm to eight state-of-the-art superpixel

segmentation algorithms including LSC1 [8], SLIC2 [12], SEEDS3 [6], ERS4

1http://jschenthu.weebly.com/projects.html2http://ivrl.epfl.ch/research/superpixels3http://www.mvdblive.org/seeds/4https://github.com/mingyuliutw/ers

25

[7], TurboPixels5 [14], LRW6 [10], VCells7 [5], and Waterpixels8 [13]. The re-

sults of the eight algorithms are all generated from implementations provided

by the authors on their respective websites with their default parameters ex-

cept for the desired number of superpixels, which is generally decided by

users.

As shown in Fig. 6, our method outperforms the selected state-of-the-art

algorithms especially for UE and ASA. It is not easy to distinguish between

our result and LSC in Fig. 6(a). However, if we use εc = 2, our result will

obviously outperforms LSC as displayed in Fig. 7.

To compare the run-time of the selected algorithms, we test them on

a desktop machine equipped with an Intel(R) Core(TM) i5-4590 CPU @

3.30GHz and 8 GB RAM. The results are plotted in Fig. 8. According to Fig.

8(b), as the size of the input image increases, run-time of our algorithm grows

linearly, which proves our algorithm is of linear complexity experimentally.

A visual comparison is displayed in Fig. 9. According to the zooms,

only our algorithm can correctly reveal the segmentations. Our superpixel

boundaries can adhere object very well. LSC gives a really competitive result,

however there are still parts of the objects being under-segmented. The

superpixels extracted by SEEDS and ERS are very irregular and their size

varies tremendously. The remaining five algorithms can generate regular

superpixels, but they adhere object boundaries very bad.

5http://www.cs.toronto.edu/ babalex/research.html6https://github.com/shenjianbing/lrw147http://www-personal.umich.edu/ jwangumi/software.html8http://cmm.ensmp.fr/ machairas/waterpixels.html

26

100 200 300 400 500 600


0.4

0.6

0.8

BR

OursLSCSEEDSERS

VCellsLRWSLICWaterTP

(a)

100 200 300 400 500 600


0.2

0.3

0.4

0.5

UE

OursLSCSEEDSERS


(b)

100 200 300 400 500 600


0.88

0.9

0.92

0.94

0.96

AS

A

OursLSCSEEDSERS


(c)

Figure 6: Comparison with state-of-the-art algorithms. Experiments are performed on

BSDS500 and results are averaged over 500 images. The results of BR, UE, and ASA are

correspondingly plotted in (a), (b), and (c).

5. Conclusion

This paper presents an efficient superpixel segmentation algorithm by

constructing a novel Gaussian mixture model. With each superpixel as-

sociated to a Gaussian density, each pixel is assumed to be independently

distributed according to a mixture of the Gaussian densities. Aiming to ex-

tract superpixles with similar size, the Gaussian densities are assumed to be

occurred with the same chance. We formulate a log-likelihood function to

describe the probability of an image. Based on Jensen’s inequality and the

expectation-maximization algorithm, an iterative solution is constructed to

approach a maximum of the log-likelihood by improving its low bound. The

label of each pixel is determined to the one with maximum posterior proba-

bility. With a comprehensively designed algorithm, opportunity is discovered

to control the shape of superpixels.

According to our experiments, the initialization of our method produces

results with tiny difference which can be ignored. The proposed algorithm

is of linear complexity, which has been proved by both theoretical analysis

27

100 200 300 400 500 600


0.8

0.85

0.9

BR

OursLSC

Figure 7: Comparison of BR between LSC and our method. Without changing the default

value of other parameters, we use ε = 2 in this figure.

229#

229

458#

458

606#

606

725#

725

826#

826

916#

916

999#

999

Image size

0

2

4

Exe

cutio

n tim

e (m

s)

#104

OursLSCSEEDSERSVCells

SLICWaterTP

(a)

229#

229

458#

458

606#

606

725#

725

826#

826

916#

916

999#

999

Image size

0

500

1000

1500

2000

Exe

cutio

n tim

e (m

s) OursLSCSEEDSSLIC

(b)

Figure 8: Comparison of run-time. Seven algorithms are compared in (a). In order to see

more details, the rum-time of the fastest four algorithms is plotted in (b). LRW is not

included in the two figures due to its slow speed.

28

Ou

rsL

SC

SE

ED

SE

RS

VC

ells

LR

WS

LIC

Wat

erT

P

(a) (b) (c) (d)

Figure 9: Visual comparison. The test image is selected from BSDS500. Each algorithm

extracts approximately 200 superpuxels. For each segmentation, four parts are enlarged

to display more details.

29

and experimental results. What’s more, it can be implemented using par-

allel techniques, and its run-time scales for the number of processors. The

comparison with the state-of-the-art algorithms is shown that our algorithm

outperforms the selected methods in accuracy and presents a competitive

performance in computational efficiency.

As a contribution to open source society, we will make our test code public

available at https://github.com/ahban.

References

References

[1] Z. Li, X.-M. Wu, S.-F. Chang, Segmentation using superpixels: A bi-

partite graph partitioning approach, in: CVPR, 2012, pp. 789–796.

[2] F. Yang, H. Lu, M.-H. Yang, Robust superpixel tracking, IP 23 (4)

(2014) 1639–1651.

[3] F. Cheng, H. Zhang, M. Sun, D. Yuan, Cross-trees, edge and superpixel

priors-based cost aggregation for stereo matching, PR 48 (7) (2015) 2269

– 2278.

[4] X. Sun, K. Shang, D. Ming, J. Tian, J. Ma, A biologically-inspired

framework for contour detection using superpixel-based candidates and

hierarchical visual cues, Sensors 15 (10) (2015) 26654–26674.

[5] J. Wang, X. Wang, VCells: Simple and efficient superpixels using edge-

weighted centroidal voronoi tessellations, PAMI 34 (6) (2012) 1241–1247.

30

https://github.com/ahban

[6] M. Van den Bergh, X. Boix, G. Roig, L. Van Gool, SEEDS: Superpixels

extracted via energy-driven sampling, IJCV 111 (3) (2015) 298–314.

[7] M.-Y. Liu, O. Tuzel, S. Ramalingam, R. Chellappa, Entropy rate super-

pixel segmentation, in: CVPR, 2011, pp. 2097–2104.

[8] Z. Li, J. Chen, Superpixel segmentation using linear spectral clustering,

in: CVPR, 2015, pp. 1356–1363.

[9] L. Duan, F. Lafarge, Image partitioning into convex polygons, in:

CVPR, 2015, pp. 3119–3127.

[10] J. Shen, Y. Du, W. Wang, X. Li, Lazy random walks for superpixel

segmentation, IP 23 (4) (2014) 1451–1462.

[11] A. P. Moore, S. Prince, J. Warrell, U. Mohammed, G. Jones, Superpixel

lattices, in: CVPR, 2008, pp. 1–8.

[12] R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, S. Susstrunk, SLIC

superpixels compared to state-of-the-art superpixel methods, PAMI

34 (11) (2012) 2274–2282.

[13] V. Machairas, M. Faessel, D. Cardenas-Pena, T. Chabardes, T. Walter,

E. Decenciere, Waterpixels, IP 24 (11) (2015) 3707–3716.

[14] A. Levinshtein, A. Stere, K. N. Kutulakos, D. J. Fleet, S. J. Dickinson,

K. Siddiqi, TurboPixels: Fast superpixels using geometric flows, PAMI

31 (12) (2009) 2290–2297.

[15] J. Shi, J. Malik, Normalized cuts and image segmentation, PAMI 22 (8)

(2000) 888–905.

31

[16] G. Mori, Guiding model search using segmentation, in: ICCV, 2005, pp.

1417–1423.

[17] A. P. Dempster, N. M. Laird, D. B. Rubin, Maximum likelihood from

incomplete data via the em algorithm, Journal of the royal statistical

society. Series B (methodological) (1977) 1–38.

[18] P. Arbelaez, M. Maire, C. Fowlkes, J. Malik, Contour detection and

hierarchical image segmentation, PAMI 33 (5) (2011) 898–916.

[19] X. Ren, J. Malik, Learning a classification model for segmentation, in:

ICCV, 2003, pp. 10–17.

[20] C. Rohkohl, K. Engel, Efficient image segmentation using pairwise pixel

similarities, in: Joint Pattern Recognition Symposium, 2007, pp. 254–

263.

[21] S. Avidan, A. Shamir, Seam carving for content-aware image resizing,

TOG 26 (3).

[22] S. Zhu, D. Cao, S. Jiang, Y. Wu, P. Hu, Fast superpixel segmentation

by iterative edge refinement, EL 51 (3) (2015) 230–232.

[23] A. P. Moore, S. J. Prince, J. Warrell, “lattice cut”-constructing super-

pixels using layer constraints, in: CVPR, 2010, pp. 2117–2124.

[24] H. Fu, X. Cao, D. Tang, Y. Han, D. Xu, Regularity preserved superpixels

and supervoxels, MM 16 (4) (2014) 1165–1175.

[25] D. Tang, H. Fu, X. Cao, Topology preserved regular superpixel, in:

ICME, 2012, pp. 765–768.

32

[26] P. Siva, A. Wong, Grid seams: A fast superpixel algorithm for real-time

applications, in: CRV, 2014, pp. 127–134.

[27] P. Siva, C. Scharfenberger, I. B. Daya, A. Mishra, A. Wong, Return of

grid seams: A superpixel algorithm using discontinuous multi-functional

energy seam carving, in: ICIP, 2015, pp. 1334–1338.

[28] M. Van den Bergh, X. Boix, G. Roig, B. de Capitani, L. Van Gool,

SEEDS: Superpixels extracted via energy-driven sampling, in: ECCV,

2012, pp. 13–26.

[29] K. Yamaguchi, D. McAllester, R. Urtasun, Efficient joint segmentation,

occlusion labeling, stereo and flow estimation, in: ECCV, 2014, pp.

756–771.

[30] J. Yao, M. Boben, S. Fidler, R. Urtasun, Real-time coarse-to-fine topo-

logically preserving segmentation, in: CVPR, 2015, pp. 2947–2955.

[31] L. Li, J. Yao, J. Tu, X. Lu, K. Li, Y. Liu, Edge-based split-and-merge

superpixel segmentation, in: ICIA, 2015, pp. 970–975.

[32] D. R. Martin, C. C. Fowlkes, J. Malik, Learning to detect natural image

boundaries using local brightness, color, and texture cues, PAMI 26 (5)

(2004) 530–549.

[33] A. Vedaldi, S. Soatto, Quick shift and kernel methods for mode seeking,

in: ECCV, 2008, pp. 705–718.

[34] O. Veksler, Y. Boykov, P. Mehrani, Superpixels and supervoxels in an

energy optimization framework, in: ECCV, 2010, pp. 211–224.

33

[35] Y. Boykov, O. Veksler, R. Zabih, Fast approximate energy minimization

via graph cuts, PAMI 23 (11) (2001) 1222–1239.

[36] Y. Zhang, R. Hartley, J. Mashford, S. Burn, Superpixels via pseudo-

boolean optimization, in: ICCV, 2011, pp. 1387–1394.

[37] P. Neubert, P. Protzel, Compact watershed and preemptive slic: On

improving trade-offs of superpixel segmentation algorithms., in: ICPR,

2014, pp. 996–1001.

[38] Y. Kesavan, A. Ramanan, One-pass clustering superpixels, in: ICIAfS,

2014, pp. 1–5.

[39] C. Y. Ren, V. A. Prisacariu, I. D. Reid, gSLICr: SLIC superpixels at

over 250hz, ArXiv e-printsarXiv:1509.04232.

[40] S. Jia, S. Geng, Y. Gu, J. Yang, P. Shi, Y. Qiao, NSLIC: SLIC super-

pixels based on nonstationarity measure, in: ICIP, 2015, pp. 4738–4742.

[41] P. Felzenszwalb, D. Huttenlocher, Efficient graph-based image segmen-

tation, IJCV 59 (2) (2004) 167–181.

[42] D. Comaniciu, P. Meer, Mean shift: A robust approach toward feature

space analysis, PAMI 24 (5) (2002) 603–619.

[43] L. Vincent, P. Soille, Watersheds in digital spaces: an efficient algorithm

based on immersion simulations, PAMI 13 (6) (1991) 583–598.

[44] Y.-J. Liu, C.-C. Yu, M.-J. Yu, Y. He, Manifold slic: A fast method to

compute content-sensitive superpixels, in: CVPR, 2016, pp. 651–659.

34

http://arxiv.org/abs/1509.04232

[45] P. Wang, G. Zeng, R. Gan, J. Wang, H. Zha, Structure-sensitive super-

pixels via geodesic distance, IJCV 103 (1) (2013) 1–21.

35

a novel gaussian mixture model for superpixel segmentation

Documents