integration of multiple contextual information for image …qji/papers/bn_seg.pdf · 2008. 10....

Integration of Multiple Contextual Information for Image Segmentationusing a Bayesian Network

Lei Zhang and Qiang JiRensselaer Polytechnic Institute

110 8th St., Troy, NY [email protected],[email protected]

AbstractWe propose a Bayesian Network (BN) model to integrate

multiple contextual information and the image measure-ments for image segmentation. The BN model systemati-cally encodes the contextual relationships between regions,edges and vertices, as well as their image measurementswith uncertainties. It allows a principled probabilistic in-ference to be performed so that image segmentation canbe achieved through a most probable explanation (MPE)inference in the BN model. We have achieved encourag-ing results on the horse images from the Weizmann dataset.We have also demonstrated the possible ways to extend theBN model so as to incorporate other contextual informationsuch as the global object shape and human intervention forimproving image segmentation. Human intervention is en-coded as new evidence in the BN model. Its impact is prop-agated through belief propagation to update the states ofthe whole model. From the updated BN model, new imagesegmentation is produced.

1. Introduction

Image segmentation is a fundamental low level problemin computer vision. Despite the significant advancement inthis field due to various proposed approaches, image seg-mentation is still an unsolved problem. The reason may bedue to the various complexity of different images. The seg-mentation approaches depending only on image data mayfail when there exist low contrast edges, noise, occlusion,cluttering, etc. Under such situations, the image data sim-ply cannot provide enough information for discriminatingthe different parts of the image. The incorporation of var-ious contextual information can help reduce the ambiguityof the separation of different parts.

Many image segmentation approaches try to incorporatecertain kinds of contextual knowledge besides the imagedata to improve segmentation. Both active contours model[10] [3] and the Mumford-Shah functional [12] have mod-eled the general smoothness constraint. Statistical shape in-

formation has been added into the active contours [9]. Chanet al. [4] incorporate the shape prior knowledge into theMumford-Shah functional. The recent development of wa-tershed algorithm [13] exploits region information to growthe region contour and incorporate shape prior knowledgeinto watershed segmentation. Ren et al. [16] integratesmultiple cues for image segmentation using a ConditionalRandom Field. Although these approaches incorporate cer-tain contextual information, it is still hard to exploit otherknowledge such as the connectivity and the compatibilityin a similar way as incorporating the shape knowledge. Inaddition, they cannot easily incorporate another importantsource of information - human intervention.

In order to conveniently incorporate various contextualinformation, we need a framework that can systemicallyincorporate different contextual information. A desirableimage segmentation framework may be able to flexibly in-tegrate various types of information and constraints, andsolve image segmentation in a probabilistic way. We noticeGraphical Models [8, 14] in the Machine Learning com-munity are very powerful statistical tools that are capableto satisfy all these requirements. They provide an effec-tive way to model various types of contextual information.Graphical models have been successfully applied to severalcomputer vision problems [18] [21] [5] [11].

Bayesian Network is a powerful probabilistic graphicalmodel that has been applied in computer vision. Early in1990s, Sarkar et al. [18] applied BN to group low level edgesegments for high level image understanding. Westling etal. [20] applied BN for interpretation of complex scenes.Alvarado et al. [1] use BN to combine high-level cues inunsupervised image segmentation. Rodrigues et al. [17]use BN for parameter estimation in medical image segmen-tation. Feng et al. [6] combine BN with Neural Networkfor scene segmentation. A trained Neural Network providesthe local predictions for the class labels. These predica-tions are then fused with the BN prior model for segmen-tation. Mortensen et al. [11] use a two-layer BN for imagesegmentation. Given a user-input seed path, they use the

1

978-1-4244-2340-8/08/$25.00 ©2008 IEEE

Authorized licensed use limited to: IEEE Transactions on SMC Associate Editors. Downloaded on October 12, 2008 at 21:10 from IEEE Xplore. Restrictions apply.

minimum-path spanning tree graph search to find the mostlikely object boundaries.

Although these researchers have successfully appliedBN in their specific problems, we notice that many of themonly use a simple BN. For complex problems, this modelmay not be good enough because there are many differentkinds of relationships among the random variables. How toreasonably model these relationships in a BN is crucial tosolving the problem.

In this paper, we propose to use a multi-layer BayesianNetwork as a single framework to consistently incorporatevarious contextual information together with the image datafor image segmentation. This framework fully takes into ac-count the uncertainties in the image measurements and thecontextual information. It allows a consistent updating ofthe states of the whole model through a principled prob-abilistic inference. It also allows making inference whenonly partial evidence (e.g. the image measurements) isavailable. The newly available information can be incre-mentally integrated into the model as new evidence. Its im-pact will be propagated throughout the network by beliefpropagation [14]. New segmentation is achieved by findingthe most probable explanation (MPE) of the updated model.

2. Overview of the Approach

We propose to construct a multi-layer Bayesian Network(BN) for image segmentation by integrating the image mea-surements together with multiple contextual relationships.Let Ej denote the set of variables for the edges. Imagesegmentation can be modeled as inferring the optimal states(on the object boundary or not) of these edge variables fromvarious measurements subject to constraints. These con-straints encode the multiple contextual information avail-able for image segmentation.

We use a BN to explicitly model the relationships amongregions, edges, vertices, their image measurements, and var-ious constraints so that the joint probability of the randomvariables can be factorized into conditional probabilitiesthat are easier to compute. The BN provides a probabilisticmechanism that can systematically incorporate image ob-servations, constraints and human intervention to affect thestates of the whole model through belief propagation.

Given the updated belief for each node, a probabilisticinference is performed to find the most probable explana-tion (MPE) of the edge nodes. MPE identifies the mostprobable states of the edge nodes that best explain the ob-served image measurements and satisfy the imposed con-straints. In the MPE result, the edge segments with truestates form the final segmentation.

3. Construction of the Bayesian Network

Bayesian Network (BN) [8, 14] is a directed acyclicgraph (DAG) that consists of a set of variables and a set of

directed links between variables. It can conveniently modelthe joint probability distribution of these variables. Our BNmodel has a hierarchical structure to encode different con-textual relationships. Each layer will be described in thefollowing sections.

3.1. Regions, Edges, Vertices and their Relations

We construct the BN based on an over-segmented edgemap. Figure 1(a) shows a synthetic edge map for illustra-tion. The map consists of superpixel regions (i.e., the re-gional blobs), edge segments and vertices. Figure 1(b) isthe corresponding multi-layer BN that models the contex-tual relationships among the superpixel regions YiL

i=1, theedge segments EjN

j=1 and the vertices VkSk=1. Specif-

ically, the region layer contains all superpixel regions. Theedge layer contains all edge segments, while the vertex layercontains all vertices. A vertex is the place where three ormore edges intersect.

The parents of an edge node are the two adjacent regionsalong this edge. The regions give one contextual informa-tion to judge whether the edge is on the object boundary ornot. If the regions have different labels, it is more likely thatthere is a true object boundary between them, i.e. Ej = 1.The edge nodes and the vertex nodes are causally linked.The parents of a vertex node are those edge segments thatintersect at this vertex. The links between them encode acompetitive contextual relationship among the related edges(see more details in Section 3.2).

Each edge node is a binary node and its true state meansthat this edge segment belongs to the object boundary. Thevertex node also assumes binary values (true or false). Itis used to encode the competitive relationships among theedges intersecting at this vertex. The region nodes assumebinary labels (foreground or background), too.

The region nodes, the edge nodes and the vertex nodeshave image measurements. The measurements of regionscan be any feature vector extracted from the statistics ofthe superpixel region. In this work, we use the averageCIELAB color as the region features. The conditionalprobability P (MYi

|Yi) is modeled as Mixture of Gaussians(MOG), which are learned from the training data.

The conditional probability P (Ej |pa(Ej)) models thecontextual relationships between the region labeling andthe edge state, where pa(Ej) denotes the parent nodes ofEj . In general, the edge Ej is more likely to be a true ob-ject boundary when its parents are assigned different labels.However, this relationship itself has uncertainty. We encodethis relationship and its uncertainty by defining the condi-tional probability P (Ej |pa(Ej)) as

P (Ej = 1|pa(Ej)) =

0.8, if the parent nodeshave different labels;

0.2, otherwise.(1)


(a)

(b)

Figure 1. A synthetic edge map and the BN that models the su-perpixel regions, the edge segments, vertices, and their measure-ments: a)the synthetic edge map; b)the corresponding basic BNmodel. The shaded circles represent the measurement nodes.

The average intensity gradient magnitude is utilized asthe measurement for each edge segment. The measurementof the edge node Ej is denoted as MEj

. The measurementnodes are continuous nodes. The conditional probabilityP (MEj |Ej) is parameterized using Gaussian distributions,which are learned from the training data, too.

Similarly, each vertex node is also associated with a mea-surement. The MVk

node in Figure 1(b) is the measure-ment of a vertex Vk. We use the Harris corner detectormethod [7] to calculate the measurement. The vertex mea-surement MVk

is currently discretized according to the cor-ner response calculated from the Harris corner detector. Ifthe corner response is above a threshold (fixed as 1000) andit is a local maximum, a corner is detected and the mea-surement MVk

becomes true. If no corner is detected, themeasurement MVk

becomes false.The conditional probability that quantifies the statistical

relationship between Vk and MVkcan be modeled as

P (MVk = 1|Vk = 1) = 0.99

P (MVk = 1|Vk = 0) = 0.1 (2)

This definition basically means the measurement uncer-tainty is low. These numbers may vary for different applica-tions, depending on the quality of the used corner detector.

3.2. Smoothness and Connectivity Constraints

We encode two additional contextual relationshipsamong intersecting edges in the BN model. One is related tothe smoothness of the connection between edges, while theother is related to the competition of the edges intersectingat a vertex to be the object boundary.

Figure 2. The BN model with angular nodes to impose the localsmoothness constraint

The boundary of a natural object is normally smooth. Weincorporate the smoothness constraint by penalizing sharpcorners between the connecting edges. A sharp corner isdefined as an angle between two edges that is less than athreshold. To impose this constraint, a new angular nodeθij is introduced to model the relationship between twoedges Ei and Ej . It is a binary node with true state mean-ing that the local smoothness constraint is violated by thesetwo edges. The relationships among the edge nodes, angu-lar nodes and their measurements is illustrated as Figure 2.The conditional probability table (CPT) between an angularnode and its measurement is similarly defined as Eq.(2).

The measurement Mθi,j is currently discretized accord-ing to the threshold π

6 . If the angle is smaller than π6 ,

the measurement Mθi,jbecomes 1 (true). To enforce the

smoothness constraint, a CPT is defined to specify the rela-tionship between θij and Ei, Ej .

P (θij = 1|Ei, Ej) =

0.2, if both Ei and Ej are true;0.5, otherwise.

(3)This CPT definition effectively reduces the probability ofa pair of edge segments to be both on the object boundarywhen they form a sharp corner (i.e. θij = 1).

On the other hand, the boundary of an object generallyshould be simply connected, i.e., a boundary edge segmentshould connect with at most one boundary edge segment atits end point. When there are multiple intersecting edgesat one vertex, these edges will compete with each other tobecome the object boundary. This type of contextual re-lationship is encoded by defining a CPT between the edgenodes and the related vertex node as follows:

P (Vk = 1|pa(Vk)) =

8<:1, if exactly two parent nodes are 1;0.3, if none of the parent nodes is 1;0, otherwise.

(4)where pa(Vk) denotes the parent E nodes of Vk node. Weset the entry 0.3 in this table because it is also possible thatnone of the parent edge segments is true boundary. For ex-ample, there may be vertices detected in the edge map ofthe background. However, the conditional probability forsuch a case shall be smaller than the case that exactly twoparent E nodes are true. With the above CPT definition, ifVk becomes 1 (true), it is most likely that exactly two par-ent edge segments are object boundaries. The connectivityconstraint is therefore imposed.


Figure 3. The complete BN model for the example in Figure 1(a)

3.3. The Complete Bayesian Network

Given the BN components that model different parts ofthe segmentation problem, we combine them to form thecomplete BN model for image segmentation. The completeBN for the example in Figure 1(a) is shown in Figure 3.The Y nodes represent the regions. The E nodes representthe edges. The θ nodes impose the smoothness constrainton edges. The V nodes impose the connectivity constrainton edges. The MY , ME , Mθ, and MV nodes are respec-tively the measurements of regions, edges, angles, and ver-tices. All the nodes are binary nodes except the measure-ment nodes MY and ME .

Given the BN model and its parameters, image segmen-tation aims to infer the states of the E nodes based on vari-ous image measurements and contextual relationships. Withthe BN and the available evidence, image segmentation canbe achieved by searching for the most probable explanation(MPE) of the BN. The MPE gives the inference result thatbest matches the evidence. Specifically, we can calculatethe joint probability of all nodes from the BN as follows:

P (Y, MY , E, ME , θ, Mθ, V, MV ) =

LYi=1

[P (Yi)P (MYi |Yi)]

NYj=1

[P (Ej |pa(Ej))P (MEj |Ej)] ·

NYi=1,j∈Ωi

[P (θij |Ei, Ej)P (Mθi,j |θi,j)] ·

SYk=1

[P (Vk|pa(Vk))P (MVk |Vk)] (5)

where P (Yi) is the prior probability of Yi. Since we have nofurther prior information about the region labeling, P (Yi =1) and P (Yi = 0) are both set as 0.5, which means there isno bias for the region label of Yi. Ωi denotes the edges thatintersect with the edge Ei. L is the total number of regions.N is the total number of edge nodes and S is the numberof vertex nodes. The factorization of the joint probability isbased on the conditionally independent relationships amongthe nodes, which are implied by the BN structure. The fac-torized probability components are already available as dis-

cussed before.Given the measurements of regions, edges, vertices, and

angles, the most probable states of all hidden variables canbe inferred by MPE inference using Junction Tree [8], i.e.,

E∗, Y ∗, θ∗, V ∗ = arg maxE,Y,θ,V

P (Y, MY , E, ME , θ, Mθ, V, MV )

(6)where the joint probability can be calculated by Eq.(5). Inthe MPE results, the edges with true states form the objectboundary.

4. Experiments

We have tested our model for automatic segmentation on60 images from the Weizmann horse dataset [2]. The Weiz-mann dataset includes the side views of many horses thathave different appearances and poses, which makes it chal-lenging to segment these horse images. In order to learnthe Mixture of Gaussians distributions of the region mea-surements MY , we collect 60 horse images from the web.We use the Edgeflow-based Anisotropic diffusion software[19] to generate the oversegmented edge map of each im-age. The superpixel regions, the edge segments and thevertices are automatically extracted from this edge map toconstruct the BN. For the training process, we collect theaverage CIELAB color of each region as the region mea-surements MY . The likelihood of these measurements arelearnt by Mixture Gaussian analysis. We also collect theaverage intensity gradient magnitude of the edge segmentsfrom the training data and learn the likelihood of the edgemeasurements ME .

Other conditional probabilities are preset and fixed in allour experiments (see Section 3). We empirically set theseparameters due to several reasons. First, we can directlydefine these CPTs according to their conceptual meaning.Second, some previous work [15] shows the performanceof a BN is not very sensitive to accurate parameter set-ting. Third, we have changed some CPTs within a range of±10% ∼ 20%. The segmentation results did not changevery much, which accords with the observations in [15].Fourth, this parametrization allows us to apply the model tosegmenting other images without much re-parametrization.

During testing, given all the image measurements andthe parameterized BN model, image segmentation can beperformed using the process described in Section 3.3. Fig-ure 4 shows some example segmentation results of the horseimages. By visual inspection, we achieved encouragingresults on these horse images. The errors mainly comefrom the appearance changes and the cluttering of the back-ground objects (e.g. the shadow) that have similar appear-ances as the horses. To quantitatively evaluate the segmen-tation performance, we calculated the percentage of cor-rectly labeled pixels. About 93.7% pixels are correctly la-beled in our testing set of images.


Figure 4. Examples of the image segmentation results arranged in2 groups of 3 rows. The first row includes the test images. Thesecond row includes the corresponding oversegmented edge map.The third row includes the segmentation masks produced by theproposed approach.

Figure 5. The BN model with the global shape constraint

5. Extension by Integrating the Global Shapeor Human Intervention

All the contextual relationships that we have discussedso far are local relationships. They may not necessarily en-sure to produce a globally consistent segmentation. In addi-tion, the global shape of the object is sometimes available.Such contextual information can also be incorporated intothe BN segmentation model.

Assuming the global shape of the object is known, aglobal shape constraint is encoded into the BN model. Thelinks between the (G,<) node and E nodes in Figure 5represent such a constraint. G is a binary variable. It istrue when the global shape is available. < represents thetransformation parameters of the global reference shape Ψ,which include a translation

−→T , a rotation angle α, and a

scaling factor s. < is discretized into a set of possible con-figurations and shall be simultaneously estimated during theimage segmentation process. For simplicity, a simple bruteforce search of the optimal parameters < is used.

In order to incorporate the impact of the global shape onthe conditional probability of each edge segment, a distancetransform is performed. The Euclidean distance transform

calculates the minimum distance of each pixel to the globalreference shape Ψ. We calculate the mean distance dj ofthe edge segment Ej to the global reference shape. If Ej

is close to the global reference shape (i.e. small dj), thenit has a high probability to be on the object boundary. Theconditional probability of Ej node is defined as follows:

P (Ej = 1|G = 1,<, paY (Ej)) = f(Ψ(<), Ej , paY (Ej))

P (Ej = 1|G = 0,<, paY (Ej)) = 0.8δ(Ys! = Yt)

+0.2δ(Ys == Yt) (7)

where paY (Ej) = (Ys, Yt) denote the parent region nodesof Ej and δ(·) is the Dirac delta function. The functionf(Ψ(<), Ej , paY (Ej)) determines the conditional proba-bility based on the distance of the edge segment Ej tothe transformed global reference shape Ψ and the statesof (Ej , paY (Ej)). It is simply defined as an exponen-tially decreasing function of the distance dj . The largerthe distance dj is, the edge Ej has smaller probability tobe the object boundary. Note that the term P (Ej = 1|G =0,<, paY (Ej)) becomes the same as the CPT definition inEq.(1) when G is 0, which means the global shape informa-tion is not available.

After the global shape information is encoded in the BN,the joint probability of all variables can be calculated as fol-lows,

P (G,<, Y, MY , E, ME , θ, Mθ, V, MV ) =

P (G,<)LY

i=1

[P (Yi)P (MYi |Yi)]

NYj=1

[P (Ej |pa(Ej))P (MEj |Ej)]

NYi=1,j∈Ωi

[P (θij |Ei, Ej)P (Mθi,j |θi,j)]

SYk=1

[P (Vk|pa(Vk))P (MVk |Vk)] (8)

where P (G,<) is the prior probability of the global shapenode. If the global shape information is available, P (G =1,<) is set to 1. Figure 6 illustrates the usefulness of theglobal shape information.

Figure 6. Image segmentation without using the global shape (left)and using the global shape (right). The middle picture shows thesimple global shape of a fish.

Besides the global shape, another commonly availablecontextual information comes from the human intervention.A human can easily provide additional information to cor-rect the automatic segmentation. Such information can alsobe incorporated into our BN model. For the interactive im-age segmentation, the human interactions are incorporated


as new evidence. Let R denote the set of variables that areset by human input and X denote all the hidden variables(E, Y, θ, V ) excluding R. X can be inferred as follows

X∗ = arg maxX

P (X|R, MY , ME , Mθ, MV ) (9)

= arg maxX

P (X, R, MY , ME , Mθ, MV )

where P (X, R,MY ,ME ,Mθ,MV ) can be calculated byEq.(5) because (X, R) is a configuration of (E, Y, θ, V ).

Figure 7 shows an example where the automatic segmen-tation has difficulty to correctly segment the shadow fromthe horse. A human can click on the place designated by thearrow to indicate that the edge there shall not be the objectboundary. The new evidence will be incorporated into theBN by instantiating the corresponding edge node as falsestate. The impact of the new evidence on other nodes canbe automatically exerted through belief propagation, whichupdates the states of the whole BN to produce the improvedsegmentation, as shown in Figure 7(c).

(a) (b) (c)

Figure 7. Incorporating human intervention for image segmenta-tion: (a) the original image; (b) the automatic segmentation with-out human intervention. The arrow indicates where the human willgive intervention; (c) the improved segmentation after the user des-ignates the edge pointed by the arrow is not an object boundary.

6. SummaryIn this paper, we propose a Bayesian Network frame-

work to integrate image data with multiple contextual infor-mation for image segmentation. The BN encodes the con-textual relationships between regions, edges and vertices.Image segmentation is achieved by the MPE inference in theBN model. We have tested the proposed approach on a setof horse images from the Weizmann database and achievedencouraging results. We have also demonstrated the exten-sion of the BN model to encode other contextual informa-tion such as the global shape of the object and human in-tervention to improve segmentation. More experiments willbe performed to validate the proposed approach in future.

References[1] P. Alvarado, A. Berner, and S. Akyol. Combination of high-

level cues in unsupervised single image segmentation usingbayesian belief networks. 2002. Proc. of the Int. Conf. onImaging Science, Systems and Technology.

[2] E. Borenstein and S. Ullman. Learning to segment. ECCV,2004.

[3] V. Caselles, R. Kimmel, and G. Sapiro. Geodesic active con-tours. IJCV, 22(1):61–79, February 1997.

[4] T. Chan and W. Zhu. Level set based shape prior segmenta-tion. 2003. Technical Report 03–66, Computational AppliedMathematics, UCLA, Los Angeles.

[5] X. Feng, C. Williams, and S. Felderhof. Combining be-lief networks and neural networks for scene segmentation.PAMI, 24(4):467–483, 2002.

[6] X. Feng, C. Williams, and S. Felderhof. Combining be-lief networks and neural networks for scene segmentation.PAMI, 24(4):467–483, 2002.

[7] C. Harris and M. Stephens. A combined corner and edgedetector. In 4th Alvey Vision Conference, pages 147 – 152,1988.

[8] F. V. Jensen. Bayesian networks and decision graphs. 2001.Springer-Verlag.

[9] M. Leventon, W. Grimson, and O. Faugeras. Statistical shapeinfluence in geodesic active contours. CVPR, pages 316–323, 2000.

[10] M.Kass, A.Witkin, and D.Terzopoulos. Snakes: Active con-tour models. 1988. IJCV, vol.1, pp. 321–331.

[11] E. N. Mortensen and J. Jia. Real-time semi-automatic seg-mentation using a bayesian network. pages 1007–1014,2006. CVPR.

[12] D. Mumford and J. Shah. Optimal approximation bypiecewise smooth functions and associated variational prob-lems. Communications on Pure and Applied Mathematics,42:577–685, 1989.

[13] H. Nguyen and Q. Ji. Improved watershed segmentation us-ing water diffusion and local shape priors. pages 985 – 992,2006. CVPR.

[14] J. Pearl. Probabilistic reasoning in intelligent systems: net-works of plausible inference. Morgan-Kaufmann PublishersInc. 1988, 1988.

[15] M. Pradhan, M. Henrion, G. M. Provan, B. D. Favero, andK. Huang. The sensitivity of belief networks to impreciseprobabilities: An experimental investigation. Artificial Intel-ligence, 85(1–2):363–397, 1996.

[16] X. Ren, C. C. Fowlkes, and J. Malik. Cue integration infigure/ground labeling. In Advances in Neural InformationProcessing Systems 18, 2005.

[17] P. Rodrigues and G. Giraldi. Parameter estimation with abayesian network in medical image segmentation. 2004.Computer Graphics and Imaging.

[18] S. Sarkar and K. Boyer. Integration, inference, and manage-ment of spatial information using bayesian networks: per-ceptual organization. PAMI, pages 256–274, 1993.

[19] B. Sumengen and B. S. Manjunath. Edgeflow-driven vari-ational image segmentation theory and performance evalua-tion. Technical Report. http://barissumengen.com/seg/.

[20] M. F. Westling and L. S. Davis. Interpretation of complexscenes using bayesian networks. ACCV, 2:201–208, 1998.

[21] S. C. Zhu and A. Yuille. Region competition: Unify-ing snake/balloon, region growing and bayes/mdl/energyfor multi-band image segmentation. PAMI, 18(9):884–900,1996.


integration of multiple contextual information for image …qji/papers/bn_seg.pdf · 2008. 10....

Documents