spectral matting - huji.ac.il · image in figure 1a, one can produce an unsupervised dis-joint hard...

Spectral Matting

Anat Levin1,2 Alex Rav-Acha1 Dani Lischinski11School of CS & Eng 2CSAIL

The Hebrew University MIT

AbstractWe presentspectral matting: a new approach to natural

image matting that automatically computes a set of funda-mental fuzzymatting componentsfrom the smallest eigen-vectors of a suitably defined Laplacian matrix. Thus, ourapproach extends spectral segmentation techniques, whosegoal is to extract hard segments, to the extraction of softmatting components. These components may then be usedas building blocks to easily construct semantically mean-ingful foreground mattes, either in an unsupervised fashion,or based on a small amount of user input.

1. Introduction

Digital matting is the process of extracting a foregroundobject from an image along with an opacity estimate foreach pixel covered by the object. This operation enablescompositing the extracted object over a novel background,and thus constitutes an invaluable tool in image editing,video production, and special effects in motion pictures.

In particular, the challenging case ofnatural image mat-ting, which poses no restrictions on the background, hasreceived much research attention. Recognizing that theproblem is inherently under-constrained, all of the exist-ing methods require the user to provide additional con-straints in the form of a trimap [2, 13, 4] or a set of brushstrokes [15, 7, 5]. Thus, the question of whether (or to whatdegree) is it possible to automate the matting process, is ofconsiderable theoretical and practical interest.

In this paper we attempt to provide some new insightsinto this question. Our work is strongly influenced by spec-tral segmentation methods [11, 16, 9, 17]. These meth-ods perform unsupervised image segmentation by examin-ing the smallest eigenvectors of the image’s graph Laplacianmatrix. This work, for the first time, extends this idea fromproducing hard segments to softmatting components.

Spectral segmentation methods, such as [11], resort tocomputation of real-valued eigenvectors as an approxima-tion necessary to transform an NP-complete optimizationproblem into a tractable one. In contrast, we are not seekinga disjoint image partitioning, but rather attempt to recoverthe fractional foreground coverage at each pixel. Specif-ically, we obtain our real-valued matting components viaa linear transformation of the smallest eigenvectors of the

(a) Input image (b) Hard segmentation (c) Alpha matte

(d) Matting components computed by our method.Figure 1. Spectral segmentation and spectral matting

matting Laplacianmatrix, introduced by Levinet al. [7].Once obtained, these matting components serve as buildingblocks for construction of complete foreground mattes.

This concept is illustrated in Figure 1. Given the inputimage in Figure 1a, one can produce an unsupervised dis-joint hard partitioning of the image using, e.g., [17] (Figure1b). In contrast, we compute a set of overlapping, frac-tional, matting components, visualized in Figure 1d. Com-bining three of these components (framed in red) yields theforeground matte of the girl, shown in Figure 1c.

In summary, our main contribution is the introduction ofthe concept of fundamental matting components and the re-sulting first unsupervised matting algorithm. Of course, justlike unsupervised segmentation, unsupervised matting is anill-posed problem. Thus, we also describe two extensionsthat use these fundamental matting components to constructa particular matte: (i) present the user with several mattingalternatives to choose from; or (ii) let the user specify herintent by just a few mouse clicks.

2. Matting components

Matting algorithms typically assume that each pixelIi inan input image is a linear combination of a foreground colorFi and a background colorBi :

Ii = αiFi +(1−αi)Bi . (1)

This is known as thecompositing equation. In this work wegeneralize the compositing equation by assuming that each

1

pixel is a convex combination ofK image layersF1, . . . ,FK :

Ii =K

∑k=1

αki Fk

i . (2)

TheK vectorsαk are thematting componentsof the image,which specify the fractional contribution of each layer to thefinal color observed at each pixel. The matting componentsare non-negative and sum to 1 at every pixel. The intuitivemotivation for having these components is that similarlyto the individual low-level fragments in an over-segmentedimage they may be used to construct higher level, seman-tically meaningful foreground mattes, as demonstrated inFigure 1.

A desirable, although not required, property of the mat-ting components issparsity: each component should be ei-ther completely opaque or completely transparent over asmany image pixels as possible. This means that areas oftransition between the different layers are limited to a smallnumber of pixels, and each pixel is influenced by a smallnumber of layers.

In this paper, we explore the relationship between thematting components and the eigenvectors of the mattingLaplacian matrix [7]. Specifically, we show that under cer-tain assumptions the matting components are spanned bythe smallest eigenvectors of the matting Laplacian. We thenpropose a method for computing the matting components byfinding an appropriate linear transformation and applying itto these eigenvectors.

3. Spectral Analysis

We start by briefly reviewing the basic theory of spec-tral segmentation methods [11, 16, 9, 17]. These methodstypically associate with the image anN ×N affinity ma-trix A, such asA(i, j) = e−di j /σ2

, wheredi j is some mea-sure of the distance between the pixels (such as color dif-ference and geometric distance). One can then define theLaplacian matrixL = D−A, whereD is the diagonal ma-trix D(i, i) = ∑ j A(i, j). L is a symmetric positive semidefi-nite matrix, whose eigenvectors capture much of the imagestructure.1

Consider the ideal case where the affinity matrixA cap-tures exactly the fact that an image is composed from sev-eral distinct clusters, orconnected components. That is, asubsetC of the image pixels is a connected component ofthe image ifA(i, j) = 0 for everyi, j such thati ∈C, j /∈C,

1In fact, most spectral segmentation papers consider normalized affinitymatrices such asD−1L or D−1/2LD−1/2. However, in this work we focuson L itself as it is not clear how to justify the normalization in the case ofthe matting Laplacian. The problem is that the off-diagonal elements of thematting Laplacian can be both negative and positive, and thusthe mattingcost cannot be expressed as a sum of positive pairwise terms.

and there is no subset ofC which satisfies this property. LetmC denote the indicator vector of the componentC,

mCi =

{

1 i ∈C0 i /∈C

,

thenmC is an eigenvector ofL with eigenvalue 0.

Now suppose that the image consists ofK connectedcomponentsC1, . . . ,CK such that{1, . . . ,N} =

⋃Kk=1Ck,

whereCk are disjoint subsets of pixels. In this case the indi-cator vectorsmC1

, . . . ,mCKare all independent, orthogonal

eigenvectors ofL with eigenvalue 0. However, computingthe eigenvectors ofL yields these indicator vectors only upto rotation. This is the case since for anyK ×K rotationmatrixR the vectors[mC1

, . . . ,mCK]Rare also a basis for the

nullspace ofL.

In real images, the affinity matrixA is rarely able to per-fectly separate between the different pixel clusters. There-fore, the LaplacianL usually does not have multiple eigen-vectors with zero eigenvalue. However, it has been ob-served that the smallest eigenvectors ofL tend to be nearlyconstant within coherent image components. Extractingthe different components from the smallest eigenvectors isknown asspectral roundingand has attracted much atten-tion [9, 17, 14, 18, 6]. The simplest approach [9] is to clus-ter the image pixels using thek-means algorithm, and useperturbation analysis to bound the error of this algorithm asa function of the connectivity within and between clusters.Other more recent methods [17, 18], which inspired the ap-proach taken in this work, explicitly search for a rotationmatrix that brings the eigenvectors as close as possible tobinary indicator vectors.

3.1. Spectral Analysis with the Matting Laplacian

Our goal in this work is to derive an analogy betweenhard segmentation and matting and to show that fuzzy mat-ting components may be extracted from the smallest eigen-vectors of the matting Laplacian, similarly to the extractionof hard clusters described earlier.

The matting Laplacian was introduced by Levinet al.[7]in order to evaluate the quality of a matte without explicitlyestimating the foreground and background colors in eq. (1).They show that if the colors of the background and the fore-ground within a local image windoww form two differentlines in RGB space, then theα values withinw may be ex-pressed as a linear combination of the color channels:

∀i ∈ w αi = aRIRi +aGIG

i +aBIBi +b (3)

Thus, the matte extraction problem becomes one of findingthe alpha matte that minimizes the deviation from the linear

model (3) over all image windowswq:

J(α,a,b)= ∑q∈I

∑i∈wq

(

αi −aRq IR

i −aGq IG

i −aBq IB

i −bq)2

+ε‖aq‖2

(4)where ε‖aq‖

2 is a regularization term ona. The linearmodel coefficientsa,bmay be eliminated from equation (4),yielding aquadratic cost inα alone,

J(α) = αTLα. (5)

HereL is thematting Laplacian, a sparse symmetric posi-tive semidefiniteN×N matrix whose entries are a functionof the input image in local windows, depending neither onthe unknown foreground and background colors, nor on thelinear model coefficients.L(i, j) is defined as:

∑q|(i, j)∈wq

(

δi j −1

|wq|

(

1+(Ii −µq)T(Σq +

ε|wq|

I3)−1(I j −µq)

))

(6)Hereδi j is the Kronecker delta,µq is the 3×1 mean colorvector in the windowwq around pixelq, Σq is a 3×3 co-variance matrix in the same window,|wq| is the number ofpixels in the window, andI3 is the 3×3 identity matrix.

The cost (5) has a trivial minimum which is a constantα vector, and thus in the user assisted framework describedin [7], J(α) is minimized subject to user constraints. Levinet al. observe that the smallest eigenvectors of the mattingLaplacian (6) capture information about the fuzzy clusterassignments of pixels in the image, even before any user-specified constraints are taken into account. However, theymake no use of the eigenvectors beyond presenting themto the user as guides for scribble placement. In this work,we show that the smallest eigenvectors span the individualmatting components of the image.

To gain some understanding, we begin by studying theideal case. To justify the usage of spectral analysis to es-timate matting components, our goal is to show that underreasonable conditions, the actual matting components be-long to the nullspace of the matting Laplacian. We say thata matting componentαk isactivein a local image windowwif there exists a pixeli ∈ w for whichαk

i > 0. The followingclaim states the conditions on the local color distributionineach layer, under whichLαk = 0. The severity of the con-ditions is related to the number of active layers in a localwindow. The least restricted case is when only one layer isactive, in which the local color distribution can be arbitrarycomplex. The most restricted case is when a window con-tains three active layers (as in the case of a T-junction), andfor such windows the color of each layer must be locallyuniform.

Claim 1 Let α1, . . . ,αK be the actual decomposition of theimage I into k matting components. The vectorsα1, . . . ,αK

lie in the nullspace of the matting Laplacian L (given by

eq. 6 withε = 0) if every local image window w satisfiesone of the following conditions:

1. A single componentαk is active within w.

2. Two componentsαk1,αk2 are active within w and thecolors of the corresponding layers Fk1,Fk2 within wlie on two different lines in RGB space.

3. Three componentsαk1,αk2,αk3 are active within w,each layer Fk1,Fk2,Fk3 has a constant color within w,and the three colors are linearly independent.

Proof: The matting cost (5) measures the deviation betweena matte and a linear function of the color channels, overall local windows (eq. 4). Thus, in order to show that amatte componentαk satisfiesLαk = 0 it suffices to showthat for every local windoww, there existaR,aG,aB,b suchthat:αk

i = aRIRi +aGIG

i +aBIBi +b, ∀i ∈w. Below we show

this for each of the three window types.Case 1: Since the matting components sum to one at everyimage pixel, the single active componentαk must equal 1within w. Thus, it is easily expressed as a linear function ofthe image by settingaR = aG = aB = 0 andb = 1.Case 2: This case is equivalent to theorem 2 in [7].Case 3: SinceFk1,Fk2,Fk3 are constant withinw and theircolors are linearly independent, there existaR,aG,aB andb such that〈a,Fk1〉+b = 1, 〈a,Fk2〉+b = 0 and〈a,Fk3〉+b= 0. AsI = αk1Fk1 +αk2Fk2 +αk3Fk3 we get that〈a, I〉+b = αk1, so thatαk1 is a linear function of the image. Asimilar argument holds forαk2 andαk3.

As in the case of standard Laplacians, when the small-est eigenvectors of the matting Laplacian are computed, theresult may be any linear combination of the different mat-ting components, and recovering the individual componentsis equivalent to linearly transforming the eigenvectors. Itshould be noted that unlike hard segments, the matting com-ponents are not binary vectors and thus are not necessar-ily orthogonal. Hence, while the eigenvectors are orthogo-nal, the transformation from eigenvectors to matting com-ponents might be a general linear transformation and not asimple rotation.

To summarize, the main conclusion of the above discus-sion is that whenever the matting components of an imagesatisfy the conditions of claim 1, they may be expressed asa linear combination of the zero eigenvectors ofL.

In most real images, the assumptions of claim 1 don’thold exactly, and thus the matting Laplacian might not havemultiple eigenvectors whose eigenvalue is 0. Yet if the lay-ers are sufficiently distinct, they are generally captured bythe smallest eigenvectors ofL. For example, Figure 2 showsthe smallest eigenvectors for a real image, all exhibiting thefuzzy layer boundaries. We have empirically observed thatthe matting components of real images are usually spanned

Figure 2. The smallest eigenvectors of the matting Laplacian forthe image in figure 1a. Linear combinations of these eigenvectorsproduced the matting components shown in figure 1d.

quite well by the smallest eigenvectors of the matting Lapla-cian. Indeed, the components shown in Figure 1d were ob-tained as linear combinations of the smallest eigenvectors.

3.2. From Eigenvectors to Matting ComponentsAs explained above, recovering the matting components

of the image is equivalent to finding a linear transforma-tion of the eigenvectors. Recall that the matting componentsshould sum to 1 at each image pixel, and they should be near0 or 1 for most image pixels, since the majority of imagepixels are usually opaque. Thus, we are looking for a lineartransformation of the eigenvectors that would yield a set ofnearly binary vectors. More formally, letE = [e1, ..,eK ] betheN×K matrix of eigenvectors. Our goal is then to find aset ofK linear combination vectorsyk that minimize

∑i,k

|αki |

γ + |1−αki |

γ , whereαk = Eyk (7)

subject to∑k

αki = 1

If 0 < γ < 1 is used (in our implementationγ = 0.9), then|αk

i |γ + |1−αk

i |γ is a robust score measuring the sparsity of

a matting component. Without the requirementαk = Eyk

the sparsity term would be minimized by binary vectors, butas the vectorsαk are restricted to linear combinations of theeigenvectors they must maintain the fuzzy layer boundaries.Although we do not explicitly constrain theα values to bebetween 0 and 1, in practice the resulting values tend to liein this range due to the sparsity penalty.

The above cost is of course a non-convex one and weoptimize it iteratively using Newton’s method [3], whichamounts to a sequence of second order approximations.Given a guess for the values ofαk

i we defineuki as the

second derivative of|αki |

γ , uki ∝ |αk

i |(γ−2), and simi-

larly vki ∝ |1−αk

i |(γ−2). The second order approximation

to eq. (7) reads as minimizing

∑i,k

uki |αk

i |2 +vk

i |1−αki |

2, whereαk = Eyk (8)

subject to∑k

αki = 1

As this problem is now a quadratic optimization problem

under linear constraints, the optimal solution can be com-puted in closed form by inverting aK2 ×K2 matrix. Forthat we defineUk = diag(uk) andVk = diag(vk). We alsodefineWk = ET(Uk+Vk)E and1 as a vector of ones. Somecalculation show that theK2 elements ofy1, . . . ,yK are thesolution for:

Id Id Id · · · Id0 W2 +W1 W1 · · · WK

0 W1 W3 +W1 · · · WK

......

.... . .

...0 W1 W1 · · · WK +W1

y=

E1Ev2 +Eu1

Ev3 +Eu1

...EvK +Eu1

,

(9)

where Id is the K × K identity matrix. Given an initialguess, our algorithm iteratively solves a sequence of sec-ond order optimization problems of the form of eq. (8) anduses the solution of each iteration to reweight the followingone. We note that the weightsuk

i (and vki ) will come out

higher when the current guess forαki is close to 0 (or 1) and

lower for further guesses. Thus, the effect of the reweight-ing step is to pull toward 0 (or 1) thoseα entries for whichthe current guess is already close, and to loosen the rela-tion for pixels in which the guess is anyway further apart.For example, if the current guess forαk

i is close to 0, thenuk

i ≫ vki and thus the termuk

i |αki |

2 + vki |1−αk

i |2 pulls the

alpha component in this pixel toward 0 much stronger thantoward 1.

Since the cost (7) is not convex, the result of the New-ton process strongly depends on the quality of the initializa-tion. One useful way to initialize the process is to apply ak-means algorithm on the smallest eigenvectors of the mattingLaplacian and project the indicator vectors of the resultingclusters onto the span of the eigenvectorsE:

αk = EETmCk. (10)

The effect of this projection step is to find a linear combina-tion of eigenvectors that minimizes the squared distance tomCk

. We note that the resulting matting components sum toone, since

∑k

αk = EET(∑k

mCk) = EET1 = 1. (11)

The last equality follows from the fact that the constant vec-tor belongs to the nullspace of the matting Laplacian. As aresult, projecting the hard clusters provides a legal solutionfor eq. (7). We also note that despite the fact thatmCk

arebinary vectors, the projection typically features fuzzy matteboundaries. This is due to the fact that, as illustrated in Fig-ure 2, the smallest eigenvectors all maintain the fuzzy layerstructure, and thus simply do not suffice to span the hardsegmentation boundaries.

In practice, we typically use a larger number of eigen-vectors than the number of matting components to be recov-

ered. Using more eigenvectors makes it possible to obtainsparser components. The reason is that more basis elementsspan a richer set of vectors (in the extreme case, if allNeigenvectors are used, any binary vector can be generated).

4. Grouping Components

So far we have shown how matting components may beextracted from the matting Laplacian. However, usually thematting components are not a goal in their own, as one is ul-timately interested in recovering a complete matte for someforeground object. Fortunately, all that is needed to obtaina complete matte is to specify which of the components be-long to the foreground. Letbdenote aK-dimensional binaryvector indicating the foreground components (i.e., bk = 1 iffαk is a foreground component). The complete foregroundmatte is then obtained by simply adding the foregroundcomponents together:

α = ∑k

bkαk (12)

For example, the matte in figure 1c was obtained by addingthe components highlighted in red in figure 1d.

For the applications discussed below, one would liketo compare multiple grouping hypotheses. If the smallesteigenvalues aren’t exactly zero (which is the case for mostreal images) we can also measure the quality of the result-ing α-matte asαTLα, whereL is the matting Laplacian(eq 6). When a large number of hypotheses is to be tested,multiplying each hypothesis byL might be too expensive.However, if each hypothesis is just a sum of matting com-ponents we can pre-compute the correlations between thematting components viaL and store them in aK×K matrixΦ, where

Φ(k, l) = αkTLα l . (13)

The matte cost can then be computed as

J(α) = bTΦb, (14)

where b is a K dimensional binary vector indicating theselected components. Thus, ifΦ has been pre-computed,J(α) can be evaluated inO(K2) operations instead ofO(N)operations.

4.1. Unsupervised MattingGiven an image and a set of matting components we

would like to split the components into foreground andbackground groups and pull out a foreground object. If thegrouping criterion takes into account only low level cues,then we just search for a grouping with the best mattingcost, as defined by eq. (14). However, the matting cost isusually biased toward mattes which assign non constant val-ues only to a small subset of the image pixels (in the extreme

case, the best matte is a constant one). The spectral segmen-tation literature suggest several criteria which overcomethisbias. One approach is to search for quotient cuts (e.g., nor-malized cuts [11]) which score a cut as the ratio betweenthe cost of the cut and the size of the resulting clusters. Asecond approach is to look for balanced cuts [6] where thesize of each cluster is constrained to be above a certain per-cent of the image size. In this work, we follow this latterapproach and rule out trivial solutions by considering onlygroupings which assign at least 30% of the pixels to theforeground and at least 30% of the pixels to the background.When the numberK of matting components is small we canenumerate all 2K hypotheses and select the one with the bestscore using eq. (14).

Figure 3 shows some results produced by the unsuper-vised matting approach described above. In each of theseexamples the hypothesis with the highest score indeed cor-responds to the “correct” foreground matte, but some ofthe other hypotheses are quite sensible as well, consider-ing that our approach does not attempt to perform any high-level image understanding. Of course, it isn’t hard to findexamples where unsupervised matting fails. For example,whenever the foreground or background objects consist ofseveral visually distinct components, the assignment withthe minimal matting cost might not correspond to our vi-sual perception. In fact, it is well known within the imagesegmentation community that while unsupervised bottom-up cues can efficiently group coherent regions in an image,the general image segmentation problem is inherently am-biguous, and requires additional information. In practice,such as in the case of hard image segmentation, the fore-ground/background assignment may be guided by severaladditional cues, such as top-down models [1], color statis-tics [10], or motion and focus cues. In the remainder of thispaper, however, we focus on user-guided matting instead.

4.2. User-Guided Matting

Matting components can also be quite useful in an inter-active setting, where the user guides the matting process to-ward the extraction of the desired foreground matte. In sucha setting the foreground/background assignment of some ofthe components is determined by the user, thereby reduc-ing the number of legal hypotheses to be tested. Given veryminimal foreground and background constraints, it is usu-ally possible to rule out trivial solutions, so there is no needto explicitly keep the size of each group above a certainthreshold (as in the unsupervised case). In this case we canspeed up the search for an optimal foreground/backgroundassignment of components using a graph min-cut formula-tion.

Input Hypothesis 1 Hypothesis 2 Hypothesis 3 Hypothesis 4 Hypothesis 5Figure 3. Unsupervised matting results for two images. The hypotheses are ordered according to their score.

4.2.1 Optimizing assignments in the componentsgraph

To efficiently search for the optimal foreground/backgroundassignment we approximate the matting cost (eq. 14) as asum of pairwise terms. This enables us to approximate thesearch for the optimal foreground/background assignmentas a min-cut problem in a graph whose nodes are the mat-ting components, and whose edge weights represent mattingpenalty. For that we need to rewrite eq. (14) as a sum ofnon-negative local and pairwise termsJ(α) = E(b), whereb is the binary vector representing the components inα. Wedefine the energyE(b) as

E(b) = ∑k

Ek(bk)+∑

k,l

Ek,l (bk−bl )2 (15)

whereEk(0) = ∞ if the k-th component is constrained to be-long to the foreground,Ek(1) = ∞ if the component is con-strained as background, and 0 otherwise. To define the pair-wise term we note thatφ k,k = ∑l 6=k φ k,l (this follows fromthe fact thatL(∑αk) = L1 = 0, and thusΦ1 = 0) and as aresult

bTΦb = ∑k,l

−φ k,l (bk−bl )2. (16)

Therefore, if we defineEk,l = −φ k,l we obtain that

∑k,l

Ek,l (bk−bl )2 = J(α). (17)

However, in order to be able to search for a min-cut inpolynomial time the pairwise energy should be positive,and we therefore make an approximation and defineEk,l =max(0,−φ k,l ). To justify this approximation we can provethe approximation is exact in all local image windows forwhich no more than two components are active. When goodcomponents are extracted from an image, the majority ofimage windows will be associated with no more than twocomponents (that is, no more than two components will be

active). Indeed, we have empirically observed that for mostmatting component pairsφ k,l < 0.

Claim 2 The correlationφ k,l between componentsαk,αl

will be negative if every local image window w satisfies oneof the following conditions:

1. αki = 0 ∀i ∈ w

2. α li = 0 ∀i ∈ w

3. αki = 1−α l

i ∀i ∈ w

Proof: Following eq. (6), we rewrite the correlation be-tween components via the matting Laplacian as a sum ofcorrelations over local windows:

φ k,l = αkTLα l = ∑

q∈IαkT

Lqα l , (18)

whereLq is anN×N matrix, if (i, j)∈wq Lq(i, j) is definedusing eq. (6), andLq(i, j) = 0 otherwise. We will show that

under the above conditions,αkTLqα l < 0 for every image

windowwq. This will follow immediately ifαki = 0 orα l

i =0 ∀i ∈ w. For the third case, we note that

αkTLqα l =1 αkT

Lq(1−αk) =2 −αkTLqαk ≤3 0, (19)

Where1 follows from the third claim condition,2 followsfrom the fact that the matting cost of the constant vector is0, and3 follows from the fact that each of theLq matrices ispositive semidefinite.

Using the above graph formulation, finding the optimalforeground/background assignment does not involve an ex-ponential search and is found efficiently in time polynomialin the number of components, as a graph min-cut. As a re-sult, if the matting components are pre-computed, the opti-mal matte may be computed very rapidly, enabling interac-tive responses to user input. The computational challengesof our algorithm are equivalent to those of conventional

Input Constraints MatteFigure 5. The middle region is not constrained, and the method ofLevin et al.assigns it an average non-opaque value.

spectral segmentation techniques. Specifically, it takes ourunoptimized matlab implementation a couple of minutes tocompute the matting components for the images in Figure 4.However, this pre-processing step can be done offline, andonce the matting components are available, it only takes anadditional few seconds to construct a matte given the user’sconstraints.

4.2.2 Matte extraction using user scribbles

Figure 4 presents a few examples where a foreground mattewas extracted from an image based on a small number offoreground (white) and background (black) markings pro-vided by the user. The second column shows the result-ing matte extracted by the approach described above (thescribbles are used to reduce the space of splitting hypothe-ses: a component is constrained to belong to the foregroundwhenever its area contains a white scribble). The remainingcolumns show the mattes generated from the same input bya number of previous methods [7, 15, 4, 13]. None of theseprevious approaches is able to recover a reasonable mattefrom such minimal user input. In particular, although ourapproach uses the same matting Laplacian as [7], our resultsare very different from those obtained by directly minimiz-ing the quadratic matting cost (5) subject to user-specifiedconstraints. The main drawback of such direct optimizationis that whenever an image contains distinct connected com-ponents without any constraints inside them, a quadraticcost such as (5) tends to assign them some average non-opaque values, as demonstrated by the simple example inFigure 5. The core of this problem is that the quadraticcost of [7] places strong assumptions on the foreground andbackground distributions, but imposes no restrictions onα.Thus, it searches forcontinuoussolutions without takinginto account that, for a mostly opaque foreground object,the matte should be strictly 0 or 1 over most of the image.

4.2.3 Matte extraction by component labeling

Once the matting components of an image have been com-puted, placing hard constraints by a set of scribbles or atrimap is not the only way for the user to specify her intent.The matting components suggest a new, more direct userinteraction mode which wasn’t possible until now: in thismode the user is presented with the precomputed mattingcomponents and may simply label some of them as back-

(a) (b) Input (c) Trimap

(d) Levinet al. [7]from trimap

(e) Componentsfrom trimap

(f) Componentlabeling

(g) Matting components

Figure 6. Benefits of direct component labeling.

ground or foreground. The labeled components then be-come constrained accordingly in the min-cut problem. Theadvantage of such an interface is illustrated in Figure 6,where the large fuzzy hair areas do not lend themselvesto placement of hard constraints. Thus, the best trimapwe could practically expect leaves such areas unconstrained(Figure 6c). The least squares matte of [7] populates theseareas with average gray values (Figure 6d). In contrast,by searching for the cheapest assignment of matting com-ponents consistent with the trimap, we obtain the matte inFigure 6e. In this case no over-smoothing is observed, butsome of the fuzzy hair was not selected to belong to theforeground. However, if the user is allowed to directly se-lect three additional components (highlighted in red in Fig-ure 6g) as foreground, we obtain the matte in Figure 6f.

5. Quantitative evaluation

To quantitatively evaluate our approach and compareit with previous methods we captured ground truth data.Three different dolls were photographed in front of a com-puter monitor displaying seven different background im-ages (Figure 7a). A ground truth matte was then extractedfor each doll using a least squares framework [12]. Each im-age was downsampled to 560×820 pixels, and the tests de-scribed below were performed on (overlapping) 200×200

Input Our result Levinet al. [7] Wang-Cohen [15] Random Walk [4] Poisson [13]Figure 4. A comparison of mattes produced by different matting methodsfrom minimal user input.

0

200

400

600

800

1000

1200

Monster Lion Monkey

Ave

rage

L2

Err

or

Components (picking)Levin et al. (trimap)Components (trimap)Wang and Cohen (trimap)Random walk (trimap)Poisson (trimap)

0

200

400

600

800

1000

1200

Monster Lion Monkey

Ave

rage

L2

Err

or

Components (picking)Erosion (masksize=6)Erosion (masksize=15)Erosion (masksize=30)Erosion (masksize=60)

(a) Three of the test images (b) A comparison with other matting methods (c) Spectral matting vs. hard segmentationFigure 7. Quantitative evaluation

windows cropped from these images. For our approach, 60matting components were extracted using the 70 smallesteigenvectors of each cropped window. The running timeof our unoptimized matlab implementation (on a 3.2GHzCPU) was a few minutes for each 200×200 window.

To design a comparison between matte extraction usingmatting components and previous matting algorithms weneed to address the two non compatible interfaces, and it isnot clear how to measure the amount of user effort involvedin each case. While previous approaches were designed towork with hard constraints (scribbles or trimap) our newapproach enables a new interaction mode by component se-lection. Therefore, in our experiments we attempted to de-termine how well can each approach do, given the best pos-sible user input. Thus, we first used the ground truth matteto generate an “ideal” trimap. The unknown region in thistrimap was constructed by taking all pixels whose groundtruth matte values are between 0.05 and 0.95, and dilatingthe resulting region by 4 pixels. The resulting trimap wasused as input for four previous matting algorithms: Levinet al. [7], Wang and Cohen [15], random walk matting [4],and Poisson matting [13]. We also ran our method twice

in each experiment: (i) using the same trimap to provide apartial labeling of the matting components, followed by amin-cut computation, as described in section 4.2; and (ii)using the ground truth matte to select the subset of mat-ting components that minimizes the distance of the result-ing matte from the ground truth, thus simulating the idealuser input via the direct component picking interface. TheSSD errors between the mattes produced by the differentmethods and the ground truth matte (averaged over the dif-ferent backgrounds and the different windows) are plottedin Figure 7b. It is apparent that given a sufficiently precisetrimap, our method offers no real advantage (when giventhe same trimap as input) over the least-squares matting ofLevin et al., which produced the most numerically accuratemattes. However, when simulating the best labeling of com-ponents, our approach produced the most accurate mattes,on average.

While our experiment compares the quality of mattesproduced from an ideal input, a more interesting compar-ison might be to measure the amount of user time requiredfor extracting a satisfactory matte with each approach. Ide-ally, we would also like to measure whether (or to what de-

gree) a component picking interface is more intuitive than ascribble based interface. Such a comparison involves a nontrivial user study, and is left for future work.

Given the strong analogy between spectral matting andhard spectral segmentation, we would like to gain someintuition about the possible advantage of using mattingcomponents versus standard hard segmentation components(also known as super-pixels). The answer, of course, de-pends on the application. If the final output is a hard seg-mentation, matting components probably do not offer an ad-vantage over standard hard components. On the other hand,when the goal is a fuzzy matte it is better to explicitly con-struct matting components, as we do, rather than first com-pute a hard segmentation and then feather the boundaries(as in [10, 8], for example). To show this, we compare thetwo approaches. We first extract matting components fromeach of the test images and select the subset of matting com-ponents which will minimize the distance from the groundtruth matte. The second approach is to select a subset ofhard components (we used the available implementation ofYu and Shi [17]) that best approximates the ground truth.We then apply morphological operations (we have experi-mented with several constant radius erosion windows) onthe resulting hard mask, create a trimap and run the mat-ting algorithm of [7]. However, since the optimal radiusof the erosion window strongly depends on the local imagestructure and varies over the image, it is impossible to ob-tain an ideal trimap with a constant radius window. Thisproblem is illustrated visually in the supplementary materi-als. Figure 7c shows the SSD errors (averaged over the dif-ferent backgrounds and the different windows) of the twoapproaches, which indicate that optimally picking the mat-ting components indeed results in more accurate mattes thanthose obtained by feathering a hard segmentation.

6. Discussion

In this work we have derived an analogy between hardspectral image segmentation and image matting, and haveshown how fundamental matting components may be au-tomatically obtained from the smallest eigenvectors of thematting Laplacian. This is a new interesting theoretical re-sult, establishing a link between two previously indepen-dent research areas. From the practical standpoint, mattingcomponents can help automate the matte extraction processand reduce user effort. Matting components also suggest anew mode of user control over the extracted matte: whilein previous methods the result is controlled by placementof hard constraints in image areas where the matte is eithercompletely opaque or completely transparent, our new ap-proach may provide the user with a simple intuitive previewof optional outputs, and thus enables the user to directlycontrol the outcome in the fractional parts of the matte as

(a) (b) (c)Figure 8. Limitations. Top: input image; Bottom: Ground truthmatte (a); Mattes from 70 (b) and 400 (c) eigenvectors.

well.Limitations: Our method is most effective in automat-

ing the matte extraction process for images that consist of amodest number of visually distinct components. However,for highly cluttered images, component extraction provesto be a more challenging task. For example, consider theexample in Figure 8. The input image consists of a largenumber of small components. Projecting the ground truthmatte (Figure 8a) on the subspace spanned by the 70 small-est eigenvectors results in a poor approximation (Figure 8b).Recall that since the matting components are obtained via alinear combination of the eigenvectors, they can do no bet-ter than the eigenvectors themselves, and thus Figure 8b isthe best matte that we could hope to construct from up to 70matting components. Thus, it is quite clear that this numberof components is insufficient to produce an accurate mattefor this image. A better matte may be obtained from the 400smallest eigenvectors (Figure 8c), but even this matte leavesroom for improvement. We have not been able to test morethan 400 eigenvectors due to computational limitations. Wehave empirically observed that this problem is significantlyreduced if matting components are computed in local im-age windows independently. We are currently investigatingmethods for stitching together components obtained in dif-ferent windows.

One major challenge in spectral matting is determiningthe appropriate number of matting components for a givenimage. This is a fundamental difficulty shared by all spec-tral segmentation methods. While the question of automat-ically selecting the number of component has been investi-gated (e.g. [18]), this parameter is still often manually ad-justed. For the applications described in this paper we foundthat a useful strategy is to over-segment the image and groupthe components later using additional cues. A second freeparameter in the algorithm is the number of smallest eigen-vectors from which the components are formed (the numbershould be larger or equal to the number of components). Inpractice, we have observed that the performance is not verysensitive to this number and all results in this paper wereobtained using the 70 smallest eigenvectors.

Future directions: An important potential advantage of

pre-segmenting the image into matting components is theoption to compute meaningful color or texture histograms,or other statistics, within each component. The histogramsimilarity can provide another important cue to guide com-ponent grouping. This ability might significantly improvematting algorithms which make use of color models suchas [15]. For example, the current strategy in [15] is to buildan initial color model using only the small number of pix-els under the scribbles. This poor initialization is known tomake the algorithm sensitive to small shifts in the scribblelocation.

Given the growing interest in the matting problem andthe large amount of recent matting research, it seems thatan important future challenge is the design of an appro-priate comparison between different user interaction modesand different matting algorithms. The ground truth data col-lected in this work is a step toward this goal, yet a properuser study is required in order to evaluate the amount of usertime required for producing good results with each method.

Our code and ground truth data are available at:www.vision.huji.ac.il/SpectralMatting

References

[1] E. Borenstein and S. Ullman. Class-specific, top-down seg-mentation. InECCV, 2002.

[2] Y. Chuang, B. Curless, D. Salesin, and R. Szeliski. ABayesian approach to digital matting. InCVPR, 2001.

[3] J. Dennis, J. Robert, and B. Schnabel.Numerical Methodsfor Unconstrained Optimization and Nonlinear Equations.Prentice-Hall, 1983.

[4] L. Grady, T. Schiwietz, S. Aharon, and R. Westermann. Ran-dom walks for interactive alpha-matting. InVIIP, 2005.

[5] Y. Guan, W. Chen, X. Liang, Z. Ding, and Q. Peng. Easymatting. InEurographics, 2006.

[6] K. Lang. Fixing two weaknesses of the spectral method. InNIPS. 2005.

[7] A. Levin, D. Lischinski, and Y. Weiss. A closed form solu-tion to natural image matting. InCVPR, 2006.

[8] Y. Li, J. Sun, C.-K. Tang, and H.-Y. Shum. Lazy snapping.ACM Trans. Graph., 23(3):303–308, 2004.

[9] A. Ng, M. Jordan, and Y. Weiss. On spectral clustering:Analysis and an algorithm. InNIPS. 2001.

[10] C. Rother, V. Kolmogorov, and A. Blake. “GrabCut”: inter-active foreground extraction using iterated graph cuts.ACMTrans. Graph., 23(3):309–314, 2004.

[11] J. Shi and J. Malik. Normalized cuts and image segmenta-tion. PAMI, 22(8):888–905, 2000.

[12] A. Smith and J. Blinn. Blue screen matting. InProceedingsof ACM Siggraph, pages 259–268, 1996.

[13] J. Sun, J. Jia, C.-K. Tang, and H.-Y. Shum. Poisson matting.ACM Trans. Graph., 23(3):315–321, 2004.

[14] D. Tolliver and G. Miller. Graph partitioning by spectralrounding: Applications in image segmentation and cluster-ing. In CVPR, 2006.

[15] J. Wang and M. Cohen. An iterative optimization approachfor unified image segmentation and matting. InICCV, 2005.

[16] Y. Weiss. Segmentation using eigenvectors: A unifying view.In ICCV, pages 975–982, 1999.

[17] S. X. Yu and J. Shi. Multiclass spectral clustering. InICCV,pages 313–319, 2003.

[18] L. Zelnik-Manor and P. Perona. Self-tuning spectral cluster-ing. In NIPS. 2005.

spectral matting - huji.ac.il · image in figure 1a, one can produce an unsupervised dis-joint hard...

Documents