Deblurring Face Images with Exemplars - Home | School of ... ?· Deblurring Face Images with Exemplars…

Download Deblurring Face Images with Exemplars - Home | School of ... ?· Deblurring Face Images with Exemplars…

Post on 15-Jul-2018




0 download

Embed Size (px)


<ul><li><p>Deblurring Face Images with Exemplars</p><p>Jinshan Pan1?, Zhe Hu2?, Zhixun Su1, and Ming-Hsuan Yang2</p><p>1Dalian University of Technology 2University of California at Merced</p><p>Abstract. The human face is one of the most interesting subjects in-volved in numerous applications. Significant progress has been made to-wards the image deblurring problem, however, existing generic deblur-ring methods are not able to achieve satisfying results on blurry face im-ages. The success of the state-of-the-art image deblurring methods stemsmainly from implicit or explicit restoration of salient edges for kernel es-timation. When there is not much texture in the blurry image (e.g., faceimages), existing methods are less effective as only few edges can be usedfor kernel estimation. Moreover, recent methods are usually jeopardizedby selecting ambiguous edges, which are imaged from the same edge ofthe object after blur, for kernel estimation due to local edge selectionstrategies. In this paper, we address these problems of deblurring faceimages by exploiting facial structures. We propose a maximum a poste-riori (MAP) deblurring algorithm based on an exemplar dataset, with-out using the coarse-to-fine strategy or ad-hoc edge selections. Extensiveevaluations against state-of-the-art methods demonstrate the effective-ness of the proposed algorithm for deblurring face images. We also showthe extendability of our method to other specific deblurring tasks.</p><p>1 Introduction</p><p>The goal of image deblurring is to recover the sharp image and the correspondingblur kernel from one blurred input image. The process under a spatially-invariantmodel is usually formulated as</p><p>B = I k + , (1)</p><p>where I is the latent sharp image, k is the blur kernel, B is the blurred inputimage, is the convolution operator, and is the noise term. The single im-age deblurring problem has attracted much attention with significant advancesin recent years [5, 16, 21, 3, 23, 13, 14, 6, 25]. As image deblurring is an ill-posedproblem, additional information is required to constrain the solutions. One com-mon approach is to utilize prior knowledge from the statistics of natural images,such as heavy-tailed gradient distributions [5, 16, 21, 15], L1/L2 prior [14], andsparsity constraints [1]. While these priors perform well for generic cases, theyare not designed to capture image properties for specific object classes, e.g., textand face images. The methods that exploit specific object properties are likely</p><p>? Both authors contributed equally to this work.</p></li><li><p>2 J. Pan, Z. Hu, Z. Su, and M.-H. Yang</p><p>(a) (b) (c) (d) (e) (f) (g) (h)</p><p>Fig. 1. A challenging example. (a) Blurred face image. (b)-(d) are the results of Choand Lee [3], Krishnan et al. [14], and Xu et al. [25]. (e)-(f) are the intermediate resultsof Krishnan et al. [14] and Xu et al. [25]. (g) Our predicted salient edges visualized byPoisson reconstruction. (h) Our results (with the support size of 75 75 pixels).</p><p>to perform well, e.g., text images [2, 20] and low-light images [10]. As the humanface is one of the most interesting objects that finds numerous applications, wefocus on face image deblurring in this work.</p><p>The success of state-of-the-art image deblurring methods hinges on implicitor explicit extraction of salient edges for kernel estimation [3, 23, 13, 25]. Thosealgorithms employ sharp-edge prediction steps, mainly based on local struc-ture, while not considering the structural information of an object class. Thisinevitably brings ambiguity to salient-edge selection if only considering localappearance, since multiple blurred edges from the same latent edge could be se-lected for kernel estimation. Moreover, for blurred images with less texture, theedge prediction step is less likely to provide robust results and usually requiresparameter tuning, which would downgrade the performance of these methods.For example, face images have similar components and skin complexion with lesstexture than natural images, and existing deblurring methods do not performwell on face images. Fig. 1(a) shows a challenging face example which containsscarce texture due to large motion blur. For such images, it is difficult to restorea sufficient number of sharp edges for kernel estimation using the state-of-the-art methods. Fig. 1(b) and (c) show that the state-of-the-art methods based onsparsity prior [14] and explicit edge prediction [3] do not deblur this image well.</p><p>In this work, we propose an exemplar-based method for face image deblurringto address the above-mentioned issues. To express the structural information, wecollect an exemplar dataset of face images and extract important structures fromexemplars. For each test image, we compare it with the exemplars structure andfind the best matched one. The matched structure is used to reconstruct salientedges and guide the kernel estimation process. The proposed method is able toextract good facial structures (Fig. 1(g)) for kernel estimation, and better restorethis heavily blurred image (Fig. 1(h)). We will also demonstrate its ability toextend to other objects.</p><p>2 Related Work</p><p>Image deblurring has been studied extensively and numerous algorithms havebeen proposed. In this section we discuss the most relevant algorithms and putthis work in proper context.</p></li><li><p>Deblurring Face Images with Exemplars 3</p><p>Since blind deblurring is an ill-posed problem, it requires certain assumptionsor prior knowledge to constrain the solution space. Early approaches, e.g., [26],usually use the assumptions of simple parametric blur kernels to deblur images,which cannot deal with complex motion blur. As image gradients of natural im-ages can be described well by a heavy-tailed distribution, Fergus et al. [5] use amixture of Gaussians to learn the prior for deblurring. Similarly, Shan et al. [21]use a parametric model to approximate the heavy-tailed prior of natural images.In [1], Cai et al. assume that the latent images and kernels can be sparsely repre-sented by an over-complete dictionary based on wavelets. On the other hand, ithas been shown that the most favorable solution for a MAP deblurring methodwith sparse prior is usually a blurred image rather than a sharp one [15]. Conse-quently, an efficient approximation of marginal likelihood deblurring method isproposed in [16]. In addition, different sparsity priors have been introduced forimage deblurring. Krishnan et al. [14] present a normalized sparsity prior andXu et al. [25] use L0 constraint on image gradients for kernel estimation. Re-cently, non-parametric patch priors that model appearance of image edges andcorners have also been proposed [22] for blur kernel estimation. We note thatalthough the use of sparse priors facilitates kernel estimation, it is likely to failwhen the blurred images do not contain rich texture.</p><p>In addition to statistical priors, numerous blind deblurring methods explicitlyexploit edges for kernel estimation [3, 23, 13, 4]. Joshi et al. [13] and Cho etal. [4] directly use the restored sharp edges from a blurred image for kernelestimation. In [3], Cho and Lee utilize bilateral filter together with shock filterto predict sharp edges. The blur kernel is determined by alternating betweenrestoring sharp edges and estimating the blur kernel in a coarse-to-fine manner.As strong edges extracted from a blurred image are not necessarily useful forkernel estimation, Xu and Jia [23] develop a method to select informative ones fordeblurring. Despite demonstrated success, these methods rely largely on heuristicimage filtering methods (e.g., shock and bilateral filters) for restoring sharpedges, which are less effective for objects with known geometric structures.</p><p>For face image deblurring, there are a few algorithms proposed to boostrecognition performance. Nishiyama et al. [18] learn subspaces from blurred faceimages with known blur kernels for recognition. As the set of blur kernels ispre-defined, the application domain of this approach is limited. Zhang et al. [27]propose a joint image restoration and recognition method based on sparse rep-resentation prior. However, this method is most effective for well-cropped faceimages with limited alignment errors and simple motion blurs.</p><p>Recently, HaCohen et al. [8] propose a deblurring method which uses a sharpreference example for guidance. The method requires a reference image with thesame content as the input and builds up dense correspondence for reconstruction.It has shown decent results on deblurring specific images, however, the usage ofthe same-content reference image restrains its applications. Different from thismethod, we do not require the exemplar to be similar to the input. The blurredface image can be of different identity and background compared to any exemplarimages. Moreover, our method only needs the matched contours that encode the</p></li><li><p>4 J. Pan, Z. Hu, Z. Su, and M.-H. Yang</p><p>(a) (b) (c) (d) (e) (f) (g)</p><p>(h) (i) (j) (k) (l) (m) (n)</p><p>Fig. 2. The influence of salient edges in kernel estimation. (a) True image and kernel.(h) Blurred image. (b)-(f) are extracted salient edges from the clear images visualizedby Poisson reconstruction. (g) shows the ground truth edges of (a). (i)-(n) are theresults by using edges (b)-(g), respectively.</p><p>global structure of the exemplar for kernel estimation, instead of using densecorresponding pixels. In this sense, our method is more general on the objectdeblurring task with less constraints.</p><p>3 Proposed Algorithm</p><p>As the kernel estimation problem is non-convex [5, 16], most state-of-the-art de-blurring methods use coarse-to-fine approaches to refine the estimated kernels.Furthermore, explicit or implicit edge selections are adopted to constrain andconverge to feasible solutions. Notwithstanding demonstrated success in deblur-ring images, these methods are less effective for face images that contain fewertextures. To address these issues, we propose an exemplar-based algorithm toestimate blur kernels for face images. The proposed method extracts good struc-tural information from exemplars to facilitate estimating accurate kernels.</p><p>3.1 Structure of Face Images</p><p>We first determine the types and number of salient edges for kernel estimationwithin the context of face deblurring. For face images, the salient edges thatcapture the object structure could come from the lower face contour, mouth, eyes,nose, eyebrows and hair. As human eyebrows and hair have small edges whichcould jeopardize the performance [23, 11], combined with their large variation,we do not take them into consideration as useful structures. Fig. 2 shows severalcomponents extracted from a clear face image as approximations of the latentimage for kernel estimation (the extraction step will be described later). Wetest those edges by posing them as the predicted salient edges in the deblurringframework and estimate the blur kernels according to [16] by</p><p>k = arg minkS k B22 + k0.5, (2)</p></li><li><p>Deblurring Face Images with Exemplars 5</p><p>(b) (c) (d) (e) (f) (g)0.5</p><p>0.6</p><p>0.7</p><p>0.8</p><p>0.9</p><p>1</p><p> Salient edges</p><p> Ker</p><p>nel</p><p> sim</p><p>ilari</p><p>ty (</p><p>KS</p><p>)</p><p>Average KS on the datasetKS of Figure 2</p><p>Fig. 3. The relationship between extracted salient edges and kernel estimation accu-racy(KS is the abbreviation of kernel similarity). The notation (b)-(g) for salientedges represent the 6 edge status as Fig. 2(b)-(g).</p><p>where S is the gradients of the salient edges extracted from an exemplar imageas shown in Fig. 2(b)-(g), B is the gradient computed from the blurred input(Fig. 2(h)), k is the blur kernel, and is a weight (e.g., 0.005 in this work) for thekernel constraint. The sparse deconvolution method [16] with a hyper-Laplacianprior L0.8 is employed to recover the images (Fig. 2(i)-(n)). The results showthat the deblurred result using the above-mentioned components (e.g., Fig. 2(l)and (m)), is comparable to that using the ground truth edges (Fig. 2(n)), whichis the ideal case for salient edge prediction.</p><p>To validate the above-mentioned point, we collect 160 images generated from20 images (10 images from CMU PIE dataset [7] and 10 images from the Inter-net) convolving with 8 blur kernels and extract their corresponding edges fromdifferent component combination (i.e., Fig. 2(b)-(g)). We conduct the same ex-periment as Fig. 2, and compute the average accuracy of the estimated kernelsin terms of kernel similarity [11]. The curve in Fig. 3 depicts the relationshipbetween the edges of facial components and the accuracy of the estimated ker-nel. As shown in the figure, the metric tends to converge as all the mentionedcomponents (e.g., Fig. 2(e)) are included, and the set of those edges is sufficient(kernel similarity value of 0.9 in Fig. 3) for accurate kernel estimation.</p><p>For real-world applications, the ground-truth edges are not available. Recentmethods adopt thresholding or similar techniques to select salient edges for ker-nel estimation and this inevitably introduces some incorrect edges from a blurredimage. Furthermore, the edge selection strategies, either explicitly or implicitly,consider only local edges rather than structural information of a particular objectclass, e.g., facial components and contour. In contrast, we consider the geometricstructures of a face image for kernel estimation. From the experiments with dif-ferent facial components, we determine that the set of lower face contour, mouthand eyes is sufficient to achieve high-quality kernel estimation and deblurred re-sults. More importantly, these components can also be robustly extracted [29]unlike the other parts (e.g., eyebrows or nose in Fig. 2(a)). Thus, we use thesethree components as the informative structures for face image deblurring.</p></li><li><p>6 J. Pan, Z. Hu, Z. Su, and M.-H. Yang</p><p>(a) Input image (b) Initial contour (c) Refined contour</p><p>Fig. 4. Extracted salient edges (See Sec. 3.2 for details)</p><p>3.2 Exemplar Structures</p><p>We collect 2, 435 face images from the CMU PIE dataset [7] as our exemplars.The selected face images are from different identities with variant facial expres-sions and poses. For each exemplar, we extract the informative structures (i.e.,lower face contour, eyes and mouth) as discussed in Sec. 3.1. We manually locatethe initial contours of the informative components (Fig. 4(b)), and use the guidedfilter [9] to refine the contours. The optimal threshold, computed by the Otsumethod [19], is applied to each filtered image for the refined contour maskM ofthe facial components (Fig. 4(c)). Thus, a set of 2, 435 exemplar structures aregenerated as the potential facial structure for kernel estimation.</p><p>Given a blurred image B, we search for its best matched exemplar structure.We use the maximum response of normalized cross-correlation as the measureto find the best candidate based on their gradients</p><p>vi = maxt</p><p>{ xB(x)Ti(x+ t)</p><p>B(x)2Ti(x+ t)2</p><p>}, (3)</p><p>where i is the index of the exemplar, Ti(x) is the i-th exemplar, and t is thepossible shift between image gradients B(x) and Ti(x). If B(x) is similarto Ti(x), vi is large; otherwise, vi is small. To deal with different scales, weresize each exemplar with sampl...</p></li></ul>