temporal coherence-based deblurring using non-uniform ...download.xuebalib.com/xltswdbwci.pdf ·...

15
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 26, NO. 10, OCTOBER 2017 4991 Temporal Coherence-Based Deblurring Using Non-Uniform Motion Optimization Congbin Qiao, Rynson W. H. Lau, Senior Member, IEEE , Bin Sheng, Member, IEEE, Benxuan Zhang, and Enhua Wu, Member, IEEE Abstract—Non-uniform motion blur due to object movement or camera jitter is a common phenomenon in videos. However, the state-of-the-art video deblurring methods used to deal with this problem can introduce artifacts, and may sometimes fail to handle motion blur due to the movements of the object or the camera. In this paper, we propose a non-uniform motion model to deblur video frames. The proposed method is based on superpixel matching in the video sequence to reconstruct sharp frames from blurry ones. To identify a suitable sharp superpixel to replace a blurry one, we enrich the search space with a non-uniform motion blur kernel, and use a general- ized PatchMatch algorithm to handle rotation, scale, and blur differences in the matching step. Instead of using pixel-based or regular patch-based representation, we adopt a superpixel- based representation, and use color and motion to gather similar pixels. Our non-uniform motion blur kernels are estimated from the motion field of these superpixels, and our spatially varying motion model considers spatial and temporal coherence to find sharp superpixels. Experimental results showed that the proposed method can reconstruct sharp video frames from blurred frames caused by complex object and camera movements, and performs better than the state-of-the-art methods. Index Terms— Video deblurring, non-uniform blur kernel, image deblurring, video processing. I. I NTRODUCTION V IDEO capture has become very popular in recent years, largely because of the widespread availability of digi- tal cameras. However, motion blur is unavoidable in casual video capture due to camera shake and object motion during Manuscript received May 3, 2016; revised November 26, 2016 and July 19, 2017; accepted July 19, 2017. Date of publication July 24, 2017; date of current version August 7, 2017. This work was supported in part by the National Natural Science Foundation of China under Grant 61632003, Grant 61572316, Grant 61672502, and Grant 61671290, in part by the National High-tech R&D Program of China (863 Program) under Grant 2015AA015904, in part by the Key Program for International S&T Cooperation Project of China under Grant 2016YFE0129500, in part by the Science and Technology Commission of Shanghai Municipality under Grant 16DZ0501100, in part by a SRG Grant from the City University of Hong Kong under Grant 7004415, in part by the Natural Science Foundation of Jiangsu Province under Grant BK20140404, and in part by the interdisciplinary Program of Shanghai Jiao Tong University under Grant 14JCY10. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Jean-Francois Aujol. (Corresponding author: Bin Sheng.) C. Qiao, B. Sheng, and B. Zhang are with the Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, China (e-mail: [email protected]). R. W. H. Lau is with the Department of Computer Science and Engineering, the City University of Hong Kong, Hong Kong. E. H. Wu is with the Institute of Software, Chinese Academy of Sciences, Beijing 100190, China. Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TIP.2017.2731206 exposure. While camera shake may cause different degrees of frame blur, object motion can aggravate the spatiotemporal complexity of the blur. Thus, video deblurring is an important but challenging task in video processing. As blur kernel estimation and image deconvolution are both ill-posed problems, most state-of-the-art deblurring methods may impose specific requirements, such as special hard- ware [1] to obtain motion, or assume that a blurry image is convoluted by a global blur kernel [2]. With multiple input images, the problem can be simplified by assuming a static scene [3] or one with few moving objects [4], [5]. While these deblurring methods may have good results under the specified conditions, they may not be able to represent the spatially varying motion due to camera shake and object motion using a single blur kernel, which is the key for deconvolution- based image/video deblurring methods. In addition, image deconvolution can cause additional artifacts due to small changes in noise or inaccuracy in blur kernel estimation. Thus, a method that can deblur casual videos is desired. Since different frames as well as parts of a frame in a video sequence may have different degrees of sharpness, some sharp frames or regions of a frame may be used to restore other blurry neighbor frames or regions based on temporal coherence. In [6], blurred-unblurred frame pairs in a video are used to refine blur kernel estimation. However, this method still suffers from the same limitation as deconvolution-based methods. In [7], sharp frames are used to replace blurry frames based on path-based synthesis, and blurry artifacts in a scene without moving objects or with only minor object movement are removed. However, such homography-based methods may fail to handle large-scale object movements, which typi- cally involve complex scale/rotation transformations. Although some generalized homography-based methods [8], [9] may be able to handle large-scale motions better, they are still restricted to the image as a whole and cannot handle object motions in video frames. In general, objects in a video likely have different motions. A global motion model or a limited, spatially varying motion model, such as the homography-based motion model, cannot represent actual motion blur. In this paper, we propose a non-uniform motion blur model to solve this problem. First, we use an adapted Simple Linear Iterative Clustering (SLIC) segmentation algorithm to divide each image into superpixels by considering image texture and motion, which preserves object outlines better than regular patch-based image repre- sentation. Second, we compute a non-uniform motion blur model, estimated by the motion vector of each superpixel, 1057-7149 © 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Upload: others

Post on 17-Oct-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Temporal Coherence-based Deblurring Using Non-uniform ...download.xuebalib.com/xlTswdbWCi.pdf · Benxuan Zhang, and Enhua Wu, Member, IEEE Abstract—Non-uniform motion blur due to

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 26, NO. 10, OCTOBER 2017 4991

Temporal Coherence-Based Deblurring UsingNon-Uniform Motion Optimization

Congbin Qiao, Rynson W. H. Lau, Senior Member, IEEE, Bin Sheng, Member, IEEE,Benxuan Zhang, and Enhua Wu, Member, IEEE

Abstract— Non-uniform motion blur due to object movementor camera jitter is a common phenomenon in videos. However,the state-of-the-art video deblurring methods used to deal withthis problem can introduce artifacts, and may sometimes failto handle motion blur due to the movements of the object orthe camera. In this paper, we propose a non-uniform motionmodel to deblur video frames. The proposed method is basedon superpixel matching in the video sequence to reconstructsharp frames from blurry ones. To identify a suitable sharpsuperpixel to replace a blurry one, we enrich the search spacewith a non-uniform motion blur kernel, and use a general-ized PatchMatch algorithm to handle rotation, scale, and blurdifferences in the matching step. Instead of using pixel-basedor regular patch-based representation, we adopt a superpixel-based representation, and use color and motion to gather similarpixels. Our non-uniform motion blur kernels are estimated fromthe motion field of these superpixels, and our spatially varyingmotion model considers spatial and temporal coherence to findsharp superpixels. Experimental results showed that the proposedmethod can reconstruct sharp video frames from blurred framescaused by complex object and camera movements, and performsbetter than the state-of-the-art methods.

Index Terms— Video deblurring, non-uniform blur kernel,image deblurring, video processing.

I. INTRODUCTION

V IDEO capture has become very popular in recent years,largely because of the widespread availability of digi-

tal cameras. However, motion blur is unavoidable in casualvideo capture due to camera shake and object motion during

Manuscript received May 3, 2016; revised November 26, 2016 andJuly 19, 2017; accepted July 19, 2017. Date of publication July 24, 2017;date of current version August 7, 2017. This work was supported inpart by the National Natural Science Foundation of China under Grant61632003, Grant 61572316, Grant 61672502, and Grant 61671290, in partby the National High-tech R&D Program of China (863 Program) underGrant 2015AA015904, in part by the Key Program for International S&TCooperation Project of China under Grant 2016YFE0129500, in part by theScience and Technology Commission of Shanghai Municipality under Grant16DZ0501100, in part by a SRG Grant from the City University of Hong Kongunder Grant 7004415, in part by the Natural Science Foundation of JiangsuProvince under Grant BK20140404, and in part by the interdisciplinaryProgram of Shanghai Jiao Tong University under Grant 14JCY10. Theassociate editor coordinating the review of this manuscript and approvingit for publication was Prof. Jean-Francois Aujol. (Corresponding author:Bin Sheng.)

C. Qiao, B. Sheng, and B. Zhang are with the Department of ComputerScience and Engineering, Shanghai Jiao Tong University, Shanghai 200240,China (e-mail: [email protected]).

R. W. H. Lau is with the Department of Computer Science and Engineering,the City University of Hong Kong, Hong Kong.

E. H. Wu is with the Institute of Software, Chinese Academy of Sciences,Beijing 100190, China.

Color versions of one or more of the figures in this paper are availableonline at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TIP.2017.2731206

exposure. While camera shake may cause different degreesof frame blur, object motion can aggravate the spatiotemporalcomplexity of the blur. Thus, video deblurring is an importantbut challenging task in video processing.

As blur kernel estimation and image deconvolution are bothill-posed problems, most state-of-the-art deblurring methodsmay impose specific requirements, such as special hard-ware [1] to obtain motion, or assume that a blurry image isconvoluted by a global blur kernel [2]. With multiple inputimages, the problem can be simplified by assuming a staticscene [3] or one with few moving objects [4], [5]. While thesedeblurring methods may have good results under the specifiedconditions, they may not be able to represent the spatiallyvarying motion due to camera shake and object motion usinga single blur kernel, which is the key for deconvolution-based image/video deblurring methods. In addition, imagedeconvolution can cause additional artifacts due to smallchanges in noise or inaccuracy in blur kernel estimation. Thus,a method that can deblur casual videos is desired.

Since different frames as well as parts of a frame in avideo sequence may have different degrees of sharpness, somesharp frames or regions of a frame may be used to restoreother blurry neighbor frames or regions based on temporalcoherence. In [6], blurred-unblurred frame pairs in a videoare used to refine blur kernel estimation. However, this methodstill suffers from the same limitation as deconvolution-basedmethods. In [7], sharp frames are used to replace blurry framesbased on path-based synthesis, and blurry artifacts in a scenewithout moving objects or with only minor object movementare removed. However, such homography-based methods mayfail to handle large-scale object movements, which typi-cally involve complex scale/rotation transformations. Althoughsome generalized homography-based methods [8], [9] maybe able to handle large-scale motions better, they are stillrestricted to the image as a whole and cannot handle objectmotions in video frames.

In general, objects in a video likely have different motions.A global motion model or a limited, spatially varying motionmodel, such as the homography-based motion model, cannotrepresent actual motion blur. In this paper, we propose anon-uniform motion blur model to solve this problem. First,we use an adapted Simple Linear Iterative Clustering (SLIC)segmentation algorithm to divide each image into superpixelsby considering image texture and motion, which preservesobject outlines better than regular patch-based image repre-sentation. Second, we compute a non-uniform motion blurmodel, estimated by the motion vector of each superpixel,

1057-7149 © 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Page 2: Temporal Coherence-based Deblurring Using Non-uniform ...download.xuebalib.com/xlTswdbWCi.pdf · Benxuan Zhang, and Enhua Wu, Member, IEEE Abstract—Non-uniform motion blur due to

4992 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 26, NO. 10, OCTOBER 2017

to approximate object motion in each blurry frame. Third, wesearch for a sharp version of each blurry superpixel temporally(i.e., from a sharp neighbor frame) and spatially (i.e., from thesame frame) using a local blur kernel. We then apply a texturesynthesis algorithm to reconstruct the deblurred frame to avoidintroducing blurry or blocky artifacts.

This paper makes two main contributions. First, we proposea non-uniform motion model to represent the movementsof superpixels based on the estimated motion field. Second,we improve the PatchMatch algorithm [10], [11] using themotion blur kernels, and extend the search both spatially andtemporally to find the most appropriate sharp superpixels.Experimental results show that our non-uniform motion modelproduces more accurate deblurred frames than state-of-the-art methods and avoids generating extra artifacts, especiallyringing artifacts.

II. RELATED WORK

Image or video deblurring has been extensively studied,and many proposed methods have yielded great success.Although a blurry image can be sharpened by convolutionthrough different point-spread functions (PSF) or blur kernels,e.g., spatially varying PSFs, restoring a blurry image is inher-ently an ill-posed problem. While video deblurring can makeuse of extra information from adjacent frames in the deblurringprocess, it must also consider additional problems, includingspatial and temporal coherence. Here, we summarize the mainstudies on image and video deblurring.

A. Single-Image Deblurring

Single-image deblurring: This problem has been extensivelystudied, and a number of approaches have recently beenproposed. Most of them are based on a uniform blur modelwith different image priors [12], [13]. The imposed assump-tions on the motion blur, e.g., the entire image having auniform blur, make these methods difficult to be applieddirectly to video frames, which typically involve complexmotion blur.

There are methods that attempt to estimate spatially varyingmotion blur. Whyte et al. [14] estimate a non-uniform blurkernel for 3D camera rotation, but it does not consider motionblur caused by object movements. Xu et al. [15] proposed amotion deblurring model based on unnatural representation,which can be applied to uniform and non-uniform motionblur. However, since it only considers camera rotation andtranslation, it shares the same limitation as [14]. Levin [16]proposed a deblurring approach for images with few movingobjects. It assumes that only part of the image is blurredsuch that the image can be segmented into blurred andunblurred layers, and uses image statistics to distinguishand correct the motion blur against a stable background.Shan et al. [17] proposed a rotational motion blur modelusing a Fourier transform with an estimated transparency map.The motion kernel is space variant but needs user interactionto indicate object motions. Dai and Wu [18] proposed anα-motion blur constraint model to estimate a blur kernel foreach pixel based on matting extraction. However, it relies

strongly on the accuracy of the mattes. Kim and Lee [19]proposed an energy model to estimate the motion flow and thedeblurred image based on the robust total variation (TV)-L1model. This method can handle abrupt motion changes withoutsegmentation.

As these single-image deblurring methods focus on deblur-ring a single image, directly applying them to recover blurryvideos frame by frame cannot guarantee temporal consistency.

B. Multi-Image Deblurring

With multiple input images, more information can beobtained to help improve the quality of the deblurred results.Rav-Acha and Peleg [20] use two blurry images in differentblur directions to obtain deblurred results superior to thoseof the single-image deblurring methods. Some methods areproposed recently to reconstruct a shape image from two ormore blurry images by deconvolution using different motionkernels, e.g., [2], [21], [22]. However, all these methodsassume a global motion kernel for each image and sharethe same limitation as the uniform motion model in single-image deblurring methods. To handle non-uniform motion blurintroduced by moving objects, Onogi and Saito [4] proposeda video sequence restoration method by segmenting a blurryimage into background and foreground based on movingobjects. However, it assumes that the blur kernels estimated bythe motion are accurate, and uses them to directly deblur theimage. Cho et al. [5] segment a blurry image into backgroundand a few moving objects with similar motion. Their spatiallyvarying motion blur kernels were defined by a uniform motionblur of the moving objects. However, real object motion istypically non-uniform. Tai et al. [1] proposed a non-uniformmotion model using a hybrid camera system. They estimatespatially varying motion blur kernels using optical flows.While our method is similar to [1], it does not require a specialcamera. We compute optical flows from adjacent frames inorder to estimate non-uniform motion blur kernels. Althoughthe estimate kernels are not as accurate as those of [1], theyare sufficiently robust to model motion in the blurry framewith our deconvolution-free deblurring.

C. Video Deblurring

This typically involves more complex motion.Matsushita et al. [23] proposed an interpolation-basedvideo deblurring method. They align a sharp frame and ablurry frame and replace the blurry pixels using the sharpones. However, they do not consider the motion differencebetween these frames. Li et al. [3] proposed a deblurringmethod using a homograph-based model in their panoramagenerating system. Since they consider the homograph-based motion model as an accurate blur kernel, they failto deconvolute moving objects. He et al. [24] proposed tosegment moving objects into small regions and track theircorner points across frames. By interpolating the motionwithin each region, spatially varying blur kernels can thenbe computed for deblurring. While this method proposed anew way to address the accuracy problem of optical flowon blurred videos, relying on the detection of shape corner

Page 3: Temporal Coherence-based Deblurring Using Non-uniform ...download.xuebalib.com/xlTswdbWCi.pdf · Benxuan Zhang, and Enhua Wu, Member, IEEE Abstract—Non-uniform motion blur due to

QIAO et al.: TEMPORAL COHERENCE-BASED DEBLURRING USING NON-UNIFORM MOTION OPTIMIZATION 4993

Fig. 1. Overview of the proposed video deblurring method.

points across frames is not reliable, e.g., under cameramotions. Takeda and Milanfar [25] proposed a shift-invariantspace-time point spread function (PSFs) for video deblurring.Its advantage is that no object segmentation or motioninformation is needed. However, it requires the cameraparameters to be fixed throughout the video and the spatialand temporal PSFs to be known, which may not be practicalfor most applications. Lee et al. [6] proposed a residualdeconvolution deblurring method using a blurred-unblurredframe pair. They iteratively refine the blur kernel to reduceringing artifacts caused by conventional deconvolution.However, their estimated blur kernel is uniform for a frameand cannot handle complex motion blur.

In [26], Wulff and Black proposed a layered model to jointlyestimate parametric motion and scene segmentation. This layermodel separates a frame into a few layers. In each layer, it canonly handle one or two moving objects. However, this affinemodel cannot approximate motion blur in complex scenesand has difficulties in handleing boundaries between layers.Delbracio and Sapiro [27] proposed a uniform deblurringmethod based on first consistently registering adjacent framesand then fusing them blockwise in the Fourier domain. Whilethis method is very efficient, it assumes the input video to havea uniform motion blur and therefore aims at handling onlycamera shakes. Kim and Lee [28] proposed a single energyfunction to estimate the optical flows and deblur the frames.While this method can handle non-uniform motion blur, itrequires minimizing a complex energy function. On the otherhand, our proposed method tries to restore the image throughreplacing blurred superpixels with sharp ones obtained fromother frames. Cho et al. [7] proposed a deblurring methodbased on the motion model proposed in [3], which usespatches from nearby sharp frames to restore blurry framesthrough texture synthesis. Although this method can effec-tively reconstruct blurry videos, it has two limitations. First,it assumes homography-based motion, which cannot handlediverse object movements, as mentioned earlier. Second, theirregular patch-based search and synthesis do not distinguishbetween moving objects and background. On the contrary, wepropose a non-uniform motion model to handle motion blurdue to camera and object movements. Although our methodalso involves patch-based deblurring, we consider motion insegmentation.

In summary, our method differs from previous work in thatour main goal is not only to handle blurs caused by a shakycamera, but also complex blurs due to moving objects.

Algorithm 1 : Summary of the Proposed Method

III. FRAME DEBLURRING ALGORITHM

Our approach is based on the assumptions that the blurkernel varies spatially in a frame and temporally amongframes, and sharp frames or superpixels are dispersed in theentire video sequence. We let I0, …, In be the input videoframes, and L0, …, Ln the latent frames to be reconstructed.We also let Ii be a blurry frame, and I j an adjacent frame withsharp superpixels as candidates to replace the blurry ones in Ii .Our goal here is to reconstruct Li , a sharp version of the blurryframe Ii in the video sequence. Thus, our method needs to findproper sharp superpixels in I j to replace the blurry superpixelsin Ii . We propose a PatchMatch-based search strategy withspatial and temporal coherence to find these sharp superpixels.

Fig. 1 shows an overview of the proposed deblurringmethod, which has four steps. First, we compute the motionfield to describe pixel motion between adjacent frames(Section III-A). Second, we propose a motion-based seg-mentation method to segment each frame into superpixels(Section III-B). Third, we present our non-uniform motionblur model to describe the motion of each superpixel in eachframe (Section III-C). Finally, we present our PatchMatch-based search method, which uses the non-uniform motionmodel to find sharp superpixels, and our superpixel synthe-sis strategy, which replaces blurry superpixels and smoothessuperpixel boundaries (Section III-D). Algorithm 1 summa-rizes our method.

A. Computing the Motion Field

As the motion field of two adjacent frames roughly indicatesthe trajectories of local motions during the exposure time, wecompute the motion field based on optical flow [29]. For each

Page 4: Temporal Coherence-based Deblurring Using Non-uniform ...download.xuebalib.com/xlTswdbWCi.pdf · Benxuan Zhang, and Enhua Wu, Member, IEEE Abstract—Non-uniform motion blur due to

4994 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 26, NO. 10, OCTOBER 2017

pixel in frame Ii , we calculate the backward motion vectorfrom Ii to Ii−1 as (u0, v0), and the forward motion vector fromIi to Ii+1 as (u1, v1), where u and v are the horizontal andvertical motions. The motion vector for a superpixel is thenthe average of all pixel motion vectors within the superpixel.

This motion field plays a number of important roles in ourmotion model. First, our segmentation method uses it to gatherpixels of similar motion to form superpixels. Second, we useit to estimate spatially varying motion blur. Third, we use itto compensate for the misalignment among neighbor framesto locate the search window.

The degree of motion blur of a pixel can be approximated bythe displacement distance during the exposure time. As in [7],we estimate the sharpness of pixel p using the forward andbackward motion vectors, which approximate the displacementof p between adjacent frames:

αp = exp(

− (u0p)

2 + (v0p)

2 + (u1p)

2 + (v1p)

2

4δ2

), (1)

where δ is a constant such that αp ranges from 0 to 1. δ is setto 10 in our implementation. The sharpness of a frame or asuperpixel is defined using the average sharpness value of allpixels within it.

B. Motion-Based Segmentation

Simply dividing a frame into regular patches may increasesmatching errors when finding sharp patches from other frames.Here, we adopt the SLIC algorithm [30] to segment eachframe into superpixels. The original SLIC algorithm is anadaption of k-means clustering for superpixel generation witha small search space, and takes color and spatial distance intoaccount in the segmentation to preserve object boundaries. Ourmotion-based segmentation method adds the motion vectors todistance metric for the following two reasons. First, the motionvectors can describe local motions between adjacent framesand thus help distinguish different moving objects. Second, themotion vectors are used to estimate the motion blur kernels,and therefore the amount of deblurring. Hence, the distancebetween two pixels p and q is computed as:

D(p, q) =√

(dc(p, q)

Nc

)2 + (ds(p, q)

Ns

)2 + δ(dm(p, q)

Nm

)2,

(2)

where dc(p, q) and ds(p, q) are the color and spatial dis-tances, respectively, between p and q . Nc and Ns are themaximum color and spatial distances, respectively. We defineNc seperately for each superpixel from the previous iterationto maintain a superpixel size of approximately Ns = 25 (theaverage superpixel size in our implementation). dm(p, q) isthe motion distance between p and q and is defined as:

dm(p, q)

=√

(u0p − u0

q)2 + (v0p − v0

q)2 + (u1p − v1

q)2 + (v1p − v1

q)2,

(3)

where (u0p, v

0p) and (u1

p, v1p) are the motion vectors for p,

while (u0q , v0

q ) and (u1q , v1

q ) are those for q . As with the zero

Fig. 2. Comparing SLIC superpixel segmentation with our motion-basedSLIC segmentation. The first row shows the zero parameter version ofSLIC [30]. The second row is our motion-based SLIC. Our segmentation candistinguish moving objects and segment blurry images better. (a) Originalvideo frame and its segmentations. (b) Enlarged views of the segmentationregion for “biker.” (c) Enlarged views of the segmentation region for “Car-II.”(d) Enlarged views of the segmentation region for “Car-I.”

parameter version of SLIC, we set the maximum motion dis-tance Nm adaptively for each superpixel using the maximumdm value obtained from the previous iteration.

To set the effect of the motion field on segmentation,we introduce δ in Eq. 2. A larger δ can distinguish pixelswith different motions, but may also destroy object structure.Our experiments show that setting δ = 3 provides a goodbalance among color distance, spatial distance and motion inthe segmentation. Fig. 2 compares our motion-based SLICwith the zero parameter version of SLIC. We see that motioninformation helps generate more accurate segmentation inmoving objects. After segmentation, Ii can be represented as asuperpixel set �i , and we use I t

i ∈ �i , to denote a superpixel.Note that a blurry frame may contain mostly blurry super-

pixels. The blur becomes much severe if a superpixel repre-sents a fast moving object. Even though we have consideredthe motion vectors, it may still be difficult to accurately seg-ment a significantly blurred frame into superpixels. However,unlike other region-based image deblurring methods [4], [5],which rely on accurate segmentation of moving objects, weonly need to obtain superpixels that can roughly separatedifferent motion parts and then estimate a blur kernel foreach superpixel for matching. Since our superpixels are small(around 25 × 25 pixels), only a small amount of edgeinformation is enough to improve the accuracy of superpixelmatching.

C. Non-Uniform Motion Blur Kernel EstimationAfter a frame is segmented into superpixels, we estimate a

blur kernel for each superpixel and combine the blur kernelsof all superpixels to form the spatially varying motion blurkernels. To minimize estimation errors due to severely blurredmoving objects, we optimize the blur kernels from neighborsuperpixels to find globally smoothed blur kernels. The esti-mated motion blur kernels serve as the initial blur filter in thesuperpixel matching step.

We compute the average backward and forward motionvectors for each superpixel I t

i ∈ �i , as (u0t , v

0t ) and (u1

t , v1t ),

respectively. In a video, the motion of a pixel across frames canbe described by a complex motion kernel, which is difficult toestimate. A motion vector indicates total displacement duringthe interval between one frame and the next. In general, the

Page 5: Temporal Coherence-based Deblurring Using Non-uniform ...download.xuebalib.com/xlTswdbWCi.pdf · Benxuan Zhang, and Enhua Wu, Member, IEEE Abstract—Non-uniform motion blur due to

QIAO et al.: TEMPORAL COHERENCE-BASED DEBLURRING USING NON-UNIFORM MOTION OPTIMIZATION 4995

Fig. 3. Estimation of blur kernels. (a) shows blur kernels generated using the original motion field, whereas (b) shows the smoothened blur kernels generatedafter global optimization. (b) and (c) show the estimation process where we vary parameter θ from 0 to 0.5 in Eq. 3 to optimize the final blur kernels.

video frame rate is over 24 f ps and the time interval shouldbe smaller than 40ms. As the exposure time should be shorterthan this interval, we can simplify the blur model by assumingconstant velocity motion. Like [3] and [17], we integrate theper-superpixel motion vectors to form spatially varying blurkernels. Since the motion vectors obtained in Section III-Aare between two frames, we need to estimate the motion vectorduring the exposure time, which is often less than 20ms.We thus compute a ratio between 0 to 0.5 for an optimalvalue.

To estimate the motion blur for all superpixels in φi , weneeds to minimize the following error:

Eblur = minθ

I ti ∈�i

(‖Lti ⊗ K t

i (θ) − I ti ‖)2, (4)

where I ti is a superpixel in �i of Ii , and Lt

i is the correspond-ing superpixel in Li by adding the motion vector of I t

i to I ti .

K ti is the motion kernel that we need to estimate for I t

i . θ isthe ratio of the exposure time to the frame interval, and is usedto determine the kernel length. We increase θ from 0 to 0.5in increments of 0.05 to find the optimum kernel length foreach superpixel.

Since Li is the latent frame that we need to reconstruct, wefind a sharp neighbor frame I j to approximate it. We set thesharpness threshold to 0.85 in our implementation. A frameor a superpixel is considered sharp if its sharpness is greaterthan 0.85. In a video sequence, however, moving objectsor camera shake may cause significant misalignment errorsbetween Ii and I j . If we let c be the motion vector betweenI ti and I t

j , where I tj ∈ � j is the corresponding superpixel

of I ti , we may define f as an alignment function using c

to compensate for the motion; thus, f (I tj ) approximates I t

i .In addition, since the alignment error may increase signifi-cantly as the frame distance increases, not all video frameswith sharpness greater than 0.85 can be candidates to esti-mate the motion model for Ii . Hence, we set a temporalwindow:

Wi = {k : ∀ j, max( j − ω, 0) < k < min( j +ω, n − 1)}, (5)

where ω is the window size and is set to 5 in our implemen-tation. The motion blur kernel for Ii is then estimated as:

Eblur = minθ

∑j ∈ Wi

α j > 0.85

∑I ti ∈ �i

I tj ∈ � j

(‖ f (I tj ) ⊗ K t

i (θ) − I ti ‖)2, (6)

Fig. 3 shows the spatially varying blur kernel estimationprocess, by finding a proper kernel length from the motionfield. Since the estimated blur kernels may contain errors dueto the inaccuracy of the motion field and the local energyminimization function, we use the blur kernels of neighborsuperpixels to optimize the blur kernel for I t

i . We aim to findan optimal λ to smooth the kernels estimated using the originalmotion field as:

Esmo = min∑

I ti ∈�i

(‖K t

i − K ti ‖2 + λ

I ni ∈N(I t

i )

(‖K ti − K n

i ‖2)),

(7)

where K ti is the globally optimized blur kernel of K t

i for I ti

by minimizing Esmo. N(I ti ) is the set of adjacent superpixels

to I ti and λ is a smoothing factor to determine the influence

of neighbor kernels K ni . ‖ · ‖ denotes the L2-norm of two

blur kernels computed by the motion vectors. ‖K ti − K t

i ‖ canbe denoted as ‖Rt

i − Rti ‖, where R = (u0, v0, u1, v1) is the

motion field of K . Thus, Eq. 7 can be optimized as:

(E + λG)Rti = Rt

i , (8)

where E is the identity matrix and G is the Laplacian matrixcalculated by a graph with each superpixel as a node. In ourimplementation, we set λ = 0.5. While we should performthe optimization in every kernel estimation step in Eq. 6,the optimization has a high computational cost and may notimprove the performance. Hence, we optimize the kernels afterwe have obtained the motion field of a frame. During thekernel estimation step, we simply use the proper θ value toreduce the alignment error between f (I t

j ) and I ti . Fig. 4 shows

an example of a blur frame caused by camera shake and a car

Page 6: Temporal Coherence-based Deblurring Using Non-uniform ...download.xuebalib.com/xlTswdbWCi.pdf · Benxuan Zhang, and Enhua Wu, Member, IEEE Abstract—Non-uniform motion blur due to

4996 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 26, NO. 10, OCTOBER 2017

Fig. 4. Comparing our blur kernels with the homography-based blurkernels [3], [7]. (a) shows a blurry frame. (b) shows the homography-basedblur kernels that mainly model the camera shake. (c) shows our blur kernelsthat can model the camera shake as well as the moving car on the left.

moving from right to left. Fig. 4(b) shows that homography-based methods [3], [7] fail to approximate the car motion,as they use a homography matrix with 8 degrees of freedomto build the motion model. On the other hand, Fig. 4(c)shows that our method can approximate motion blur causedby camera shake and moving objects well.

D. PatchMatch-Based Frame Deblurring

PatchMatch [10] and its generalized algorithm [11] areknown to be efficient in finding similar patches for imagecompletion, retargeting and reshuffling. However, the pixel-wise global search is not effective for our superpixel-basedsearch problem. The PatchMatch Filter Algorithm (PMF) [31]involves a search strategy based on superpixels. It first buildsan adjacency graph for the input image, where each superpixelis a graph node and an edge connects two adjacent superpixels.Similar to PatchMatch, the PMF iterates two search strategies,i.e., neighborhood propagation and random search. However,PMF improves the efficiency by replacing the pixelwise searchwith superpixelwise search. It does not consider spatial andtemporal coherences nor motion blur. To address these limita-tions, we present our PatchMatch-based search strategy basedon the original PatchMatch algorithm [10] to search for asharp superpixel to replace a blurry one using our estimatedmotion model. Given a superpixel, we adapt the PatchMatchalgorithm to search for an irregular-shaped region, rather thana fixed squared region, based on a new distance function thatconsiders color, spatial and temporal constraints, as shownin Eq. 9 to 12. After the PatchMatch initialization step, weiterate through the propagation step and the random searchstep. However, instead of performing the random search stepon the whole frames, we confine the search to a small regionto save time. Our argument is that most of the best matchingsuperpixels are near to the original superpixels, as shown inFig. 5. We elaborate this idea as follows.

To find the most adequate sharp superpixel and maintainspatial coherence, we propose a distance function for twosuperpixels, instead of SSD (sum of squared differences) inthe original PatchMatch algorithm, as:

Edata = Ecolor + Etemporal + Espat ial, (9)

where Ecolor is the distance between two patches. Etemporal

and Espat ial enforce temporal and spatial coherences in patchmatching.

Given superpixel I ti , we find the nearest-neighbor field from

candidate frame I j : j ∈ Wi by the specified rotation angle,scale size, and the length of the blur kernel. I t

j (x, y, r, s, k)represents the transformation result with offset values alongthe x-axis, the y-axis, rotation, scale. k represents the length ofthe blur kernel. It is applied to the pixel of the transformationresult obtained by the generalized PatchMatch algorithm withsearch space (x, y, r, s). Here, we add the motion vector of I t

ito the superpixel to get the corresponding superpixel (of thesame shape), I t

j , in I j . The color term Ecolor is then expressedas:

Ecolor = ‖I ti − I t

j (x, y, r, s, k)‖2. (10)

We assume that the motion field can approximately repre-sent the location of the sharp superpixel. Thus, we apply themotion field constraint between blurry superpixel I t

i and sharpsuperpixel I t

j , and define the temporal term as:

Etemporal =√

(xt − uti, j )

2 + (yt − v ti, j )

2, (11)

where (xt , yt ) is the offset value of superpixel I ti along

x-axis and the y-axis. (uti, j , v

ti, j ) is the motion vector from I t

ito I t

j . Therefore, we minimize the temporal error to maintaintemporal coherence when searching for a sharp superpixel.

We also consider the neighbor superpixels and use theirmotion vectors to find a sharp superpixel, which is similarto the spatial smoothness regularization term [32]. The ideais that I t

i and its neighbor superpixels should have similarmotion vectors, due to spatial coherence. Thus, we apply themotion field constraint between candidate superpixel I t

i andits neighbor superpixels I n

i ∈ N(I ti ). Our spatial term can be

optimized as:

Espat ial =∑

n∈N(I ti )

√(xt − un

i, j )2 + (yt − vn

i, j )2, (12)

Like the motion model estimation step in Section III-C,we use the motion vector to determine the initial searchposition of all neighbor frames in the PatchMatch initializationstep. The initial values for rotation and scale are set to zero.We compute an initial blur kernel for each superpixel inSection III-C. This initialization step can help find propersuperpixels within a few iterations.

A random search for superpixel I ti is performed during each

PatchMatch iteration. The original algorithm [11] randomlyassigns locations to all possible positions. Since our experi-ments in Fig. 5 show that most superpixels can be found in alimited local window with a small range of rotation and scalevalues, we use a w × w spatial window centered at the initialpixel position in each superpixel as the search window. In ourimplementation, we set w = 15, which means that the searchwindow for each pixel ranged from −7 to 7. We set the rotationangle from − 1

20π to 120π and the scale size from 0.8 to 1.25.

for the blur kernels, we vary the length ratio of the initialblur kernels from 0.8 to 1.2. We can then obtain the rotationangle, scale size, blur kernel, and position from neighborsharp frames by minimizing Edata(for all j ∈ Wi and α >0.8, αt > 0.85) for each blurry superpixel.

Page 7: Temporal Coherence-based Deblurring Using Non-uniform ...download.xuebalib.com/xlTswdbWCi.pdf · Benxuan Zhang, and Enhua Wu, Member, IEEE Abstract—Non-uniform motion blur due to

QIAO et al.: TEMPORAL COHERENCE-BASED DEBLURRING USING NON-UNIFORM MOTION OPTIMIZATION 4997

Fig. 5. The best matching nearest-neighbor field found in all frames of a video sequence. (a) shows the offsets in the x-axis and y-axis, (b) shows the bestscale size. (c) shows the best rotation angle. The dotted line in each diagram is chosen as the search window in our implementation. Note that most of thebest matching superpixels can be found within this search window.

Fig. 6. Blending strategies. (a) shows two adjacent superpixels. (b) shows theenlargement of the adjacent superpixels to overlap each other. (c) shows theblending strategy of a pixel by computing the distances to the two superpixels.

Simply replacing the blurry superpixels with the matchedsharp superpixels inevitably introduces noise or seams, espe-cially around the boundaries of a moving object. Traditionalmethods [33] solve this problem based on blending. Theysample their patches densely to ensure that neighbor patchesoverlap with one another, and calculate a weighted averageof the overlapping pixels. However, such blending introducesblending blur, which is the artifact that we want to remove.To address this problem, we apply texture synthesis to com-bine the overlapping shape superpixels at each pixel, usingtheir spatial distances as weights. Suppose that I t

i,p denotesa pixel p in superpixel I t

i , and I ni ∈ (I t

i ) is a neighborsuperpixel. We assume that the minimum spatial Euclideandistance between I t

i,p and I ni is dn . We sample the pixels of

all neighbor superpixels and reconstruct the new pixel as:

Lti,p =

∑n∈N(t)(e

−0.1dn I ni,p) + e−0.1dt I t

i,p∑n∈N(t) e−0.1dn + e−0.1dt

, (13)

where I ni,p is the pixel on the enlarged region of superpixel

I ni at p. Thus, p is in superpixel I t

i and in the enlarged regionof I n

i . We calculate the nearest spatial distance dt from p toI ti , and dn from p to I n

i , as shown in Fig. 6. The final blendedpixel is a weighted average of superpixel I t

i and all its neighborsuperpixels on pixel p. We set the spatial distance dt to 0, asp is on I t

i . We only consider the neighbor superpixels close top and set dn = ∞ if dn is larger than 1

2w. This is because thelarger dn is, the greater the difference between the value of I t

iand that of the enlarged region of I n

i if superpixels I ti and I n

iare obtain from different sharp frames. This approach producessmooth blended results and avoids blending blur.

Once a blurry superpixel is sharpened, we update its sharp-ness value. As a result, the sharpness of the frame mayalso increase. If it is above the sharpness threshold, it maythen be used to reconstruct other blurry frames.

IV. IMPLEMENTATION ISSUES

To speed up the proposed deblurring algorithm, we consid-ered the following issues.

A. Spreading of Sharp Superpixels

As the order of deblurring video frames may affect process-ing time, Cho et al. [7] process the sharpest frames first. Whilethis quickly produces frames with high sharpness values, itmay not always be easy to find sharp superpixels if sharpframes appear only sparsely in the video.

To quickly spread sharp superpixels, we order the videoframes by the number of sharp neighbor frames that they have.The deblurring algorithm will be performed first on the onewith the highest number of neighbor frames in its temporalwindow. After a frame has been deblurred, its neighbor framesmay now have a different number of sharp neighbor framesand we update the sorted list to reflect this situation. Thisapproach is to ensure that we process the frames that are morelikely to find good matches first.

B. GPU-Based Acceleration

Since all deblurring steps are performed on individual super-pixels, they can be easily parallelized to run on GPUs. Exceptfor the optical flow estimation step [29], we have implementedall other steps based on OpenGL Shading Language (GLSL).

1) Segmentation: The original SLIC requires several itera-tions to find the cluster centers and to assign a cluster label foreach pixel. This is difficult to perform in parallel on GPUs,as this assignment step is done iteratively based on the clustercenters. To address this problem, we treat each pixel as anindependent unit and try to find a cluster center for it, insteadof finding what pixels belong to a cluster. In addition, dueto temporal coherence, we can transfer the cluster centersand labels in one frame as input to the next to improveefficiency.

Page 8: Temporal Coherence-based Deblurring Using Non-uniform ...download.xuebalib.com/xlTswdbWCi.pdf · Benxuan Zhang, and Enhua Wu, Member, IEEE Abstract—Non-uniform motion blur due to

4998 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 26, NO. 10, OCTOBER 2017

Fig. 7. Comparing the homography-based model with the proposed model asin Fig. 4. (a) Deblurring result using homography-based blur kernels similarto [7]. (b) Deblurring result using our blur model. (c)–(e) are magnified viewsof the input image, the deblurring result using homography-based blur kernels,and the deblurring result using our blur model.

Fig. 8. Comparing different image segmentation approaches for deblurring.(a) Deblurring result using regular patch-based representation. (b) Deblurringresult using SLIC superpixels. (c) Deblurring result using our motion-basedSLIC superpixels. Note that the front wheel is much smoother.

2) Kernel Estimation and Patch Matching: As these stepsare applied to each superpixel, they can be performed after thesegmentation step. However, the global smoothing step (Eq. 8)needs to solve a linear system and thus run on the CPU,before assigning tasks to GPUs. We compute all values of θ ,i.e., from 0 to 0.5 with increments of 0.05, in the GPU to findthe proper kernel length for each superpixel, which is used asthe initial value for superpixel matching. In superpixel match-ing, we apply the jump flood scheme [34] with 4 neighborsin the propagation step, as in [10], on a spatial window ofw × w. In the random searching step of patch matching, weneed to generate random numbers. However, as GLSL doesnot provide a random number generator, we use Poisson noiseto synthesize random numbers.

Fig. 9. Comparing different deblurring approaches on a bicycle video frame.From top to bottom: input frame, followed by results of [7], [12], [14][15], [27], and [28] and our method. (a) Input video frames. (b) Enlargedview of “biker.”

V. EXPERIMENTAL RESULTS

We have implemented the proposed method in C++ andtested it on a PC with an i3 3.4 GHz CPU. Using only a singlethread, it takes about 10 minutes to deblur a 800 × 600 videoframe. We have also tested it on the same PC with a GeForce

Page 9: Temporal Coherence-based Deblurring Using Non-uniform ...download.xuebalib.com/xlTswdbWCi.pdf · Benxuan Zhang, and Enhua Wu, Member, IEEE Abstract—Non-uniform motion blur due to

QIAO et al.: TEMPORAL COHERENCE-BASED DEBLURRING USING NON-UNIFORM MOTION OPTIMIZATION 4999

Fig. 10. Comparing different deblurring approaches on a car video frame. (a) Input frame. (b) Deblurred frame using our method, (c) to (j) show themagnified views of the input frame, followed by results of [7], [12], [14], [15], [27], and [28] and our method. Note that although the background of (g) issharper, the moving car is actually more blur than (a), since [7] is unable to correctly handle moving objects.

GTX 750Ti graphics card. The computation time reduces to20 seconds. In this section, we evaluate the proposed methodin detail.

A. Deblurring Model Testing

We first compare the proposed motion model with thehomography-based model [3], [7]. Homography-based motionmodels may produce good deblurred results for blurs causedby camera motions. As shown in Fig. 4(b), it fails to esti-mate object motion, i.e., the moving car. Fig. 7 comparesthe two results. Although the motion blur appearing at thebackground, caused by camera shake, are sharpened in bothFig. 7(c) and 7(e), the homography-based method fails torecover the motion of the car correctly. Hence, it also fails tofind suitable patches to deblur the moving car. On the contrary,our result, which is obtained using a spatial varying motionmodel, is more convincing.

We have also studied the effects of using different segmenta-tion methods in our deblurring model. As regular patch-based

representation divides each image regularly, it does not handleobject segmentation well, and may fail in motion estimationand in patch matching along object boundaries as shown inFig. 8(a). The superpixel-based segmentation methods shownin Fig. 8(b) and 8(c) produce better segmentation, which helpsimprove the accuracy of superpixel matching. On the otherhand, as our method considers object motion, it producesmore consistent segmentation across frames. Hence, our recon-structed frame preserves the object structure, e.g., the frontwheel, better.

B. Visual Evaluation

We compare the proposed method with differents methods,including single-image deblurring methods using a uniformblur kernel [12], [35] and using a non-uniform blur ker-nel [14], [15], video deblurring method [7] and two state-of-art method [28], [27]. As we do not have the code for[7] and [27], we can only compare with their results inFig. 9 and 10.

Page 10: Temporal Coherence-based Deblurring Using Non-uniform ...download.xuebalib.com/xlTswdbWCi.pdf · Benxuan Zhang, and Enhua Wu, Member, IEEE Abstract—Non-uniform motion blur due to

5000 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 26, NO. 10, OCTOBER 2017

From Fig. 9 and 10, we observe that the results from [12]contain obvious artifacts, as it applies a uniform blur kernelto restore the frames. Although a non-uniform blur kernel isable to handle camera shake, the results from [14] and [15]have ringing artifacts. The complex motion blur increasesthe difficulty of kernel estimation. The image backgroundfrom [7] is much better than those from single-image deblur-ring methods. Like ours, [7] is a patch-based video deblurringmethod. However, it cannot handle moving objects and mayfail to find proper sharp patches for moving objects using thehomography-based motion model. Thus, the blended result ofthese mismatched patches may not show any improvementor may sometimes be worse than the original (see the carin Fig. 10(g)). In addition, as it segments the image regu-larly, it increases the chance of getting mismatches along themoving object boundaries, which may destroy the structure ofthe moving object, as shown in the fifth row of Fig. 9(b).Although the shock filter is used to sharpen the patches,the frame quality is only slightly improved. The proposedmotion deblurring model uses appropriate blur kernels to blurthe sharp superpixels found before superpixel matching. Thishelps find suitable sharp superpixels more accurately, as shownin Fig. 10(j). Comparing [28] with the proposed method,our method shows shaper results. Although [27] producesbetter background details, the video still exhibits motion blur.Further, in Fig. 10, while the tire is clearer, the structure iswrong.

In Fig. 11, we compare the proposed method with state-of-the-art single-image deblurring methods, which use uni-form [12] and non-uniform [14], [15] motion models, and thepixel-wise deblurring method [28]. While deconvolution witha uniform blur kernel may improve the quality of the imagebackground to a certain extent, it introduces severe ringingartifacts on the moving objects, as shown in the second row ofFig. 11. Although non-uniform motion models [14], [15] mayproduce fewer ringing artifacts, it is very difficult to recoveraccurate blur kernels from a single image for deconvolution.The proposed method produces better result by avoiding thisdeconvolution step.

Fig. 12, 13 and 14 show video frames involving significantblurs caused by fast moving objects, while the blackgroundsare blurred in different degrees. In Fig. 12, most methods donot perform very well. Our method also fails to deblur thecyclist/bicycle because they are blurred throughout the video.However, it can recover the building, in particular the window,much better than the other methods. In Fig. 13, all othermethods produce obvious artifacts in the car wheels, while ourperforms very well in particular the text on the car. However,our method fails to deblur the background as the backgroundis blurred throughout the video clip. In Fig. 14, our methodcan deblur both the foreground cyclist (the text on the bodyin particular) and the background very well, compared to allthe other methods.

Images or videos captured by image sensors may be noisywhen filmed under a dim environment. Most deconvolution-based image deblurring methods are sensitive to image noise.As our method is deconvolution-free, it can avoid such prob-lems. Fig. 15 and 16 demonstrate this feature. Although [35]

Fig. 11. Deblurring a car-bicycle frame. From top to bottom: the input frame,followed by results of [12], [14], [15], and [28] and our method.

handle noise in deblurring and [12] is robust to noise, theyproduce ringing artifacts. Fig. 16 is also noisy. As a result,their outputs may appear worse than the inputs. The proposedmethod, on the other hand, is able to deblur the input framesin both situations.

C. Quality Assessment Evaluation

We evaluate the deblured results from [7], [12], [15], [27],and [28] and the proposed method using two no-reference image quality assessment methods, NIQE [36] andBRISQUE [37]. These two methods perform better than mostlatest methods and are correlated with human perception [38].NIQE mainly extracts highly sharp part of an image bya spatial domain natural scene statistic model, which can

Page 11: Temporal Coherence-based Deblurring Using Non-uniform ...download.xuebalib.com/xlTswdbWCi.pdf · Benxuan Zhang, and Enhua Wu, Member, IEEE Abstract—Non-uniform motion blur due to

QIAO et al.: TEMPORAL COHERENCE-BASED DEBLURRING USING NON-UNIFORM MOTION OPTIMIZATION 5001

Fig. 12. Deblurring of a bicycle frame. (a) Input frame. (b) Magnified view of the input frame. (c) Result of [12]. (d) Result of [15]. (e) Result of [28].(f) Result of our method.

Fig. 13. Deblurring a fast car frame. (a) Input frame. (b) Result of [12].(c) Result of [15]. (d) Result of [28]. (e) Result of our method.

Fig. 14. Deblurring another bicycle frame. (a) Input frame. (b) Result of [12].(c) Result of [15]. (d) Result of [28]. (e) Result of our method.

evaluate the image quality without training. On the contrary,BRISQUE is a machine learning based approach independentof database content. It uses the statistics of normalized localluminance coefficients under natural scenes and can evaluateimage quality holistically without requiring that the usersupply specific features such as ringing, blur or blocking inadvance.

Table I shows the average image quality of the deblur-ring results. Our method outperforms the other three video

TABLE I

QUALITY ASSESSMENT OF RESULTS IN FIGS. 9 AND 10, USING NIQE [36]AND BRISQUE [37]. (LARGER VALUES INDICATE LOWER QUALITY,

AND NUMBERS IN BOLD INDICATE THE BEST RESULTS)

TABLE II

QUALITY ASSESSMENT OF RESULTS IN FIG. 11–16, USING NIQE [36]AND BRISQUE [37]. (LARGER VALUES INDICATE LOWER QUALITY,

AND NUMBERS IN BOLD INDICATE THE BEST RESULTS

deblurring methods [7], [27], and [28] on the first three rows.[27] outperforms ours in the last row. Table II shows thatour method performs mostly better than the image deblurringmethods [12] and [15] on both image quality assessmentmetrics. It also outperforms the other methods on six testcases, while [28] outperforms other methods on four testcases. Although image deconvolution helps improve imagequality, it introduces the ringing artifact, which affects the

Page 12: Temporal Coherence-based Deblurring Using Non-uniform ...download.xuebalib.com/xlTswdbWCi.pdf · Benxuan Zhang, and Enhua Wu, Member, IEEE Abstract—Non-uniform motion blur due to

5002 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 26, NO. 10, OCTOBER 2017

Fig. 15. Deblurring a blurry frame with noise. (a) Input frame. (b) Resultfrom a non-uniform blur kernel with noise handling [35]. (c) Result from [12].(d) Result from our method.

performances of [12] and [15] on quality assessment metrics.However, for Fig. 12, [12] and [15] can restore sharp silhou-ettes with few ringing artifacts. On the other hand, our methodcan only reconstruct a frame to one as sharp as other framesin the video can provide. As this entire video is blurred, ourmethod performs worse than [12] and [15] on BRISQUE. ForFig. 13, while our method is able to reconstruct the fast car,

Fig. 16. Deblurring a blurry frame with noise. (a) Input frame. (b) Resultfrom a non-uniform blur kernel with noise handling [35]. (c) Result from [12].(d) Result from our method.

the background is nearly unchanged. On the other hand,[12] produces a slightly sharper background. Hence, it yieldeda slightly better performance on both NIQE and BRISQUE.It is interesting to note that [12] and [15] perform evenworse than the input video on Fig. 14. This is because thebackground in this video was already sharp enough. Applyingimage deconvolution to these sharp regions lead to ringing

Page 13: Temporal Coherence-based Deblurring Using Non-uniform ...download.xuebalib.com/xlTswdbWCi.pdf · Benxuan Zhang, and Enhua Wu, Member, IEEE Abstract—Non-uniform motion blur due to

QIAO et al.: TEMPORAL COHERENCE-BASED DEBLURRING USING NON-UNIFORM MOTION OPTIMIZATION 5003

TABLE III

AVERAGE COMPUTATION TIME PER FRAME (SEC) FOR DIFFERENT METHODS. THE RIGHT HALF SHOWS VARIATIONS OF OUR METHOD

artifacts, which affect the performance of these methods onthe two metrics.

D. Computation Time Evaluation

We have compared the average computation time per frameof each video sequence using different deblurring methods, asshown in Table III. Reference [12] is the most efficient method,as it only estimates a global blur kernel for the input image.Although a uniform blur kernel is estimated for deconvolution,[35] needs to apply a series of directional filters to refine theblur kernel, resulting in longer computation time. Both [15]and [14] require higher computation times in order to estimatenon-uniform blur kernels. The CPU implementation of ourmethod is also slow. However, the GPU implementation ofour method has a speed up of roughly 25 times. We havefurther compared the computational times of the GPU imple-mentation by considering the speedup technique discussed inSection IV-A. As shown in the last column of Table III, it takesat least double the time to process a video in most situationswhen using the frame ordering technique proposed in [7].

VI. CONCLUSION

In this paper, we have proposed a structure-preservingvideo deblurring algorithm that uses irregular motion-basedsegmentation to estimate a non-uniform motion model andperform superpixel deblurring. The proposed method assumesthat sharp frames or superpixels are sparsely spread in videosequence and uses sharp superpixels to reconstruct blurry ones.The superpixels are essential for effectively distinguishingmoving objects from the background and are more robust thanregular patches employed by previous methods. Experimentalresults demonstrate that the proposed method can reconstructblurry frames on both static and moving objects while main-taining spatial and temporal coherence without introducingartifacts.

In general, our method fails if no sharp superpixels canbe found from other frames of the video. In future work,we intend to address this limitation. One possible approachthat we are looking at is to apply deconvolution on framesto recover blur kernels more reliably to form the initial sharpframes.

ACKNOWLEDGMENTS

The authors would like to thank all reviewers for theirhelpful suggestions and constructive comments.

REFERENCES

[1] Y.-W. Tai, H. Du, M. S. Brown, and S. Lin, “Image/video deblurringusing a hybrid camera,” in Proc. IEEE CVPR, Jun. 2008, pp. 1–8.

[2] H. Zhang, D. Wipf, and Y. Zhang, “Multi-image blind deblurring usinga coupled adaptive sparse prior,” in Proc. IEEE CVPR, Jun. 2013,pp. 1051–1058.

[3] Y. Li, S. B. Kang, N. Joshi, S. Seitz, and D. Huttenlocher, “Generatingsharp panoramas from motion-blurred videos,” in Proc. IEEE CVPR,Jun. 2010, pp. 2424–2431.

[4] M. Onogi and H. Saito, “Mosaicing and restoration from blurredimage sequence taken with moving camera,” in Proc. ICAPR, 2005,pp. 598–607.

[5] S. Cho, Y. Matsushita, and S. Lee, “Removing non-uniform motion blurfrom images,” in Proc. IEEE ICCV, Oct. 2007, pp. 1–8.

[6] D.-B. Lee, S.-C. Jeong, Y.-G. Lee, and B. C. Song, “Video deblurringalgorithm using accurate blur kernel estimation and residual deconvo-lution based on a blurred-unblurred frame pair,” IEEE Trans. ImageProcess., vol. 22, no. 3, pp. 926–940, Mar. 2013.

[7] S. Cho, J. Wang, and S. Lee, “Video deblurring for hand-held camerasusing patch-based synthesis,” ACM Trans. Graph., vol. 31, no. 4,Jul. 2012, Art. no. 64.

[8] Z. Liu, L. Yuan, X. Tang, M. Uyttendaele, and J. Sun, “Fast burst imagesdenoising,” ACM Trans. Graph., vol. 33, no. 6, Nov. 2014, Art. no. 232.

[9] S. Liu, L. Yuan, P. Tan, and J. Sun, “Bundled camera paths for videostabilization,” ACM Trans. Graph., vol. 32, no. 4, Jul. 2013, Art. no. 78.

[10] C. Barnes, E. Shechtman, A. Finkelstein, and D. B. Goldman,“PatchMatch: A randomized correspondence algorithm for structuralimage editing,” ACM Trans. Graph., vol. 28, no. 3, Aug. 2009,Art. no. 24.

[11] C. Barnes, E. Shechtman, D. Goldman, and A. Finkelstein, “The gen-eralized patchmatch correspondence algorithm,” in Proc. ECCV, 2010,pp. 29–43.

[12] L. Xu and J. Jia, “Two-phase kernel estimation for robust motiondeblurring,” in Proc. ECCV, 2010, pp. 157–170.

[13] M. Ben-Ezra and S. Nayar, “Motion deblurring using hybrid imaging,”in Proc. IEEE CVPR, Jun. 2003, pp. I-657–I-664.

[14] O. Whyte, J. Sivic, A. Zisserman, and J. Ponce, “Non-uniform deblurringfor shaken images,” Int. J. Comput. Vis., vol. 98, no. 2, pp. 168–186,2012.

[15] L. Xu, S. Zheng, and J. Jia, “Unnatural L0 sparse representationfor natural image deblurring,” in Proc. IEEE CVPR, Jun. 2013,pp. 1107–1114.

[16] A. Levin, “Blind motion deblurring using image statistics,” in Proc.NIPS, 2006, pp. 841–848.

[17] Q. Shan, W. Xiong, and J. Jia, “Rotational motion deblurring of a rigidobject from a single image,” in Proc. IEEE ICCV, Oct. 2007, pp. 1–8.

[18] S. Dai and Y. Wu, “Motion from blur,” in Proc. IEEE CVPR, Jun. 2008,pp. 1–8.

[19] T. H. Kim and K. M. Lee, “Segmentation-free dynamic scene deblur-ring,” in Proc. IEEE CVPR, Jun. 2014, pp. 2766–2773.

Page 14: Temporal Coherence-based Deblurring Using Non-uniform ...download.xuebalib.com/xlTswdbWCi.pdf · Benxuan Zhang, and Enhua Wu, Member, IEEE Abstract—Non-uniform motion blur due to

5004 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 26, NO. 10, OCTOBER 2017

[20] A. Rav-Acha and S. Peleg, “Two motion-blurred images are better thanone,” Pattern Recognit. Lett., vol. 26, no. 3, pp. 311–317, Feb. 2005.

[21] J. Chen, L. Yuan, C.-K. Tang, and L. Quan, “Robust dual motiondeblurring,” in Proc. IEEE CVPR, Jun. 2008, pp. 1–8.

[22] F. Sroubek and P. Milanfar, “Robust multichannel blind deconvolutionvia fast alternating minimization,” IEEE Trans. Image Process., vol. 21,no. 4, pp. 1687–1700, Apr. 2012.

[23] Y. Matsushita, E. Ofek, W. Ge, X. Tang, and H.-Y. Shum, “Full-framevideo stabilization with motion inpainting,” IEEE Trans. Pattern Anal.Mach. Intell., vol. 28, no. 7, pp. 1150–1163, Jul. 2006.

[24] X. C. He, T. Luo, S. C. Yuk, K. P. Chow, K.-Y. K. Wong, andR. H. Y. Chung, “Motion estimation method for blurred videos andapplication of deblurring with spatially varying blur kernels,” in Proc.ICCIT, Nov. 2010, pp. 355–359.

[25] H. Takeda and P. Milanfar, “Removing motion blur with space–time processing,” IEEE Trans. Image Process., vol. 20, no. 10,pp. 2990–3000, Oct. 2011.

[26] J. Wulff and M. J. Black, “Modeling blurred video with layers,” in Proc.IEEE CVPR, Sep. 2014, pp. 236–252.

[27] M. Delbracio and G. Sapiro, “Hand-held video deblurring via effi-cient Fourier aggregation,” IEEE Trans. Comput. Imag., vol. 1, no. 4,pp. 270–283, Dec. 2015.

[28] T. H. Kim and K. M. Lee, “Generalized video deblurring for dynamicscenes,” in Proc. IEEE CVPR, Jun. 2015, pp. 5426–5434.

[29] L. Bao, Q. Yang, and H. Jin, “Fast edge-preserving patchmatch for largedisplacement optical flow,” IEEE Trans. Image Process., vol. 23, no. 12,pp. 4996–5006, Dec. 2014.

[30] R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, and S. Süsstrunk,“SLIC superpixels compared to state-of-the-art superpixel methods,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 34, no. 11, pp. 2274–2282,Nov. 2012.

[31] J. Lu, H. Yang, D. Min, and M. N. Do, “Patch match filter: Efficientedge-aware filtering meets randomized search for fast correspondencefield estimation,” in Proc. IEEE CVPR, Jun. 2013, pp. 1854–1861.

[32] T. Portz, L. Zhang, and H. Jiang, “Optical flow in the presenceof spatially-varying motion blur,” in Proc. IEEE CVPR, Jun. 2012,pp. 1752–1759.

[33] V. Kwatra, I. Essa, A. Bobick, and N. Kwatra, “Texture optimizationfor example-based synthesis,” ACM Trans. Graph., vol. 24, no. 3,pp. 795–802, 2005.

[34] G. Rong and T. Tan, “Jump flooding in GPU with applications to Voronoidiagram and distance transform,” in Proc. ACM I3D, 2006, pp. 109–116.

[35] L. Zhong, S. Cho, D. Metaxas, S. Paris, and J. Wang, “Handling noise insingle image deblurring using directional filters,” in Proc. IEEE CVPR,Jun. 2013, pp. 612–619.

[36] A. Mittal, R. Soundararajan, and A. C. Bovik, “Making a ‘completelyblind’ image quality analyzer,” IEEE Signal Process. Lett., vol. 20, no. 3,pp. 209–212, Mar. 2013.

[37] A. Mittal, A. K. Moorthy, and A. C. Bovik, “No-reference image qualityassessment in the spatial domain,” IEEE Trans. Image Process., vol. 21,no. 12, pp. 4695–4708, Dec. 2012.

[38] A. Nouri, C. Charrier, A. Saadane, and C. Fernandez-Maloigne, “Statis-tical comparison of no-reference images quality assessment algorithms,”in Proc. IEEE CVCS, Sep. 2013, pp. 1–5.

Congbin Qiao received the B.Eng. degree fromthe Software College, Shandong University. He iscurrently a Graduate Student with the Departmentof Computer Science and Engineering, ShanghaiJiao Tong University. His research interests includeimage/video processing.

Rynson W. H. Lau received the Ph.D. degreefrom the University of Cambridge. He was on theFaculty of Durham University and The Hong KongPolytechnic University. He is currently with the CityUniversity of Hong Kong. His research interestsinclude computer graphics and computer vision.In addition, he has also served in the committeeof a number of conferences, including a ProgramCo-Chair of ACM VRST 2004, ACM MTDL 2009,and the IEEE U-Media 2010, and a ConferenceCo-Chair of CASA 2005, ACM VRST 2005,

ICWL 2007, ACM MDI 2009, ACM VRST 2010, and ACM VRST 2014.He serves on the Editorial Board of Computer Graphics Forum and ComputerAnimation and Virtual Worlds. He has served as the Guest Editor of a numberof journal special issues, including the ACM Transactions on Internet Technol-ogy, the IEEE MULTIMEDIA, the IEEE TRANSACTIONS ON MULTIMEDIA,the IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS,and the IEEE COMPUTER GRAPHICS AND APPLICATIONS.

Bin Sheng received the B.A. degree in Englishand the B.E. degree in computer science from theHuazhong University of Science and Technologyin 2004, the M.S. degree in software engineeringfrom the University of Macau in 2007, and thePh.D. degree in computer science from The ChineseUniversity of Hong Kong in 2011. He is currentlyan Associate Professor with the Department of Com-puter Science and Engineering, Shanghai Jiao TongUniversity. His research interests include virtual real-ity and computer graphics. He is an Associate Editorof IET Image Processing.

Benxuan Zhang received the B.Eng. degree incomputer science from Shanghai Jiao Tong Univer-sity. He is currently a Graduate Student with theDepartment of Computer Science and Engineering,Shanghai Jiao Tong University. His research interestsinclude image processing and deep learning.

Enhua Wu (M’12) received the B.S. degree fromTsinghua University in 1970 and the Ph.D. degreefrom The University of Manchester, U.K., in 1984.He is currently a Research Professor with theState Key Laboratory of Computer Science, ChineseAcademy of Sciences. He has also been teachingwith the University of Macau, where he is currentlyan Emeritus Professor. His research interests includerealistic image synthesis, physically-based simula-tion, and virtual reality. He is a member of ACM anda fellow of the China Computer Federation. He has

been invited to chair or co-chair various conferences or program committees,including Siggraph Asia 2016, ACM VRST 2010, 2013, and 2015, and WSCG2012. He is an Associate Editor-in-Chief of the Journal of Computer Scienceand Technology, and the Editorial Board Member of TVC, JVI, CAVW, IJIG,and IJSI.

Page 15: Temporal Coherence-based Deblurring Using Non-uniform ...download.xuebalib.com/xlTswdbWCi.pdf · Benxuan Zhang, and Enhua Wu, Member, IEEE Abstract—Non-uniform motion blur due to

本文献由“学霸图书馆-文献云下载”收集自网络,仅供学习交流使用。

学霸图书馆(www.xuebalib.com)是一个“整合众多图书馆数据库资源,

提供一站式文献检索和下载服务”的24 小时在线不限IP

图书馆。

图书馆致力于便利、促进学习与科研,提供最强文献下载服务。

图书馆导航:

图书馆首页 文献云下载 图书馆入口 外文数据库大全 疑难文献辅助工具