improving video stabilization using multi-resolution mser features

This article was downloaded by: [Selcuk Universitesi]On: 21 December 2014, At: 14:10Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House,37-41 Mortimer Street, London W1T 3JH, UK

Click for updates

IETE Journal of ResearchPublication details, including instructions for authors and subscription information:http://www.tandfonline.com/loi/tijr20

Improving Video Stabilization Using Multi-ResolutionMSER FeaturesManish Okadea & Prabir Kumar Biswasb

a Department of Electronics and Communication Engineering, National Institute ofTechnology, Rourkela 769008, Indiab Department of Electronics and Electrical Communication Engineering, Indian Institute ofTechnology, Kharagpur 721302, IndiaPublished online: 14 Oct 2014.

To cite this article: Manish Okade & Prabir Kumar Biswas (2014) Improving Video Stabilization Using Multi-Resolution MSERFeatures, IETE Journal of Research, 60:5, 373-380, DOI: 10.1080/03772063.2014.962627

To link to this article: http://dx.doi.org/10.1080/03772063.2014.962627

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) containedin the publications on our platform. However, Taylor & Francis, our agents, and our licensors make norepresentations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of theContent. Any opinions and views expressed in this publication are the opinions and views of the authors, andare not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon andshould be independently verified with primary sources of information. Taylor and Francis shall not be liable forany losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoeveror howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use ofthe Content.

This article may be used for research, teaching, and private study purposes. Any substantial or systematicreproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in anyform to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

http://crossmark.crossref.org/dialog/?doi=10.1080/03772063.2014.962627&domain=pdf&date_stamp=2014-10-14

http://www.tandfonline.com/loi/tijr20

http://www.tandfonline.com/action/showCitFormats?doi=10.1080/03772063.2014.962627

http://dx.doi.org/10.1080/03772063.2014.962627

http://www.tandfonline.com/page/terms-and-conditions

http://www.tandfonline.com/page/terms-and-conditions

Improving Video Stabilization Using Multi-ResolutionMSER Features

Manish Okade1 and Prabir Kumar Biswas2

1Department of Electronics and Communication Engineering, National Institute of Technology, Rourkela 769008, India, 2Department ofElectronics and Electrical Communication Engineering, Indian Institute of Technology, Kharagpur 721302, India

ABSTRACT

In this paper, we investigate the application of multi-resolution maximally stable extremal region (MSER) features forimproving the video stabilization performance. MSER features have been used for many computer vision applications likewide baseline stereo, object recognition, video object tracking, and video stabilization with very good results as comparedto other features like scale invariant feature transform (SIFT) and Kanade Lucas Tomasi (KLT). However, a limitation of theMSER feature in the stabilization application was observed when the input video frames were severely blurred. The samelimitation was also observed when other features like KLT and SIFT were utilized under blurring conditions. In this paperwe propose to overcome this drawback for video stabilization application by utilizing MSERs which are extracted andmatched in a scale pyramid fashion instead of the MSER features detected and matched on a single image resolution.The duplicate MSERs resulting due to the pyramid style detection are removed followed by MSER feature matching forestablishing correspondence between video frames to estimate the global motion parameters. Once the global motionparameters are estimated, the accumulated transformation is smoothed followed by motion compensation to constructthe stabilized frame. Comparative analysis with state-of-the-art stabilization methods shows improvement in stabilizationperformance as well as robustness to blurring degradations. The proposed method can easily be ported to other featuredetectors like KLT and SIFT thereby making the proposed method generic to any feature detector.

Keywords:Feature matching, Global motion estimation, Motion compensation, MSER, Multi-resolution analysis, Video stabilization.

1. INTRODUCTION

Video stabilization is an important task in video proc-essing as hand-held cameras are extensively being uti-lized for a variety of modern day applications. Thesehand-held cameras are prone to hand jitters and plat-form vibrations which make the image frames appearjerky. Such videos have to be pre-processed otherwiseany further video processing stage (like object tracking)would fail. The goal of a good stabilization techniquewould be to align the image frames which have under-gone misalignment due to the factors listed. To achievethis goal the stabilization algorithms utilize three stepswhich form the stabilization pipeline: global motionestimation (GME), motion smoothing, and motion com-pensation. GME is a key step in this pipeline as itsaccuracy determines the stabilization performance.This paper concentrates on improving the performanceof this stage in the stabilization pipeline. GME deter-mines the movement of the camera and can be carriedout using either direct intensity based methods [1,2] orfeature based methods [3�8]. Advantages with directmethods are that they are more accurate but time con-suming as all the pixels participate in the GME process.On the other hand, feature based methods utilize image

features which are generally less in number therebyachieving faster estimations at the expense of loss inaccuracy for the global parameter estimation process.Once the global motion parameters are estimated, thenext step is to separate the desired motions (pan, tiltzoom) from the undesired motions (jitter, platformvibrations) which is accomplished using the motionsmoothing step. Finally, the stabilized frame is con-structed using the motion compensation step whichaccounts for the correction factor in performingframe alignment.

The literature in video stabilization is quite rich. A briefreview is given here. Research in the field of video sta-bilization has mainly focussed on exploring intensitybased techniques and feature based techniques for esti-mating the global (camera) motion parameters. In addi-tion, considerable study has been also done onexploring new techniques for the motion smoothingoperation. A robust and efficient two-dimensional (2D)affine GME algorithm based on phase correlation in theFourier�Mellin domain with least square model fittingof sparse motion vectors and its application for digitalimage stabilization were explored in [1]. A robustimage alignment algorithm based on refinement of the

IETE JOURNAL OF RESEARCH | VOL 60 | NO 5 | SEP�OCT 2014 373

Dow

nloa

ded

by [

Selc

uk U

nive

rsite

si]

at 1

4:10

21

Dec

embe

r 20

14

local motion vectors using a voting based approach tocalculate the global interframe motion for video stabili-zation purposes was explored in [2]. Corner matching[3], edge pattern matching [4], SIFT point matching[5,6], maximally stable extremal region (MSER) [9] fea-ture matching [7], and KLT feature tracking [8] aresome examples of features which have been used toestimate the global motion for video stabilization appli-cations. However, all these feature based methods havereported a drawback when the input video frames areseverely blurred. In the motion smoothing stage techni-ques like motion vector integration [10,11], frame posi-tion smoothing [12], Gaussian filtering [13], Kalmanfiltering [14], and extended Kalman filtering [15] havebeen used to separate out the desired motion from theundesired motion. In practice, the stabilized videoimages often do not overlap perfectly with the bound-ary of the desired image frame due to motion compen-sation operation and as a result missing regions areformed along with impairment in resolution. Mosaicingtechniques to fill the missing image regions were usedin [14,15]. A more comprehensive post-processing wasproposed in [13], where the combination of motioninpainting and image deblurring provided an effectiveenhancement on the quality of stabilized image sequen-ces. Presently, three-dimensional (3D) stabilization[8,16] is being explored in the research community. Thecamera motion estimation block is replaced by struc-ture from motion (SFM), as the first step in the stabiliza-tion pipeline. SFM is fundamentally a difficult problemin computer vision and is challenged in cases where thecamera pans without parallax, zooming cameras, andcameras having rolling shutter effects. In addition, SFMalso suffers from heavy computational time, thus mak-ing 3D stabilization impractical. However, recentlyLiu et al. [8] tried combining the positives of 2D stabili-zation and 3D stabilization by placing subspace con-straints on feature trajectories. They replaced the SFMstep of 3D stabilization by 2D method of tracking fea-ture points and then computed where the tracked fea-ture points should be located in the output and finallyrendered an output video that followed the featurepoint locations. However, even this method failedwhen the input frames were blurred as feature trajecto-ries could not be tracked due to discontinuities as aresult of dramatic reduction in factorization windowsthereby degrading stabilization performance. In addi-tion their method relied on 3D reconstruction whichwas error prone and time intensive.

In this paper a novel method for video stabilization thatis robust to image degradations and transformations ispresented. We extend our earlier work where singleresolution (traditional) MSERs were utilized for thevideo stabilization application [7]. Although we couldachieve robust stabilization using these region features

still a limitation was observed when severe blurringwas present in the input video frames. The traditionalMSER could not account for this blurring and as aresult the stabilization performance suffered in suchcases. Although we improved the stabilization perfor-mance using an auxiliary method in our previous work[17] yet the concept of using deblurring for improvingstabilization performance was naive as well as timeintensive. A better way would have been to improvethe sensitivity of the MSER features towards blurring.In this work we investigate this concept by applyingthe multi-resolution MSERs to the stabilization prob-lem. It was also reported in [18] that such multi-resolu-tion MSERs are capable of handling scale changes aswell as image blurring degradations. The data setused in this paper is available at http://www.facweb.iitkgp.ernet.in/~pkb/. The motivation for making theresearch results public is to make it useful for fellowresearchers.

2. KEY CONTRIBUTIONS

The focus in the current work is to analyse the useful-ness of using multi-resolution MSERs in the featureextraction and matching (GME) stage of the video stabi-lization pipeline. Extensive experiments conducted on anumber of real-life video sequences show that themulti-resolution MSERs are capable of achieving betterfeature matching as compared to using the single-reso-lution MSERs. It is also able to counter image degrada-tions like blurring present in the input video frameswhich is a major limitation for state-of-the-art featurebased video stabilization methods. Effectively, weshow that by utilizing multi-resolution analysis for fea-ture matching the performance of video stabilizationmethods can be improved with very good results.

3. CAMERA MODEL

2D affine camera model [6] with six parameters is usedin this study for estimating the camera motion. Themodel is given by

x 0

y 0

1

0@

1A ¼

m1 m2 m3

m4 m5 m6

0 0 1

0@

1A x

y1

0@

1A

where ðx; yÞ is the point on the reference frame andðx 0 ; y 0 Þ is the point on the frame which has been trans-formed as given by the six-parameter transformationmatrix m ¼ ½m1;m2; : : : ;m6�. Parameters m3 and m6 rep-resent translation while the rest of the parameters rep-resent rotation, zooming, and shearing effects. GMEestimates this matrix m between two consecutiveframes for the entire video sequence.

Okade M and Biswas P K: Improving Video Stabilization Using Multi-Resolution MSER Features

374 IETE JOURNAL OF RESEARCH | VOL 60 | NO 5 | SEP�OCT 2014

Dow

nloa

ded

by [

Selc

uk U

nive

rsite

si]

at 1

4:10

21

Dec

embe

r 20

14

http://www.facweb.iitkgp.ernet.in/~pkb/


4. PROPOSED METHOD

Figure 1 shows the video stabilization pipeline whichconsists of three stages: GME, motion smoothing, andmotion compensation. The novelty in this paper is toexplore multi-resolution analysis for feature extractionand matching in the GME stage and investigate theimprovement in stabilization performance obtainedusing the multi-resolution MSER features. The motionsmoothing and compensation remain same as in theprevious traditional stabilization schemes. Video cap-tured by hand-held camera which consists of jitteryhand motions and/or platform vibrations due to thecamera being in movement on a vehicle forms the inputto the stabilization pipeline. Due to the fast and irregu-lar movement of the camera the input frames areblurred. The GME stage uses MSER features extractedon a scale pyramid (multi-resolution) by taking oneoctave difference between each resolution. Thisimproves the accuracy of the feature extraction andmatching stage and provides a robust estimate of theglobal motion parameters in comparison to using thesingle-resolution MSER. Once the GME is performed,motion smoothing is carried out to retain the desiredmotions. Finally, motion compensation is performed toconstruct the stabilized frame. The detailed explanationof each block is given as under.

4.1 Global Motion Estimation (GME)

This block estimates the movement of the camera byutilizing the concept of feature extraction and matchingbetween successive frames of the input video. Our con-tribution in this paper is to explore the usefulness of uti-lizing multi-resolution analysis for GME as explainedbelow.

4.1.1 MSER Feature Extraction and Matchingon a Pyramid Level

This section describes in detail our contribution ofapplying multi-resolution MSERs for estimating theglobal motion parameters and analysing the advan-tages this technique offers in comparison to the single-resolution MSERs. The experiments that we have con-ducted using MSERs can also be adopted to any otherfeature like the KLT. The steps are as follows:

Step 1. A scale pyramid with one octave differencebetween scales is constructed for two successiveframes, i.e. frame k and frame (k + 1) separately.

Step 2. MSERs are separately detected at each resolu-tion of the pyramid for frame k as well as forframe (k + 1).

Step 3. Since Step 2 leads to duplicate MSERs as a resultof multi-resolution detection, these duplicateshave to be removed prior to feature matching orelse they would hamper the accuracy of featurematching. This is accomplished by removing thefine scale MSERs which have the same locationand size as that of the next detected coarser scaleMSER. The location criterion and area criterionused in this paper are listed in the “Results”section.

Step 4. These multi-resolution MSERs detected in Step 3are matched between frame k and frame (k + 1)and the transformation matrix m (i.e. globalmotion parameters) is calculated by model fit-ting. This shows as to how the camera hasmoved between the two frames, i.e. frame k andframe (k + 1).

Step 5. Steps 1�4 are repeated for the next set of framesto cover the entire video sequence.

The advantage of the proposed method of utilizingmulti-resolution MSERs over single-resolution (tradi-tional) MSERs is explained with a video sequencewhich has blurring degradations in the input frames.Analysis is also carried out to understand the effect ofblurring on stabilization performance. We takesequence 00016 as an example where subspace method[8] as well as our earlier work [7] of using single-resolu-tion MSER had failed due to severe blurring present inthis video. This sequence can be found at http://web.cecs.pdx.edu/fliu/project/subspace_stabilization/.

Figures 2 and 3 show the comparative scenario whentwo successive image frames are matched using the tra-ditional MSERs and the multi-resolution MSERs,respectively. As observed from Figure 3 the number ofcorrespondences found on the scenes using multi-reso-lution MSERs is more (i.e. 15 correspondences) in com-parison to Figure 2 where traditional (i.e. single-resolution) MSERs have been used for establishing

Global Motion Estimation

MotionSmoothingFeature

ExtractionFeature

Matching

MotionCompensationInput

VideoStabilized Video

Figure 1: System framework for video stabilization.

Figure 2: MSER correspondences between frame numbers#300 and #301 for sequence 00016. The ellipse marked ingreen represents the MSERs and the red line across thetwo frames represents the correspondences. Six corre-spondences are observed in this case.



Dow

nloa

ded

by [

Selc

uk U

nive

rsite

si]

at 1

4:10

21

Dec

embe

r 20

14

http://web.cecs.pdx.edu/fliu/project/subspace_stabilization/


correspondences (6 correspondences). We now investi-gate the reasons for the increase in the number of corre-spondences using multi-resolution MSER features.Since the detection of the MSER is done in a scale pyra-mid fashion (i.e. coarse to fine), the robustness to blur-ring increases as coarse scale MSERs are picked whichare less affected by blurring instead of fine scale MSERswhich generally get affected by blurring as observedfrom Figures 2 and 3. Increase in the number of corre-spondences improves the accuracy of the featurematching block, thereby improving the GME estimate.

Figure 2 shows MSER feature correspondences betweenframe numbers #300 and #301 of sequence 00016 usingthe single-resolution MSER. It is observed that the fea-tures are not localized over the entire image due tosevere blurring present on these two frames and more-over frame #301 shows decrease in the repeatability ofthe MSERs as a result of excessive blurring. Such a corre-spondence which is not localized over the entire imageframes would bias the GME leading to incorrect globalmotion parameters. This effect would get compoundedin cases where previous and future frames are alsoseverely blurred thereby leading to failure of stabiliza-tion techniques that employ feature matching or track-ing as an approach for GME. Figure 3 shows MSERfeature correspondences between frames #300 and #301of the same sequence 00016 but now using the multi-res-olution MSERs. As observed the MSER features are bet-ter localized over the entire image frame as compared tothe previous case. In addition, the repeatability of theseMSER features also increases as seen by the increase inthe number of correspondences (15 correspondences ascompared to 6 correspondences). This improves thecamera transformation matrix estimate m therebyimproving the accuracy of the feature matching block inthe stabilization pipeline.

4.2 Motion Smoothing and Compensation

The accumulative transform between frames is calcu-lated and a Gaussian filter is used to smooth the

transformation chain using Matsushita et al.’s [13]method as described below.

The transformation from a frame to the correspondingmotion compensated frame is directly computed usingonly the neighbouring transformation matrices. Theglobal transformation mt is smoothed using a Gaussianfilter by

St ¼Xi2Nt

mit$GðktÞ (1)

where Nt ¼ fj j t � kt � j� t þ ktg denotes the temporalneighbourhood of frame t, St is the smoothed correctivetransformation, and

GðktÞ ¼ 1ffiffiffiffiffiffiffi2P

psexp

�k2t2s2

� �(2)

is a Gaussian filter with s ¼ ffiffiffiffikt

p. The window size kt

varies over time and enables the smoothing operationto be adaptive to the different magnitudes of motionthat might exist in video sequences. The smoothness ofthe camera motion parameters can be controlled byvarying the window size kt. Finally, in the motion com-pensation stage, the original frame is shifted by anappropriate amount namely the correction vector so asto bring the frame back into alignment. Using thesmoothed corrective transformation matrices S0; : : : ; Stobtained in the motion smoothing stage, the stabilizedframe I 0t is constructed using

I 0t ItðStÞ (3)

Missing areas are formed opposite to the direction ofunwanted camera movement due to the motion com-pensation operation.

5. RESULTS

MATLAB R2010a is used on a Pentium dual core CPUrunning at 3 GHz with 2 GB RAM for experimentation.The location criterion used for removing duplicateMSERs is that the distance from the centroid must beless than six pixels in the finer scale as compared to thenext coarser scale in the scale pyramid. The area criterionused for removing duplicate MSERs is that if two regionareas r1 and r2 are given then jr1�r2j

maxðr1;r2Þ < 0:5. The choice ofthese two thresholds, i.e. area criterion and location crite-rion was made on experimental heuristics. The chosen val-ues worked well for our problem of video stabilization.However, the choice of these thresholds may be differentif these multi-resolution MSERs are used in some otherapplication. The data set used in this paper is available on

Figure 3: MSER correspondences between frame numbers#300 and #301 for sequence 00016. The ellipse marked ingreen represents the MSERs and the red line across thetwo frames represents the correspondences. Fifteen corre-spondences are observed in this case.



Dow

nloa

ded

by [

Selc

uk U

nive

rsite

si]

at 1

4:10

21

Dec

embe

r 20

14

http://www.facweb.iitkgp.ernet.in/~pkb/ for visualiza-tion to the reader and to researchers for use in their studyand experimentation with the motivation of making thedata set reproducible. The proposed method is comparedwith (1) particle filtering based motion estimation (PFME)method proposed by Yang et al. [6] which uses SIFT fea-ture points for camera motion estimation. Yang et al. testvideos and results are available at http://www.junlanyang.net/VideoStabilization.html. (2) MSER basedvideo stabilization method [7] proposed in our earlierwork which uses single-resolution MSERs for cameramotion estimation. (3) Subspace video stabilizationmethod proposed by Liu et al. [8] which places subspaceconstraints on KLT feature trajectories by filtering thetracked feature matrix by factoring into two low-rankmatrices in a low-dimensional feature space. Liu et al. testvideos and results are available at http://web.cecs.pdx.edu/fliu/project/subspace_stabilization/. Subjective andobjective evaluations are carried out for validating the pro-posed stabilization method using multi-resolution MSERsover state-of-the-art stabilization methods.

5.1 Subjective Evaluation

Comparative analysis is performed with subspace videostabilization method proposed by Liu et al. [8] which isone among state-of-the-art 3D stabilization methods.Sequence 00016 is used for comparison. Figure 4(a)shows input frames #300, #301, #302, and #303 ofsequence 00016. Due to high amount of blur in frame

number 301, subspace method [8] fails to track featuretrajectories due to dramatic reduction in factorizationwindows thereby degrading stabilization performanceas seen from Figure 4(b). MSER based video stabilization[7] also failed in this case due to the MSER featureextraction and matching process getting hampered incases of severe blurring degradation as shown inFigure 4(c). The reasons for the failure of the stabilizationtechniques under blurring degradations were discussedin Section 4.1.1. The proposed method where multi-reso-lution MSERs is used for feature extraction and match-ing is effectively able to deal with blurring degradationsas seen from the stabilization results of Figure 4(d). Therobustness is due to the better sensitivity (i.e. good local-ization over entire image frame) along with increase inthe number of correspondences of these region featureswhen detected in a scale pyramid fashion.

Performance evaluation is now carried out on testsequence OUTDOOR_2 captured by us at IIT Kharag-pur campus. Figure 5(a) shows four frames of inputsequence OUTDOOR_2 corresponding to frame num-bers #123, #139, #180, and #219. Stabilization resultsusing the PFME, MSER, and proposed method areshown in Figure 5(b), 5(c), and 5(d), respectively. Asobserved from Figure 5(d) the proposed method usingmulti-resolution MSERs is able to achieve better stabili-zation results in comparison to the PFME as well as thesingle-resolution MSER methods. Finally, the sequence

Figure 4: (a) Five frames of the input blurred video(sequence 00016). (b) Failure of subspace method. (c) Fail-ure of MSER based stabilization method. (d) Proposedmethod using multi-resolution MSERs.

Figure 5: (a) Five frames of the input video (sequenceOUTDOOR_2). Stabilized frames from (b) PFME method,(c) MSER method, and (d) proposed method using multi-resolution MSERs.



Dow

nloa

ded

by [

Selc

uk U

nive

rsite

si]

at 1

4:10

21

Dec

embe

r 20

14


http://www.junlanyang.net/VideoStabilization.html

http://www.junlanyang.net/VideoStabilization.html



STREET used by Yang et al. [6] is tested for perfor-mance evaluation. This sequence consists of a scenewith a car moving across the camera and building inthe background. As reported by Yang et al., this sceneis challenging because of fast moving car and high jit-ter. Figure 6(a) shows four frames of the input sequencewhile Figure 6(b) shows stabilization results based ontheir PFME approach and Figure 6(c) shows the stabili-zation results using MSER based stabilization method.Figure 6(d) shows the stabilization results using theproposed method which subjectively achieves superiorresults than PFME and MSER based video stabilizationmethods.

5.2 Objective Evaluation

Interframe transformation fidelity (ITF) [2] measuresthe peak signal to noise ratio (PSNR) between succes-sive frames and is utilized in this paper for objectiveevaluation. The measures based on PSNR do notrequire knowledge of the ground truth. ITF is given by

ITF ¼ 1Nframe � 1

XNframe�1

k¼1

PSNRðIk; Ikþ1Þ (4)

where Nframe is the total number of frames in the videosequence and PSNRðIk; Ikþ1Þ is the peak signal to noiseratio between two successive frames given by

PSNRðIk; Ikþ1Þ ¼ 10 log10I2max

MSEðIk; Ikþ1Þ (5)

where Imax is the maximum intensity value of a pixel.MSEðIk; Ikþ1Þ is the mean square error between consecu-tive frames given by

MSEðIk; Ikþ1Þ ¼ 1wh

Xhx¼1

Xwy¼1

ðIkðx; yÞ � Ikþ1ðx; yÞÞ2 (6)

where w and h are the width and height of the imageframe, respectively. Typically, a stabilized sequence hashigher ITF value than the unstable sequence. Asobserved from Table 1 the proposed method achievessuperior ITF values in comparison to PFME and MSERbased 2D stabilization methods. This happens becauseof the improvement obtained in the feature extractionand matching block as a result of using multi-resolutionMSERs. Table 2 shows the comparative analysis of theproposed method with subspace video stabilizationmethod which is one of the competing state-of-the-art3D video stabilization methods. As observed fromTable 2, the stabilization failed due to the failure of thefeature tracking in the subspace method in regions hav-ing excessive amount of blurring (sequence 0016) andas a result the ITF value is lesser than the input ITFvalue signifying degradation in performance. However,our proposed method does not fail here as observed bythe higher ITF value for the sequence 00016. The rea-sons for superior performance were discussed in the“Subjective evaluation” section. Similar observation ismade in case of sequence SANY0025. Due to excessiveamount of scene dynamics, the subspace method [8]failed because a single subspace could not account forthe feature trajectories in both the motion area as wellas the background area. The proposed method is ableto stabilize the sequence due to the robustness of themulti-resolution MSERs as observed from Table 2.

Figure 6: (a) Input frames (sequence STREET). Stabilizationresults due to (b) PFME method, (c) MSER method, and(d) proposed method using multi-resolution MSERs.

Table 1: ITF comparison with 2D stabilization methods

Stabilized ITF (dB)

Sequencename

OriginalITF (dB)

PFMEmethod

MSERmethod

Proposedmethod

OUTDOOR_1 19.71 25.45 25.34 26.89OUTDOOR_2 22.88 24.53 25.75 27.10STREET 18.43 18.89 20.23 22.23ONROAD 14.95 17.57 18.50 20.81ONDESK 21.75 23.61 24.67 25.92

Table 2: ITF comparison with 3D stabilization method

Stabilized ITF (dB)

Sequencename

OriginalITF (dB)

Subspacemethod

Proposedmethod

00016 22.88 17.69 25.77SANY0025 18.43 20.44 24.94



Dow

nloa

ded

by [

Selc

uk U

nive

rsite

si]

at 1

4:10

21

Dec

embe

r 20

14

5.3 Computational Complexity

In this section the computational complexity of the pro-posed method is evaluated. The average processingtime per frame for the proposed method is measuredon a number of test sequences and comparative analy-sis is performed with state-of-the-art stabilizationmethods. The frame resolution is highlighted whileperforming comparative analysis as feature extractionand matching stage is highly dependent on it. A largerframe resolution means more features are extractedand matched thereby consuming more processingtime. As seen from Table 3 the proposed method usingmulti-resolution MSER features competes with thePFME and MSER methods. The additional overhead ofextracting MSER features across the scale pyramid andeliminating the duplicate MSERs is minimal as seen bythe marginal rise in the computational complexity forthe proposed method in comparison to PFME andMSER based stabilization methods. However, thismarginal rise in computational complexity is justifiedby the improvement obtained in the stabilization per-formance as seen by the ITF values as well as the sub-jective results in the previous sections. The proposedmethod outperforms the subspace video stabilizationmethod in terms of processing time because the sub-space method uses 3D rendering which is computa-tionally expensive. Although the computationallyintensive SFM step of 3D stabilization method wasachieved by 2D method of tracking feature points inthe subspace method, still the rendering of the outputvideo by 3D reconstruction consumes considerableprocessing time.

6. CONCLUSION

This paper investigated the application of multi-resolu-tion MSERs in the feature extraction and matchingstage for achieving robust stabilization performance.The motivation for investigating this problem was dueto the limitation observed in the state-of-the-art

stabilization methods in cases where the input videoframes were blurred. This area required more attentionas any camera under unsteady and fast motions wouldexperience blurring which is a practical problem undersuch conditions. Experimental evaluation conducted ona number of hand-held video sequences along withcomparison with state-of-the-art stabilization methodsshowed how multi-resolution feature extraction andmatching could overcome this limitation. In addition, italso improved the accuracy of the stabilization methodin comparison to state-of-the-art stabilization methods.The computational complexity for multi-resolutionanalysis is not high as observed in the computationalcomplexity evaluation section. The proposed methodusing multi-resolution MSERs can also be easily portedto the subspace video stabilization method where tradi-tional KLT features can be replaced by multi-resolutionKLT features and then performing feature matrix track-ing. Our future work is focussed on using eigen decom-position of these features and exploring if this couldalso perform better in comparison to the traditional fea-ture matching.

ACKNOWLEDGEMENTS

We would like to thank Prof. Subhashis Banerjee of IIT Delhiand Prof. Uma Mudenagudi of BVBCET Hubli for their valu-able feedback on our work. We would also like to thank theanonymous reviewers as well as the editor who coordinatedthe review of our paper.

REFERENCES

1. S. Kumar, H. Azartash, M. Biswas, and T. Nguyen, “Real-time affineglobal motion estimation using phase correlation and its applicationfor digital image stabilization” IEEE Trans. Image Process., Vol. 20,no. 12, pp. 3406�18, Dec. 2011.

2. G. Puglisi, and S. Battiato, “A robust image alignment algorithm forvideo stabilization purposes” IEEE Trans. Circuits Syst. Video Tech-nol., Vol. 21, no. 10, pp. 1390�400, Oct. 2011.

3. A. Censi, A. Fusiello, and V. Roberto, “Image stabilization by featuretracking” in Proceedings of the International Conference on ImageAnalysis and Processing, Vol. 2, Venice, Italy, 1999, pp. 665�7.

4. J. Pail, Y. Park, and D. Kim, “An adaptive motion decision system fordigital image stabilizer based on edge pattern matching” IEEE Trans.Consum. Electron., Vol. 38, no. 3, pp. 607�15, Aug. 1992.

5. S. Battiato, G. Gallo, G. Puglisi, and S. Scellato, “SIFT features trackingfor video stabilization” in ICIAP 2007, IEEE Computer Society, Mod-ena, Italy, 2007, pp. 825�30.

6. J. Yang, D. Schonfeld, and M. Mohamed, “Robust video stabilizationbased on particle filter tracking of projected camera motion” IEEETrans. Circuits Syst. Video Technol., Vol. 19, no. 7, pp. 945�54,Jul. 2009.

7. M. Okade, and P. K. Biswas, “Video stabilization using maximally sta-ble extremal region features,” Multimedia Tools Appl., Vol. 68, no. 3,pp. 947�68, Feb. 2014.

8. F. Liu, M. Gleicher, J. Wang, H. Jin, and A. Agarwala, “Subspace videostabilization,” ACM Trans. Graph., Vol. 30, no. 1, pp. 1�10, Jan. 2011.

9. J. Matas, O. Chum, M. Urban, and T. Pajdla, “Robust wide baselinestereo from maximally stable extremal regions” in Proceedings of

Table 3: Average processing time comparison

Time (s)

Sequencename

Frameresolution

PFMEmethod

MSERmethod

Subspacemethod

Proposedmethod

OUTDOOR_1 352£288 0.94 0.77 2.39 0.97OUTDOOR_2 352£288 0.89 0.81 2.14 0.94STREET 160£120 0.27 0.21 1.15 0.32ONROAD 160£120 0.25 0.20 1.10 0.34ONDESK 160£120 0.25 0.22 1.14 0.320073YC 640£360 1.67 1.58 4.05 1.8700016 640£360 1.62 1.49 4.20 1.91SANY0025 640£360 1.55 1.41 4.20 1.72



Dow

nloa

ded

by [

Selc

uk U

nive

rsite

si]

at 1

4:10

21

Dec

embe

r 20

14

the British Machine Vision Conference (BMVC), Cardiff, Wales, 2002,pp. 384�93.

10. K. Uomori, A. Morimura, and H. Ishii, “Electronic image stabilizationsystems for video cameras and VCRs,” J Soc. Motion Pict. Telev. Eng.,Vol. 101, no. 2, pp. 66�75, 1992.

11. S. Ko, S. Lee, S. Jeon, and E. Kang, “Fast digital image stabilizer basedon gray-coded bit-plane matching” IEEE Trans. Consum. Electron.,Vol. 45, no. 3, pp. 598�603, Aug. 1999.

12. S. Erturk, and T. Dennis, “Image sequence stabilization based on DFTfiltering” in IEE Proc. Vis. Image Signal Proces., Vol. 147, no. 2,pp. 95�102, Apr. 2000.

13. Y. Matsushita, E. Ofek, W. Ge, X. Tang, and H.-Y. Shum, “Full-framevideo stabilization with motion inpainting,” IEEE Trans. Pattern Anal.Mach. Intell., Vol. 28, no. 7, pp. 1150�63, July 2006.

14. A. Litvin, J. Konrad, and W. Karl, “Probabilistic video stabilizationusing Kalman filtering and mosaicking” in Proceedings of IST/SPIE

Symposium on Electronic Imaging, Santa Clara, CA, 2003, pp.663�74.

15. C. Morimoto, and R. Chellappa, “Fast 3-D stabilization andmosaic construction” in IEEE Conference on Computer Visionand Pattern Recognition (CVPR), San Juan, Puerto Rico, 1997,pp. 660�5.

16. F. Liu, M. Gleicher, H. Jin, and A. Agarwala, “Content-preservingwarps for 3d video stabilization,” ACM Trans. Graph. (Proc. ACM SIG-GRAPH 2009), Vol. 28, no. 3, pp. 44:1–44:9, Aug. 2009.

17. M. Okade, and P. Biswas, “Improving video stabilization in the pres-ence of motion blur” in 2011 Third National Conference on Com-puter Vision, Pattern Recognition, Image Processing and Graphics(NCVPRIPG), Hubli, India, 2011, pp. 78�81.

18. P.-E. Forssen, and D. Lowe, “Shape descriptors for maximally stableextremal regions” in IEEE 11th International Conference onComputer Vision (ICCV), Rio de Janeiro, 2007, pp. 1�8.

AuthorsManish Okade was born in Hubli, Karnataka, India,in 1980. He received his BE degree in electronicsand communication engineering from BVB Collegeof Engineering and Technology, Hubli, MTechdegree in automation and computer vision fromIndian Institute of Technology (IIT), Kharagpur,and PhD degree in computer vision and imageprocessing from IIT, Kharagpur. He was awardedwith the IBM “The Great Mind Challenge Award”

in the year 2008 for the project OSPEDALE. He was a senior lecturer in theDept. of Computer Science and Engineering at BVB College of Engineeringand Technology, Hubli, from 2006 to 2009. Currently he is working as anassistant professor in the Dept. of Electronics and Communication Engi-neering, National Institute of Technology (NIT), Rourkela, India. His inter-ests include image and video processing, pattern recognition, assistivetechnologies, and road safety technologies.

E-mail: [email protected]

Prabir K. Biswas received the BTech degree (withhonors) in electronics and electrical communica-tion engineering, the MTech. degree in automa-tion and control engineering, and the PhD degreein computer vision from the Department of Elec-tronics and Electrical Communication Engineering,IIT, Kharagpur, India, in 1985, 1989, and 1991,respectively. From 1985 to 1987, he was withBharat Electronics Ltd, Ghaziabad, India, as a dep-

uty engineer. Since 1991, he has been working as a faculty member in theDepartment of Electronics and Electrical Communication Engineering, IIT,where he is currently a professor. He has several video lectures hosted onNPTEL which is India’s premier knowledge hub supported by Ministry ofHRD, India. His interests include pattern recognition, image processing, andoperating systems.

E-mail: [email protected]

DOI: 10.1080/03772063.2014.962627; Copyright © 2014 by the IETE



Dow

nloa

ded

by [

Selc

uk U

nive

rsite

si]

at 1

4:10

21

Dec

embe

r 20

14

mailto:[email protected]

mailto:[email protected]

improving video stabilization using multi-resolution mser features

Documents