[ieee 2013 fourth national conference on computer vision, pattern recognition, image processing and...

Mean Shift Clustering based Outlier Removal forGlobal Motion Estimation

Manish Okade∗, Prabir Kumar Biswas†∗Department of Electronics and Communication EngineeringBVB College of Engineering and Technology, Hubli, India

†Department of Electronics and Electrical Communication EngineeringIndian Institute of Technology, Kharagpur, India

Email: ∗[email protected], †[email protected]

Abstract— This paper investigates a novel motion vector outlierrejection method based on using mean shift clustering on blockmotion vectors. The accuracy of compressed domain globalmotion estimation techniques is largely influenced by its abilityto counter the outlier motion vectors. These outliers occur in theblock motion vector field due to moving objects, noise or due tolarge matching errors as a result of the encoders priority on ratedistortion optimization. In the present work it is shown that byusing mean shift clustering on block motion vectors, those clusterswhich correspond to outlier motion vectors can be identified.Once detected these clusters are kept out of the global motionestimation process thereby increasing the robustness of estimatedcamera parameters. The proposed method is compared withexisting state-of-the-art outlier removal methods using syntheticand real video sequences to establish and validate its superiority.

Index Terms— outlier motion vectors, mean shift clustering,global motion estimation, compressed domain.

I. INTRODUCTION

Video sequences basically consists of two types of motion,the first one is due to the movement of the camera (pan, tilt,rotate, zoom) referred to as global motion while the second oneis due to the motion of individual objects composing the scenewhich is referred to as local motion. The region of supportfor motion representation in case of global motion consistsof the entire image frame while for local motion the regionof support consists of small areas like rectangular blocks oreven a single pixel within an image frame. It is advantageousto have a distinction between the global and local motionsfrom two points of view, firstly it gives precise estimatesof the types of motions underlying a dynamic scene andsecondly it leads to more compact representations of motioninformation that would eventually help in video analysis. As aconsequence of the advantages offered by estimation of globalmotions the MPEG-4 coding standard has adopted two toolsnamely dynamic sprites and global motion compensation inits framework. The task of estimating the movement of thecamera is known as Global Motion Estimation (GME). Itis useful in applications such as video stabilization, movingobject segmentation, super resolution, SFM, video indexingand annotation.

The estimation of the camera parameters can be doneeither in pixel domain [1]–[3] or block motion vector domainalso referred to as the compressed domain [4]–[7]. GMEtechniques in the pixel domain minimize the prediction

errors by utilizing an iterative non-linear estimation techniquethereby achieving robust estimations for the camera motionparameters. However, due to the number of pixels involved inthis iterative non-linear estimations, computational complexityis an major issue in this domain thereby limiting its usefor real time applications. On the other hand, compresseddomain global motion estimation methods are fast becausethe motion vectors and DCT coefficients are available readilyin the bitstream and do not have to be explicitly computed.The only drawback is that although the motion vectors arereadily available, they do not always represent the actualcamera motion due to the outliers being present in thebit-stream. These outliers occur in the block motion vectorfield due to moving objects, noise or due to large matchingerrors as a result of the encoders priority on rate distortionoptimization. The outliers tend to bias the GME process andhave to be pre-processed in-order to increase the efficiencyof the GME techniques. The outlier removal for efficientGME has been investigated extensively in literature. Danteet al. [8] investigated a novel outlier rejection technique forepipolar geometry by applying a threshold on the magnitudedifference between a motion vector and its 8 surroundingneighbours. This work was extended by Chen et al. [6] whoin addition to checking the motion vector magnitude alsochecked for the phase difference between the motion vectorand its 8 surrounding neighbours. However, their methodutilized three filters with the involvement of thresholdsat each filter input. Su et al. [5] proposed a novel androbust global motion estimation technique which utilizedNewton-Raphson gradient descent optimization techniquewith outlier handling to minimize the difference between theestimated and actual motion vectors by using square error.Smolic et al. [9] investigated a low complexity global motionestimation technique which utilized a M-estimator on theP-frame motion vector fields of block coded video. Recently,in [10] a 2-D tensor voting technique was used to exploit thesimilarity between the motion vectors.

In this paper we propose a novel technique which utilizesmean shift clustering for identifying outliers in the blockmotion vector field. Experiments conducted on synthetic andreal video sequences demonstrate its superiority over state-of-the-art outlier removal methods.

II. KEY CONTRIBUTIONS

Our focus in the present work is to detect the outliermotion vectors present in the block motion vector field sothat these motion vectors can be excluded while estimatingthe global motion. The outliers are detected by using meanshift clustering on the block motion vectors. Extensive ex-perimentation show significant improvement in performanceevaluation parameters as compared to state-of-the-art outlierremoval techniques.

III. CAMERA MODEL

Eight parameter perspective camera model [5] is used in thisstudy for estimating the camera motion. The model is givenby ⎛

⎜⎝𝑥′

𝑦′

1

⎞⎟⎠ =

⎛⎜⎝𝑚1 𝑚2 𝑚3

𝑚4 𝑚5 𝑚6

𝑚7 𝑚8 1

⎞⎟⎠

⎛⎜⎝𝑥

𝑦

1

⎞⎟⎠

where (𝑥, 𝑦) is the point on the reference frame and (𝑥′, 𝑦′)is the point on the frame which has been transformed asgiven by the eight parameter transformation matrix m. 𝑚3

and 𝑚6 represent translation parameters while the rest of theparameters represent rotation, zooming and shearing effects.GME estimates this matrix m between two consecutive framesfor the entire video sequence.

IV. PROPOSED METHOD

Proposed Outlier RejectionStrategy

Mean Shift Clustering

Input MotionVector Field

Inlier Motion Vectors

GME

Global (camera)motion parameters

Fig. 1. Proposed outlier removal framework

Fig. 1. shows the proposed outlier removal strategy. Theblock motion vectors resulting from the video encoding pro-cess contain both inlier as well as outlier motion vectors. Theseoutlier motion vectors occur due to the reasons listed as under

i. At homogeneous regions, motion vectors have verysmall magnitudes (nearing zero). This happens due tothe failure of prediction error minimization step of theencoding algorithms at such areas.

ii. Areas with repetitive texture pattern, boundary areas ofmoving object, motion vectors corresponding to tiny ornon-rigid objects.

iii. motion vectors with large matching errors due to theproperty of the block-matching process.

iv. objects which are in motion relative to the background.The first three categories can be countered by pre-processing

the input motion vector field by using effective filteringstrategies. The final category i.e. object motion, can be handledwhile performing the global motion estimation using regres-sion techniques. Our focus in this work is to concentrate on thefirst three categories of outliers. Our proposed outlier rejectionstrategy uses mean shift clustering for refining the inputmotion vector field. The proposed outlier removal method

outputs a set of inlier motion vectors which will be fed tothe GME algorithm. It is observed from experiments that byrefining the input motion vector field, significant improvementis observed in the global motion estimation process. Detailsof the proposed method is given as under

A. Mean Shift Review

The mean shift is an iterative procedure that shifts each datapoint to the average of data points within its neighborhoodand is derived using the density gradient estimation. A briefmathematical description is given below. More details can befound in [11], [12].

Assume that 𝑋1 . . . 𝑋𝑛 is the given multivariate data setin the d-dimensional Euclidean space 𝑅𝑑 . The multivariatekernel density estimator with kernel 𝐾 and window width ℎis defined as

𝑓(𝑥) =1

𝑛ℎ𝑑

𝑛∑𝑖=1

𝐾{ 1ℎ(𝑥−𝑋𝑖)} (1)

Substituting for kernel 𝐾 as

𝐾(𝑥) =

{(1− ∣∣𝑥∣∣2) if ∣∣𝑥∣∣ ≤ 1,

0 if ∣∣𝑥∣∣ > 1,(2)

Equation (1) simplifies to

𝑀ℎ(𝑥) =1

𝑛𝑥

∑𝑋𝑖∈𝑆ℎ(𝑥)

𝑋𝑖 − 𝑥 (3)

where the region 𝑆ℎ(𝑥) is a hyper-sphere of radius ℎ contain-ing 𝑛𝑥 data points. Equation (3) is called mean shift at 𝑥 ∈ 𝑋 .The kernel 𝐾 used here is known as Epanechnikov kernel.The mean shift vector always points toward the direction ofthe maximum increase in the density. The set of all locationsthat converge to the same mode defines the basin of attractionof that mode. The mean shift procedure which is obtainedby successive computation of the mean shift vector 𝑀ℎ(𝑥)followed by translation of the kernel window by 𝑀ℎ(𝑥) isguaranteed to converge at a nearby point where the estimatehas zero gradient making it highly suitable for block motionvectors. The next subsections describes how we have adoptedthe mean shift clustering procedure for the task of outlierdetection.

1) Mean Shift clustering applied on block motion vectors:This section outlines as to how we have utilized the mean shiftclustering for detecting the outlier motion vectors present inthe input motion vector field. The premise is that since theoutlier motion vectors tend to deviate from the motion model,they do not fit it in the true sense and form independentand small clusters on application of mean shift clusteringprocedure. These clusters which contain the outlier motionvectors would bias the global motion estimate if includedin the estimation process thereby reducing the accuracy ofthe global motion parameters. Our proposed method initiallyidentifies such clusters and excludes them from the globalmotion estimation process. The following steps outlines theproposed outlier rejection strategy.

Let 𝑥𝑖 and 𝑧𝑖, 𝑖 = 1 . . . 𝑛 be the d-dimensional input andfiltered motion vectors in the joint spatial-range domain. The

bandwidth parameter ℎ = (ℎ𝑠, ℎ𝑟) controls the size of thekernel in the spatial and range domain respectively therebydetermining the resolution of mode detection. For each motionvector

Step 1. Initialize 𝑗 = 1 and 𝑦𝑖,1 = 𝑥𝑖

Step 2. Compute 𝑦𝑖,𝑗+1 using equation (3) until convergence,𝑦 = 𝑦𝑖,𝑐. The stationary points 𝑦𝑖,𝑐 are the modes of thedensity.

Step 3. Store all the information about the d-dimensional con-vergence point in 𝑧𝑖, i.e. 𝑧𝑖 = 𝑦𝑖,𝑐

Step 4. Delineate in the joint domain the clusters {𝐶𝑝}𝑝=1...𝑚

by grouping together all 𝑧𝑖 which are closer than ℎ𝑠

in the spatial domain and ℎ𝑟 in the range domain.This step concatenates the basins of attraction of thecorresponding convergence points.

Step 5. For each 𝑖 = 1 . . . 𝑛, assign 𝐿𝑖 = {𝑝∣𝑧𝑖 ∈ 𝐶𝑝}.Step 6. Mark those clusters which contain less than a threshold

number of motion vectors as outlier clusters.Step 7. Exclude these outlier clusters from the global motion

estimation process.

Fig. 2 shows the outlier clusters identified by the proposedoutlier rejection strategy marked in red and superimposed onframe #3 of sequence Stefan. The motion vector field usedwas between frame #2 and frame #3 of sequence Stefan.The motion vectors associated with these detected outlierclusters are excluded from the further stage of global motionestimation. This improves the accuracy of the global motionestimation algorithm as it has to counter only for the objectmotion while determining the global motion parameters.

Fig. 2. Outlier Clusters identified by the proposed outlier rejection strategy

B. Global Motion Estimation algorithm

GME is now carried out on the refined motion vector fieldwhich was obtained as described in the previous section. Suet al. [5] algorithm is utilized for estimation the global motionparameters. Su et al. used Newton-Raphson gradient descentoptimization technique along with outlier removal to minimizethe difference between the estimated and actual motion vectorsby using square error criterion. Their method which estimatesthe camera parameter vector m is briefly described as under.

Let MV = (MV𝑋 ,MV𝑌 ), where MV𝑋 is the horizontal(X) component of motion vector and MV𝑌 is the vertical(Y) component of motion vector. The horizontal and verticalcomponents for the block centered at pixel (𝑥, 𝑦) in the

TABLE I

SYNTHETIC GLOBAL MOTION PARAMETERS

Model Type Motion Parameters

GM 1 Geometric [0.95, 0,10.4238,0, 0.95,5.7927, 0, 0 ]

GM 2 Affine [0.9964, -0.0249, 1.0981, 0.0856, 0.9457,-7.2, 0, 0]

GM 3 Perspective [0.9964, -0.0249,6.0981, 0.0249, 0.9964,2.5109, -2.7e-5, 1.9e-5]

GM 4 Perspective [1, 0, 4.4154,0, 1, 0, -1.1263e-4, 0]

current frame corresponding to the perspective camera modelof Section III are obtained using

MV𝑋(𝑥, 𝑦;m) = 𝑥′ − 𝑥 (4)

MV𝑌 (𝑥, 𝑦;m) = 𝑦′ − 𝑦 (5)

Equations (4) and (5) gives the motion vector MV(x, y; m)at location (𝑥, 𝑦) corresponding to a given m. However, theactual motion vector at the location is MV(x, y, t). Su et al.minimized the difference between these two motion vectorterms (i.e. estimated and actual) by using square error as givenbelow

m𝑡 = argmin𝒎

∑(𝑥,𝑦)

∥MV(𝑥, 𝑦, 𝑡)− MV(𝑥, 𝑦;m)∥2 (6)

Estimating m𝑡 using equation (6) is referred as regressionor model fitting. Gradient Descent (GD) which is a robustregression technique was used by Su et al. for the globalmotion parameter estimation. Those motion vectors that havemaximum deviation in comparison to the estimated model aremarked as outliers. This process is iterated till convergence.This regression handles the motion vectors corresponding tothe large moving objects.

V. RESULTS

Matlab R2010a is used for experimentation. Exhaustivesearch motion estimation algorithm is used to generate theblock motion vector field. Fixed block size of 8 x 8 is usedin the exhaustive search. The cluster size threshold was setto 20 i.e. clusters which contain less than 20 motion vectorswould be marked as outlier clusters. The kernel bandwidthparameters used were (ℎ𝑠, ℎ𝑟) = (8, 4). Comparative studiesis carried out with the following methods to validate ourproposed outlier removal method using mean shift clustering:i) Gradient descent (Plain GD) [5], ii) Filter method (FLT GD)[8], iii) Cascade-of-rejectors (CAS GD) [6], iv) Tensor Votingmethod (TV GD) [10].

A. Synthetic sequence evaluation

Firstly, we perform comparative analysis on synthetic mo-tion vector fields. Table 1 shows the global motion parametersused in the synthetic evaluation. This is the same set as usedby Chen et al. [6] as it helps in maintaining uniformity incomparison with competing methods of outlier removal. Thesynthetic motion vectors are corrupted by gaussian noise ofzero mean in both horizontal as well as vertical directionsto generate the input block motion vector field. Increasinglevels of gaussian noise i.e. {0.7, 1.5, 2.2, 3.0} is used.

Global motion parameters are estimated from this corruptedinput block motion vector field using the proposed as wellas other competing methods. SNR is used for benchmarkingthe proposed method against state-of-the-art outlier removalmethods. From Table II it is observed that the proposed outlierrejection method is superior in comparison to the state-of-the-art outlier rejection methods. This is due to the fact thatmean shift clustering strategy used is able to reject the outlierscomprehensively.

TABLE II

AVERAGE SNR COMPARISON

GM GME Standard Deviation of Gaussian NoiseModel Algorithm 0.7 1.5 2.2 3

GM 1

proposed 35.86 29.44 25.82 23.11TV GD 34.46 28.37 25.07 22.68

CAS GD 33.98 27.60 23.87 21.50FLT GD 28.15 15.83 11.19 8.73Plain GD 31.71 24.63 21.85 20.11

GM 2

proposed 38.66 33.48 29.23 26.77TV GD 38.12 32.65 28.57 25.73


GM 3

proposed 35.54 29.71 26.54 23.18TV GD 34.96 28.96 26.13 22.35


GM 4

proposed 38.21 31.75 29.17 26.31TV GD 37.78 31.20 28.53 25.79


B. Real test sequence evaluation

The proposed outlier removal technique is now tested onreal test sequences. The sequences used are standard 4:2:0YUV sequences. Global Motion Compensation (GMC) isperformed by warping the current frame on the referenceframe by utilizing the estimated global motion parameters.The compensated frames will be approximately near to theoriginal frames only if the GME is accurate and this similarityis measured by the PSNR metric. Table III shows the averagePSNR for the test sequences used in this study. It is observedfrom Table III that the proposed method using mean shiftclustering is superior in comparison to state-of-the-art outlierrejection methods. The TV GD which is based on using tensorvoting between motion vectors and CAS GD which utilizesthree filters for testing the magnitude and phase between amotion vector and its eight neighbours competes with theproposed method. However, our proposed method is simplerin comparison to CAS GD as a single filter is sufficient ascompared to three filters used in the CAS GD. Our method isalso simpler in comparison to the data communication basedapproach followed by the tensor voting method of outlierremoval.

VI. CONCLUSION

In this paper, a novel motion vector outlier rejection schemefor global motion estimation is demonstrated. Mean shiftclustering is carried out on the input block motion vectorfield as a pre-processing step to identify the clusters which

TABLE III

GLOBAL MOTION COMPENSATION (GMC) PERFORMANCE

Sequence Average PSNR (dB)Name Plain GD FLT GD CAS GD TV GD proposed

Garden 22.30 21.44 22.19 25.75 26.89

Stefan 24.51 22.16 24.60 24.30 25.77

City 28.70 29.25 29.48 28.11 29.76

Tempete 26.51 24.98 27.83 27.91 28.79

Waterfall 34.71 24.25 34.86 34.85 35.91

Mobile 23.91 22.69 23.47 24.15 25.68

Coastguard 26.54 26.90 26.78 26.86 28.17

correspond to the outlier motion vectors. Those clusters whosesize is smaller than a fixed threshold signify outliers (i.e. noisymotion vectors as well as motion vectors corresponding to tinymoving objects) which would bias the global motion estimates.By excluding these outlier clusters from the GME processwe are able to improve the efficiency of the GME estimates.Experimental evaluation using synthetic as well as real testsequences demonstrate this improvement in Average SNR aswell as Average PSNR for the proposed method in comparisonto the state-of-the-art outlier removal methods. The proposedmethod is robust as well as simpler in comparison to cascade-of-rejectors method of Chen et al. in which a combination ofthree filters is used with the involvement of thresholds at theinput of each cascaded filter. Our future work is focussed onidentifying the cluster size adaptively instead of fixed thresholdused in this work.

REFERENCES

[1] M. Haque, M. Biswas, and M. Pickering, “Computationally efficientglobal motion estimation using a multi-pass image interpolation algo-rithm,” in Picture Coding Symposium (PCS), may 2012, pp. 349 –352.

[2] H. Alzoubi and W. D. Pan, “Efficient global motion estimation usingfixed and random subsampling patterns,” in International Conference onImage Processing (ICIP), vol. 1, 2007, pp. I –477 –I –480.

[3] F. Dufaux and J. Konrad, “Efficient, robust, and fast global motionestimation for video coding,” IEEE Transactions on Image Processing,vol. 9, no. 3, pp. 497–501, 2000.

[4] M. Tok, A. Glantz, M. Arvanitidou, A. Krutz, and T. Sikora, “Com-pressed domain global motion estimation using the helmholtz tradeoffestimator,” in International Conference on Image Processing (ICIP),sept. 2010, pp. 777 –780.

[5] Y. Su, M.-T. Sun, and V. Hsu, “Global motion estimation from coarselysampled motion vector field and the applications,” IEEE Transactions onCircuits and Systems for Video Technology, vol. 15, no. 2, pp. 232–242,2005.

[6] Y.-M. Chen and I. Bajic, “Motion vector outlier rejection cascade forglobal motion estimation,” IEEE Signal Processing Letters, vol. 17,no. 2, pp. 197 –200, feb. 2010.

[7] R. Wang and T. Huang, “Fast camera motion analysis in MPEG domain,”in ICIP, 1999, pp. 691 –694.

[8] A. Dante and M. Brookes, “Precise real-time outlier removal frommotion vector fields for 3d reconstruction,” in ICIP, vol. 1, sept. 2003,pp. I – 393–6 vol.1.

[9] A. Smolic, M. Hoeynck, and J.-R. Ohm, “Low-complexity global motionestimation from p-frame motion vectors for mpeg-7 applications,” inICIP, vol. 2, sept. 2000, pp. 271 –274 vol.2.

[10] T. Nguyen Dinh and G. Lee, “Efficient motion vector outlier removalfor global motion estimation,” in ICME, july 2011, pp. 1–6.

[11] Y. Cheng, “Mean shift, mode seeking, and clustering,” IEEE Transac-tions on Pattern Analysis and Machine Intelligence,, vol. 17, no. 8, pp.790–799, 1995.

[12] D. Comaniciu and P. Meer, “Mean shift: a robust approach towardfeature space analysis,” IEEE Transactions on Pattern Analysis andMachine Intelligence,, vol. 24, no. 5, pp. 603–619, 2002.

[ieee 2013 fourth national conference on computer vision, pattern recognition, image processing and...

Documents