09.05.interpolation of missing data in image sequences

8/14/2019 09.05.Interpolation of Missing Data in Image Sequences

1/11

1509EEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 4. NO . I I , NOVEMBER 1995

Interpolation of Missing Data in Image SequencesAni1 C. Kokaram, Member, IEEE, Robin D. Morris, William J . Fitzgerald, and Peter J . W. Rayner

Abstract-This paper presents a number of model based inter-polation schemes tailored to the problem of interpolating missingregions in image sequences. These missing regions may be ofarbitrary size and of random, but known, location. The problemof locating the missing regions is discussed in another paperin this issue. This problem occurs regularly with archived filmmaterial. The film is abraded or obscured in patches, giving riseto bright and dark flashes, known as dirt and sparkle in themotion picture industry. Both 3-D autoregressive models and 3-DMarkov random fields are considered in the formulation of thedifferent reconstruction processes. The models act along motiondirections estimated using a multiresolution block matching (BM)scheme. It is possible to address this sort of impulsive noisesuppression problem with median filters, and comparisons withearlier work using multilevel median filters are performed. Thesecomparisons demonstrate the higher reconstruction fidelity of thenew interpolators.

I. INTRODUCTIONHE problem of missing data in image sequences occursT egularly in archived motion picture film as well assequences from extremely high-speed film cameras. Parti-cles caught in the film transport mechanism can damage

the image information. The missing data regions manifest asblotches of random intensity in the sequence, called dirtand sparkle in the motion picture industry. The problemcan be solved by using either a global filtering strategy or adetection/interpolation approach. The global filtering strategysuffers from the drawback that the treatment is not guaranteedto leave uncorrupted regions untouched. This paper, therefore,describes processes for interpolating missing areas in theimage sequence after they have been flagged for treatmentby some detection process. Various detection processes havealready been described in [ l ] and [2]. In this paper, the SDIadetector (described in [1 and [2]) is used for examining thebehavior of the interpolators in a real situation.An important point is the size of the missing data beingconsidered in this paper. Unlike typical impulsive noise sup-pression applications, it is possible for blotches on motionpicture film to be larger than 20 x 20 pixels. A spatial medianfiltering operation thus becomes less effective in the center ofsuch distortion primarily because it is then considering manymissing pixels in its output. Of course, one could design amedian filter that uses more intraframe information, and thisis illustrated in the section on 3-D multilevel filters.In addressing the issue of data reconstruction for image se-quences, it is necessary to recognize that a fully 3-D operation

Manuscript received M arch 19, 1994; revised January 10, 1995. This workwas supported by the British Library and Cable and Wireless PLC. Theassociate editor coordinating the review of this paper and approving it forpublication wa s A. Murat Tekalp.The authors are with the Signal Processing and C ommunications Labora-tory, Department of Engineering, Cambridge University, Cambridge, UK .IEEE Log Number 9414601.

would hold much more potential for greater image fidelitythan a 2-D operation. Of course, the problem then arises aboutestimating motion, and it becomes important to acknowledgethe errors that will occur in this estimation process. Therefore,with respect to reconstruction, a good algorithm would takeadvantage of both spatial and temporal information and beable to emphasize one or the other in spatially or temporallyinhomogeneous regions of the sequence.Although it is true that one can formulate motion estimatorsthat use the paradigms presented in this paper, we chooseinstead, to use a simpler motion estimator-multiresolutionblock matching. This brings some element of practicality tothe algorithms that will be discussed since there already existblock matching (BM) estimators on silicon, which conceiv-ably could be incorporated into multiresolution schemes. Thedetails of the motion estimation scheme used can be found in[I ] and [ 2 ] . t is sufficient to note here that the multiresolutionscheme is similar to the one used by Bierling [3], and theBM itself incorporates some explicit robustness to noise asdiscussed by Boyce [41.A full reconstruction system would therefore involve firstmotion estimation, then detection of the missing regions(which have been characterized as temporal discontinuitiesin [ l ] ) , and, finally, reconstruction of the detected missingregions. The paper considers three interpolators that are eachrepresentative of a class of systems.First, a 3-D multilevel median filter that is an extensionof those introduced previously [5]-[7] s presented. Althoughstrictly not an interpolator, this type of filtering operationyields acceptable results when used as part of a detectorcontrolled scheme. Turning the filter on and off as requiredlimits the fading effect of the median operation to just theflagged sites, thus improving the overall quality of the resultingimage when compared with a globally filtered one. Controlledmedian operations were also considered in [2] and [8].Two model-based approaches are then described. The firstemploys a Markov random field (MRF) model of the image,and the second considers 3-D autoregressive (3-DAR) modelsof the image. Both of these models attempt to account forintensity variation in the image, the first employing Gibbsdistributions and Bayesian estimation strategies, whereas thesecond employs a more traditional linear prediction approach.The goal in using some image model for reconstruction is to beable to provide interpolated samples that smoothly blend withthe rest of the data at the fringes of the blotch as well as to be

Inhomogeneous due to either nontrivial motion or erroneous motionestimation.Noncausal m ultidimensional autoregressive proc esses are considered here.Noncausal a utoregres sive process are perhaps better referred to as noncausalminimum variance processes [9].

1057-7149/95$04.00 0 995 IEEE

uthorized licensed use limited to: BIBLIOTECA D'AREA SCIENTIFICO TECNOLOGICA ROMA 3. Downloaded on October 8, 2009 at 04:32 from IEEE Xplore. Restrictions apply.


2/11

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 4, NO . 11 , NOVEMBER 1995510

able to preserve detail such as edges across the blotch. This isparticularly useful considering the large size of blotches thatcan occur.Of course, no blotch detector is perfect, especially if, as in[ I ] , the only criterion for detection is a region of temporaldiscontinuity. Therefore, one expects false alarms, and theinterpolators presented in the rest of this paper are quite robustto such problems. However, in areas of nontrivial motion,such as rapid occlusion and uncovering, it is difficult tointerpolate useful data unless perhaps more frames are usedfor motion estimatiodinterpolation. The algorithms discussedhere operate on three frames only but can be extended in themanner of [IO] and [111 to consider more frames.The remainder of this paper first presents the various inter-polators and then compares their action on known distortion inan image sequence. Finally, examples of complete restorations,i.e., motion estimation followed by detection and reconstruc-tion, are given to show the applicability of these interpolatorsto the problem of removing dirt and sparkle from imagesequences.

11. T HE INTERPOLATORSA . A 3-0Multilevel Median Filter

Median filters are not usually regarded as interpolators, butin this case, it is clear that the median operation is quite ap-propriate both because of its reported robustness as well as itssuccess in impulsive noise suppression for images as a whole.Alp [7] and Arce [ 5 ] , [ 6 ] have both previously introduced3-D multilevel median filter (MMF) structures for removingimpulsive noise. The structures were introduced without amotion-compensated implementation and without detection ofimpulses. Both Huang [12] and Martinez [13] implementeda three-tap motion-compensated median operation with goodresults. A discussion of the advantages of motion-compensatedapplication of 3-D MMFs as opposed to a nonmotion-compensated application is given in [2]. Although one paysa higher computational cost for motion estimation, the gainsin image fidelity can make the process worthwhile. This isimportant in the case of TV imagery in which the motion isnot small, therefore causing the filters presented by Alp andArce to tend to a purely spatial operation.It is important to realize the blotches that are typicallyencountered can be quite large in practice. Blotches spanning10 pixels are often seen. With this in mind, it becomesimportant that the filtering operation involves informationfrom the surrounding frames. Because of the little temporalinformation used by the filters introduced by Alp and Arce,they cannot completely remove this degradation after one filterpass. Several passes could be employed, but doing so wouldaffect the rest of the uncorrupted image. Although the three-tap filter of Huang and Martinez would be very effective inremoving blotches, it is also very sensitive to erroneous motion

w1 w2

w3 w4 w5Subfilter masks used for the new MMF: ML3-Dex.ig . I .

is similar to that of Alp et al. [7] and is called ML3Dex forextended ML3D. The output of the filter is defined as follows:1 5 I 2 5

ML3Dex Filter output = median[zl,22 , 2 3 , 2 4 , 2 5 1 . (1)Two additional windows have been incorporated that con-tain minimal information from the current frame and extensiveinformation from the outer frames. Consider the situation witha large blotch covering all of the center 3 x 3 pixels. It can

be seen that although the windows W3 and W4 would outputBlotch values inside the degraded area, the three windowsW1,W2,W5 would still be able to yield a correction usingimage data. In other words, provided that the blotch does notoccur at the same position in the three frames, the mediansof windows W1,W2,W5 are not dominated by scratch data.Furthermore, the additional information improves the scratchrejection capacity of the filter. This is accompanied of courseby a subsequent loss of detail preservation when compared tothe Alp et al. or Arce filters.B. M RF-Based Interpolators

To interpolate very large regions of missing data in anisolated image would require a complex, adaptive model.However, typical missing data regions caused by blotches donot typically occur at the same motion-compensated locationsin successive frames, and this means that the spatiotemporalneighborhood of the missing region contains much good infor-mation, reducing the complexity of the interpolator required.In this section, an interpolator based on an MRF imagemodel [15] is proposed. Although this type of model can bevery general and sophisticated, because of the comments inthe preceding paragraph, a simple MRF was used in this work.Only two element cliques were used, coupled with differentneighborhood structures and the quadratic potential function.This produces a smooth interpolation that is suitable for thispurpose.The nature of the problem is such that areas that are notclassified as missing must not be affected-only the missingareas are to be interpolated, based on the known informationin the spatiotemporal neighborhood of the missing region. AnMRF formulation that embodies this is

21 = median[W~]

/ restimation. 1ZIs stated previously, the authors [8], [14] introduced a 3-DMMF that preserves detail well while being fairly robust to

p ( I = i ( D = d ) = -exp-motion estimation errors and being able to reject large sizedistortion. The proposed filter structure can be defined withthe help of the subfilter windows shown in Fig. 1. The MMF +A (i(3 ( q 2 (2)SE7; I)

'


3/11

KOKARAM et al.: INTERPOLATION OF MISSING DATA IN IMAGE SEQUENCES 151 I

where the model is only over the missing regions, ( i(3d ( 3 = l} , d ( 3 = 0 indicating known data at positionr'; and d(?) = 1 indicating missing data. JV-Fis the spatialneighborhood of pixel F, T- is the temporal neighborhood,and A is the relative weight given to the temporal neighbors.21 normalizes the distribution. The spatial neighborhoodsused were the first- and second-order neighborhoods (fourand eight nearest neighbors), and the temporal neighborhoodcomprised either one or five pixels from each of the previousand following frames.

The Gibbs sampler [15], [16] may be used directly with thedistribution of (2) to form an interpolation. At each pixel of themissing region taken in turn, a new value is drawn from thedistribution of p ( z ( q ) ,conditional on the current values in theneighborhood. This will converge to a sample from the jointdistribution over the missing region. However, for the largestate spaces typically involved in high-quality imagery, eachiteration of the Gibbs sampler is computationally expensive,and the interpolation converges very slowly.This motivates the use of the mean field approximation[171, [181 as a more efficient, deterministic approximationto the interpolant. The distribution of (2) is replaced by amuch simpler distribution with no spatiotemporal interactionbetween the variables, giving the joint distribution over thepixels to be interpolated as

This distribution is a function of the parameters m ( 3 . Theoptimum interpolant is i(F) = m ( 3 , which maximizes thedistribution in (3). The problem is to select m(F) o minimizethe errors introduced by using the distribution of (3) ratherthan the distribution of (2).

The Gibbs-Bogoliubov-Feynman bound [191, [20] statesthat the error of this approximation is minimized when(4)

where U ( i ) s the energy function of the distribution of (2)given by

V,[-Tln Zo+ ( U - Uo),] = O

J(5)

and U o ( i ) s similarly the energy function for the distributionof (3). ( U - U O ) , is the mean value with respect to thedistribution of (3), that is

( U - U& = ( U ( i )- Uo(i))po(i)di (6)L M

where M is the number of missing pixels. The section ofthis integral involving U O () is straightforward. The temporalterm in U ( i ) also produces a simple integral. Because of theinteraction between the variables, the spatial term in the U ( i )integral is more involved and requires a distinction to bemade between neighbors that are within missing regions and,

hence, variable and those which are of known, fixed values.Performing this integral results in

r

.


4/11

1512 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 4,NO. 1 1 , NOVEMBER 1995

whereI ( f ) pixel grey level at position r' (a 3-D position vector)in the imageak model coefficients~ ( f ) odel excitation (or ideal prediction error).

The P vectors i& are support vectors that point to each pixel inthe neighborhood used for the AR model. Therefore, I(?+ &)is the grey level of the pixel at the kth support position for thepixel at r'. It is assumed that a set of N frames are considered,and the data volume used has already been compensated formotion.It is helpful to describe predictors by the number of pixelssupport in each frame. There is no evidence for asymmetricsupports; therefore, a 9:O model refers to a model with ninepixels in a 3 x 3 square in the previous frame acting as support.A 9:0:9 model has twice that support: nine pixels in each ofthe previous and next frames. In general, an a : b model hasa pixels support in the previous frame and b in the currentframe; an a : b : c model has, in addition, c pixels supportin the next frame.Allowing for a border of pixels at the edge of the L x Lblock in the current frame, say n (so that (?+ &) will neverresult in a location outside the L x L block), an equation forthe error at every pixel within a centered K x K block in thatframe can be written as follows:

e = Ai (10)where

i NL2 x 1 column vector of row ordered pixels from theN L x L blocks

e K 2 x 1 column vector of errorsA matrix of coefficients satisfying the model equation atall the considered points.

This coefficient matrix is of size K2 x NL2. The vector icontains intensities of both known and unknown pixels. If thisvector is separated into two vectors i, (U for unknown) and i k(k or known), which represent the known and unknown pixelintensities, then (IO) can be written as

e = Akik +A&. (1 1)Here, Ak, A, are the coefficient matrices corresponding tothe known and unknown data vectors. They are submatricesof the A matrix made by extracting the relevant columns. Thelength of i, is M x 1.To derive an interpolation, i, must be found. Following

Vaseghi [22], this is done by minimizing the squared erroreTe with respect to i, as follows:eTe = (&it +A,i,]T[Akik +A&]

= irArAkik + irArAuiu+ iTA;Akik+ i;ATA,i,deTediu- - - 2A:Akik + 2ATA,i,+ , = -[AzA,]-'ATAkik. (12)

Therefore, the solution for the interpolated pixels is givenby (12). Of course, this solution implies knowledge of the

model coefficients. These must be estimated from the corruptdata, and this estimation process is discussed next.An important point to recognize is that ( I O ) can be made upof error observations from blocks in frames before and afterthe current one. In fact, it is sensible to incorporate as manyobservations as possible that incorporate data in the missingregions to maximize the information used. This implies thatfor a causal model with support in the previous frame only,observations in the K x K block in the next frame also incor-porate the missing pixel information i, in the current frame.Therefore, the resulting interpolator incorporates informationfrom one frame both previous to and following the currentframe. For a noncausal model incorporating one frame ofsupport in the previous and next frames, a total of five framescan be incorporated into the equations (two frames previousand following the considered one). In practice, however, thisextra information does not yield significant improvements overusing just observations from the current frame.A useful practical point to be noted is that for a given setof missing pixels, there is a maximum area of observationsaround the missing region beyond which no improvement ininterpolation quality is gained. This region depends on themodel support. The reason is that the observation equations(i.e., the equations for the model errors) are only usefulfor interpolation if they contain at least one missing pixel.Therefore, if there was a region of missing pixels that was ofsize 1 x 1 then given a 9:O causal 3-D AR model, the spatialextent of the observation equations need only be (1+2) x (1+2).This result follows from examination of the interpolation (12).

Estimating the Model Coefficients: To solve (12), themodel coefficients are required. In a real case, however,these values are unavailable. They must be estimated from thedegraded image sequence data. However, as has been statedbefore, the block sizes that must be used are small. This isforced because of the highly nonstationary nature of imagesequences, both in terms of space and time (due to errors inmotion estimation). This means that the distortion can biasthe model coefficients adversely. Because a detector is usedto isolate a suspected distorted area, this information can beused to suppress the bias that the distorted area would cause.The normal equations are altered to solve for the ARparameters using weighted coefficient estimation. The modelcoefficients are normally chosen to minimize the expectedvalue of the squared prediction error at all points in the blockconsidered. Because some of this data is now known to bemissing, the prediction error at these points may be weightedto zero so that this data does not affect the estimation process.The general approach is to weight the prediction error by somefunction w ( q , prior to minimizing the squared weighted error.For the purposes of coefficient estimation, the new predic-tion equation may be written as

Pf w ( 3 = w(q akI(?+ {k) (13)

k=Owhere all the symbols have their usual meaning a. = 1.0,and E,(F') is the weighted error at position ?. It is assumedthat the data volume being used has already been compensated

uthorized licensed use limited to: BIBLIOTECA D'AREA SCIENTIFICO TECNOLOGICA ROMA 3. Downloaded on October 8, 2009 at 04:32 from IEEE Xplore. Restrictions apply.


5/11

KOKARAM et al.: INTERPOLATION OF MISSING DATA IN IMAGE SEQUENCES 1513

1 nonstationary AR models for these situations, but these arenot considered in this work.111. COMPUTATIONALOAD

Since the interpolation processes described here are inde-pendent of the choice of motion estimator and blotch detector,the load of those processes is not considered here. See [ l ] fordiscussion of the computational load of various detectors.All arithmetic operations, e.g., +- ABS < were countedas costing one operation. The exponential function evaluationwas taken as costing 20 operations, and inversion of an N x Nmatrix was assumed to be an N 3 process. Estimates for thenumber of operations per blotched pixel for the detectors areas follows:

MMF = 1603-DAR = 20000 (assuming a block size of 8 x 8 pixels,MRF = 22 operations per iteration.a 9:0 model, and a 10% rate of corruption)

With regard to the MRF interpolator, about 1000 iterationswere needed in the following experiments. The 3-DAR oper-ig. 2. Frame 23 of WESTERN, size 256 x 25 6

for motion; therefore, the motion parameter has been omittedfrom the 3-D AR model. Further, a 3-D trend (of the formai +p j +yk) is subtracted from the data prior to modeling toimprove the prediction [ 2 ] , 25]. The least squares estimationof the trend coefficients is also weighted in an identical mannerto that shown here and performed as a separate step.Minimizing the squared error [~,(q]' ith respect to thecoefficients, then yields the following set of P + 1 equations.

P a / ~ E [ ( w ( q ) ~ I ( . ' + k ) I ( F f &)] = 0k=O

f o r m = O . . .P . (14)where a0 = 1.0. Therefore, these equations may be written inmatrix form asC,a = -c W (15)

where C , is a P x P matrix of correlation coefficients, and c,is a P x 1 vector of correlation coefficients. Equation (15) isthe weighted solution for the P model coefficients. The mostobvious choice for the weighting function is a binary field setto 0 for all the blotch positions and 1 otherwise. This is foundto be extremely effective in practice. Note that methods foroptimal weighting are available; one of these is given in [26].D. A Practical Consideration

It is necessary to choose a region of data around the detectedmissing region from which to estimate the AR coefficients thatare then used to interpolate the missing data. For the purposesof this paper, this region was chosen to be a square areacentered on the missing region such that the missing regionoccupied less than 10% of the data block. Of course, whenthe missing region is large enough to cover many statisticallydiffering areas, the resulting coefficients do not well describethe underlying model for the particular missing region. In suchcases, the interpolation is blurred. It would be better to use

- .ation estimate is not independent of the rate or spatial layoutof the corruption since the process involves the inversion ofmatrices (Au), he sizes of which are a function of the numberof spatially connected missing pixels in a considered block ofdata.

Iv. RESULTS ND DISCUSSIONThere are two factors to be considered in discussing theperformance of these interpolators. First of all, given somemissing patch and errors in motion estimation due to these

patches; how accurate is the reconstruction? Second, in a realsituation, errors in motion estimation will yield subsequenterrors in detection of missing patches; how robust is the inter-polator to these errors? Of course, the ultimate performance ofthe interpolators would be observed when the missing patcheshave been correctly detected and the motion estimation processhas not been adversely affected. However, this does not givea realistic assessment of performance and results for this caseare not illustrated here in the interest of brevity.The sequence WESTERN1 (60 frames of 256 x 256) isused to demonstrate the performance of the interpolators onartificially corrupted data. The probability of distortion was0.007, and the blotches were generated as outlined in thecompanion paper [11. Motion estimation was performed usingthe corrupted frames with a multiresolution BM algorithm em-ploying three resolution levels, 256 x 256: 128 x 128,64x 64.The details of the parameters used for the motion-estimationprocess are not important; it is sufficient to note that allinterpolators used the same motion vectors. Integer accuratemotion estimates were used. Fig. 2 shows a full-sized pictureof frame 23 of the WESTERN sequence to give a feel for theimage composition.A. Known Distortion

Fig. 3 compares the performance of various interpolatorson separate frames of WESTERN based on the mean squarederror (MSE) between the interpolated missing regions and the

'


6/11

1514 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 4, NO. 11, NOVEMBER 1995

3

2.5

2$w2 1.5v)Caf 1

0.5

0 IO 20 30 40 50 60Frame

Fig. 3. MSE of various interpolators of known distortion

I I I I I10 20 30 40 50

Frame600

Fig. 4. MSE of various interpolators operating on distortion detected using the SDIa.

original clean frames. The missing regions have been assumedto be correctly identified in this case. The graph shows thatthe 9:8 AR interpolator performs best .overall, with the medianoperation being the worst and the MRF interpolator (usingfirst-order cliques in a 1:4:1 neighborhood with four pixelcurrent frame support in a + configuration with X = 1)strikingsome compromise between these extremes.To illustrate this behavior, Fig. 5 shows a zoomed portion3of three frames from the corrupted WESTERN sequence. Theoriginal (zoomed) frame 23 is shown as the bottom right handimage in Fig. 5 . The missing regions (blotches) of interest havebeen boxed in white in the top right hand image (frame 23).Fig. 6shows the results of interpolating the missing regionsusing a 9:8 AR model, the MRF interpolator and the ML3Dexmedian filter. The three boxed regions show a good overview

3Size (128 x 128)

of the compromises within each interpolator. The blotch overthe C is very well removed by the median filter. The ARinterpolator does not perform as well here (although textureis reconstructed) because it is unable to reject the corruptedinformation in that same position in the previous frame (seeFig. 5 ) . The M RF interpolator does not do as good a job ofreconstructing texture as the AR process since the interpolatedregion above the C is not textured at all. The fact that themedian filter reconstructs the texture in this region well is moredue to the fact that it rearranges existing surrounding samplesand conserves the randomness of the background texture.Visual results from the 9:O model are not shown since itis clear that its performance is affected by the lack of spatialsupport in the current frame. In this respect, it is prone tothe same problems affecting ML3Dex in that the quality ofinterpolations depends heavily on the integrity of the motionestimates.


7/11

K O K A R A M et al.: INTERPOLATION OF MISSING DATA IN IMA GE SEQUENCES IS15

Fig. 5.of WESTERN. Zoom on original frame 23 (bottom right).Zoom on degraded frames 22, 23, (Top left, right) 24 (B ottom left) Fig. 7. Degraded frames 44, 45 (top left, right), 46 (bottom left)WESTERN. Bottom right: Detection on frame 45 using SDIa indicatedbright white pixels.ofas

Fig. 6.M13-Dex (bottom left). Original frame 23 (bottom right).Zoom on restored frame 23 using MR F (top, left), 9:8 AR (top right),

The median filter fails however, when the motion estimateis not sufficiently accurate or the structures to be interpolatedare more complicated. The upper right-hand highlighted blotchin the relevant image in Fig. 5 is a good example of this.The diagonal structure is well reconstructed by both the MRFand AR processes (the MRF interpolation being somewhatblurred as expected) but not the median process. This is thebasic problem with the use of the median operation in thisway. Whereas the other interpolators attempt to create somesmooth transition of data across the blotch, the median filterused in this manner rearranges the data with no regard tosmooth transitions at the edges of the missing regions. Theadaptive nature of the AR interpolator used also explains

Fig. 8.original frame 45 (bottom left, right).Restored frame 45 using M RF, AR 9: 8 (top left, right) , M13-Dex and

why it performs better overall when compared with the MRFinterpolator, especially with regard to texture. The lack of aline process in the MRF interpolator also reduces the sharpnessof the reconstructed features.B. Unknown Distortion

Fig. 4 compares the performance of the interpolators whenthe blotch locations are unknown and must be detected using,for instance, one of the detectors discussed in [ l ] . The SDIadetector is chosen for use here because it is the cheapestcomputationally. This is also the simplest detector for temporaldiscontinuities [ 2 ] .


8/11

1516 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 4, NO, 1 , NOVEMBER 1995

Fig. 4 therefore represents the performance of the in-terpolators in a more realistic case. The interpolators usedhere were left unchanged from the previous experiment. TheSDIa threshold et was set to 21.0. It shows that the relativeperformance remains the same, but the absolute performanceis worse than that shown previously for the obvious reason thatthere are more false alarms. A classic instance of a fast motioninduced false alarm is shown in Figs. 7 and 8. These figuresare again zoomed portions of WESTERN of size 128 x 128.The corresponding detected missing regions are shown inthe bottom right-hand side of Fig. 7. It is clear why such alarge region is flagged as missing at the center of the picture.That region cannot be easily matched in the previous or thenext frame. Unfortunately, only a thin ring of pixels aroundthe missing region is undetected; therefore, the M RFinterpolator, lacking any adaptivity in this implementation,(Fig. 8) cannot reconstruct the missing region well. The ARprocess performs more respectably; however, the interpolateddata is still quite blurred. The median filter also fails inthe same regions as the MRF interpolator in a more severemanner as it introduces sharp edges at the boundaries of theinterpolated data.

In general, the MRF process is perhaps the most robust esti-mator and provides a good tradeoff between computation andfidelity. The AR process is much heavier computationally, butbecause it is easily made adaptive, it can perform better thanthe MRF interpolator presented here. The median filter doesnot compare well with either of the model-based interpolators;however, it requires by far the least computation, and providedthere is not much drastic motion in an image sequence, it willperform acceptably.

V . REALMOVIESTwo outstanding considerations remain with respect to realdegradation in typical motion picture film. First of all, unlikethe artificial case, blotches do not have sharp edges; therefore,it is typical for a simple detector like the SDIa to be unable todetect the periphery of a blotch. As a result, the interpolationprocess usually cannot remove the entire defect and in theAR case often replaces the missing data with data that hasthe intensity of the undetected blotch periphery. One solutionto this problem is to examine the image data in the regionof the suspected blotch and extend the detected blotch areasif necessary. Another less intelligent, but effective, approachis to dilate [9], [27], [28] the suspected blotch locationsusing a simple morphological operator and effectively havea pessimistic estimate of the extent of the blotch. This latterpost processing stage is employed here.Although the experiments thus far have been conductedwith integer accurate motion estimates, it is typical of movingobjects to show some degree of fractional (i.e., subpixel)motion from frame to frame. This can have a great effecton the motion-compensated residual and cause false alarmsto be flagged by the detector in a region of such motion. Itis better overall to estimate motion to some fractional pixelaccuracy, such as 0.5 or 0.25 pixels [29]. The three frames(of resolution 25 6 x 256) shown as Figs. 9-11 show just

Fig. 9. Frame 1 of FRANK.

Fig. 10. Frame 2 of FRANK with large blotches boxed.

this sort of motion in the right forearm of FRANK. Thecorresponding detection fields (dilated using a 3 x 3 squareas a structuring element) are superimposed on the secondframe in Fig. 12. Bright white pixels mark the instances thatthe SDIa flagged as corruptions with both fractional ( f 0 . 5pixel) and integer accurate motion estimates. Green and redmark additional flagged pixels using fractional and integeraccurate motion estimates, respectively. It is observed that thearea flagged as blotch from integer-accurate motion estimatesis much larger and consists of more false alarms than thatof the fractional motion estimates; note the forearm. Bothmotion estimators used a three-level multiresolution processas mentioned earlier, with the fractionally accurate motionestimator estimating motion to f 0 . 5 pixels.


9/11

K O K A R A M et al.: INTERPOLATION OF MISSING DATA IN IMAGE SEQUENCES 1517

Fig. 1 1 . Frame 3 of FRANK. Fig. 13. Restored frame 2 using 9 : 8 AR

Fig:.12. Detection on frame 2 of FRA NK. White: both fractional and integermotion estimation. Green: additional flagged by fractional estimation. Red:additional flagged by integer estimation.

Figs. 13-15 show interpolations of the missing data using a9:s AR model, MRF, and M13-Dex system, respectively. TheMRF system used cliques in a 5:8:5neighborhood with X = 2 .The five pixels used in the previous frame were arranged ina + configuration. The interpolated locations were flaggedby the SDIa using fractional motion estimates. Note againhow well all the systems perform where there is little texturaldetail. However, the blotch in the head of the figure is bestinterpolated by the AR system, with the MRF being somewhatblurred and the median filter giving a generally flat intensity.

Again, the classic motion-estimation problem arises in thepetals of the flower in the picture. It is very difficult forany motion estimation algorithm to track the almost random

Fig. 14. Restored frame 2 using MRF

flutterings of one of the petals; therefore, it is partially flaggedas a blotch. The performance of the interpolators in this regionis worse than in other areas.Subjective Assement: A series of differently, artificially,

and real degraded sequences have been processed. Informalsubjective assessment of the restored sequences displayed at25 framesls (UK PAL television standard) was performed.It is found, in general, that it is difficult to determine anymajor difference in quality between the restorations at thisframe rate. A closer examination allows the observer to rankthe restorations in the order 3-DAR, MRF, and MMF. TheAR process is more robust to motion-estimation errors andgenerally gives the smoothest interpolation. The MMF often


10/11

1518

Fig. 15. Restored frame 2 using M13-Dex.

causes breakup of the periphery of fast moving regions. TheMRF reduces the breakup of fast moving regions but cannotreproduce texture as well as the 3-DAR process.A. Robust Motion Estimation

The motion estimation process is adversely affected by thepresence of large blotches. For block matching using a meanabsolute error criterion, the extent to which it is affecteddepends on the block size. To some degree, using multires-olution motion estimation brings an inherent robustness to theestimation process. At the low-resolution levels, the size ofthe corrupting blotches is quite small and affects the motion-estimation process less. At the higher resolution levels, motionestimation would be more affected, but provided estimation atthe previous levels has resulted in an estimate close to theactual motion, the error is small.Unfortunately, it is quite feasible that the size of a blotchextends across a large enough area to cause a problem even atthe low resolution levels. These motion estimation errors donot typically cause problems in detection, but they do resultin inappropriate interpolations. It would be best to addressthis problem specifically in the design of a motion estimator.However, this is not likely to be a simple task, and it isbetter to consider correcting motion vectors at the sites ofsuspected distortion. Although this is not addressed in thispaper, operators such as the vector median or an AR-basedvector interpolator appear to provide useful solutions, andthese are currently being pursued.

VI. CONCLUSIONSThe three interpolators presented have been shown to beuseful to lesser or greater extents, depending on the sceneryand motion present. They can all produce good interpolationsin areas of little texture, but it is the AR interpolator that

is best at handling texture because of its adaptivity and the

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 4, NO . 11 , NOVEMBER 1995

underlying random nature of its driving function. The choiceof a particular interpolator for a certain application is governedby the tradeoff between fidelity and computational burden.The AR interpolator is computationally much more intensivethan the median process, but it gives the best visual result.The median process requires very few operations, and infact, despite the rough nature of the method, it is foundto be effective in practice. In the short term, one wouldexpect detector-controlled median systems to be more popularfor real time deblotch equipment for video, whereas theAR and MRF systems could be used in particular instancesfor high levels of degradation or where a very high qualityof reconstruction is required (in slow motion sequences forinstance).REFERENCES

A. C. Kokaram, R. D. Moms, W. J. Fitzgerald, and P. J. W. Rayner,Detection of missing data in image sequences, IEEE Trans. ImageProcessing, th is i ssue, pp. 14961508.A. C. Kokaram, Motion picture restoration, Ph.D. Thesis, CambridgeUniv., Cambridge, UK, May 1993.M. Bierling, Displacement estimation by heirarchical block m atching,J. Boyce, Noise reduction of image sequences using adaptive motioncompensated frame averaging, in Proc. IEEE ICASSP, vol. 3, 1992,pp . 461264.G. R. Arce and E. Malaret, Motion preserving ranked-order filters forimage sequence processing, IEEE Int. Con$ Circuits Syst., 1989, pp.G. R. Arce, Multistage order statistic filters for image sequence pro-cessing,IEEE Trans. Signal Processin g, vol. 39, no. 5, pp. 1146-1 161,May 1991.B. Alp, P. Haavisto, T. Jarske, K. Oista mo and Y. Neuvo, Median-based algorithms for image sequence processing,SPIE visual Commun.Image Processing, pp. 122-133, 1990.A. C. Kokaram and P. J. W. Rayner, A system for the removal ofimpulsive noise in image sequences, SPIE visual Commun. ImageProcessing, pp. 322-331, Nov. 1992.A. K. Jain, Fundamentals of Digital Image Processing. EnglewoodCliffs, NJ: Prentice Hall, 1989.M . Sezan,M. Ozkan, and S. Fogel, Temporally adaptive filtering ofnoisy image sequences using a robust m otion estimation algorithm, inProc. IEEE ICASSP, vol. 3, May 1991, pp. 2429-2431.M. Ozkan, M. I. Sezan, and A. M. Tekalp, Adaptive motion-compensated filtering of noisy image sequences, IEEE Trans. CircuitsSyst. video Technol., pp. 277-290, A ug. 1993.T. Huang, Image Sequence Analysis. Ne w York Springer-Verlag.1981.D. M. Martinez, Model-based motion estimation and its application torestoration and interpolation of motion pictures, Ph.D. Thesis, Mass.Inst. of Technol., Cambridge, 1986.A. C. Kokaram and P. J. W. Rayner, Removal of impulsive noisein image sequences, in Proc. Singapore In?. Co nt Image Processing,Sept. 1992, pp. 629-633.S. Geman and D. Geman, Stochastic relaxation, Gibbs distributions andthe Bayesian restoration of images, IEEE Trans. Part. Anal. MachineIntell., vol. PAMI-6, no. 6, pp. 721-741, Nov. 19 84.M. A. Tanner, Tools or Statistical Inference, Springer Series in Statistics.New York: Springer-Verlag. 1993, 2nd ed.H. P. Hiriyannaiah, G. L. Bilbro, and W. E. Snyder, Restoration ofpiecewise-constant images by mean-field annealing, J. Opt. Soc. Amer.,pp. 1901-1912, 1989.I. M. Abdelquader, S. A. Rajala, W. E. Snyde r, and G. L. Bilbro, En-ergy minimization approach to motion estimation, Signal Processing,vol. 28, pp. 291-309, 1992.D. Chandler, Introduction to Modem Statistical Mechanics. Oxford,U K Oxford University Press, 1987.J. Zhang, The application of the Gibbs-Bogoliubov-Feynman inequal-ity in mean field calculations for Markov random fields, in SPIE visualCommun. Image Processing, vol. 2308, pp. 982-993, 1994.

SPIE VCIP, pp. 942-951, 1988.

983-986.

'


11/11

KOKARAM et al.: INTERPOLATlONOF MISSING DATA IN IMAGE SEQUENCES 1519

R. R. Schultz and R. L. Stevenson, A bayesian approach to imageexpansion for improved definition, IEEE Trans. Image Processing, vol.3, no . 4, p. 233-242, May 1994.S. V. Vaseghi, Algorithms for the restoration of archived gramophonerecordings, Ph.D. Thesis, Cambridge Univ., Cambridge, UK, 1988.P. Strobach, Quadtree-structured linear prediction models for imagesequence processing, IEEE Patt. Anal. Machine Intell., vol. 11 , pp.742-747, July 1989.S. Efstratiadis and A. Katsagellos, A model based, pel-recursive motionestimation algorithm, in Proc. IEEE ICASSP, 1990, pp. 1973-1976.R. Veldhuis, Restoration of Lost Samples in Digital Signals. Engle-wood Cliffs, NJ: Prentice Hall, 1980.E. DiClaudio, G. Orlandi, F. Piazza, and A. Uncini, Optimal weightedLS AR estimation in presence of impulsive noise, in Proc. IEEEICASSP, vol. E3.8, 1991, pp. 3149-3152.J. S. Lim, Two-Dimensional Signal an d Image Processing. EnglewoodCliffs, NJ: Prentice-Hall, 1990.R. Schalkoff, Digital Image Processing and Computer Vision. Ne wYork: Wiley, 1989.B . Girod, Motio nxom pensatin g prediction with fractional-pel accu-racy, IEEE Trans. Commun., vol. 41, pp. 604-612, 1993.

Ani1 C. Kokaram (S91-M92), fo r a photograph and biography, see thisissue, p. 1508.

Robin D. Morris, for a photograph and biography, see this issue, p. 1508.

William J. Fitzgerald, for a photograph and biography, see this issue, p. 1508.

Peter J. W. Rayner, for a photograph and biography, see this issue, p. 1508.

09.05.interpolation of missing data in image sequences

Documents