video enhancement and restoration - ufpe

File: {Books}Elsevier/Bovik/Pageproofs/3d/Bovik-03-11.3dCreator: iruchan/cipl-un1-3b2-1.unit1.cepha.net Date/Time: 27.1.2005/5:38pm Page: 1/22

3.11Video Enhancement and

Restoration

Reginald L. Lagendijk,Peter M.B. van Roosmalen,Jan Biemond, Andrei Rares� ,and Marcel J.T. ReindersDelft University of Technology,

The Netherlands

1 Introduction.................................................................................................................. 000

2 Spatio-Temporal Noise Filtering ................................................................................ 0002.1 A Linear Filters � 2.2 Order-Statistic Filters � 2.3 Multi-resolution Filters

3 Blotch Detection and Removal................................................................................... 0003.1 Blotch Detection � 3.2 Motion Vector Repair and Interpolating CorruptedIntensities � 3.3 Restoration in Conditions of Difficult Object Motion

4 Vinegar Syndrome Removal ....................................................................................... 000

5 Intensity Flicker Correction ........................................................................................ 0005.1 Flicker Parameter Estimation � 5.2 Estimation on Sequences with Motion

6 Kinescope Moire Removal .......................................................................................... 000

7 Concluding Remarks.................................................................................................... 000

8 References...................................................................................................................... 000

1 Introduction

Even with the advancing camera and digital recordingtechnology, there are many situations in which recordedimage sequences — or video for short — may suffer fromsevere degradations. The poor quality of recorded imagesequences may be due to, for instance, the imperfect oruncontrollable recording conditions such as one encountersin astronomy, forensic sciences, and medical imaging. Videoenhancement and restoration has always been important inthese application areas not only to improve the visual quality,but also to increase the performance of subsequent tasks suchas analysis and interpretation.

Another important application of video enhancement andrestoration is that of preserving motion pictures and videotapes recorded over the last century. These unique records ofhistoric, artistic, and cultural developments are deterioratingrapidly due to aging effects of the physical reels of film andmagnetic tapes that carry the information. The preservation ofthese fragile archives is of interest not only to professionalarchivists, but also to broadcasters as a cheap alternative tofill the many television channels that have come available withdigital broadcasting. Re-using old film and video material is,

however, only feasible if the visual quality meets the standardsof today. First, the archived film and video is transferredfrom the original film reels or magnetic tape to digital media.Then, all kinds of degradations are removed from the digitizedimage sequences, in this way increasing the visual quality andcommercial value. Because the objective of restoration is toremove irrelevant information such as noise and blotches, itrestores the original spatial and temporal correlation structureof digital image sequences. Consequently, restoration may alsoimprove the efficiency of the subsequent MPEG compressionof image sequences.

An important difference between the enhancement andrestoration of 2-D images and of video is the amount of datato be processed. Whereas for the quality improvement ofimportant images elaborate processing is still feasible, this isno longer true for the absolutely huge amounts of pictorialinformation encountered in medical sequences and film/videoarchives. Consequently, enhancement and restoration meth-ods for image sequences should have a manageable complex-ity, and should be semi-automatic. The term semi-automaticindicates that in the end professional operators control thevisual quality of the restored image sequences by selectingvalues for some of the critical restoration parameters.

Copyright � 2005 by Academic Press.All rights of reproduction in any form reserved. 1


The most common artifact encountered in the above-mentioned applications is noise. Over the last two decades anenormous amount of research has focused on the problem ofenhancing and restoring 2-D images. Clearly, the resultingspatial methods are also applicable to image sequences, butsuch an approach implicitly assumes that the individualpictures of the image sequence, or frames, are temporallyindependent. By ignoring the temporal correlation that exists,suboptimal results may be obtained and the spatial intra-framefilters tend to introduce temporal artifacts in the restoredimage sequences. In this chapter we focus our attentionspecifically on exploiting temporal dependencies, yieldinginter-frame methods. In this respect the material offered inthis chapter is complementary to that on image enhance-ment in the Chapters 3.1 to 3.4 of this Handbook. The result-ing enhancement and restoration techniques operate in thetemporal dimension by definition, but often have a spatialfiltering component as well. For this reason, video enhance-ment and restoration techniques are sometimes referred to asspatio-temporal filters or 3-D filters. Section 2 of this chapterpresents three important classes of noise filters for videoframes, namely linear temporal filters, order-statistic filters, andmulti-resolution filters.

In forensic sciences and in film and video archives alarge variety of artifacts are encountered. Besides noise,we discuss the removal of other important impairmentsthat rely on spatial or temporal processing algorithms,namely blotches (Section 3), vinegar syndrome (Section 4),intensity flicker (Section 5) and kinescope moire (Section 6).Blotches are dark and bright spots that are often visiblein damaged film image sequences. The removal of blotchesis essentially a temporal detection and interpolation problem.In certain circumstances, their removal needs to be doneby means of spatial algorithms, as will be shown later inthis chapter. Vinegar syndrome represents a special typeof impairment related to film and it may have variousappearances (e.g., partial loss of colour, blur, etc.). In somecases, blotch removal algorithms can be applied for itsremoval. In general, however, the particular propertiesof their appearance need to taken into account. Intensityflicker refers to variations in intensity in time, caused byaging of film, by copying and format conversion (for instancefrom film to video), and — in case of earlier film — byvariations in shutter time. Whereas blotches are spatially

highly localized artifacts in video frames, intensity flicker isusually a spatially global, but not stationary, artifact. Thekinescope moire phenomenon appears during film-to-videotransfer using Telecine devices. It is caused by the super-position of (semi-)periodical signals from the film contentsand the scan pattern used by the Telecine device. Because of its(semi-)periodical nature, spectrum analysis is generally usedfor its removal.

It becomes apparent that image sequences may be degradedby multiple artifacts. For practical reasons, restoration systemsfollow a sequential procedure, where artifacts are removed oneby one. As an example, Fig. 1 illustrates the order in whichthe removal of flicker, moire, noise, blotches, and vinegarsyndrome takes place. The reasons for this modular approachare the necessity to judge the success of the individual steps(for instance by an operator), and the algorithmic andimplementation complexity.

As already suggested in Fig. 1, most temporal filter-ing techniques require an estimate of the motion in theimage sequence. Motion estimation has been discussed indetail in Chapters 3.7 and 6.1 of this Handbook. Theestimation of motion from degraded image sequences is,however, problematic. We are faced with the problem thatthe impairments of the video disturb the motion estimator,but that at the same time correct motion estimates areassumed in developing enhancement and restoration algo-rithms. In this chapter we will not discuss the design of newmotion estimators [24, 36, 40] that are robust to the variousartifacts, but we will assume that existing motion estimatorscan be modified appropriately such that sufficiently correctand smooth motion fields are obtained. The reason for thisapproach is that even under ideal conditions motion estimatesare never perfect. Usually, incorrect or unreliable motionvectors are dealt with in a few special ways. In the firstplace, clearly incorrect or unreliable motion vectors can berepaired. Secondly, the enhancement and restoration algo-rithms should be robust against a limited amount of incorrector unreliable motion vectors. Thirdly, areas with wrongmotion vectors which are impossible to repair can be protec-ted against temporal restoration, in order to avoid an outcomewhich is visually more objectionable than the input sequenceitself. In such a case, the unavailability of temporal informa-tion makes the spatial-only restoration more suitable thanthe temporal one.

Flickercorrection

Corruptedsequence

g(n,k)

Moiréremoval

Noisereduction

Blotchremoval

Motionestimation

Vinegarsyndromeremoval

Motion vectors

Enhanced andrestored sequence

f(n,k)ˆ

FIGURE 1 Some processing steps in the removal of various video artifacts.

2 Handbook of Image and Video Processing


2 Spatio-Temporal Noise Filtering

Any recorded signal is affected by noise, no matter how precisethe recording equipment. The sources of noise that cancorrupt an image sequence are numerous (see Chapter 4.4 ofthis Handbook). Examples of the more prevalent ones includecamera noise, shot noise originating in electronic hardwareand the storage on magnetic tape, thermal noise, and granularnoise on film. Most recorded and digitized image sequencescontain a mixture of noise contributions, and often the (com-bined) effects of the noise are non-linear of nature. In practice,however, the aggregated effect of noise is modeled as anadditive white (sometimes Gaussian) process with zero meanand variance �2

w that is independent from the idealuncorrupted image sequence f (n, k). The recorded imagesequence g(n, k) corrupted by noise w(n, k) is then given by

gðn, kÞ ¼ f ðn, kÞ þ wðn, kÞ ð1Þ

where n¼ (n1, n2) refers to the spatial coordinates and k to theframe number in the image sequence. More accurate modelsare often much more complex but lead to little gain comparedto the added complexity.

The objective of noise reduction is to make an estimateff ðn, kÞ of the original image sequence given only the observednoisy image sequence g(n, k). Many different approachestoward noise reduction are known, including optimallinear filtering, non-linear filtering, scale-space processing,and Bayesian techniques. In this section we discuss succes-sively the class of linear image sequence filters, order-statisticfilters, and multi-resolution filters. In all cases the emphasis ison the temporal filtering aspects. More rigorous reviews ofnoise filtering for image sequences are given in [9, 11, 41].

2.1 A Linear Filters

Temporally Averaging Filters

The simplest temporal filter carries out a weighted averagingof successive frames. That is, the restored image sequence isobtained by [18, 24]:

ff ðn, kÞ ¼XK

l¼�K

hðl Þgðn, k� l Þ ð2Þ

Here h(l ) are the temporal filter coefficients used to weight2Kþ 1 consecutive frames. In case the frames are consideredequally important we have h(l )¼ 1/(2Kþ 1). Alternatively,the filter coefficients can be optimized in a minimum mean-squared error fashion

hðl Þ minhðl Þ

E ð f ðn, kÞ � ff ðn, kÞÞ2h i

ð3Þ

yielding the well-known temporal Wiener filtering solution:

hð�KÞ

..

.

hð0Þ

hð1Þ

..

.

hðKÞ

0

BBBBBBBBBBBB@

1

CCCCCCCCCCCCA

¼

Rggð0Þ � � � Rggð�KÞ � � � � � � Rggð�2KÞ

..

. . .. ..

.

RggðKÞ Rggð0Þ...

..

.Rggð0Þ

..

.

..

. . .. ..

.

Rggð2KÞ � � � � � � � � � � � � Rggð0Þ

0

BBBBBBBBBBBBBB@

1

CCCCCCCCCCCCCCA

�1

�

Rfgð�KÞ

..

.

Rfg ð0Þ

Rfg ð1Þ

..

.

RfgðKÞ

0

BBBBBBBBBBBB@

1

CCCCCCCCCCCCA

ð4Þ

where Rgg(m) is the temporal auto-correlation functiondefined as RggðmÞ ¼ E ½ gðn, kÞgðn, k�mÞ�, and Rfg(m) thetemporal cross-correlation function defined as RfgðmÞ ¼E ½ f ðn, kÞgðn, k�mÞ�. The temporal window length, i.e., theparameter K, determines the maximum degree by which thenoise power can be reduced. The larger the window the greaterthe reduction of the noise, at the same time, however, themore visually noticeable the artifacts resulting from motionbetween the video frames. A dominant artifact is blur ofmoving objects due to the averaging of object and backgroundinformation.

The motion artifacts can greatly be reduced by operating thefilter (2) along the picture elements (pixels) that lie on thesame motion trajectory [14, 25]. Equation (2) then becomesa motion-compensated temporal filter (see Fig. 2):

ff ðn, kÞ ¼XK

l¼�K

hðl Þgðn1 � dxðn1, n2; k, l Þ, n2

� dyðn1, n2; k, l Þ, k� l Þ ð5Þ

3.11 Video Enhancement and Restoration 3


Here dðn; k, l Þ ¼ ðdxðn1, n2; k, l Þ, dyðn1, n2; k, l ÞÞ is themotion vector for spatial coordinate (n1, n2) estimatedbetween the frames k and l. It is pointed out here that theproblems of noise reduction and motion estimation areinversely related as far as the temporal window length Kis concerned. That is, as the length of the filter is increasedtemporally, the noise reduction potential increases but soare the artifacts due to incorrectly estimated motion betweenframes that are temporally far apart.

In order to avoid the explicit estimation of motion, whichmight be problematic at high noise levels, two alternatives areavailable that turn (2) into a motion-adaptive filter. In the firstplace, in areas where motion is detected (but not explicitlyestimated) the averaging of frames should be kept to aminimum. Different ways exist to realize this. For instance,the temporal filter (2) can locally be switched off entirely, orcan locally be limited to using only future or past frames,depending on the temporal direction in which motionwas detected. Basically the filter coefficients h(l) are spatiallyadapted as a function of detected motion between frames.Secondly, the filter (2) can be operated along M a prioriselected motion directions at each spatial coordinate. Thefinally estimated value ff ðn, kÞ is subsequently chosen fromthe M partial results according to some selection criterion,for instance as the median [18, 24]:

ffi:jðn, kÞ ¼ 13ðgðn1 � i, n2 � j, k� 1Þ þ gðn1, n2, kÞ

þ gðn1 þ i, n2 þ j, kþ 1ÞÞ ð6aÞ

ff ðn, kÞ ¼ median�

ff�1,�1ðn, kÞ, ff1,�1ðn, kÞ, ff�1, 1ðn, kÞ,

ff1, 1ðn, kÞ, ff0, 0ðn, kÞ, gðn, kÞ�

ð6bÞ

Clearly cascading (6a) and (6b) turns the overall estimationprocedure into a non-linear one, but the partial estimationresults are still obtained by the linear filter operation (6a).

It is easy to see that the filter (2) can be extended witha spatial filtering part. There exist many variations to thisconcept, basically as many as there are spatial restorationtechniques for noise reduction. The most straightforwardextension of (2) is the following 3-D weighted averagingfilter [41]:

ff ðn, kÞ ¼X

ðm, l Þ2S

hðm, l Þgðn�m, k� l Þ ð7Þ

Here S is the spatio-temporal support or window of the 3-Dfilter (see Fig. 3). The filter coefficients h(m, l) can be chosento be all equal, but a performance improvement is obtainedif they are adapted to the image sequence being filtered, forinstance by optimizing them in the mean-squared error sense(3). In the latter case (7) becomes the theoretically optimal3-D Wiener filter.

There are, however, two disadvantages with the 3-D Wienerfilter. The first is the requirement that the 3-D autocorrelationfunction for the original image sequence is known a priori.The second is the 3-D wide-sense stationarity assumptions,which is virtually never true because of moving objects andscene changes. These requirements are detrimental to theperformance of the 3-D Wiener filter in practical situations ofinterest. For these reasons, simpler ways of choosing the 3-Dfilter coefficients are usually preferred, provided that theyallow for adapting the filter coefficients. One such choice foradaptive filter coefficients is the following [29]:

hðm, l; n, kÞ ¼c

1þmax �, ðgðn, kÞ � gðn�m, k� l ÞÞ2� � ð8Þ

Here h(m, l; n, k) weights the intensity at spatial locationn�m in frame k� l for the estimation of the intensity ff ðn, kÞ.The adaptive nature of the resulting filter can immediately beseen from (8). If the difference between the pixel intensityg(n, k) being filtered and the intensity g(n�m, k� l) forwhich the filter coefficient is calculated, is less than �, thispixel is included in the filtering with weight c/(1þ �),otherwise it is weighted with a much smaller factor. In thisway, pixel intensities that seem to deviate too much fromg(n, k) — for instance due to moving objects within thespatio-temporal window S — are excluded from (7). As withthe temporal filter (2) the spatio-temporal filter (7) can becarried out in a motion-compensated way by arranging thewindow S along the estimated motion trajectory.

Temporally Recursive Filters

A disadvantage of the temporal filter (2) and spatio-temporalfilter (7) is that they need to buffer several frames of an imagesequence. Alternatively, a recursive filter structure can be usedthat generally needs to buffer fewer (usually only one) frames.Furthermore, these filters are easier to adapt since there are

motion trajectory

g(n,k−2)

g(n,k−1)

g(n,k+2)(n,k+1)

g(n,k)

k−2 k−1 k+1 k+2k

FIGURE 2 Noise filter operating along the motion trajectory of the picture

element (n, k).



fewer parameters to control. The general form of a recursivetemporal filter is as follows:

ff ðn, kÞ ¼ ffbðn, kÞ þ �ðn, kÞ gðn, kÞ � ffbðn, kÞh i

ð9Þ

Here ffbðn, kÞ is the prediction of the original k-th frame onthe basis of previously filtered frames, and �(n, k) is the filtergain for updating this prediction with the observed k-th frame.Observe that for �(n, k)¼ 1 the filter is switched off, i.e.,ffbðn, kÞ ¼ gðn, kÞ. Clearly, a number of different algorithmscan be derived from (9) depending on the way the predictedframe ffbðn, kÞ is obtained and the gain �(n, k) is computed. Apopular choice for the prediction ffbðn, kÞ is the previouslyrestored frame, either in direct form

ffbðn, kÞ ¼ ff ðn, k� 1Þ ð10aÞ

or in motion-compensated form:

ffbðn, kÞ ¼ ff ðn� dðn; k, k� 1Þ, k� 1Þ ð10bÞ

More elaborate variations of (10) make use of a local esti-mate of the signal’s mean within a spatio-temporal neighbor-hood. Furthermore, Equation (9) can also be cast into aformal 3-D motion-compensated Kalman estimator structure[42, 20]. In this case the prediction ffbðn, kÞ depends directlyon the dynamic spatio-temporal state-space equations usedfor modeling the image sequence.

The simplest case for selecting �(n, k) is by using a globallyfixed value. As with the filter structures (2) and (7) it is gen-erally necessary to adapt �(n, k) to the presence or correctnessof the motion in order to avoid filtering artifacts. Typicalartifacts of recursive filters are ‘‘comet-tails’’ that movingobjects leave behind.

A switching filter is obtained if the gain takes on the values� and 1 depending on the difference between the predictionffbðn, kÞ and the actually observed signal value g(n, k):

�ðn, kÞ ¼1 if gðn, kÞ � ffbðn, kÞ

�� > "

� if gðn, kÞ � ffbðn, kÞ��

�� "

8><

>:ð11Þ

For areas that have a lot of motion (if the prediction (10a) isused) or for which the motion has been estimated incorrectly(if the prediction (10b) is used), the difference between thepredicted intensity value and the noisy intensity value is large,causing the filter to switch off. For areas that are stationaryor for which the motion has been estimated correctly, theprediction differences are small yielding the value � for thefilter coefficient.

A finer adaptation is obtained if the prediction gain isoptimized to minimize the mean-squared restoration error(3), yielding:

�ðn, kÞ ¼ max 1��2

w

�2g ðn, kÞ

, 0

!ð12Þ

Here �2g ðn, kÞ is an estimate of the image sequence variance

in a local spatio-temporal neighborhood of (n, k). If thisvariance is high, it indicates large motion or incorrectlyestimated motion, causing the noise filter to switch off,i.e., �(n, k)¼ 1. If �2

g ðn, kÞ is in the same order of magnitude asthe noise variance �2

w, the observed noisy image sequence isobviously very unreliable so that the predicted intensities areused without updating it, i.e., �(n, k)¼ 0. The resultingestimator is known as the local linear minimum mean-squarederror (LLMMSE) estimator. A drawback of (12), as with anynoise filter that requires the calculation of �2

g ðn, kÞ, is thatoutliers in the windows used to calculate this variance maycause the filter to switch-off. Order-statistic filters are moresuitable for handling data in which outliers are likely to occur.

2.2 Order-Statistic Filters

Order-statistic (OS) filters are non-linear variants of weighted-averaging filters. The distinction is that in OS filters theobserved noisy data — usually taken from a small spatio-temporal window — is ordered before being used. Because ofthe ordering operation, correlation information is ignored infavor of magnitude information. Examples of simple OS filtersare the minimum operator, maximum operator, and medianoperator. OS-filters are often applied in directional filtering.In directional filtering different filter directions are consideredcorresponding to different spatio-temporal edge orientations.Effectively this means that the filtering operation takes placealong the spatio-temporal edges, avoiding the blurring ofmoving objects. The directional filtering approach may besuperior to adaptive or switching filters since noise aroundspatio-temporal edges can effectively be eliminated by filter-ing along those edges, as opposed to turning off the filter inthe vicinity of edges [41].

The general structure of an OS restoration filter is asfollows:

ff ðn, kÞ ¼XjSj

r¼1

hðrÞðn, kÞgðrÞðn, kÞ ð13Þ

Here g(r)(n, k) are the ordered intensities, or ranks, of thecorrupted image sequence, taken from a spatio-temporalwindow S with finite extent centered around (n, k) (see Fig. 3).The number of intensities in this window is denoted by |S|.As with linear filters, the objective is to choose appropriatefilter coefficients h(r)(n, k) for the ranks.



The most simple order-statistic filter is a straightforwardtemporal median, for instance taken over three frames:

ff ðn, kÞ ¼ median gðn, k� 1Þ, gðn, kÞ, gðn, kþ 1Þ� �

ð14Þ

Filters of this type are very suitable for removing shot noise.In order to avoid artifacts at the edges of moving objects, Eq.(14) is normally applied in a motion-compensated way.A more elaborate OS-filter is the multi-stage median filter(MMF) [3, 4]. In the MMF the outputs of basic median filterswith different spatio-temporal support are combined. Anexample of the spatio-temporal supports is shown in Fig. 4.The outputs of these intermediate median filter results arethen combined as follows:

ff ðn, kÞ ¼ median�

gðn, kÞ, maxð ff1ðn, kÞ, . . . , ff9ðn, kÞÞ,

minð ff1ðn, kÞ, � � � , ff9ðn, kÞÞ�

ð15Þ

The advantage of this class of filters is that although itdoes not incorporate motion estimation explicitly, artifactson edges of moving objects are significantly reduced. Never-theless, the intermediate medians can also be computed in amotion compensated way by positioning the spatio-temporalwindows in Fig. 4 along motion trajectories.

The filter coefficients h(r)(n, k) in (13) can also be statisti-cally designed, as described in Chapter 4.4 of this Handbook.If the coefficients are optimized in the mean-squared errorsense, the following general solution for the restored imagesequence is obtained [21]:

ff ðn,kÞ

��2wðn,kÞ

!¼

hð1Þðn,kÞ hð2Þðn,kÞ � � �hðjSjÞðn,kÞ

nð1Þðn,kÞ nð2Þðn,kÞ � � �nðjSjÞðn,kÞ

! gð1Þðn,kÞ

..

.

gðjSjÞðn,kÞ

0

BB@

1

CCA

¼ AtC�1ðwÞA

� ��1

AtC�1ðwÞ

gð1Þðn,kÞ

..

.

gðjSjÞðn,kÞ

0

BB@

1

CCA ð16aÞ

This expression formulates the optimal filter coefficientsh(r)(n, k) in terms of a matrix product involving the |S|� |S|auto-covariance matrix of the ranks of the noise, denoted byC(w), and a matrix A defined as

A ¼

1 E½wð1Þðn, kÞ�

1 E½wð2Þðn, kÞ�

..

. ...

1 E½wðjSjÞðn, kÞ�

0

BBBBBB@

1

CCCCCCAð16bÞ

Here E[w(r)(n, k)] denotes the expectation of the ranks ofthe noise. The result in (16a) not only gives an estimate of thefiltered image sequence, but also for the local noise variance.This quantity is of use by itself in various noise filters toregulate the noise reduction strength. In order to calculateE[w(r)(n, k)] and C(w) the probability density function of thenoise has to be assumed known. In case the noise w(n, k)is uniformly distributed, (16a) becomes the average of theminimum and maximum observed intensity. For Gaussian

k−1 k−1 k+1kk+1k

(n,k) (n,k)

FIGURE 3 Examples of spatio-temporal windows to collect data for noise

filtering of the picture element (n, k).

f6(n,k)∧

f3(n,k)∧

f2(n,k)∧

k−1 k+1k

f1(n,k)∧

(n,k)

f5(n,k)∧

f4(n,k)∧

f7(n,k)∧

f8(n,k)∧

f9(n,k)∧

FIGURE 4 Spatio-temporal windows used in the multi-stage median filter.



distributed noise (16a) degenerates to (2) with equal weightingcoefficients.

An additional advantage of ordering the noisy observationprior to filtering is that outliers can easily be detected. Forinstance, with a statistical test — such as the rank order test[21] — the observed noisy values within the spatio-temporalwindow S that are significantly different from the intensityg(n, k) can be detected. These significantly different valuesoriginate usually from different objects in the image sequence,for instance due to motion. By letting the statistical test rejectthese values, the filters (13) and (16) use locally only datafrom the observed noisy image sequence that is close — inintensity — to g(n, k). This further reduces the sensitivity ofthe noise filter (13) to outliers due to motion or incorrectlycompensated motion.

The estimator (16) can also be used in a recursive structuresuch as the one in Equation (9). Essentially (16) is theninterpreted as an estimate for the local mean of the imagesequence, and the filtered value resulting from (16) is usedas the predicted value ffbðn, kÞ in (9). Furthermore, instead ofusing only noisy observations in the estimator, previouslyfiltered frames can be used by extending the spatio-temporalwindow S over the current noisy frame g(n, k) and thepreviously filtered frame ff ðn, k� 1Þ. The overall filterstructure is shown in Fig. 5.

2.3 Multi-Resolution Filters

The multi-resolution representation of 2-D images hasbecome quite popular for analysis and compression purposes[39]. This signal representation is also useful for imagesequence restoration. The fundamental idea is that if anappropriate decomposition into bands of different spatial andtemporal resolutions and orientations is carried out, theenergy of the structured signal will locally be concentrated inselected bands whereas the noise is spread out over all bands.The noise can therefore effectively be removed by mapping allsmall (noise) components in all bands to zero, while leaving

the remaining larger components relatively unaffected.Such an operation on signals is also known as coring [13, 35].Figure 6 shows two coring functions, namely soft- and hard-thresholding. Chapter 3.4 of this Handbook discusses 2-Dwavelet-based thresholding methods for image enhancement.

The discrete wavelet transform has been widely used fordecomposing one- and multidimensional signals into bands.A problem with this transform for image sequence restorationis, however, that the decomposition is not shift-invariant.Slightly shifting the input image sequence in spatial or tem-poral sense can cause significantly different decompositionresults. For this reason, in [38] a shift-invariant, but over-complete, decomposition was proposed known as theSimoncelli pyramid. Figure 7a shows the 2-D Simoncellipyramid decomposition scheme. The filters Li(!) and Hi(!)are linear phase low- and high-pass filters, respectively. Thefilters Fi(!) are fan-filters that decompose the signal into fourdirectional bands. The resulting spectral decomposition isshown in Fig. 7b. From this spectral tessellation, the differentresolutions and orientations of the spatial bands obtained byFig. 7a can be inferred. The radial bands have a bandwidthof 1 octave.

Calculation of predicted frame

1−α(n,k)

f (n,k−1)

g(n,k)

f (n,k−1) g(n,k) g(n,k+1)

α(n,k)

Memoryprevious frame

Estimationusing (16)

Outlierremoval

fb(n,k)∧

f (n,k)∧

∧

∧

FIGURE 5 Overall filtering structure combining (9), (16), and an outlier removing rank order test.

(a) (b)

x x

x∧ x∧

FIGURE 6 Coring functions: (a) Soft-thresholding, (b) Hard-thresholding.

Here x is an signal amplitude taken from one of the spatio-temporal bands

(which carry different resolution and orientation information), and xx is the

resulting signal amplitude after coring.



The Simoncelli pyramid gives a spatial decomposition ofeach frame into bands of different resolution and orientation.The extension to temporal dimension is obtained by tem-porally decomposing each of the spatial resolution and orien-tation bands using a regular wavelet transform. The low-pass

and high-pass filters are operated along the motion trajectoryin order to avoid blurring of moving objects. The resultingmotion-compensated spatio-temporal wavelet coefficients arefiltered by one of the coring functions, followed by thereconstruction of the video frame by an inverse wave-let transformation and Simoncelli pyramid reconstruction.Figure 8 shows the overall scheme.

Though multi-resolution approaches have been shown tooutperform the filtering techniques described in Section 2.1and 2.2 for some types of noise, they generally requiremuch more processing power due to the spatial and temporaldecomposition, and — depending on the temporal waveletdecomposition — they require a significant number of framestores.

3 Blotch Detection and Removal

Blotches are artifacts that are typically related to film. Dirtparticles covering film introduce bright or dark spots on theframes, and the mishandling or aging of film causes lossof gelatin covering the film. Figure 11a shows a film framecontaining dark and bright spots: the blotches. A model forthis artifact is the following [24, 33]:

gðn, kÞ ¼ ð1� bðn, kÞÞf ðn, kÞ þ bðn, kÞcðn, kÞ þ wðn, kÞ ð17Þ

Here b(n, k) is a binary mask that indicates for each spatiallocation in each frame whether or not it is part of a blotch.The (more or less constant) intensity values at the corruptedspatial locations are given by c(n, k). Though noise is notconsidered to be the dominant degrading factor in the section,it is still included in (17) as the term w(n, k). The removal ofblotches is a two-step procedure. In the first step, the blotchesneed to be detected, i.e., an estimate for the mask b(n, k) ismade [22]. In the second step, the incorrect intensities c(n, k)at the corrupted locations are spatio-temporally interpolated[23]. In case a motion-compensated interpolation is carried

(a)

(b)

FIGURE 7 (a) Simoncelli pyramid decomposition scheme, (b) Resulting

spectral decomposition, illustrating the spectral contents carried by the

different resolution and directional bands.

Spatial reconstructionby inverting Simoncelli

pyramid

Spatial decompositionusing the Simoncelli

pyramid

Temporal waveletreconstruction

Coefficient coring

Temporal waveletdecomposition of

each of the spatial frequencyand oriented bands

g(n,k)

∧f (n,k)

FIGURE 8 Overall spatio-temporal multi-resolution filtering using coring.



out, the second step also involves the local repair of motionvectors estimated from the blotched frames. The overallblotch detection and removal scheme is shown in Fig. 9.

3.1 Blotch Detection

Blotches have three characteristic properties that are exploitedby blotch detection algorithms. In the first place, blotches aretemporally independent and therefore hardly ever occurat the same spatial location in successive frames. Secondly,the intensity of a blotch is significantly different from itsneighboring uncorrupted intensities. Finally, blotches formcoherent regions in a frame, as opposed to, for instance,spatio-temporal shot noise.

There are various blotch detectors that exploit thesecharacteristics. The first is a pixel-based blotch detectorknown as the spike-detector index (SDI). This method detectstemporal discontinuities by comparing pixel intensities in thecurrent frame with motion-compensated reference intensitiesin the previous and following frame:

SDIðn, kÞ ¼ min�

gðn, kÞ � gðn� dðn; k, k� 1Þ, k� 1Þ� �2

,

gðn, kÞ � gðn� dðn; k, kþ 1Þ, kþ 1Þ� �2

�ð18Þ

Since blotch detectors are pixel-oriented, the motion fieldd(n; k; l ) should have a motion vector per pixel, i.e., themotion field is dense. Observe that any motion-compensationprocedure must be robust against the presence of intensityspikes: this will be discussed later in this Section. A blotchpixel is detected if SDI(n, k) exceeds a threshold:

bðn, kÞ ¼1 if SDIðn, kÞ > T

0 otherwise

(ð19Þ

Since blotch detectors are essentially searching for outliers,order-statistic based detectors usually perform better. Therank-order difference (ROD) detector is one such method.It takes |S| reference pixel intensities: r¼ {ri | i¼ 1, 2, . . . , |S|}from a motion-compensated spatio-temporal window S (seefor instance the greyed pixels in Fig. 10), and ranks them by

intensity value: r1� r2� . . .� r|S|. It then finds the deviationbetween the pixel intensity g(n, k) and the reference pixelsranked by intensity value as follows:

RODiðn, kÞ ¼ri � gðn, kÞ if gðn, kÞ � medianðrÞ

gðn, kÞ � rjSj�i if gðn, kÞ > medianðrÞ

(

for i ¼ 1, 2, � � � ,jSj

2ð20Þ

A blotch pixel is detected if any of the rank order differencesexceeds a specific threshold Ti:

bðn, kÞ ¼1 if 9 i such that RODiðn, kÞ > Ti

0 otherwise

(ð21Þ

More complicated blotch detectors explicitly incorporatea model for the uncorrupted frames, such as a two- orthree-dimensional autoregressive model or a Markov RandomField to develop the maximum a posteriori detector for theblotch mask b(n, k). Figure 11b illustrates the detectionprobability versus the false detection probability of threedifferent detectors on a sequence of which a representativeblotched frame is shown in Fig. 11a. These results indicate thatfor reasonable detection probabilities the false detectionprobability is fairly high. False detections are detrimental tothe restoration process because the interpolation process itselfis fallible and may introduce disturbing artifacts that were notpresent in the blotched image sequence.

The blotch detectors described so far are essentiallypixel-based detectors. They do not incorporate the spatialcoherency of the detected blotches. The effect is illustratedby Fig. 11c which shows the detected blotch mask b(n, k) usinga simplified version of the ROD detector (with Ti !1,i � 2). The simplified ROD (sROD) is given by the following

Motionestimation

Motion vectorrepair

Blotchdetection

Blotchinterpolation

∧f (n,k)g(n,k)

b (n,k)

FIGURE 9 Blotch detection and removal system.

g(n−d(n;k,k+1),k+1)

g(n−d(n;k,k−1),k−1)

motion trajectory

g(n,k)

k+1k−1 k

FIGURE 10 Example of motion-compensated spatio-temporal window for

obtaining reference intensities in the ROD detector.



relations [33]:

sRODðn, kÞ ¼

minðrÞ � gðn, kÞ if gðn, kÞ < minðrÞ

gðn, kÞ �maxðrÞ if gðn, kÞ > maxðrÞ

0 elsewhere

8>><

>>:

ð22aÞ

bðn, kÞ ¼1 if sRODðn, kÞ > T

0 otherwise

(ð22bÞ

The sROD basically looks at the range of the reference pixelintensities obtained from the motion-compensated window,

and compares it with the pixel intensity under investigation.A blotch pixel is detected if the intensity of the current pixelg(n, k) lies far enough outside that range.

The performance of even this simple pixel-based blotchdetector can be improved significantly by exploiting the spatialcoherence of blotches. This is done by postprocessing theblotch mask in Fig. 11c in two ways, namely by removingsmall blotches, and by completing partially detected blotches.We first discuss the removal of small blotches. The detectoroutput (22a) is not only sensitive to intensity changes due toblotches corrupting the image sequence, but also to noise. Ifthe probability density function of the noise — denoted byf W(w) — is known, the probability of false detection fora single pixel can be calculated. Namely, if the sROD uses |S|reference intensities in evaluating (22a), the probability that

(c)

10−5 10−4 10−3 10−2 10−1 1000.65

0.7

0.75

0.8

0.85

0.9

0.95

1

Probability of false alarm

Pro

babi

lity

of c

orre

ct d

etec

tion

SRODSDI AR

(a) (b)

FIGURE 11 (a) Video frame with blotches; (b) Correct detection versus false detection for three different blotch

detectors; (c) Blotch detection mask using the sROD (T¼ 0).



sROD(n, k) for a single pixel is larger than T due to noiseonly is [33]:

PðsRODðn, kÞ > Tjno blotchÞ

¼ Pð gðn, kÞ �maxðrÞ > Tjno bl:Þ

þ PðminðrÞ � gðn, kÞ > Tjno bl:Þ

¼ 2

Z1

�1

Zu�T

�1

fW ðwÞdw

2

4

3

5S

fW ðuÞdu ð23Þ

In the detection mask b(n, k) blotches may consist of singlepixels or of multiple connected pixels. A set of connectedpixels that are all detected as (being part of a) blotch, is calleda spatially coherent blotch. If a coherent blotch consists ofN connected pixels, the probability that this blotch is due tonoise only is

PðsRODðn, kÞ > T for N connected pixelsjno blotchÞ

¼ PðsRODðn, kÞ > Tjno blotchÞð ÞN

ð24Þ

By bounding this false detection probability to a certainmaximum, the minimum number of pixels identified by thesROD detector as being part of a blotch can be computed.Consequently, coherent blotches consisting of fewer pixelsthan this minimum are removed from the blotch mask b(n, k).

A second postprocessing technique for improving thedetector performance is hysteresis thresholding. First a blotchmask is computed using a very low detection threshold T,for instance T¼ 0. From the detection mask the small blotches

are removed as described above, yielding the mask b0(n, k).Nevertheless, due to the low detection threshold this maskstill contains many false detections. Then a second detectionmask b1(n, k) is obtained by using a much higher detectionthreshold. This mask contains fewer detected blotches andthe false detection rate in this mask is small. The seconddetection mask is now used to validate the detected blotchesin the first mask: only those spatially coherent blotches inb0(n, k) that have a corresponding blotch in b1(n, k) arepreserved, all others are removed. The result of the above twopostprocessing techniques on the frame shown in Fig. 11a isshown in Fig. 12a. In Fig. 12b the detection and false detectionprobabilities are shown.

3.2 Motion Vector Repair and InterpolatingCorrupted Intensities

Block-based motion estimators will generally find the correctmotion vectors even in the presence of blotches, providedthat the blotches are small enough. The disturbing effectof blotches is usually confined to small areas of the frames.Hierarchical motion estimators will experience little influenceof the blotches at the lower resolution levels. At higherresolution levels blotches covering larger parts of (at thoselevels) small blocks will significantly influence the motionestimation result. If the blotch mask b(n, k) has been esti-mated, it is also known which estimated motion vectors areunreliable.

There are two strategies in recovering motion vectorsthat are known to be unreliable. The first approach is totake an average of surrounding motion vectors. This process— known as motion vector interpolation or motion vector repair— can be realized using, for instance, the median or average

10−5 10−4 10−3 10−2 10−1 1000.65

0.7

0.75

0.8

0.85

0.9

0.95

1

Probability of false alarm

Pro

babi

lity

of c

orre

ct d

etec

tion

SROD ProcessedSROD SDI AR

(a) (b)

FIGURE 12 (a) Blotch detection mask after postprocessing; (b) Correct detection versus false detections obtained for

SROD with postprocessing (top curve), compared to results from Fig. 11b.



of the motion vectors of uncorrupted regions adjacent tothe corrupted blotch. Though simple, the disadvantages ofaveraging are that motion vectors may be created that are notpresent in the uncorrupted part of the image and that novalidation of the selected motion vector on the actual frameintensities takes place.

The second — more elaborate — approach circumvents thisdisadvantage by validating the corrected motion vectors usingintensity information directly neighboring the blotched area.As a validation criterion the motion-compensated meansquared intensity difference can be used [11]. Candidatesfor the corrected motion vector can be obtained either frommotion vectors taken from adjacent regions or by motionre-estimation using a spatial window containing only uncor-rupted data such as the pixels directly bordering the blotch.

The estimation of the frame intensities labeled by the maskas being part of a blotch can be done either by a spatialor temporal interpolation, or a combination of both. Weconcentrate on spatio-temporal interpolation. Once themotion vector for a blotched area has been repaired, thecorrect temporally neighboring intensities can be obtained.In a multi-stage median interpolation filter, five interpolatedresults are computed using the (motion-compensated) spatio-temporal neighborhoods shown in Fig. 13. Each of the fiveinterpolated results is computed as the median over thecorresponding neighborhood Si:

ffiðn, kÞ ¼ median�

f ðn, k� 1Þjn 2 Sk�1i

� �[ f ðn, kÞjn 2 Sk

i

� �

[ f ðn, kþ 1Þjn 2 Skþ1i

� ��ð25Þ

The final result is computed as the median over the fiveintermediate results:

ff ðn, kÞ ¼ median ff1ðn, kÞ, ff2ðn, kÞ, ff3ðn, kÞ, ff4ðn, kÞ, ff5ðn, kÞ� �

ð26Þ

The multi-stage median filter does not rely on any modelfor the image sequence. Though simple, this is at the same timea drawback of median filters. If a model for the original image

sequence can be assumed, it is possible to find statisticallyoptimal values for the missing intensities. For the sake ofcompleteness we mention here that if one assumes the popularMarkov Random Field, the following complicated expressionneeds to be optimized:

Pð ff ðn, kÞjf ðn� dðn; k, k� 1Þ, k� 1Þ, f ðn, kÞ,

f ðn� dðn; k, kþ 1Þ, kþ 1ÞÞ / exp �X

m:dðm, kÞ¼1

� mð Þ

!,

� mð Þ ¼X

s2Sk

ff ðm, kÞ � ff ðs, kÞ� �2

þ �

X

s2Sk�1

ff ðm, kÞ � f ðn� dðn; k, k� 1Þ, k� 1Þ� �2

þX

s2Skþ1

ff ðm, kÞ � f ðn� dðn; k, kþ 1Þ, kþ 1Þ� �2

!

ð27Þ

The first term of � on the right hand side of (27) forcesthe interpolated intensities to be spatially smooth, while thesecond and third term enforce temporal smoothness. Thesets Sk� 1, Sk and Skþ 1 denote appropriately chosen spatialwindows in the frames k� 1, k and kþ 1. The temporalsmoothness is calculated along the motion trajectory usingthe repaired motion vectors. The optimization of (27) requiresan iterative optimization technique. If a simpler 3-D auto-regressive model for the image sequence is assumed, theinterpolated values can be calculated by solving a set of linearequations.

Instead of interpolating the corrupted intensities, it is alsopossible to directly copy and paste intensities from past orfuture frames. The simple copy-and-paste operation instead ofa full spatio-temporal data regeneration is motivated by theobservation that — at least on local and motion-compensatedbasis — image sequences are heavily correlated. Furthermore,straightforward interpolation is not desirable in situationswhere part of the information in the past and future framesitself is unreliable, for instance if it was part of a blotch itselfor if it is situated in an occluded area. The objective is nowto determine — for each pixel being part of a detectedblotch — if intensity information from the previous or nextframe should be used. This decision procedure can again becast into a statistical framework [27, 33]. As an illustration,Fig. 14 shows the interpolated result of the blotched framein Fig. 11a.

3.3 Restoration in Conditions of DifficultObject Motion

Due to various types of complicated object movements [31],wrong motion vectors are sometimes extracted from the

S4k−1

S1k−1

S5k−1

S2k−1 S3

k−1S2k+1 S3

k+1

S4k+1

S1k+1

S5k+1S4

k

S1k S2

kS3

k

S5k

FIGURE 13 Five spatio-temporal windows used to compute the partial

results in Eq. (25).



sequence. As a result, the spatio-temporal restoration processthat follows may introduce unnecessary errors that are visuallymore disturbing than the blotches themselves. The extractedtemporal information becomes unreliable, and a source oferrors by itself. This triggered research on situations in whichthe motion vectors cannot be used.

An initial solution to the problem was to detect areasof ‘‘pathological’’ motion and protect them against anyrestoration [31]. This solution, however, preserves the artifactswhich happen to lie in those areas. To restore these artifacts,too, several solutions have been proposed which discardthe temporal information and use only spatial informationcoming from the same frame. Spatial restoration is currentlythe subject of intensive research [6, 8, 15, 19, 32] and hasproven to be a powerful alternative to temporal or spatio-temporal restoration.

Spatial restoration algorithms can be classified mainly intwo categories: smoothness-preserving ones, and texturesynthesis ones. The first category comprises various inpaintingapproaches which usually propagate isophotes (i.e., levellines), gradients or curvatures inside the artifact by meansof variational models [5, 6, 10, 28], or approaches whichreconstruct the explicit structure of objects based on edgereconnection [32]. In the second category, parametric andnon-parametric approaches are employed for texture synthesisbased on Markov Random Fields [8, 12, 15], Bayesian auto-regressive models [26], projection onto convex sets [17] andother models [1].

As their naming suggests, texture synthesis methods do abetter reconstruction of the textural content. However, theydo not preserve object edges properly. This is done betterwith smoothness-preserving methods, which, on their turn,do not reproduce textural content. In order to combine theadvantages of both methods, a third category of algorithms

has appeared comprising hybrid methods that combinestructure and texture reconstruction [7, 19, 30, 32].

Figure 15 presents an example of an artificially degradedimage which was restored by means of a hybrid method. Therestoration consisted of a texture synthesis constrained byedge-based structure reconstruction [32]. The main steps are:(1) edge detection and feature extraction; (2) image structurereconstruction; and (3) texture synthesis. In the first step,edges are detected around the artifact, based on the con-tours that result from a segmentation procedure. Features areextracted for each edge, such as: histograms of the luminanceand gradient angles on both sides of the edge. In the secondstep, the missing object edges within the artifact area arerecovered by pairing the edges extracted around the artifact.The pairing procedure assesses the similarity of the pairededges based on the features extracted in the previous step; onan estimate of how well one edge continuous into another one(assuming, for example, local circularity); and on the spatialorder of the edges with respect to each other. All these factorscontribute to a global consistency score of the resultingstructure, and the best scoring configuration is chosen torepresent the final image strucuture. The edges which do notbelong to any pair are considered to be T-junctions (comingfrom edge occlusions). Finally, in the third step, the artifact isrestored by texture synthesis, confined by the structurereconstructed in the previous step.

In image sequences, this restoration method can beautomatically switched on to repair single frames spatiallywhen the extracted temporal information is not reliable. Thisis realized by means of a detector of complex events thatsupervises the output of the motion estimator. The task of thecomplex event detector is actually twofold: besides detectingthe frame areas where the motion estimator fails, it has todiscover artifacts which may lie in such areas. The latter task isperformed by means of spatial algorithms, since the commonartifact detectors would fail due to the unreliability of thetemporal information [31, 32].

4 Vinegar Syndrome Removal

Content stored on acetate-based film rolls faces deteriorationat a progressive pace. These rolls can be affected by dozens oftypes of film artifacts, each of them having its own propertiesand showing up in particular circumstances. One of themajor problems of all film archives is the vinegar syndrome[32]. This syndrome appears when, in the course of theirchemical breakdown, the acetate-based film bases startto release acetic-acid, giving a characteristic vinegar smell.It is an irreversible process, and from a certain moment on,it becomes auto-catalytic, progressively fuelling itself in thecourse of time.

The vinegar syndrome has various appearances. It mayshow up as a partial loss of colour, bright or dark tree-like

FIGURE 14 Blotch-corrected frame resulting from Fig. 11a.



branches, non-uniformly blurred images, etc [32]. Here, weonly focus on one type of vinegar syndrome, namely thepartial loss of colour. This manifests itself as a localized totalloss of information in some of the film dye layers (e.g., greenand blue). Thanks to the sandwiched structure of the film(sketched in Fig. 16), the inner layer (red) is more protectedand still preserves some of the original information. This typeof vinegar syndrome may be accompanied by emulsionmelting (a dissolution of the dye layers). Figure 18a showsan example of vinegar syndrome combined with emulsionmelting.

The detection and correction of this type of vinegarsyndrome can be performed with the normal spatio-temporal techniques, as long as it has a temporally impulsiveappearance, like normal blotches. However, there are caseswhen the vinegar syndrome happens in areas of pathologicalmotion, or when it appears in the same place in consecutiveframes. In both cases, the information from the surround-ing frames is of little use and one has to rely only on the

(a) (b)

(c) (d)

FIGURE 15 Constrained texture synthesis. (a) Original image, with artificial artifact surrounded by a white box;

(b) Structure; (c) Original content of the artifact; (d) Restoration result.

Anti-Halation Layer

Film Base

Gelatin

Red Layer

Gelatin

Green Layer

Gelatin

Blue Layer

Gelatin

FIGURE 16 The layered structure of a film.



information from the current frame which lies in and aroundthe artifact. Moreover, the blotch removal algorithms do notexploit the information that remains inside the artifact, in thered layer.

For the detection step, one may use the fact that the lossof colour makes the artifact look very bright mainly in thegreen and blue layers [32]. In order to avoid the influenceof irregularities inside the artifact, a hysteresis thresholding isperformed as described in section 3.1 of this chapter, withdifferent thresholds for each colour layer and in each of thetwo thresholding phases. In the end, a conditional dilationmakes sure that the dark border surrounding the artifact isalso added to the artifact mask.

As opposed to conventional spatial restoration techniques,in the restoration step under consideration one may usethe information that still exists inside the artifact in thered layer [32]. This information may also be partially affectedby the artifact, so a gaussian smoothing is first performedon the red layer. The isophotes that result from this smoothingare presumed to follow the shapes of the the isophotesfrom the original, undegraded image. These isophotes arethen used to ‘‘propagate’’ information inside the artifactfrom its surrounding area. The general scheme is presentedin Fig. 17.

An artifact pixel pi with smoothed red value ris will be

overwritten with the non-smoothed RGB values of a pixelpk¼ (rk, gk, bk) lying in the neighbourhood of the artifact and

representing the ‘‘closest’’ pixel to pi, (in terms of physicaldistance and colour resemblance). Pixel pk is selected from aset of pixels representing the connected neighbourhood ofpixel pi and having values in ½ri

s ��s . . . ris þ�s�. pk is found

using Eq. (28):

k ¼ arg minj

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

ris � r

js

� �þ

d p i, p jð Þ

d p i, p jð Þ þ 1

2s

ð28Þ

where d(pi, p j) is the physical distance between pixels pi andp j, and is normalized to values in [0. . .1] in order to have thesame range as the colour distance (all pixel values are in the[0. . .1] interval).

Figure 18 presents the recovery of portions of a filmsequence that were presumed to have been completely lost.A value �s � 4% of the overall greyvalue range was used forthe connected neighbourhood calculation, and a standarddeviation �¼ 7 for the gaussian smoothing step. The resultlargely reflects the image structure that has been recoveredin the smoothed red layer.

5 Intensity Flicker Correction

Intensity flicker is defined as unnatural temporal fluctuationsof frame intensities that do not originate from the originalscene. Intensity flicker is a spatially localized effect that occursin regions of substantial size. Figure 19 shows three successiveframes from a sequence containing flicker. A model describingthe intensity flicker is the following

gðn, kÞ ¼ �ðn, kÞf ðn, kÞ þ �ðn, kÞ þ wðn, kÞ ð29Þ

Here, �(n, k) and �(n, k) are the multiplicative and additiveunknown flicker parameters, that locally scale the intensities ofthe original frame. The model includes a noise term w(n, k)that is assumed to be flicker-independent. In the absence offlicker we have �(n, k)¼ 1 and �(n, k)¼ 0. The objective offlicker correction is the estimation of the flicker parameters,followed by the inversion of Equation (29). Since flickeralways affects fairly large areas of a frame in the same way,the flicker parameters �(n, k) and �(n, k) are assumed to bespatially smooth functions. Temporally the flicker parametersin one frame may not be correlated at all with those in asubsequent frame.

The earliest attempts to remove flicker from imagesequences applied intensity histogram equalization or mean-equalization on frames. These methods do not form a generalsolution to the problem of intensity flicker correction becausethey ignore changes in scene contents, and do not appreciatethat intensity flicker is a localized effect. In Section 5.1 we showhow the flicker parameters can be estimated on stationaryimage sequences. Section 5.2 addresses the more realisticcase of parameter estimation on image sequences withmotion [34].

5.1 Flicker Parameter Estimation

When removing intensity flicker from an image sequence,we essentially make an estimate of the original intensities,given the observed image sequence. Note that the undoingof intensity flicker is only relevant for image sequences, sinceflicker is a temporal effect by definition. From a single frameintensity flicker cannot be observed nor be corrected.

Partial color artefact(PCA) detection

Current frame

PCA mask

For each PCA pixel, findclosest undamagedcounterpart in G

PCA restoration

Restored PCA’s

Gaussian smoothingof the red layer

Correspondencemap

Gaussian-filteredred layer (G)

FIGURE 17 Restoration scheme for the vinegar syndrome.



If the flicker parameters were known, then one can form anestimate of the original intensity from a corrupted intensityusing the following straightforward linear estimator:

ff ðn, kÞ ¼ h1ðn, kÞgðn, kÞ þ h0ðn, kÞ ð30Þ

In order to obtain estimates for the coefficients hi(n, k), themean-squared error between f(n, k) and ff ðn, kÞ is minimized,yielding the following optimal solution:

h0ðn, kÞ ¼ �1

�ðn, kÞ�ðn, kÞ þ

�2wðn, kÞ

�2g ðn, kÞ

E ½ gðn, kÞ�

!ð31aÞ

h1ðn, kÞ ¼1

�ðn, kÞ

�2g ðn, kÞ � �2

wðn, kÞ

�2g ðn, kÞ

ð31bÞ

If the observed image sequence does not contain any noise,then (31) degenerates to the obvious solution

h0ðn, kÞ ¼ ��ðn, kÞ

�ðn, kÞh1ðn, kÞ ¼

1

�ðn, kÞð32Þ

In the extreme situation that the variance of the corruptedimage sequence is equal to the noise variance, the combination

of (30) and (31) shows that the estimated intensity is equal tothe expected value of the original intensities E[f(n, k)].

In practice, the true values for the intensity-flicker para-meters �(n, k) and �(n, k) are unknown and need to beestimated from the corrupted image sequence itself. Since theflicker parameters are spatially smooth functions, we assumethat they are locally constant:

�ðn, kÞ ¼ �mðkÞ

�ðn, kÞ ¼ �mðkÞ

(8n 2 Sm ð33Þ

where Sm indicates a small frame region. This region can, inprinciple, be arbitrarily shaped, but in practice rectangularblocks are chosen. By computing the averages and variancesof both sides of Equation (29), the following analyticalexpressions for the estimates of �m(k) and �m(k) can beobtained:

��mðkÞ ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi�2

g ðn, kÞ � �2wðn, kÞ

�2f ðn, kÞ

s

��mðkÞ ¼ E½ gðn, kÞ� � ��mðkÞE½ f ðn, kÞ�

ð34Þ

To solve (34) in a practical situation, the mean and vari-ance of g(n, k) are estimated within the region Sm. The only

FIGURE 19 Three successive frames that contain intensity flicker.

(a) (b)

FIGURE 18 Restoration example (sequence courtesy of RTP — Radiotelevisao Portuguesa). (a) Original frame, with

artifact surrounded by a white box; (b) Restored frame.



quantities that remain to be estimated are the mean andvariance of the original image sequence f(n, k). If we assumethat the flicker correction is done frame-by-frame, we canestimate these values from the previous corrected frame k�1in the temporally corresponding frame region Sm:

E ½ f ðn, kÞ� �1

jSmj

X

m2Sm

ff ðm, k� 1Þ

�2f ðn, kÞ �

1

jSmj

X

m2Sm

ð ff ðm, k� 1Þ � E ½ f ðn, kÞ�Þ2 ð35Þ

There are situations in which the above estimates areunreliable. The first case is that of uniform intensity areas.For any original image intensity in a uniform regions, thereare an infinite number of combinations of �m(k) and �m(k)that lead to the observed intensity. The estimated flickerparameters are also potentially unreliable because of ignoringthe noise w(n, k) in (34) and (35). The reliability of theestimated flicker parameters can be assessed by the followingmeasure:

WmðkÞ ¼

0 if �2g ðm, kÞ < T

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi�2

g ðm, kÞ � TT

r

otherwise

8><

>:ð36Þ

The threshold T depends on the noise variance. Large valuesof Wm(k) indicate reliable estimates, while for the mostunreliable estimates Wm(k)¼ 0.

5.2 Estimation on Sequences with Motion

The results (34) and (35) assume that the image sequenceintensities do not change significantly over time. Clearly this isan incorrect assumption if motion occurs. The estimation ofmotion on image sequences that contain flicker is, however,problematic because virtually all motion estimators are basedon the constant luminance constraint. Because of the intensityflicker this assumption is violated heavily. The only motionthat can be estimated with sufficient reliability is globalmotion such as camera panning or zooming. In the followingwe assume that in the evaluation of (35) and (36) possibleglobal motion is compensated for. At that point we still needto detect areas with any remaining — and uncompensated —motion, and areas that were previously occluded. For bothof these cases the approximation in (35) leads to incorrectestimates, which in turn lead to visible artifacts in thecorrected frames.

There are various approaches for detecting local motion.One possibility is the detection of large differences betweenthe current and previously (corrected) frame. If localmotion occurs the frame differences will be large. Anotherpossibility to detect local motion is to compare the

estimated intensity-flicker parameters to threshold values.If disagreeing temporal information has been used forcomputing (35), we will locally find flicker parameters thatdo not correspond with its spatial neighbors or with the apriori expectations of the range of the flicker parameters.An outlier detector can be used to localize these incorrectlyestimated parameters.

For frame regions Sm where the flicker parameters could notbe estimated reliably from the observed image sequence, theparameters are estimated on the basis of the results in spatiallyneighboring regions. At the same time, for the regions inwhich the flicker parameters could be estimated, a smoothingpost-processing step needs to be applied to avoid suddenparameter changes that lead to visible artifacts in the correctedimage sequence. Such an interpolation and smoothing post-processing step may exploit the reliability of the estimatedparameters, as for instance given by Equation (36). Further-more, in those frame regions where insufficient informationwas available for reliably estimating the flicker parameters, theflicker correction should switch-off itself. Therefore, smoothedand interpolated parameters are biased toward �m(k)¼ 1 and�m(k)¼ 0.

In Figure 20, an example of smoothing and interpolatingthe estimated flicker parameter for �m(k) is shown as a 2-Dmatrix [34]. Each entry in this matrix corresponds to a30� 30 pixels region �m in the frame shown in Fig. 19.The interpolation technique used is successive over-relaxation(SOR). Successive over-relaxation is a well-known iterativeinterpolation technique based on repeated low-pass filter-ing. Starting off with an initial estimate �0

mðkÞ found bysolving (34), at each iteration a new estimate is formedas follows:

riþ1m ðkÞ ¼ WmðkÞ �

imðkÞ � �

0mðkÞ

� �þ �Cð�i

mðkÞÞ

�iþ1m ðkÞ ¼ �

imðkÞ þ !

riþ1m ðkÞ

WmðkÞ þ �ð37Þ

Here Wm(k) is the reliability measure, computed by (36),and Cð�mðkÞÞ is a function that measures the spatialsmoothness of the solution �m(k). The convergence of theiteration (37) is determined by the parameter !, while thesmoothness is determined by the parameter �. For thoseestimates that have a high reliability, the initial estimates�0

mðkÞ are emphasized, while for the initial estimates that aredeemed less reliable, i.e., �>>WmðkÞ, emphasis is on achievinga smooth solution. Other smoothing and interpolationtechniques include dilation and 2-D polynomial interpolation.The smoothing and interpolation needs to be applied not onlyto multiplicative parameter �m(k), but also to the additiveparameter �m(k).

As an example, Fig. 21 shows the mean and variance as afunction of the frame index k of the corrupted and corrected



image sequence ‘‘Tunnel’’. Clearly the temporal fluctuations ofthe mean and variance have been greatly reduced, indicat-ing the suppression of flicker artifacts. An assessment ofthe resulting visual quality, as with most results of videoprocessing algorithms, has been done by actually viewing thecorrected image sequences. Although the original sequencecannot be recovered, the flicker-corrected sequences have amuch higher visual quality and they are virtually without anyremaining visible flicker.

6 Kinescope Moire Removal

The moire phenomenon represents an interference of (semi-)periodical structures which gives rise to patterns that did notexist in the original signal. It appears in scanned or printedimages [2], in video sequences [16, 37] etc. Moire phenomenaare caused in general by the sampling process (due to thealiasing phenomenon), or a superposition of structures.Kinescope moire is caused by the process of transferring oldTV archives to magnetic tapes. In the early times of the

television, broadcasted programs were stored on films thatwere recorded directly from a TV monitor. The scan lines ofthe monitor were thus recorded as well. The transfer fromfilm to magnetic tapes took place years later by means of aTelecine device. This machine scans the film with a flyingspot that moves in horizontal lines. These scan lines and thoseof the initial TV monitor do not always coincide, giving riseto an aliasing phenomenon. The resulting moire shows upas a beating pattern of horizontal lines (see Fig. 23a).Because the initial TV monitors were not always flat, somecurvilinear distortion also takes place in the recorded signal,which further complicates the modeling and removal of themoire.

Currently, the most successful approaches work in thespectral domain. The algorithm presented in [37] replacespeaks of vertical frequency from the Fourier domain with anoise floor, as explained below. The algorithm steps are:

1. After applying the 2D Fourier transform, a rangeof vertical frequencies around the origin in the spec-tral domain: [�� . . .þ�] is selected. This horizontal

05 10

1520

25

05

10

150.9

1

1.1

1.2

1.3

1.4

1.5

05

1015

2025

05

10150.9

1

1.1

1.2

1.3

(a) (b)

FIGURE 20 (a) Estimated intensity flicker parameter �m(k) using (34) and local motion detection; (b) Smoothed and

interpolated �m(k) using SOR.

Degraded Corrected

0 20 40 60 80 100 120 140 160 180 20060

62

64

66

68

70

72

74

76

78

80

FRAME

INT

EN

SIT

Y M

EA

N

DegradedCorrected

0 20 40 60 80 100 120 140 160 180 200800

900

1000

1100

1200

1300

1400

1500

1600

FRAME

INT

EN

SIT

Y V

AR

IAN

CE

(a) (b)

FIGURE 21 (a) Mean of the corrupted and corrected image sequence (b) Variance of the corrupted and corrected

image sequence.



band from the spectrum will be protected against anychanges.

2. A binary mask B is created that will indicate whichfrequencies will be changed:

B !h,!vð Þ ¼1 if F !h,!vð Þ

�� > "��

AND !vj j > �ð Þ

0 Otherwise

(

ð38Þ

where F !h,!vð Þ�� represents the spectrum magnitude at

frequency !h,!vð Þ; and � ¼max F !h,!vð Þj j�min F !h,!vð Þj j

2 þ

min F !h,!vð Þ�� :

3. The magnitudes of the spectral components selectedwith mask B are replaced with a noise floor value,as shown in Fig. 22. This value represents the medianmagnitude of the frequencies lying outside the banddefined in step 1.

4. The resulting magnitude is combined with the originalphase extracted in step 1, and the inverse 2D Fouriertransform is applied in order to obtain the restoredimage.

Figure 23 presents the results of the moire removal algo-rithm. The parameter values used in this case were: " ¼ 0:1,� ¼ 94:4 and � ¼ 25. The moire pattern was largely remo-ved. Some ringing effect still persists due to the discontinuitiesinserted in the Fourier domain by the magnitude replacementoperation shown in Fig. 22b.

The kinescope moire phenomenon remains an openresearch track. The nonlinear distortions associated with it(in particular, the curved geometry) are difficult to model.Until now, the algorithms working in the frequency domainwere more successful than those working in the spatialdomain. More insight into moire phenomenon is expected tobe gained from its temporal analysis.

7 Concluding Remarks

This chapter has described methods for enhancing andrestoring corrupted video and film sequences. The materialthat was offered in this chapter, is complementary to thespatial enhancement and restoration techniques describedin other chapters of the Handbook. For this reason thealgorithmic details concentrated mostly on the temporalprocessing aspects of image sequences, or on solutions forcases where the temporal information is not usable. Althoughthe focus has been on the detection and removal of a limitednumber of impairments, namely noise, blotches, vinegarsyndrome, flicker and moire, the approaches and tools des-cribed in this chapter are of a more general nature and they

(a) (b)

FIGURE 22 An example of spectrum magnitude at zero horizontal

frequency1. (a) Degraded image; (b) Restored image.

(a) (b)

FIGURE 23 (a) Image affected by moire2; (b) Restored image.

1Images used with permission from the authors of [37].2Images used with permission from the authors of [37]. Courtesy of INA (Institut de L’Audiovisuel) and BBC (British Broadcase Corporation).



can be used for developing enhancement and restorationmethods for other types of degradation.

8 References

[1] S.T. Acton, D.P. Mukherjee, J.P. Havlicek, A.C. Bovik,‘‘Oriented texture completion by AM-FM reaction-diffusion’’,IEEE Transactions on Image Processing, Vol. 10(6), 2001.

[2] I. Amidror, The Theory of the Moire Phenomenon, KluwerAcademic Publ., 2000.

[3] G.R. Arce, ‘‘Multistage order statistic filters for image sequenceprocessing’’, IEEE Transactions on Signal Processing, Vol. 39(5),pp. 1147–1163, 1991.

[4] N. Balakrishnan, C.R. Rao, (Editors) Handbook of Statistics 16:Order Statistics: Theory & Methods, North-Holland, 1998.

[5] C. Ballester, M. Bertalmio, V. Caselles, G. Sapiro, J. Verdera,‘‘Filling-in by joint interpolation of vector fields and graylevels’’, IEEE Transactions on Image Processing, Vol. 10(8),August 2001.

[6] M. Bertalmio, G. Sapiro, V. Caselles, C. Ballester, ‘‘Imageinpainting’’, Proceedings of SIGGRAPH 2000, New Orleans,USA, July 2000.

[7] M. Bertalmio, L. Vese, G. Sapiro, S. Osher, ‘‘Simultaneousstructure and texture image inpainting’’, IEEE Transactions onImage Processing, Vol. 12(8), August 2003.

[8] R. Bornard, Probabilistic Approaches for the Digital Restorationof Television Archives, Ph.D. Thesis, Ecole Centrale Paris,France, 2002.

[9] J.C. Brailean, R.P. Kleihorst, S. Efstratiadis, A.K. Katsaggelos,R.L. Lagendijk, ‘‘Noise reduction filters for dynamic imagesequences: a review’’, Proceedings of the IEEE, Vol. 83(9),pp. 1272–1291, September 1995.

[10] T. Chan, J. Shen, ‘‘Mathematical models for local non-textureinpainting’’, SIAM Journal on Applied Mathematics, Vol. 62(3),1019–1043, 2001.

[11] M.J. Chen, L.G. Chen, R. Weng, ‘‘Error concealment of lostmotion vectors with overlapped motion compensation’’, IEEETrans. on Circuits and Systems for Video Technology, Vol. 7(3),1997.

[12] A. Criminisi, P. Perez, K. Toyama, ‘‘Object removal byexemplar-based inpainting’’, Proceedings of CVPR 2003.

[13] D.L. Donoho, I.M. Johnstone, ‘‘Ideal spatial adaptation viawavelet shrinkage’’, Biometrika, Vol. 81, pp. 425–455, 1994.

[14] E. Dubois, S. Sabri, ‘‘Noise reduction using motion compen-sated temporal filtering’’, IEEE Transactions on Communica-tions, Vol. 32, pp. 826–831, 1984.

[15] A.A. Efros, T.K. Leung, ‘‘Texture synthesis by non-parametricsampling’’, Proceedings of ICCV 1999, Corfu, Greece, September1999.

[16] C. Hentschel, ‘‘Video moire cancellation filter for high-resolution CRTs’’, IEEE Transactions on Consumer Electronics,Vol. 47(1), pp. 16–24, 2001.

[17] A.N. Hirani, T. Totsuka, ‘‘Combining frequency and spatialdomain information for fast interactive image noise removal’’,ACM SIGGRAPH, pp. 269–276, 1996.

[18] T.S. Huang (ed.), Image Sequence Analysis, Springer Verlag,Berlin, 1991.

[19] J. Jia, C.-K. Tang, ‘‘Inference of segmented color and texturedescription by tensor voting’’, IEEE Transactions on PatternAnalysis and Machine Intelligence, Vol. 26(6), pp. 771–786,June 2004.

[20] J. Kim, J.W. Woods, ‘‘Spatio-temporal adaptive 3-D kalmanfilter for video’’, IEEE Transactions on Image Processing,Vol. 6(3), pp. 414–424, March 1997.

[21] R.P. Kleihorst, R.L. Lagendijk, J. Biemond, ‘‘Noise reduction ofimage sequences using motion compensation and signaldecomposition’’, IEEE Transactions on Image Processing,Vol. 4(3), pp. 274–284, 1995.

[22] A.C. Kokaram, R.D. Morris, W.J. Fitzgerald, P.J.W. Rayner,‘‘Detection of missing data in image sequences’’, IEEETransactions on Image Processing, Vol. 4(11), 1995.

[23] A.C. Kokaram, R.D. Morris, W.J. Fitzgerald, P.J.W. Rayner,‘‘Interpolation of missing data in image sequences’’, IEEETransactions on Image Processing, Vol. 4(11), 1995.

[24] A.C. Kokaram, Motion Picture Restoration: Digital Algorithms forArtifact Suppression in Degraded Motion Picture Film and Video,Springer Verlag, 1998.

[25] A.C. Kokaram, S. Godsill, ‘‘Joint noise reduction, motionestimation, missing data reconstruction, and model parameterestimation for degraded motion pictures’’, Proc. of SPIE,San Diego, 1998.

[26] A.C. Kokaram, ‘‘Parametric texture synthesis using stochasticsampling’’, Proceedings of ICIP 2002 (IEEE), New York, USA,September 22–25, 2002.

[27] A.C. Kokaram, ‘‘On missing data treatment for degraded videoand film archives: a survey and a new bayesian approach’’, IEEETransactions on Image Processing, Vol. 13(3), pp. 397–415,March 2004.

[28] S. Masnou, ‘‘Disocclusion: A variational approach using levellines’’, IEEE Transactions on Image Processing, Vol. 11(2),February 2002.

[29] M.K. Ozkan, M.I. Sezan, A.M. Tekalp, ‘‘Adaptive motion-compensated filtering of noisy image sequences’’, IEEE Transac-tions on Circuit and Systems for Video Technology, Vol. 3(4),pp. 277–290, 1993.

[30] S. Rane, M. Bertalmio, G. Sapiro, ‘‘Structure and texturefilling-in of missing image blocks for wireless transmissionand compression applications’’, IEEE Transactions on ImageProcessing, 2002.

[31] A. Rares� , M.J.T. Reinders, J. Biemond, ‘‘Statistical Analysis ofPathological Motion Areas’’, IEE Seminar on Digital Restorationof Film and Video Archives, London, UK, January 16, 2001.

[32] A. Rares� , Archived Film Analysis and Restoration, Ph.D. Thesis,Delft University of Technology, The Netherlands, 2004.

[33] P.M.B. Roosmalen, Restoration of Archived Film and Video,Ph.D. Thesis, Delft University of Technology, The Netherlands,1999.

[34] P.M.B. van Roosmalen, R.L. Lagendijk, J. Biemond, ‘‘Correctionof intensity flicker in old film sequences’’, IEEE Transactionson Circuits and Systems for Video Technology, Vol. 9(7),pp. 1013–1019, 1999.

[35] P.M.B. van Roosmalen, R.L. Lagendijk, J. Biemond, ‘‘Embeddedcoring in mpeg video compression’’, IEEE Transactions onCircuits and Systems for Video Technology, Vol. 3, pp. 205–211,2002.



[36] M.I. Sezan, R.L. Lagendijk (eds.), Motion Analysis and ImageSequence Processing, Kluwer Academic Publishers, Boston, 1993.

[37] D.N. Sidorov, A.C. Kokaram, ‘‘Suppression of moire patternsvia spectral analysis’’, in Visual Communications and ImageProcessing 2002, Editor C.-C. Jay Kuo, Proceedings of SPIE,Vol. 4671, pp. 895–906.

[38] E.P. Simoncelli, W.T. Freeman, E.H. Adelson, and D.J. Heeger,‘‘Shiftable Multiscale transform’’, IEEE Transactions on Informa-tion Theory, Vol. 38(2), pp. 587–607, 1992.

[39] E.P. Simoncelli, ‘‘Bayesian denoising of visual images in thewavelet domain’’, in Bayesian Inference in Wavelet Based Models,

P. Muller and B. Vidakovic, eds., Springer-Verlag, Lecture Notesin Statistics 141, 1999.

[40] C. Stiller, J. Konrad, ‘‘Estimating motion in image sequences’’,IEEE Signal Processing Magazine, Vol. 16(4), pp. 70–91,July 1999.

[41] A.M. Tekalp, Digital Video Processing, Prentice Hall, UpperSaddle River, NJ, 1995.

[42] J.W. Woods, J. Kim, ‘‘Motion-compensated Spatio-temporalKalman Filter’’, in Motion Analysis and Image SequenceProcessing, M.I. Sezan, and R.L. Lagendijk, eds., KluwerAcademic Publishers, Boston, 1993.


video enhancement and restoration - ufpe

Documents