temporal filtering for restoration of wavelet-compressed motion imagery

ARTICLE IN PRESS

Signal Processing: Image Communication 19 (2004) 701–721

*Corresponde

the author by po

0923-5965/$ - see

doi:10.1016/j.ima

Temporal filtering for restoration of wavelet-compressedmotion imagery

Mark A. Robertson*

Air Force Research Laboratory/IFEC, 525 Brooks Rd., Rome, NY 13441-4505, USA

Received 17 March 2004

Abstract

Temporal filtering of motion imagery can alleviate the effects of noise and artifacts in the data by incorporating

observations of the imagery data from several distinct frames. If the noise that is expected to occur in the data is well-

modeled by independent and identically distributed (IID) Gaussian noise, then straightforward algorithms can be

designed that filter along motion trajectories in an optimal fashion. This paper addresses the restoration of motion

imagery that have been compressed by scalar quantization of the data’s discrete wavelet transform coefficients. Noise

due to compression in such situations is neither independent nor identically distributed, and thus straightforward filters

designed for the IID case are suboptimal. This paper provides a statistical characterization of the quantization error

and shows how the improved noise modeling can be used in temporal filtering to improve visual quality of the

decompressed motion imagery. Example restoration results include the cases where the data have been compressed by

quantization of two- and three-dimensional wavelet transform coefficients. Although not developed in this work, the

noise model is also directly applicable to other restoration algorithms that incorporate information from other time

instants, such as super-resolution.

Published by Elsevier B.V.

Keywords: Temporal filtering; Quantization noise; Compression error; Motion imagery restoration; Discrete wavelet transform

1. Introduction

As lossy imagery compression techniques con-tinue to use discrete wavelet transforms (DWT),algorithms that process the data need to becomemore aware of peculiarities in the compression thatmay affect processing performance. When oneconsiders the capabilities of wavelet-based image

nce in regard to this article may be directed to

st at the address listed above.

front matter Published by Elsevier B.V.

ge.2004.06.001

and video compression techniques [13,24,26] andthe establishment of international standards [9,10],it is reasonable to expect much of the current andfuture imagery and motion imagery data to becompressed by DWT-based systems. More oftenthan not, especially for the case of motion imagerysequences, the data are compressed in a lossyfashion due to bandwidth limitations. By perform-ing lossy compression on the data one intention-ally introduces an error in the reconstructedimagery in exchange for decreased bandwidth.

ARTICLE IN PRESS

M.A. Robertson / Signal Processing: Image Communication 19 (2004) 701–721702

Although intentionally introduced, this error (ornoise) is unwanted; in many applications it may bedesirable to try to remove the noise, or at leastalleviate it so that the visual quality is improved.Estimation of pixel values based on noisy

observations has been studied in considerabledetail in the image restoration community. Toreduce the noise at a given pixel, filters canincorporate information from other pixels in somemanner to provide optimal pixel estimates, whereoptimality is defined according to some criterion,and also depends on the accuracy of the for-mulation’s assumptions. Two-dimensional spatialfiltering assumes that neighboring pixels arecorrelated with the pixel to be estimated. Whenone considers motion imagery, the additionaltemporal dimension allows filters to incorporatepixel data at other time instants. Pixel observa-tions in surrounding frames are highly correlatedwith the pixel to be estimated, and filters thatexploit these correlations can outperform purespatial filtering. Motion-compensated temporalfilters perform filtering along motion trajectories;thus if an object moves from one location in oneframe to another location in a subsequent frame,the temporal filter must have knowledge of themotion so that it may properly filter along thetemporal path of the object. Although notdiscussed in detail here, it is interesting to notethat such ideas have been used in the super-resolution literature (for example, in [25]), wherethe goal is to improve the spatial resolution byusing information from surrounding frames; here,however, the goal is to improve the quality of aframe without changing resolution.Brailean et al. [2] present a thorough literature

survey of temporal filtering in motion imagery,where they categorize algorithms into one of fourclasses: (1) non-motion-compensated spatio-tem-poral; (2) motion-compensated spatio-temporal;(3) non-motion-compensated temporal; and (4)motion-compensated temporal. According to sucha classification, the filters to be presented in thispaper belong to the fourth category, i.e., motion-compensated temporal filters. As will be men-tioned later, extension of these filters to spatio-temporal (i.e., to include spatial smoothing as wellas temporal smoothing) is straightforward. For a

thorough review of temporal filters in motionimagery through about 1995, see the survey papercited above. Several examples of more recent workcan be found in [6,8,19]. Perhaps more relevant tothe work here are those algorithms that applytemporal filtering to compressed motion imagery.For compressed image sequences, often the errordue to lossy compression forms a significant, andeven dominant, component of the overall noiseobserved in the decoded sequence. In such cases,the objective of temporal filtering is to alleviate theartifacts (or noise) introduced by compression.Several examples of temporal filtering in thismanner for schemes based on the discrete cosinetransform (DCT) can be found in the literature[5,22,29,34]. Although not equivalent to filtering asperformed here, super-resolution reconstruction ofcompressed image sequences does share manysimilar properties; several examples, again forDCT-based compression, include those found in[4,7,17].The main contribution of this work to the

restoration community lies in its precise character-ization of the compression error induced byquantization of the data’s wavelet-domain coeffi-cients. The following section derives a model forthis compression error, the main result of which isan error covariance matrix that can be used invarious forms and with various distributions fortemporal filtering of compressed motion imagery.Section 3 presents example predictions of quanti-zation noise behavior based on the noise model ofSection 2, including discussions for both 2D and3D wavelet transforms, as might be used inMotion JPEG2000 [10] or 3D SPIHT [13] com-pression. These predictions provide motivationand intuitive justification for one of the centralthemes of this paper, namely that straightforwardtemporal filters, as one might design for datacorrupted by independently distributed noise, areundoubtedly suboptimal when the data have beencompressed in a lossy fashion. Section 4 uses thenoise model developed in Section 2 to designtemporal filters that can be used to provide bothquantitative and qualitative improvements relativeto filters based on the simpler but commonassumption of independently distributed noise.Experimental results of this section show a clear

ARTICLE IN PRESS

M.A. Robertson / Signal Processing: Image Communication 19 (2004) 701–721 703

benefit of proper compression noise characteriza-tion compared to mean, weighted mean, median,and weighted median temporal filtering. Section 5concludes the paper.

2. Quantization noise

Compression noise has been studied thoroughlyfor the case of the discrete cosine transform [23],which found and continues to find extensive use incompression of images and video. However,wavelet compression noise has not been studiedin as much detail. Woods and Naveen [32]considered the compression distortion for pur-poses of bit allocation, where their primaryconcern was analyzing the overall noise energy asa sum of each subband’s noise energy. Whileconsidering the error from such a collective pointof view is quite appropriate for the application of[32], it is not necessarily appropriate for modelingpixel-to-pixel compression distortion. To see thelimitations of such a global point of view, considerthe image of Fig. 1, which shows the mean-squarederror averaged over 1000 images of size 128� 128that were compressed according to the wavelet-

20 40 60 80 100 120

20

40

60

80

100

120

Fig. 1. Mean-squared error for 1000 images of size 128� 128

compressed according to JPEG-2000 at 10:1 compression ratio,

with five levels of wavelet decomposition.

based JPEG 2000 standard [9]. Bright pixels in thefigure correspond to higher mean-square errors atthose pixel positions. An obvious conclusion fromthe image is that the variances of the errors arechanging according to pixel location, i.e., the pixelerrors are not identically distributed.Analysis of the spatially varying variance

evident in Fig. 1 has been conducted in the areaof LN-constrained image compression [11], wherethe objective is to limit the maximum pixel error(i.e., a local approach) rather than an overallglobal average error. However, not only are theerrors induced by wavelet quantization distributeddifferently, but they are also correlated. Such errorcorrelations have been observed experimentally fora situation similar to that shown in Fig. 1, wheresignificant correlations between pixel errors werereported [20]. Together with the discussion of theprevious paragraph, these results suggest that thepixel errors introduced by quantization of waveletcoefficients are neither independent nor identicallydistributed, and hence making an IID assumptionwill lead to suboptimal algorithms.Error patterns such as that shown in Fig. 1 are

readily evident when applying a three-dimensionalwavelet transform to image volumes. However, inthis case there are peaks and valleys in the mean-squared error in the two spatial dimensions as wellas the third dimension. Examples for the three-dimensional case are given in later sections.Without loss of generality, we represent the

pixel-domain image data by the length-N vector z;which is formed by stacking the columns fromeither a two-dimensional image or the images of athree-dimensional imagery volume. For notationalconvenience it will be assumed that z represents asingle W � W image; generalizations to non-square images and motion imagery are straightfor-ward. The multiresolution DWT being employedis represented by the matrix H; of size N � N;which can be either the two- or three-dimensionalDWT, depending on the situation. The waveletcoefficients are given as y ¼ Hz; which arequantized to yq ¼ Hzþ ey; where ey representsthe error due to quantization of the waveletcoefficients. Upon application of the inversewavelet transform, the reconstructed image be-comes zq ¼ zþ ez; where the spatial domain

ARTICLE IN PRESS

1Although approximately uncorrelated, the transform coeffi-

cients are not independent [3]. An interesting area of future

investigation for the work of this paper is consideration of the

well-known correlations in the magnitudes of the wavelet

coefficients.


quantization error is ez ¼ H�1ey; where H�1 is the

inverse DWT.An alternative but equivalent way of looking at

an image is in terms of the basis images of theinverse DWT,

z ¼Xu;v

h�1u;v Y ½u; v�; ð1Þ

where Y ½u; v� is the ðu; vÞth element of the DWTdecomposition represented by y; and h�1u;v is thebasis image corresponding to the ðu; vÞth waveletcoefficient, equivalent to the ðvW þ uÞth column ofH�1: In a similar manner, the quantization errorcan be written as,

ez ¼Xu;v

h�1u;v EY ½u; v�: ð2Þ

Watson et al. [30] base much of their quantizationnoise perceptibility study on the error termsh�1u;v EY ½u; v�:While the quantization error is a deterministic

function of an input image, in many practicalapplications once the quantized signal yq iscalculated, the clean signal y is discarded, andthus explicit information about the quantizationerror is lost. A commonly used theoretic tool formodeling the error signal is to treat it as a randomquantity [31]. Treating the error as randomprovides an understanding of how the errorbehaves, including how much the error will varyat different pixel locations and the correlationsbetween the errors. Such understanding furtherprovides the theoretical framework for formulat-ing effective schemes for alleviating the error. Thisstatistically based model of the error signal isreferred to as a quantization noise model since theerror signal represents unwanted information inthe resulting image representation. In the contextof DCT-based image compression, there arenumerous cases that treat the compression erroras a random quantity; some do so in order toanalyze visibility [18] or characteristics [36] of theerror, while others do so in order to formulatealgorithms that attempt to remove the noise[14,15,35]. This work is interested in both thecharacterization and the alleviation of the com-pression noise.

The original image data z has a covariance matrixof Kz; which results in a covariance matrix for thewavelet coefficients of Ky ¼ HKzH

t: For naturalimages, it has been observed that many discretewavelet transforms provide transform coefficientsthat are approximately uncorrelated.1 Here, it isassumed that the coefficients are approximatelyuncorrelated, allowing the simplifying approxima-tion that Ky is diagonal. For relatively high-ratesituations (rather small quantization bin sizes), it iswell established that the quantization errors are alsouncorrelated [11,27]. For lower-rate situations it isnot immediately obvious that the quantizationerrors are uncorrelated, although empirical evidencehas supported such an assumption: We haveperformed simulations with numerous test images(frames from the standard video test sequencesfootball, mobile, bike, garden, and tennis at resolu-tion 352� 240), and quantization errors for thewavelet coefficients show no noticeable covariancewith other coefficients. Such observations haveremained consistent when tested with a variety oftypes of wavelet transforms (i.e., different filters; allfilters tested were linear phase), and under a varietyof quantization severities. However, one shouldremember that quantization error statistics aredependent on the statistics of the data beingquantized, and it is possible that for severequantization some types of imagery may yieldwavelet-domain quantization errors that deviatefrom these observations.For both low- and high-rate cases we assume

that the quantization errors are approximatelyuncorrelated, which allows the approximation thatthe covariance matrix of ey; Key ; is diagonal andconsists of the wavelet domain quantization errorvariances for each coefficient.Given Key ; the covariance of the quantization

error in the spatial domain can then be found as

Kez ¼H�1KeyH�t

¼H�1½H�1Key �t; ð3Þ

ARTICLE IN PRESS


showing that the error covariance in the pixeldomain depends only on the error variances in thewavelet domain and the basis images of the inversewavelet transform. Individual elements of Kez

can be easily expressed in terms of the basisimages h�1u;v ;

Kez ½m1;m2� ¼Xu;v

s2Y ½u; v�h�1u;v ½m1�h�1

u;v ½m2�; ð4Þ

where s2Y ½u; v� are the DWT-domain error var-iances that compose Key ; and h�1

u;v ½m� represents themth element of h�1u;v : A special case of (4) considersthe diagonal elements of Kez ; which represent thevariances of the pixel-domain quantization errors.Quantization error variances are themselves ofsignificant interest; the results of Fig. 1 representestimates of these terms for one set of data. Ingeneral,

diagðKez Þ ¼Xu;v

s2Y ½u; v� ½h�1u;v �

2; ð5Þ

where the square of the vector indicates that eachelement of the vector is squared, and diagðKez Þ isthe vector taken from the diagonal of Kez : The½h�1u;v �

2 terms above can be considered ‘‘errorvariance basis images,’’ since the error variancesof an image can be written in terms of the basissummation of (5).It is argued here that a Gaussian probability

distribution function (pdf) provides a good de-scription of the quantization error in the pixeldomain. The primary justification is due to thebasis image summation that forms a reconstructedpixel error—the quantization error for a singlepixel will consist of the sum of quantization errorsfor each basis image that overlaps with thepixel, as described by (2). The number of suchbasis images depends on several factors, includingthe length of the wavelet reconstruction filters, thelevels and type of wavelet decomposition, andthe location of the pixel within the fixed-lengthtransform size. For most situations of interest, thenumber of contributing terms is large enough suchthat one may use the Central Limit Theorem toapproximate the sum of random variables repre-sented by noisy basis images as Gaussian. Thus,the probability distribution function of the pixel-

domain quantization error is approximated as

pðezÞ ¼1

ð2pÞN=2jKez j1=2

exp �1

2etz K

�1ezez

� �: ð6Þ

The exponent in the above equation can beexpanded according to (3),

� 12etz K

�1ezez ¼ � 1

2ðHez Þ

t K�1eyðHez Þ; ð7Þ

showing that the exponent of the probabilitydistribution can be evaluated by simple applicationof the wavelet transform followed by scalingaccording to the diagonal K�1

ey:

Other distributions besides Gaussian are possi-ble, although the Gaussian distribution is certainlythe most convenient both from a computationalstandpoint and from the standpoint of includingarbitrary covariance matrices as is necessary in theproblem at hand. Although not recommendedhere, it is possible to use independently distributedGaussian or Laplacian noise sources to modelthe quantization noise in the spatial domain. Theobvious disadvantage of such a method is that thecorrelation structure contained in Kez is lost.However, by assigning the variances of these N

independent noise terms according to the elementsof the diagonal elements of Kez as in (5), one canstill capture the variance behavior that wasillustrated in Fig. 1. The main advantage of suchan approach is a potential reduction in computa-tional complexity.Given a distribution type, the individual wavelet

domain quantization error variances that composeKey determine the error in the pixel domain. Forhigh-rate situations, the quantization step sizesused for compression are small enough that auniformly distributed random variable accuratelymodels the quantization error in the waveletdomain; Watson et al. [30] also use this model.In such cases, the diagonal components of Key arecomposed of terms 1

12D2

i ; where Di is the quantiza-tion step size for the ith wavelet coefficient. Forlower-rate situations, one must resort to morecomplicated means: A distribution function mustbe assumed for each wavelet coefficient, whoseparameters must be estimated from the received(noisy) data; the quantization noise can then becalculated for each wavelet coefficient. See [23] forsuch an analysis in the case of DCT quantization

ARTICLE IN PRESS


noise, where a Laplacian distribution is assumedfor each DCT coefficient. Although a lower-rateanalysis for the case of the wavelet transformcould be performed analogously to that of theDCT, such situations were not the focus of thiswork, and are not considered further. In general,however, the original image statistics dictate thesignal energy of the wavelet coefficients, and hencethe energy of the quantization errors in the waveletdomain; such analysis is an interesting area forfurther investigation.

3. Discussion

Fig. 2 shows predicted quantization errorvariances for a situation similar to that shown inFig. 1, where the prediction is taken from thediagonal components of Kez : Uniform dead-zonescalar quantization is applied to each waveletcoefficient of subband b with quantization stepsizes,

Db ¼Ddffiffiffiffiffigb

p ; ð8Þ

where gb is the sum of the squared errorsintroduced by a unit error in subband b at a

20 40 60 80 100 120

20

40

60

80

100

120

Fig. 2. Predicted error variance for compression of 128� 128

images, taken from diag(Kez ).

location away from the boundaries, and scales thenominal quantizer Dd [32]. The Daubechies 9=7biorthogonal DWT [1] is the transform employed,both here and throughout the remainder of thispaper. A comparison between zoomed portions ofFigs. 1 and 2 is presented in Fig. 3, where thesimilarity between observed and predicted errorvariances is clearly apparent. Note that thepredictions shown in Figs. 2 and 3 are based onthe assumption that all wavelet coefficients arenon-zero, and thus the larger quantization bins forzero-valued coefficients (due to the dead zone)do not contribute.Fig. 4 shows the predicted PSNR for each of 16

video frames that are compressed by uniformlyquantizing the coefficients from a temporal three-level wavelet decomposition with the 9=7 waveletfilters (note that this example only considers theoverall frame-by-frame error, not the pixel-by-pixel error). The plot correlates very well withPSNR results reported for the 3D SPIHT videocompression algorithm [12,13], which compressesimagery volumes by quantizing their 3D DWTcoefficients. Such an analysis of frame-by-framequantization error was previously performed byXu et al. [33], who propose a wavelet transformthat eliminates the low PSNR’s at the group-of-frames boundaries that are evident in Fig. 4.Nevertheless, regardless of whether or not awavelet transform exhibits boundary effects, therewill still be a predicted error pattern that dependson the reconstruction transform’s basis vectors.While Fig. 4 showed only the temporal variation

in error for compression using a three-dimensionalDWT, there is also significant pixel variation

100 105 110 115 120 125

100

105

110

115

120

125

100 105 110 115 120 125

100

105

110

115

120

125

Fig. 3. Zoomed portions of Fig. 1 (left) and 2 (right), near the

bottom right corner of the images. (The ranges of both images

have been normalized to the same range for display purposes.)

ARTICLE IN PRESS

0 5 10 1535

35.5

36

36.5

37

37.5

38

38.5

39

39.5

Fig. 4. Predicted PSNR for 16 frames when the wavelet

coefficients of a temporal three-level 9=7 biorthogonal wavelet

decomposition are quantized with uniform quantizers at high

rate.

50 100 150 200 250

50

100

150

200

250

Fig. 5. Predicted quantization error variance for a length-16

volume of 64� 64 images, compressed using a three-level (both

temporal and spatial) wavelet decomposition. The 64� 64 error

variance images are tiled from top-left to bottom-right.

2Note that Fig. 4 plotted PSNR, where higher values indicate

higher quality, whereas Fig. 5 plots mean-square error, where

higher values indicate poorer quality.


as was present in Fig. 2. Fig. 5 shows both spatialand temporal mean-square error prediction for a64� 64 sequence of 16 frames, which has beencompressed according to quantization of a three-level (both spatial and temporal) 3D wavelet

decomposition. The sixteen individual errorframes have been displayed in a 4� 4 tile, withthe first image at the top-left and the last image atthe bottom-right, with intermediate images occur-ring in row order between these two. The overallbrightness of each individual tile directly corre-sponds to the errors depicted in Fig. 4.2 Althoughperhaps difficult to see due to the image’s smallsize, the pixel-to-pixel variation of each individualcomponent is identical in form to that of Fig. 2.Thus there is significant variation in the errorbehaviour according to both temporal and spatialpixel location.Fig. 6 shows two predictions of the covariance

of the error for the ð32; 34Þ position. In the figure,the peak center value corresponds to the varianceof the error from the ð32; 34Þ pixel location, while,for example, the entry below the peak correspondsto the correlation between errors at locationsð32; 34Þ and ð32; 35Þ: The predictions have beennormalized such that the variance value is 100.0.For part (a) of the figure, predictions arecomputed assuming no zero-valued coefficientswere received by the decoder, and hence no largerquantization bins due to the quantizer dead zone.In part (b) of the figure, predictions are computedby randomly assigning zero and non-zero coeffi-cients, and hence the larger quantization bins fordead-zone coefficients affect the result (note thatthis prediction is just one realization of manypossible zero/non-zero coefficient scenarios, anddifferent correlations occur for different config-urations of zero/non-zero observations). Predictedcorrelations for part (a) of the figure indicate onlyminor correlations between the errors at pixelð32; 34Þ and its neighbors, and one might arguethat approximating them as independent would bevalid. However, as part (b) of the figure clearlydemonstrates, different quantization bin sizes thatarise as a result of dead-zone quantization canintroduce significant correlations between errors,and for this particular example such correlationsexist beyond the eight nearest neighbors. It isalso worth noting that the covariances shown in

ARTICLE IN PRESS

-0.77 -1.26 -2.48 -5.88 -2.48 -1.26 -0.77-1.18 -1.10 -0.89 5.19 -0.90 -1.10 -1.180.32 -0.74 -1.28 3.91 -1.28 -0.74 0.32

-0.83 5.55 3.71 100.00 3.71 5.55 -0.840.57 -0.18 -0.43 4.93 -0.43 -0.19 0.57

-0.73 -0.10 0.64 7.03 0.64 -0.10 -0.73-0.25 -0.13 -0.77 -3.83 -0.77 -0.13 -0.25

-4.79 -10.11 -11.32 -7.59 -3.58 -11.38 -3.67-4.51 -8.44 -13.94 2.52 -1.24 -11.65 -3.82-4.53 -5.70 29.16 23.10 -7.25 11.65 3.992.43 16.37 26.40 100. 00 -8.09 33.06 10.753.02 8.04 11.23 20.58 12.95 10.01 5.29

-2.24 -4.81 -4.08 -1.45 -9.71 0.06 0.46-0.97 -2.41 -6.59 -10.06 -5.65 -2.74 -0.29

(b)(a)

Fig. 6. Normalized prediction of error correlations for the single pixel location ð32; 34Þ under the same conditions reported in Fig. 1.

(a) Predictions are computed assuming no zero-valued quantized coefficients; (b) predictions are computed by randomly assigning zero

and non-zero quantized coefficients.


part (a) of the figure for pixel (32, 34) are not thesame as those covariances that are observed forother pixel locations; for example, if part (a) of thefigure is repeated for the (34, 34) pixel location,there are four entries that have magnitude greaterthan 10.0. As a final note, for the case of 3Dwavelet transforms there are correlations extend-ing in both the spatial and temporal dimensions.

4. Temporal filtering

This section demonstrates various methods ofrestoring motion imagery compressed by scalarquantization of both 2D and 3D wavelet coeffi-cients. In both cases, the data of interest aremotion imagery, i.e., they are sequences of imagesindexed by time. Filtering along the time domainallows the incorporation of information fromsurrounding frames for the restoration of thecurrent frame. The section begins by formulatingthe general framework for temporal filtering, andcontinues by examining temporal filtering whenusing several different variations of the quantiza-tion noise model discussed in this report. The firsttwo subsections provide results for syntheticallygenerated video whose individual frames havebeen compressed by using the 2D DWT. Section4.3 extends these results to the case of compressionwith a 3D DWT. Section 4.4 considers the case ofactual video, showing that benefits of using theproposed quantization noise model apply for realvideo as well as the synthetic sequences used in thefirst three subsections.Suppose the original image at time k is zk: Since

a temporal filter uses pixel information fromframes at different time instants, one must first

describe the relationships between the images atdifferent times. One common method of relatingimages at two time instants k and l is to modelthem such that,

zl ¼ Ak;lzk þ lk;l þ ek;l ; ð9Þ

where the matrix Ak;l forms a prediction of zl given

zk; ek;l is the error in such a prediction, and lk;l isa ‘‘mean’’ term that accounts for pixels in zl

that are unobservable from zk: Note that whenl ¼ k; Ak;k ¼ I; ek;k ¼ 0; and lk;k ¼ 0: When theoriginal images are compressed, the model thatrelates the observation at time l to the originalimage at time k must be modified according to thequantization error,

zlq ¼ Ak;lz

k þ lk;l þ ek;l þ elz; ð10Þ

where the additional noise term is the samequantization error term introduced in Section 2.For notational convenience, the two noise termscan be combined into a single noise term,

zlq ¼ Ak;lz

k þ lk;l þ nk;l : ð11Þ

The error is assumed to be normally distributed,with mean 0 and covariance Kk;l ; which yieldsthe probability distribution function of zl

q given zk;

pðzlq j z

kÞ ¼1

ð2pÞN=2 jKk;l j1=2

� exp �1

2ðAk;lz

k þ lk;l � zlqÞt

�

� K�1k;l ðAk;lz

k þ lk;l � zlqÞo: ð12Þ

With the simplifying approximation that zlq j z

k

and zmq j zk are independent for mal; the joint

pdf of observations of compressed images at times

ARTICLE IN PRESS


k � n;y; k þ n is

pðzlq j z

k; l ¼ k � n;y; k þ nÞ

¼Ykþn

l¼k�n

pðzlq j z

kÞ: ð13Þ

The temporal filters described in this section allfind maximum likelihood (ML) estimates ofa single frame zk given degraded observations ofthe sequence at time instants within 7n of k; i.e.,k � n;y; k þ n: The maximum likelihood estima-tor chooses an estimate for zk that maximizes thelikelihood term in (13). Since maximizing apositive function is equivalent to minimizing thenegative of its natural logarithm, the ML estimatecan be written as,

#zk ¼ arg minzk

Xkþn

l¼k�n

ðAk;lzk þ lk;l � zl

qÞt

� K�1k;l ðAk;lz

k þ lk;l � zlqÞ: ð14Þ

The various temporal filtering schemes presentedin this section differ in their dimensionality (two-or three-dimensional DWT) as well as their choiceof covariance matrix Kk;l ; but the general problemsetup is that of (14). Eq. (14) is solved using aniterative conjugate gradient optimization algo-rithm, details of which are given in Appendix A.Note, however, that some of the simplified noiseterms lead to simplifications that allow almosttrivial implementations that are not iterative; suchsimplifications will be discussed as they areintroduced in later subsections.Elements of the mean term lk;l are chosen as

lk;l ¼0 for observable pixels;

zlq for unobservable pixels:

(ð15Þ

For pixels at time l that do not correspond to anypixels at time k; the corresponding rows of Ak;l

consist entirely of zeros. To classify a pixel at timel as having or not having corresponding pixels attime k; a simple threshold can be applied to thedifference jzl

q � Ak;lzkq j after motion estimation;

knowledge of pixels appearing or disappearing atimage edges due to camera motion can also beused. More sophisticated algorithms could be usedto estimate both Ak;l and lk;l simultaneously,although they are not explored here.

Eq. (14) is a ML estimate that exclusively makesuse of temporal filtering to remove noise. Exten-sion of the formulation to include spatial filtering,although not developed here, is straightforward:A prior image model that encourages smoothness(although not too smooth—see [28]) can beincorporated in the form of a Markov randomfield (MRF). The restoration estimate can thenbe taken as the maximum a posteriori (MAP)solution, which, when using the likelihood term of(14), would perform spatio-temporal smoothing.The following subsections discuss various meth-

ods and experiments of temporally filtering wave-let-compressed motion imagery

4.1. Experiment 1

Consider the case where each frame of an imagesequence is compressed by use of a 2D DWT,independently of the other frames. This subsectionconsiders the simplified case where both thehorizontal and vertical motion between frames isan integer number of pixels. The simplest oftemporal filters assumes Gaussian noise withcovariance matrix Kk;l that is diagonal withelements s2 for each l ¼ k � n;y; k þ n: Such afilter equally weights all observations of eachframe at each time instant, and can be looselyinterpreted as a box filter in the temporaldimension along the motion trajectories describedby the Ak;l matrices. Another simple filter assumesIID Laplace noise, and results in a median filteralong the motion trajectories. More accurateversions of these two filters can be achieved byusing the same noise type (i.e., independentGaussian or Laplacian), but assigning Kk;l ¼diagðK

elzÞ: For the Gaussian case, this leads to a

weighted-average filter along motion trajectoriessuch that pixels that are more accurate accordingto the quantization noise model are weighted moreheavily than pixels that are less accurate. Similarly,the Laplace case leads to a weighted-median filteralong motion trajectories. The fifth and final caseconsidered here is to use Gaussian noise withKk;l ¼ K

elz; which cannot be implemented by

simple averaging or median filtering, but is insteadimplemented with the conjugate gradient methoddiscussed in Appendix A. All five of these noise

ARTICLE IN PRESS


models for Kk;l consider only the noise introducedby compression.The initial experiment given in this subsection is

designed to isolate the effect of the quantizationnoise model. The input sequence consists of fiveframes, where each frame is taken from a singlesource image and globally translated with integershift; thus the only difference between these fiveimages is that they are global translates of eachother, where the global motion parameters areknown. In a very simple sense, these five imagesform an artificially constructed ‘‘image sequence’’.Obviously, sequences with such a property willalmost never occur in actual video; however, thepoint of this subsection is to isolate the effect ofthe quantization noise model, and the methoddiscussed here eliminates the influence of othermotion-related error such as ek;l or uncertainty inthe matrices Ak;l : Section 4.4 will relax theconstraints imposed here so as to filter actualvideo.The integer shifts that relate the five frames are

ð0; 0Þ; ð1; 0Þ; ð0; 1Þ; ð�1; 0Þ; and ð0;�1Þ: Thecompressed images are formed by quantizationof four-level DWT decompositions of the images,where the quantization is performed as describedby (8). The frames used for compression are 384�384 portions of the original larger image, whichallows the shifted frames to be extracted without

(a) (b)

Fig. 7. Original and compressed images for the first temporal

PSNR ¼ 32:80 dB:

missing pixels at the image borders. Fig. 7 showsthe original test-pattern image along with theimage to be restored, the ð0; 0Þ-shifted frame.Although only the compressed image correspond-ing to ð0; 0Þ translation is shown in the figure, theother four compressed images appear approxi-mately the same; all five compressed imagesfor test-pattern have PSNR in the range32:8470:06 dB:Fig. 8 compares the restoration results of test-

pattern for the IID Gaussian noise model and forthe proposed noise model. As can be seen bycomparing Figs. 7 and 8, even the IID Gaussiannoise model yields considerable improvement overthe originally compressed image; this, however, isto be expected due to the contrived nature of theexperiment. The important lesson to be learnedfrom Fig. 8 is not the improvement relative to theimage in Fig. 7(b), but rather the improvement ofthe image in Fig. 8(b) relative to that in part (a)—using the quantization noise model proposed hereprovides over 1 dB of improvement in PSNRrelative to using an IID Gaussian noise model,and, most importantly, there is a significant visibleimprovement as well, in particular in the numbersthat label the concentric circles in the upper leftof the figures. Table 1 summarizes results forother test images using all five quantizationnoise models introduced at the beginning of this

filtering experiment: (a) original; (b) compressed image 0,

ARTICLE IN PRESS

(a) (b)

Fig. 8. Comparison of restorations of the compressed image in Fig. 7 using IID and non-IID Gaussian noise models. (a) Restoration

using IID Gaussian noise model, PSNR ¼ 35:10 dB; (b) restoration using Gaussian noise model with covariance matrices

Kelz; PSNR ¼ 36:14 dB:

Table 1

PSNR values for restoration using various quantization noise models

Sequence Compressed Laplacian, Laplacian, Gaussian, Gaussian, Gaussian,

name images IID diagðKelzÞ IID diagðK

elzÞ K

elz

barb 32:8670:02 35.18 35.50 36.00 36.31 37.15

boat 30:4970:03 32.06 32.24 32.65 32.84 33.71

mandrill 32:2970:01 34.79 35.04 35.52 35.87 36.83

peppers 32:5070:03 33.48 33.55 33.81 33.90 34.37

test-pattern 31:1070:06 32.42 32.61 33.26 33.42 34.24

test-pattern 32:8470:06 34.30 34.51 35.10 35.29 36.14

test-pattern 34:7370:07 36.39 36.62 37.10 37.31 38.09

The range of PSNR’s under ‘‘Compressed images’’ refers to the PSNR’s for the five compressed input images. All PSNR values are in

decibels.


subsection; 384� 384 portions of each of these testimages were used. As can be seen from Table 1,using the full quantization error covariance matrixwith a Gaussian pdf produces PSNR results thatare significantly better than those produced whenusing IID distributions or when using diagonalapproximations of the full covariance matrix, andis true for either Laplace or Gaussian distribu-tions.The five test images used here contain various

characteristics, from the text contained in theresolution chart of test-pattern, to the texture ofmandrill’s fur or barb’s clothing, to the smoothregions contained in areas of each of the images.

Although the results presented in Table 1 do notcontain an exhaustive comparison between imagesat different compression qualities, the PSNRimprovements evident in the table are representa-tive of PSNR improvements that have beenobserved over a wide range of compressionseverities.

4.2. Experiment 2

The previous subsection demonstrated thepotential improvement of using the proposedquantization noise model relative to using simpleIID Gaussian or Laplacian noise models. Both the

ARTICLE IN PRESS


proposed noise model and the IID noise modelsprovided significant improvements relative to thecompressed images, with the proposed noise modelgiving approximately an additional decibel ofimprovement over the IID Gaussian model. Theexperiment presented in this subsection willdemonstrate that the improvements that one gets,for each of the noise models, is quite dependent onthe actual motion between the frames.Consider an experiment much like the one of the

previous subsection, in which original frames of asynthetic video sequence are formed by taking384� 384 subimages from an original 512� 512image; each of the frames are offset by an integerglobal translation. In this subsection, the frame tobe restored is considered as having global transla-tion of ð0; 0Þ; and eight other frames are formed bytaking shifts of 7b pixels, i.e., ðb; 0Þ; ðb; bÞ; ð0; bÞ;ð�b; bÞ; ð�b; 0Þ; ð�b;�bÞ; ð0;�bÞ; and ðb;�bÞ:Table 2 compares PSNR improvements as afunction of b for Gaussian noise models makinguse of K

elzand the IID noise assumption; the

mandrill image is used for testing. For this simplerestoration example, the PSNR improvementsobviously depend on the motion that occursbetween frames—odd pixel displacements consis-tently yield better results than even displacements.These results in the table are quite interesting; foreach value of b; the input frames all have(approximately) the same PSNR. Thus, although

Table 2

Dependence of filtering results on integer shifts b for the

mandrill image

b DPSNR, DPSNR,Gaussian IID Gaussian K

elz

1 3.93 5.69

2 0.80 0.97

3 3.94 5.71

4 0.25 0.30

5 3.94 5.71

6 0.80 0.97

7 3.93 5.69

8 0.14 0.17

Here, frame 0 is being restored based on the nine total

observations that are within 7b of a shift of ð0; 0Þ: Quantizedinput frames all have PSNR of 34:170:05 dB:

in each case of b the input images are shiftedversions of each other with equivalent PSNR’s,there are drastic differences in PSNR results whenthe samples are averaged. Such behavior is easilyexplained: Due to the subsampling employed by anon-expansive discrete wavelet transform, shifts ofb ¼ 2 among frames result in the highest-fre-quency subbands’ being exact shifts of each other.Since the quantization parameters are unchangedbetween images, the quantized DWT coefficientsin these highest-frequency subbands are identicalamong the different images, and hence no newinformation is introduced in the highest subbandsof the shifted image observations. Any gains forb ¼ 2 are due entirely to lower-frequency sub-bands. Similarly, for the case of b ¼ 4 not only arethe highest-frequency subbands exact shifts ofeach other, but so are the second-highest-fre-quency subbands; hence the PSNR improvementsfor b ¼ 4 are worse than for b ¼ 2: The trendcontinues for b ¼ 8; and ultimately one canconclude that for a lev-level DWT decomposition,shifts of7v2lev; v an integer, there will be zero gainby filtering multiple observations; such is the caseregardless of the noise model one uses. Ananalogous phenomenon occurs when a compres-sion technique makes use of the block DCT: If theblock size is, for example, 8� 8; then there is noreduction in compression noise by temporalfiltering when the shifts among the images areinteger multiples of 8.These results suggest that one should not forget

that the quantization noise model is merely whatits name suggests—a model. True quantizationerror is not a random process, but rather adeterministic and repeatable quantity; compres-sing the same image at two different timesproduces quantization errors that are identicalfor the two images. However, as discussed inSection 2 the quantization noise model canprovide a foundation upon which restorationalgorithms can be built, and although the drasticresults of Table 2 demonstrate a limitation of themodel, the pathological conditions of this sub-section’s experiment (namely, that all frames differin global translation by a constant integer shift)rarely occur in natural video. Section 4.4 presentsresults for actual video that demonstrate the

ARTICLE IN PRESS


quantization noise model’s utility in temporalfiltering.

4.3. Experiment 3

Here, we repeat the experiment of Section 4.1 byreplacing the two-dimensional DWT by the three-dimensional DWT. Such a situation was discussedin Section 3, where it was demonstrated that pixelerrors’ statistical behavior varies both spatiallyand temporally. Here, three noise models will becompared, all of which are Gaussian: Using thefull K

elz; using an IID model; and using indepen-

dent noise with variances that are constant withina frame, but vary between frames in a mannerpredicted by the plot in Fig. 4. As was the case inFig. 4, the length of the transform in the temporaldirection is sixteen; the image sequence is synthe-sized by shifts applied to a prototype image ofð0; 0Þ; ð1; 0Þ; ð1; 1Þ; and continuing in an outwardcounter-clockwise spiral to the final shift ofð�1; 2Þ: All of the sixteen frames in this group ofpictures are filtered to produce the estimate of theoriginal image.Quantitative results for the experiment are

shown in Table 3. Significant PSNR improvementsare evident relative to the received noisy images,but this is to be expected since the filtering is ableto make use of sixteen noisy versions of the exactsame original frame; in real-life situations, rarelywill such fortunate circumstances arise. Theimportant conclusions to be drawn from theresults are not in regard to PSNR improvement

Table 3

Restoration results for compression that uses a three-dimensional DW

Sequence name PSNR (dB)

Quantized Gaussian I

barb 36.27 40.19

boat 34.01 36.69

mandrill 31.30 35.34

peppers 35.35 36.77

test-pattern 36.24 39.15

The PSNR listed under ‘‘Quantized’’ is the highest of the PSNR’s for

(compare with image 8 of Fig. 4). The noise model ‘‘Gaussian Ind’’ is a

frame has a constant error variance as predicted in Fig. 4.

relative to the noisy images, but rather the PSNRimprovements of the noise models relative to thoseof the IID Gaussian noise model. One importantresult is that relative to the IID noise model’sresults, there is practically nothing to be gained intemporal filtering by making use of the frame-to-frame error variances that were demonstrated inFig. 4; it seems that the pixel-wise noise character-istics are considerably more important than theframe-wise noise characteristics. The second im-portant result to be drawn from the table is thesignificant gain of using the full theoretic covar-iance matrix K

elzrelative to the other noise models.

4.4. Experiment with real video

While the previous three subsections dealt withsynthetically generated video, such situations arenot necessarily indicative of the results one wouldobtain with real video sequences. This subsectionconsiders temporal filtering of actual motionimagery that has been compressed using the 2DDWT. We will consider two cases: When theobservation error is dominated by quantizationnoise, and hence the motion-compensation errorek;l can be neglected; and when ek;l is included inthe formulation. For each of the two cases, twomodels are used for the quantization error—IIDGaussian noise with covariance matrix of s2I; andGaussian noise with covariance K

elz:

In previous subsections, the motion betweeninput frames was known exactly because of theartificial construction of the sequences. With real

T

ID Gaussian Ind Gaussian Kelz

40.29 42.38

36.79 39.17

35.37 38.50

36.79 37.90

39.21 41.14

the 16 input images, and corresponds to the PSNR at image 8

n independent Gaussian noise model that assumes each received

ARTICLE IN PRESS


video, the motion must be estimated. Here, weconsider video sequences that contain frames thatdiffer by a global transformation, e.g., stationaryscenes captured by a moving camera that is locatedat a far distance from the actual scene, as might beexpected from aerial surveillance video. Stationaryscenes are not a requirement, but they makemotion estimation simpler; with more sophisticatedmotion estimation, the algorithm described herecould be applied equally well. To register frames ofthe input imagery, an affine motion model isemployed. The six parameters of the affine modelthat relate the two frames are estimated iterativelywithin a coarse-to-fine multiresolution pyramid.Note that while previous sections assumed integerpixel motions, the more general model here allowsfor floating-point pixel motions; Ak;l matrices areconstructed based on bilinear interpolation forthese fractional pixel motions.The sequence used for testing here is the stickers

sequence; the author acquired this uncompressedsequence using a Pixelink PL-A641 monochromecamera. While the sequence is certainly not thesame as aerial surveillance video, it does sharecertain qualities—a large global-motion compo-nent, as well as significant detail at fine resolutions.The writing on the stickers will serve as a sort ofresolution chart for comparison of the restorationalgorithms. For this example, the compressedversions of five of these frames will be used toreconstruct the central image of the five frames.Although the five individual frames are not shownhere, they are similar to the images shownthroughout this subsection, but differ slightly dueto the camera’s motion.Results are first presented for two compression

qualities under the assumption that the quantiza-tion error dominates the overall error nk;l : As inprevious subsections, the compression is simulatedby scalar quantization of the coefficients from afour-level DWT decomposition. Fig. 9 presentsresults for images compressed to a relatively lowquality of approximately 32:3 dB; while Fig. 10presents results for images compressed at higherquality of about 36:5 dB: Each figure contains thecompressed middle frame of the five input images,along with restorations using both the fullquantization noise model and the IID noise model.

In Fig. 9, both the quantization noise model andthe IID noise model show considerable improve-ment over the original compressed image, whichshows the potential advantages of temporal filter-ing for this type of compressed sequence. Thevisual improvement of the quantization noisemodel relative to the IID noise model is evidencedby an overall sharper reconstruction, with moreapparent contrast. The PSNR is also slightlyhigher for the restoration using the quantizationnoise model.For the higher quality compressed image set of

Fig. 10, visual distinction among the images is notas clear. There are some improvements, both invisual quality and PSNR, for the restorationscompared to the compressed image; for example,the ‘‘California’’ sticker, or the ‘‘Go Bananas’’sticker. Although difficult to discern on the printedpage, the restoration for the quantization noisemodel is slightly sharper than that of the IID noisemodel; the ‘‘Go Bananas’’ sticker is arguably morelegible for the restoration of the quantization noisemodel. Unfortunately, the additional sharpness forthis higher-quality case comes at the cost of asharpening of noise as well, which accounts for theslightly lower PSNR for the restoration of thequantization noise model. In this case, noise beingsharpened is not compression noise el

z; but ratherthe motion compensation noise ek;l : Recall that therestorations of this section thus far have notincluded the motion compensation noise, butrather have assumed that the quantization noisedominates. Since the quantization noise covar-iance matrix is non-diagonal, and hence includescorrelations among the errors, an overall sharpen-ing can result from the filtering. For high-qualitycompressed motion imagery, the quantizationerror ceases to dominate as it did in the lower-quality case, and the motion-compensating errorbecomes more influential. As a result, the quanti-zation noise model can sharpen this motion-compensation noise. Such sharpening is notpresent for the IID case, which assumes that thenoise terms at each pixel observation are indepen-dent and identically distributed. Thus the veryquality that prevents the IID quantization noisemodel from providing sharp restorations alsoprevents it from enhancing noise.

ARTICLE IN PRESS

(a) (b)

(c) (d)

(e) (f)

Fig. 9. Restoration results for real video data. The five input images have PSNR values in the range 32.2–32:4 dB: The left columnshows the full 640� 480 images, while the right column shows zoomed portions of the images. (a,b) compressed image 2, PSNR ¼32:23 dB; (c,d) restored image 2 using quantization noise model, PSNR ¼ 34:91 dB; (e,f) restored image 2 using IID noise model,

PSNR ¼ 34:58 dB:


To better account for the motion compensationerror, one can explicitly model the term ek;l suchthat the overall noise term would be,

nk;l ¼ elz þ ek;l : ð16Þ

The motion compensation noise term ek;l has oftenbeen modeled as IID-Gaussian distributed [25]

with variance terms lk;l ; note that lk;l is zero fork ¼ l: With such an assumption, the overall noiseterm for the IID quantization noise case wouldhave covariance matrix,

Kk;l ¼ s2Iþ lk;lI ð17Þ

¼ #lk;lI; ð18Þ

ARTICLE IN PRESS

(a)

(c)(b)

Fig. 10. Close-up views of restoration results for real video data. The five input images have PSNR values in the range 36.4–36:6 dB:(a) Compressed image, PSNR ¼ 36:44 dB; (b) restored image using quantization noise model, PSNR ¼ 37:41 dB; (c) restored image

using IID noise model, PSNR ¼ 37:81 dB: PSNR values are for the full 640� 480 images.


which for practical purposes is nearly equivalent tothe IID quantization noise model used previously;this explains why the IID quantization noisemodel does not enhance the motion compensationerror. For the non-IID quantization noise model,the covariance matrix becomes,

Kk;l ¼ H�1KelyH�t þ lk;lI: ð19Þ

The restoration algorithms require the inversion ofthe noise covariance matrix, and it should bereadily evident that the inversion for the diagonalIID case is much easier than for the non-diagonaland non-IID case. However, since only theproduct of the matrix inverse with some inputvector is needed, and not the actual explicit inversematrix, iterative methods can be used in the non-IID case. Appendix B discusses the iterativeimplementation for inversion of the covariancematrix in (19). It is interesting to note the contrastbetween the covariance matrix of (19) and similar

ones used elsewhere in [7,22]: Although a similarcovariance matrix is used in [7], the authors makesimplifying assumptions to allow easier implemen-tation. The simplifying assumptions required K

ely

to be diagonal (like it is here) but with equalelements (unlike here), and H to be unitary, whichis true for the DCT as used by the authors, but isnot true for wavelet transforms typically used inimaging applications. In [22], the authors make adiagonal approximation for the covariance matrixto simplify implementation. In both cases thesimplifications lead to undesirable restrictions orapproximations, whereas the implementation dis-cussed here avoids these problems by directlysolving as discussed in Appendix B.Explicitly modeling the motion compensation

error ek;l within the overall noise nk;l leads toimprovements in the restorations of both IID andnon-IID quantization noise models. Fig. 11 showsresults for the experiment first reported in Fig. 10,

ARTICLE IN PRESS

(b)(a)

Fig. 11. Close-up views of restoration results for real video data using more complex error covariance matrices of (17) and (19). The

five input images have PSNR values in the range 36.4–36:6 dB: The compressed image being restored was shown in Fig. 10(a), with

PSNR ¼ 36:44 dB; and is not re-displayed here. (a) Restored image using quantization noise model, PSNR ¼ 38:22 dB; (b) restoredimage using IID noise model, PSNR ¼ 37:88 dB: Here, lk;l ¼ 20 j k � l j þ 10; and s2 ¼ 24:1: Note that PSNR values are for the full

640� 480 images.


but using the modified error covariance matricesof (17) and (19) instead. As indicated in the figurecaption, the PSNR performance of the restorationalgorithm that used the full quantization noisecovariance matrix has risen above that of the IIDquantization noise assumption; in both cases,PSNR’s are higher than when neglecting the ek;l

term. (Note that since the IID quantization noiseassumption already modeled ek;l fairly well, onlymarginal improvements result for the IID case.)Although PSNR has been improved, visualdifferences between these images and their coun-terparts presented in Fig. 10 are difficult todiscern. Similarly, the visual improvement ofFig. 11(a) relative to Fig. 11(b) is comparable tothat between Fig. 10(b) and Fig. 10(c).

4.5. Complexity, practicality, and applicability

Although the results presented in the precedingsubsections provide strong incentive for choosinga temporal filter that uses K

elzrather than one that

uses the IID assumption, there are drawbacks tothe more sophisticated filters that make use of K

elz:

The primary difficulty is computational complex-ity: When using the covariance matrix of (3) withinthe iterative optimization of Appendix A, repeatedevaluations of H and Ht (i.e., the forward DWTand its transpose) are required. When using the

modified covariance matrix of (19), in addition tothe optimization of Appendix A one must performthe inversion of Appendix B, which itself isiterative and requires repeated evaluation of H�1

and H�t (i.e., the inverse DWT and its transpose).Even with relatively fast DWT implementations,the computational requirements of the algorithmpreclude its use in applications that requireprocessing at real-time video framerates. By wayof example, the results presented in Fig. 9(c) and(d) required approximately 35 s of computationtime for our implementation on a 2:2 GHz PC;while the results in the figure corresponding to theIID filtering required approximately 4 s: Both ofthese times can be reduced by more efficientimplementations, but the difference between timesof the two filters will remain significant. A more-appropriate application for this work would be thesituation where a user wishes to extract a singlehigh-quality frame (or region of interest within aframe) for closer examination, and can tolerate abrief delay while the algorithm computes theestimate by temporal filtering; another applicationwould be off-line restoration of compressedmotion imagery, where high-quality frame esti-mates are of higher importance than real-timeoperation.A final observation about temporal filtering

concerns the algorithms’ sensitivity to errors in

ARTICLE IN PRESS


motion estimation. Motion compensation error, asused here, refers to the difference between animage area and its corresponding area in areference image. Motion estimation error refersto inaccuracies in the registration of the areas ofthe two images—for example, saying that anobject has moved by six pixels when in fact ithas only moved by five. For the same reasons thatthe non-IID quantization noise model leads tonoise amplifications for motion-compensationerror, the non-IID model also leads to erroramplification for motion-estimation errors. Errorsin motion estimation lead to large errors in motioncompensation, which when combined with thepotential sharpening features of the quantizationnoise model can lead to ringing-like artifacts. TheIID quantization noise model does not suffer fromsuch problems, because it only averages pixelsrather than sharpening them; motion-estimationerrors for the IID case simply result in over-blurring, and sometimes introducing ghostingartifacts, in the restoration. Additionally, whenusing simple block-based translational motion-estimation techniques for constructing the motion-compensating Ak;l matrices, ringing artifactscan sometimes result near the block boundariesfor the restorations using the quantizationnoise model. Such phenomena do not necessarilyindicate a limitation of the quantization noisemodel, but simply reinforce the necessity ofaccurately estimating correspondences betweenthe input images. Artifacts can be avoided byaccurately estimating and assigning the motion-compensation error covariance terms3lk;l : How-ever, as motion estimation accuracy decreases(or motion compensation error increases), the lk;l

by necessity become large enough to dominate theoverall error covariance term Kk;l : When lk;l

dominate, the resulting restorations are nearlyidentical to those achieved using the IID model.If accurate motion estimation is simply notpossible, then temporal filtering with the IIDquantization noise model may be more appro-priate due to its robustness to errors in motion

3 In addition, this term can be modified such that the pixel

variances vary from position to position depending on the

accuracy of the motion estimation.

estimation, in addition to its simpler implementa-tion. However, with accurate motion estimationthe quantization noise model provides betterrestoration results.

5. Conclusion

Assuming independent and identically distribu-ted noise for the error due to compression is notoptimal. This paper has demonstrated the benefitderived, both quantitatively and qualitatively,from proper modeling of the compression errorwhen performing temporal filtering of motionimagery. Although using the IID noise model fortemporal filtering does indeed improve the qualityof the video, significant further improvements weredemonstrated for the more-sophisticated noisemodeling. Appropriate applications of temporalfiltering as discussed here include motion imagerycompressed using a 3D DWT [13], as well asindividually compressed frames of video, forexample, Motion JPEG 2000 [10]. Although notdiscussed here, with slight modifications the workpresented here could be applied to other restora-tion scenarios that incorporate information fromsurrounding frames when estimating a singleframe, such as super-resolution of wavelet-com-pressed video.

Acknowledgements

The author would like to acknowledge thesupport of the Center for Integrated Transmissionand Exploitation (CITE), a joint endeavor of theInformation Directorate and the Air Force Officeof Scientific Research, both of the Air ForceResearch Laboratory. The author would also liketo thank Dr. Andrew Noga for helpful commentsin preparation of this manuscript.

Appendix A. Optimization for temporal filtering

To optimize (14), the method of conjugategradients is used. For iteration i þ 1; a newestimate for zk is obtained from the estimate at

ARTICLE IN PRESS


iteration i as,

zkðiþ1Þ ¼ zk

ðiÞ � aðiÞdðiÞ; ðA:1Þ

where zkðiÞ is the estimate at iteration i; dðiÞ is a

direction vector and aðiÞ is the step size thatcontrols how far along dðiÞ from zk

ðiÞ the newestimate is taken.The direction is computed as

dðiÞ ¼ gðiÞ þ bðiÞdði�1Þ; ðA:2Þ

where bðiÞ is

bðiÞ ¼gtðiÞgðiÞ

gtði�1Þgði�1ÞðA:3Þ

and gðiÞ is the gradient of the objective function in(14), computed as

gðiÞ ¼Xkþn

l¼k�n

Atk;lK

�tk;lðAk;lz

kðiÞ þ lk;l � zl

qÞ: ðA:4Þ

The initial estimate for the image is taken as zkð0Þ ¼

zkq ; and the initial direction is taken as dð0Þ ¼ gð0Þ:The optimal step size aðiÞ is computed by

minimizing the objective function of (14) for themodified image of (A.1). The result of such aderivation is

aðiÞ ¼dtðiÞgðiÞ

dtðiÞPkþn

l¼k�n Atk;lK

�1k;l Ak;ldðiÞ

: ðA:5Þ

Iterations of (A.1) are continued until the rate ofdecrease of the objective function falls below asmall threshold. For more details on the conjugategradient method of optimization, consult Nocedaland Wright [16].In some cases, for example Kk;l ¼ K

elz¼

H�1KelyH�t; the DWT transpose will need to be

evaluated for some of the computations above.Implementation details of such a function can befound in [21].

Appendix B. Inversion of full observation error

covariance matrix

The inverse of the full observation errorcovariance matrix Kk;l is needed to implementthe optimization of Appendix A when lk;la0;

K�1k;l ¼ ½H�1K

elyH�t þ lk;lI��1: ðB:1Þ

Due to the combination of the quantization noisecovariance matrix and the motion compensationerror covariance matrix, the above inverse cannotbe implemented through simple operations as ispossible when only one of the two noise terms ispresent. Since the inverse matrix is always appliedto an input vector v to yield an output vector u; itsuffices to solve only for the vector u withoutexplicitly evaluating and storing the full inversematrix,

u ¼ ½H�1KelyH�t þ lk;lI��1v; ðB:2Þ

which is equivalent to solving the followingequation for u:

½H�1KelyH�t þ lk;lI�u ¼ v: ðB:3Þ

Eq. (B.3) is solved here using the conjugategradient method. For notational convenience,replace the matrix to be inverted by the matrixP; such that (B.3) becomes

Pu ¼ v: ðB:4Þ

Solving (B.4) is equivalent to minimizing

12utPu� utv; ðB:5Þ

whose minimum is found when the gradient g withrespect to u is zero,

g ¼ Pu� v ¼ 0: ðB:6Þ

The u that minimizes (B.5) is computed using themethod of conjugate gradients in the exact samemanner as was described in Appendix A.Eqs. (A.2) and (A.3) are used to determine adirection dðiÞ; which when used with a step size of

aðiÞ ¼dtðiÞgðiÞ

dtðiÞPdðiÞðB:7Þ

leads to an updated estimate for u; analogous tothe update equation of (A.1).Note that in application of P; the transpose of

the inverse DWT must be applied, i.e., H�t: Such afunction is implemented similarly to the case forHt; which is discussed in [21].

ARTICLE IN PRESS


References

[1] M. Antonini, M. Barlaud, P. Mathieu, I. Daubechies,

Image coding using wavelet transform, IEEE Trans. Image

Process. 1 (2) (1992) 205–220.

[2] J.C. Brailean, R.P. Kleihorst, S. Efstratiadis, A.K.

Katsaggelos, R.L. Lagendijk, Noise reduction filters for

dynamic image sequences: a review, Proc. IEEE 83 (9)

(1995) 1272–1292.

[3] R.W. Buccicrossi, E.P. Simoncelli, Image compression via

joint statistical characterization in the wavelet domain,

IEEE Trans. Image Process. 8 (12) (1999) 1688–1701.

[4] D. Chen, R.R. Schultz, Extraction of high-resolution video

stills from MPEG image sequences, in: International

Conference on Image Processing, Vol. 2, 1998, pp. 465–469.

[5] M. Choi, Y. Yang, N.P. Galatsanos, Regularized multi-

channel recovery of compressed video, in: International

Conference on Image Processing, Vol. 1, 1997, pp. 271–274.

[6] F. Cocchia, S. Carrato, G. Ramponi, Design and real-time

implementation of a 3-D rational filter for edge preserving

smoothing, IEEE Trans. Consumer Electronics 43 (4)

(1997) 1291–1300.

[7] B.K. Gunturk, Y. Altunbasak, R.M. Mersereau, Super-

resolution reconstruction of compressed video using

transform-domain statistics, IEEE Trans. Image Process.

13 (1) (2004) 33–43.

[8] G. de Haan, IC for motion-compensated de-interlacing,

noise reduction, and picture-rate conversion, IEEE Trans.

Consumer Electronics 45 (3) (1999) 617–624.

[9] ISO/IEC 15444-1, Information technology—JPEG 2000

image coding system—Part 1: Core coding system, 2000.

[10] ISO/IEC 15444-3, Information technology—JPEG 2000

image coding system—Part 3: Motion JPEG 2000, 2002.

[11] L. Karray, P. Duhamel, O. Rioul, Image coding with an

LN norm and confidence interval criteria, IEEE Trans.

Image Process. 7 (6) (1998) 621–631.

[12] B.-J. Kim, W.A. Pearlman, An embedded wavelet video

coder using three-dimensional set partitioning in hierarch-

ical trees (SPIHT), in: Data Compression Conference,

1997, pp. 251–260.

[13] B.-J. Kim, Z. Xiong, W.A. Pearlman, Low bit-rate scalable

video coding with 3-D set partitioning in hierarchical trees

(3-D SPIHT), IEEE Trans. Circuits Systems Video

Technol. 10 (8) (2000) 1374–1387.

[14] J. Mateos, C. Ilia, B. Jim!enez, R. Molina, A.K. Katsagge-

los, Reduction of blocking artifacts in block transformed

compressed color images, in: International Conference on

Image Processing, Vol. 1, 1998, pp. 401–405.

[15] T. Meier, K.N. Ngan, G. Crebbin, Reduction of blocking

artifacts in image and video coding, IEEE Trans. Circuits

Systems Video Technol. 9 (3) (1999) 490–500.

[16] J. Nocedal, S.J. Wright, Numerical Optimization, Spring-

er, New York, Inc., New York, NY, 1999.

[17] A.J. Patti, Y. Altunbasak, Super-resolution image estima-

tion for transform coded video with application to MPEG,

in: International Conference on Image Processing, Vol. 3,

1999, pp. 179–183.

[18] H.A. Peterson, A.J. Ahumada, A.B. Watson, The visibility

of DCT quantization noise, in: J. Marreale (Ed.), Digest

of Technical Papers, Vol. 24, Society for Information

Display, Playa del Rey, CA, 1993, pp. 942–945.

[19] A. Pi&zurica, V. Zlokolica, W. Philips, Noise reduction in

video sequences using wavelet-domain and temporal

filtering, in: SPIE Wavelet Applications in Industrial

Processing, Vol. 5266, 2003, pp. 48–59.

[20] M.A. Robertson, Wavelet quantization noise in com-

pressed images and motion imagery, in: IASTED Interna-

tional Conference on Signal and Image Processing, 2003,

pp. 287–291.

[21] M.A. Robertson, Restoration of wavelet-compressed

images and motion imagery, AFRL Technical Report

AFRL-IF-RS-TR-2004-5, available from http://www.dtic.

mil, 2004.

[22] M.A. Robertson, R.L. Stevenson, Restoration of com-

pressed video using temporal information, in: SPIE Visual

Communications and Image Processing, Vol. 4310, 2001,

pp. 21–29.

[23] M.A. Robertson, R.L. Stevenson, DCT quantization noise

in compressed images, in: International Conference on

Image Processing, Vol. 1, 2001, pp. 185–188.

[24] A. Said, W.A. Pearlman, A new, fast, and efficient image

codec based on set partitioning in hierarchical trees,

IEEE Trans. Circuits Systems Video Technol. 6 (3) (1996)

243–250.

[25] R.R. Schultz, R.L. Stevenson, Extraction of high-resolu-

tion frames from video sequences, IEEE Trans. Image

Process. 5 (6) (1996) 996–1011.

[26] J.M. Shapiro, Embedded image coding using zerotrees of

wavelet coefficients, IEEE Trans. Signal Process. 41 (12)

(1993) 3445–3462.

[27] A.B. Sripad, D.L. Snyder, A necessary and sufficient

condition for quantization errors to be uniform and white,

IEEE Trans. Acoust. Speech Signal Process. ASSP-25 (5)

(1977) 442–448.

[28] R.L. Stevenson, B.E. Schmitz, E.J. Delp, Discontinuity

preserving regularization of inverse visual problems, IEEE

Trans. System Man Cybernet. 24 (3) (1994) 455–469.

[29] C.-J. Tsai, P. Karunaratne, N. Galatsanos, A. Katsagge-

los, A compressed video enhancement algorithm, in:

International Conference on Image Processing, Vol. 3,

1999, pp. 454–458.

[30] A.B. Watson, G.Y. Yang, J.A. Solomon, J. Villasenor,

Visibility of wavelet quantization noise, IEEE Trans.

Image Process. 6 (8) (1997) 1164–1175.

[31] B. Widrow, I. Koll!ar, M.-C. Liu, Statistical theory of

quantization, IEEE Trans. Instrum. Meas. 45 (2) (1996)

353–361.

[32] J.W. Woods, T. Naveen, A filter based bit allocation

scheme for subband compression of HDTV, IEEE Trans.

Image Process. 1 (3) (1992) 436–440.

[33] J. Xu, Z. Xiong, S. Li, Y. Zhang, Memory-constrained 3-D

wavelet transform for video coding without boundary

effects, IEEE Trans. Circuits Systems Video Technol.

12 (9) (2002) 812–818.

http://www.dtic.mil

http://www.dtic.mil

ARTICLE IN PRESS


[34] Y. Yang, M. Choi, N. Galatsanos, New results on

multichannel regularized recovery of compressed video,

in: International Conference on Image Processing, Vol. 1,

1998, pp. 391–395.

[35] J. Yang, H. Choi, T. Kim, Noise estimation for blocking

artifacts reduction in DCT coded images, IEEE

Trans. Circuits Systems Video Technol. 10 (7) (2000)

1116–1120.

[36] G.S. Yovanof, S. Liu, Statistical analysis of the DCT

coefficients and their quantization error, in: Asilomar

Conference on Signals, Systems and Computers, Vol. 1,

1996, pp. 601–605.

temporal filtering for restoration of wavelet-compressed motion imagery

Documents