[mathematics and visualization] innovations for shape analysis || on globally optimal local...

27
Chapter 17 On Globally Optimal Local Modeling: From Moving Least Squares to Over-parametrization Shachar Shem-Tov, Guy Rosman, Gilad Adiv, Ron Kimmel, and Alfred M. Bruckstein Abstract This paper discusses a variational methodology, which involves locally modeling of data from noisy samples, combined with global model parameter regularization. We show that this methodology encompasses many previously proposed algorithms, from the celebrated moving least squares methods to the globally optimal over-parametrization methods recently published for smoothing and optic flow estimation. However, the unified look at the range of problems and methods previously considered also suggests a wealth of novel global functionals and local modeling possibilities. Specifically, we show that a new non-local variational functional provided by this methodology greatly improves robustness and accuracy in local model recovery compared to previous methods. The proposed methodology may be viewed as a basis for a general framework for addressing a variety of common problem domains in signal and image processing and analysis, such as denoising, adaptive smoothing, reconstruction and segmentation. 17.1 Introduction A fundamental problem in both image and signal processing is that of recovering a function, a curve or a surface (i.e., a signal or an image) from its noisy and distorted samples. Significant research effort was invested in this problem and the results obtained so far are quite remarkable. The most important ingredient in the success of any method that extracts signals from noise is, of course, the set of assumptions that S. Shem-Tov () G. Rosman R. Kimmel A.M. Bruckstein Technion - Israel Institute of Technology, 2000, Haifa, Israel e-mail: [email protected]; [email protected]; [email protected]; [email protected] G. Adiv Rafael, Haifa, Israel e-mail: [email protected] M. Breuß et al. (eds.), Innovations for Shape Analysis, Mathematics and Visualization, DOI 10.1007/978-3-642-34141-0 17, © Springer-Verlag Berlin Heidelberg 2013 379

Upload: petros

Post on 08-Dec-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Chapter 17On Globally Optimal Local Modeling: FromMoving Least Squares to Over-parametrization

Shachar Shem-Tov, Guy Rosman, Gilad Adiv, Ron Kimmel,and Alfred M. Bruckstein

Abstract This paper discusses a variational methodology, which involves locallymodeling of data from noisy samples, combined with global model parameterregularization. We show that this methodology encompasses many previouslyproposed algorithms, from the celebrated moving least squares methods to theglobally optimal over-parametrization methods recently published for smoothingand optic flow estimation. However, the unified look at the range of problems andmethods previously considered also suggests a wealth of novel global functionalsand local modeling possibilities. Specifically, we show that a new non-localvariational functional provided by this methodology greatly improves robustnessand accuracy in local model recovery compared to previous methods. The proposedmethodology may be viewed as a basis for a general framework for addressing avariety of common problem domains in signal and image processing and analysis,such as denoising, adaptive smoothing, reconstruction and segmentation.

17.1 Introduction

A fundamental problem in both image and signal processing is that of recovering afunction, a curve or a surface (i.e., a signal or an image) from its noisy and distortedsamples. Significant research effort was invested in this problem and the resultsobtained so far are quite remarkable. The most important ingredient in the success ofany method that extracts signals from noise is, of course, the set of assumptions that

S. Shem-Tov (�) � G. Rosman � R. Kimmel � A.M. BrucksteinTechnion - Israel Institute of Technology, 2000, Haifa, Israele-mail: [email protected]; [email protected]; [email protected];[email protected]

G. AdivRafael, Haifa, Israele-mail: [email protected]

M. Breuß et al. (eds.), Innovations for Shape Analysis, Mathematics and Visualization,DOI 10.1007/978-3-642-34141-0 17, © Springer-Verlag Berlin Heidelberg 2013

379

380 S. Shem-Tov et al.

summarizes our prior knowledge about the properties of the signal that effectivelydifferentiates it from the noise. These assumptions range from some vague generalrequirements of smoothness on the signals, to quite detailed information on thestructure or functional form of the signals that might be available due to priorknowledge on their sources.

The prior information on signal/image is often expressed in the form of aparameterized model. For instance, in speech recognition [25] slowly varyingcoefficients of the short-time Fourier transform (STFT) are used to locally describeand model highly fluctuating spectral characteristics over time. In object recognitionthe choice of the correct spatial support for objects i.e., the segmentation, is afundamental issue [19], hence, in general scene understanding, support maps areused to represent segmentation of images into homogeneous chunks, enablingthe representation of objects as disjoint regions with different local modelingparameters [9]. This concept of spatial support selection for estimation problemsis closely related to layered models of scenes which offer significant benefits formotion estimation producing state of the art results [33]. In geometric modeling,B-splines (which are essentially a continuous set of piecewise polynomials), areused for local curve and surface approximation, interpolation and fitting fromnoisy samples [8]. In model-based texture segmentation [13], the selection of anappropriate support set for the local model, is important to obtain a good localtexture representation, which then serves as a basis for the segmentation process.In sparse coding [1, 23], the main goal is to model data vectors (signals) as a linearcombination of a few elements (support set) from a known dictionary. Sparse codinghas proven to be very effective for many signal or image processing tasks, as wellas advances in computer vision tasks such as object recognition.

One of the widespread and successful methods for local signal modeling, isthe celebrated moving least squares local fitting method (MLS), which in recentyears has evolved to become an important tool in both image and signal processingand in computational graphics. In [17], Levin explored the moving least-squaresmethod and applied it to scattered-data interpolation, smoothing and gradientapproximation. In [2, 4, 18] the moving least squares technique was employed formodeling surfaces from point-sampled data, and proved to be a powerful approach.This was followed by the work of Fleishman et al. [2, 11], incorporating robuststatistics mechanisms for outlier removal. Common to these works is the locality ofthe fitting procedure and the lack of global assumptions expressing prior knowledgeon the variations of local parameters.

The aim of this paper is to show how one can design variational functionals thatexploit local fitting of models and global smoothness assumptions on the variationsof model parameters, that are natural for various types of signals. A first attempt,at such a variational methodology, was made by Nir et al. [21, 22]. This work wassubsequently broadened and generalized by Bruckstein in [6], by the realizationthat over-parametrization methods naturally follow from combining moving leastsquares, or other local fitting methods, with global priors on parameter variations.The local modeling relates to a wealth of classical methods, such as Haralick’sand Watson’s facet model for images [14] and extends them in many ways.

17 On Globally Optimal Local Modeling 381

More importantly, the discussion and experimental results reported in this paperpoint at a rather general methodology for designing functionals for variationalmodel estimation and signal reconstruction and focusing on the denoising problemis merely an illustrative test case.

The importance of the proposed variational framework, lies in the fact thatit allows for directly incorporating knowledge of the problem domain, henceit is easily extendable to address numerous problem areas, such as denoising,deconvolution and optical flow in image and signal processing and various otherfields of research. Moreover, due to the structure of the proposed functionals, ourvariational framework is able, unlike many common methods, to accurately recoverthe underlying model of a signal while addressing its main aim.

17.2 The Local Modeling of Data

For the sake of simplicity, we shall here limit our discussion to one dimensionalsignals and address the generic problem of denoising. Let f .x/ be a one dimensionalsignal and fnoisy.x/ D f .x/ C n.x/ be it’s noisy counterpart. Also, denote by˚fj D f .xj / C n.xj /

�the set of samples of the noisy signal. Suppose that f can

be (locally) described by a parameterized model of the form

f .x/ DnX

iD1

Ai �i .x/ (17.1)

where A D fAi gniD1 is a set of parameters, and � D f�i gn

iD1 is a set of ‘basis’signals. Local modeling is the process of estimating A D fAi g from the noisysignal fnoisy or it’s samples

˚fj

�, in the neighborhood of a point x. As a simple

example, we can consider the Taylor approximation as a parameterized model withpolynomial basis functions �i D xi�1.

Suppose we would like to use this local modeling for denoising our noisy datafnoisy.x/ or

˚fj

�. Then, around x D x0, we want to estimate Ai.x0/, i.e., the

parameters of the model (17.1), by solving:

arg minŒA1;A2;:::;An�

�����fnoisy.x0/ �

nX

iD1

Ai �i .x0/

�����

; (17.2)

in some local neighborhood of x0 and a distance norm k�k. This minimization givesus the best local estimate of Of .x0/

Of .x0/ DnX

iD1

Ai .x0/�i .x0/: (17.3)

Repeating this process for every location x give us the “moving” best estimate of f .

382 S. Shem-Tov et al.

The choice of the distance or measure of error is of a great importance. Onecommon choice is the weighted least squares distance, as considered, for example,by Farneback [10], as a generalization to the facet model:

�����fnoisy.x/ �

nX

iD1

Ai .x/�i .x/

�����

w

DZ

fnoisy .y/ �nX

iD1

Ai .x/ �i .y/

!2

w .y � x/ dy:

(17.4)

This is essentially a weighted L2 norm, where w.�/ is a weight function whichlocalizes the estimation of the parameters Ai.x0/ (This error measure can also beapplied to a sampled signal). Both the continuous and the discrete cases, describedabove, yield a process to compute the local model parameters Ai .x/ and thereforefor estimating the signal Of .x/, not only at x0 but at all x. This process is the well-known moving least squares estimation process.

We wish to emphasize that although our proposed methodology, discussed in thenext section, focuses on the L2 norm, this is by no means the only choice of errormeasure. The choice of the error measure may be adapted to each particular problemof interest. Other measures, such as the least sum of absolute values distance [34]or L1 norm, can readily be substituted into the cost functionals.

17.3 Global Priors on Local Model Parameter Variations

In the previous discussion we did not impose any conditions on the modelparameters Ai , and in the definition of the neighborhood around x0, via the weightfunctions, we did not use any idea of adaptation to the data.

Suppose now that we are given some structural knowledge of the signal f .x/. Wewould like to use this knowledge to improve the estimation process. For example,suppose we a priori know that f .x/ is a piecewise polynomial signal over a set ofintervals, i.e., we know that:

f .x/ DnX

iD1

Ari x

i�1 for x 2 Œxr ; xrC1� (17.5)

but we do not know the sequence of breakpoints fxr g. Using polynomial basisfunctions

�1; x; x2; : : : ; xn�1

�, we know a priori that a good estimate for f .x/ may

be provided by piecewise constant sets of parameters fAg, over Œxr ; xrC1� segments,and changes in the parameters occur only at the breakpoints fxrg. Such knowledgeprovides us the incentive to impose some global prior on the parameters, suchthat the minimum achieved by the optimization process, will indeed favor themto be piecewise constant. This may be achieved by supplementing the moving leastsquares local fitting process with constraints on the variations of the parameters inthe minimization process. Thus, we shall force the estimated parameters not only to

17 On Globally Optimal Local Modeling 383

provide the best weighted local fit to the data, but also to be consistent with the localfitting over adjacent neighborhoods. This is where we diverge and extend the facetmodel, which assigns the basis function’s parameters at each point, using solely alocal least square fit process.

In this paper, the assumed structural knowledge of the signal implies that goodestimates can be achieved by a piecewise constant model i.e., a model whoseparameters are piecewise constant. Therefore the focus of this work will be to designfunctionals which also impose global priors on the model parameters. We shalldemonstrate that one can design a functional that does indeed fulfil this requirement.

The mix of local fitting and global regularity is the main idea and power behindover-parameterized variational methods and is what makes them a versatile problemsolving tool. By adapting the local fitting process and incorporating the global prior(in a way that will be described in the following sections), this methodology can bereadily applied to address problems in various domains.

17.4 The Over-parameterized Functional

In [21] Nir and Bruckstein presented a first attempt at noise removal based on anover-parameterized functional. This functional was, similarly to many well knownfunctionals, a combination of two terms, as follows:

E .f; A/ D ED .f; A/ C ˛ES .A/ : (17.6)

Here ˛, is some fixed relative weight parameter, ED is a data or fidelity term, andES is a regularization term. The data term ED was chosen to:

ED.f; A/ DZ

fnoisy.x/ �nX

iD1

Ai .x/�i .x/

!2

dx: (17.7)

Note that this functional implies a neighborhood of size 0 in the distance mea-sure (17.4), which means that this data term only penalizes for point-wise deviations

of the estimated signal vianP

iD1

Ai.x/�i .x/ from fnoisy.

The smoothness or regularization term, ES , was defined to penalize variations ofthe parameters Ai as follows

ES .A/ DZ

nX

iD1

A0i .x/2

!

dx (17.8)

where ��s2� D p

s2 C �2.

384 S. Shem-Tov et al.

The resulting functional yields a channel-coupled total variation (TV) regular-ization process for the estimation of the model parameters. Note that (17.8) is anapproximated L1 type of regularizer (sometimes referred to as the Charbonnierpenalty function). This regularizer causes the functional to be more robust tooutliers, and allows for smaller penalties for high data differences (compared toa quadratic regularizer), while maintaining convexity and continuity [5, 20]. Theregularization term was designed to impose the global prior on the parameters. Itis channel-coupled to “encourage” the parameters to change simultaneously, thuspreferring a piecewise constant solution as described in Sect. 17.3.

In our experiments (which are discussed in Sect. 17.7), as well as in [21] , thisfunctional displayed good performance for noise removal compared with RudinOsher and Fatemi’s [28] classical total variation noise removal functional. A similarfunctional, with data term modifications, was used by Nir et al. in [22] for opticalflow estimation, producing state of the art results.

17.4.1 The Over-parameterized Functional Weaknesses

Despite the good performance displayed by the over-parameterized functional,it still lacks with regard to the following shortcomings, that were clear in ourexperiments:

Discontinuities smearing: As mentioned, the regularization term is an approxi-mate L1 regularizer. A precise L1 regularizer is indifferent to the way signaldiscontinuities appear, i.e. the same penalty is given to a smooth gradual signalchange, and to sharp discontinuities (as long as the total signal difference isthe same). See for example Pock’s Ph.D. work [24] for a detailed example. Weconsider this property as a shortcoming, because we expect the reconstructedparameters to be piecewise constant, where discontinuities appear as relativelyfew and sharp changes, hence this regularizer does not convey the intended globalprior on the parameters, and does not prefer a “truly” piecewise constant solution.In practice the problem is even more severe: first the selection of � constant, inthe Charbonnier penalty function, proves to be problematic. Choosing a bigger �

value causes the functional to lose the ability to preserve sharp discontinuitiesand actually prefers to smooth out discontinuities. On the other hand, choosinga smaller � value degenerates the penalty function. In fact, for any choiceof �, this penalty function will tend to smooth sharp discontinuities. Second,as discussed above the TV-L1 model suffers from the so called staircasingeffect, where smooth regions are recovered as piecewise constant staircases inthe reconstruction. See the work of Savage et al. [29] and references therein, fora detailed review of such effects.

Origin biasing: The over-parameterized functional’s global minimum maydepend on the selected origin of the model. In the over-parameterized

17 On Globally Optimal Local Modeling 385

methodology we consider basis functions which are defined globally across thesignal domain. This definition requires us to fix an arbitrary origin for the basisfunctions. As the true parameters value may vary with a change of the originchoice, thus the value of the regularization term may also vary. For instance,consider the value of the constant term in a linear function, which determinesthe point at which the line crosses the y-axis. Change of the y-axis locationi.e., the origin location, will incur achange in the value the constant term. Thisdependency on the choice of the basis function origin is termed Origin biasing.A detailed example, regarding the optical flow over-parameterized functionalwith affine flow basis functions, is given in the work of Trobin et al. [35].

We also note that the data term presented above only penalizes for point-wisedeviation form the model, hence it imposes only a point-wise constraint on thefunctional’s minimum, relying only on the regularization term to impose the globalconstraint. A discussion why this is a problematic issue is given in Sect. 17.5.3.

Overall, it is evident, that despite producing good results when applied to variousapplications, the over-parameterized functional model is fundamentally flawedwhen attempting to accomplish parameter reconstructions. On one hand the over-parameterized model provides a solution domain wider than the TV model, for thefunctional to “choose” from, thus often enabling convergence to excellent denoisingsolutions, on the other hand the constraints applied to the solution domain, throughthe functional, are not strong enough as to impose convergence to piecewise-constant-parameters solution, as demonstrated in Sect. 17.7.

17.5 The Non-local Over-parameterized Functional

To overcome the shortcomings described in Sect. 17.4.1, we shall modify thefunctional, both in the data term and in the regularization term, as described below

17.5.1 The Modified Data Term: A Non-local FunctionalImplementing MLS

In order to overcome the point-wise character of the data term, and to impose aneighborhood constraint in the spirit of (17.4) in the data term, we extend it to whatis commonly referred to as a non-local functional [6, 12, 16]. This is done simplyby means of defining a weighting function which considers more then point-wisedifferences.

Making the data term a non-local functional, requires the parameters to modelthe signal well over a neighborhood of each point. We note that the robustness ofthe parameters estimate increases with the size of the support set. On the other hand,

386 S. Shem-Tov et al.

increasing the size of the support set too much, may reduce the functional’s abilityto detect discontinuities and to preserve them.

The non-local data term functional is

ED DZ

x

Z

y

fnoisy.y/ �nX

iD1

Ai .x/ �i .y/

!2

w .x; y/ dydx (17.9)

and it conforms to the functional form of the weighted least squares fit distancedefined in Sect. 17.2. Note that there is yet no restriction on the size or shape of thesupport set around each point that is induced by the weighting function w .x; y/.

For the 1D case, which is thoroughly explored in this work, we defined a simpleyet powerful sliding window weighting function, as described in Sect. 17.5.1.1. Thegeneralization of the functional to the 2D case may require a more sophisticatedoptimal data dependent weighting function, which will be the subject of futureresearch.

17.5.1.1 Weighting Scheme

The choice of the weighting scheme is of great importance. For each location x0,the weighting scheme expresses the support set from which the parameters are to bereconstructed from i.e. the set of samples which are used for the implicit MLS fittingprocess calculated in the minimization process. Thus the weighting scheme shouldbe designed in such a manner which, at points near a discontinuity, will preventthe combination of samples from different sides of the discontinuity, thus enablesthe preservation of signal discontinuities in the reconstruction process. This impliesthat a simple box or gaussian window around each location x0 will not suffice, as itmay spread a discontinuity across several samples around the actual location of thediscontinuity (which was evident in various experiments we performed).

In order to avoid this behavior, we defined an adaptive sliding window weightingfunction, which is closely related to concepts of ENO schemes used for theapproximation of hyperbolic conservation laws first introduced in the fundamentalwork of Harten et al. [15] (a detailed description was presented by Shu in [32]). Foreach point x of the signal, we choose a window W of length N such that, x 2 W and

r .x/ DNP

j D1

r�x; yj

�is minimal, where yj 2 W and r

�x; yj

�is the non-weighted

least squares fit distance at x:

r�x; yj

� D

fnoisy.yj / �nX

iD1

Ai .x/ �i .yj /

!2

: (17.10)

Specifically, we chose a successive set of points of size N which include x andalso minimizes r .x/. For example, for a window of size 5, there are 5 window

17 On Globally Optimal Local Modeling 387

x-5 x-4

W1W2W3

W5W4

x-3 x-2 x-1 x x+1 x+2 x+3 x+4 x+5

possibilities as described below: We mark the chosen window by wx , and theselected weight function will then be:

w.y � x/ D8<

:

1N

if y 2 wx

0 otherwise(17.11)

By no means do we claim that this is the best choice of the weightingfunction. This is but one possible, adaptive weighting function selection process,that enhances the data term to impose a non point-wise constraint, while allowingfor the preservation and sharpening of discontinuities in the signal and that wasfound to yield good results. Note that the choice of the window is sensitive tonoise, therefore for achieving better results we updated the selected window at eachlocation throughout the minimization process as described in Sect. 17.6.

17.5.2 The Modified Regularization Term

We extend the over-parameterized regularization term using the Ambrosio-Tortorelli(AT) scheme [3] in a similar manner as done in [7] for the TV functional, whileretaining the L1 penalty function as proposed by Shah [30, 31], and applied to theover-parameterization functional by Rosman et al. [26]. This functional transformsthe regularization into an implicit segmentation process, each segment with it’sown set of parameters. In effect, the AT scheme allows the regularization term toprefer few parameter discontinuities and to prevent discontinuities from smearinginto neighboring pixels via the diffusion process, thus allowing piecewise smoothsolution. This directly addresses the discontinuities smearing effect described inSect. 17.4.1. The choice of L1 regularizer, as opposed to the L2 regularizer inoriginal AT scheme, is due to the fact that a L1 regularizer better encourages apiecewise constant solution, which is the intended global prior we wish to imposeon the solution. Exploration of different sub-L1 regularizer functions, is beyond thescope of this paper.

The chosen AT regularization term is:

ES;AT DZ

.1 � vAT /2 �

nX

iD1

jjA0i jj2!

C �1.vAT /2 C �2

��v0

AT

��2

(17.12)

388 S. Shem-Tov et al.

where vAT is a diffusivity function, ideally serving as an indicator of the parametersdiscontinuities set in the signal model. Suppose we have a piecewise linear signal,and an ideal solution .A�; v�

AT / where A� is piecewise constant, and the diffusivityfunction v�

AT is 1 at the linear regions boundaries and 0 elsewhere. With sucha solution, we expect two neighboring points, belonging to different regions, tohave a very small, in fact negligible, diffusivity interaction between them. Thisis controlled by the value of v�

AT at those points, which effectively cancels thediffusivity interaction between the different sets of linear parameters. Furthermore,the cost associated with this solution is directly due to the discontinuity set measurein the signal i.e. to Z

�1.vAT /2 C �2

��v0

AT

��2

(17.13)

hence the penalty no longer depends on the size of the signal difference atdiscontinuities. Moreover, the AT regularization addresses the origin biasing effect,described in Sect. 17.4.1, by making the functional much less sensitive to theselected origin. This is due to the fact that ideally we consider only piecewiseconstant parameters solutions. These solutions nullifies the regularization termenergy at every location except for discontinuities where the energy depends solelyon energy (17.13). Therefore the ideal piecewise constant solution becomes a globalminimizer of the functional.

17.5.3 Effects of the Proposed Functional Modifications

An obvious question arises: why do we need to modify both data and regularizationterms? To answer this, we first notice that using only the non-local data termimproves the local parameter estimates, but cannot prevent the discontinuitiessmearing effect. A moving least squares process, with small window, will yieldexcellent parameter estimates, but is unable to prevent the diffusion process fromcombining data from neighboring segments, thus smoothing and blurring theestimates at boundaries locations. Therefore we need to add the AT scheme to thefunctional, which has the ability to prohibit the diffusion process at discontinuities.

But then one might expect that the AT scheme without the non-local data termwould suffice, by segmenting the regularization into piecewise constant regions,and relying on the data term and the global regularization to recover the correctparameters for each segment. In practice this is not the case. Consider the followingillustrative test case: ys a discontinuous signal, depicted in Fig. 17.1, and suppose weinitialize the model parameters to a smoothed version of ys achieved by calculatingthe moving least squares fit solution, with a centralized window, on the clean signal.We will now discuss applying different functionals for reconstructing the modelparameters from clean signal ys and the initialized smooth parameters.

Figures 17.2 and 17.3, depict snapshots of the reconstructed signal, parameters,and the resulting AT indicator function vAT at different time steps of the minimiza-tion process. We used ˛ D 0:05 and ˛ D 10 respectively. These are snapshots of the

17 On Globally Optimal Local Modeling 389

0 10 20 30 40 50 60−0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9OriginalSmoothed

Fig. 17.1 Solid line – ys apiecewise linear signal withone discontinuity. Dashedline – a smooth version of ys

with which we initialized themodel parameters

0 20 40 60

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 20 40 60

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 20 40 60

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1VATy

0 20 40 60−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

0 20 40 60−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

0 20 40 60−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1 A0A1

Fig. 17.2 From left to right, snapshots at various times, of the point-wise over-parameterizedfunctional modified with the AT regularization term (MOP), with relative weight of Es ˛ D 0:05,�1 D 7:5 and �2 D 5. The top image displays the reconstructed signal and the vAT indicatorfunction. The bottom image displays the reconstructed parameters

390 S. Shem-Tov et al.

0 20 40 60

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 20 40 60

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 20 40 60

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

VATy

0 20 40 60−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

0 20 40 60−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

0 20 40 60−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

A0A1

Fig. 17.3 From left to right, snapshots at various times, of the Mpoint-wise over-parameterizedfunctional modified with the AT regularization term (MOP), with relative weight of Es ˛ D 10,�1 D 7:5 and �2 D 5. The top image displays the reconstructed signal and the vAT indicatorfunction. The bottom image displays the reconstructed parameters

minimization process of the point-wise over-parameterized functional, modifiedwith the AT regularization term (MOP). Both minimizations were carried outuntil convergence. It is evident that the reconstructed parameters are not piecewiseconstant, as we would like to have from our prior information on the signal.Also, it is important to note that in the first series (Fig. 17.2) the signal isperfectly reconstructed, effectively nullifying the data term, and in the second series(Fig. 17.3) the signal is smoothed, however the data term energy is still very low.

In contrast, Fig. 17.4 depicts a series of snapshots of the minimization process ofthe non-local over-parameterized functional (NLOP). Note that here the parametersare perfectly reconstructed, the AT indicator function vAT receives a value closeto one only in the vicinity of the discontinuity and also the signal is perfectlyreconstructed.

We wanted to try and understand why did the MOP functional converged to sucha solution, is this solution a local/global minimum of the functional? We addressed

17 On Globally Optimal Local Modeling 391

0 20 40 60

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 20 40 60

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 20 40 60

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1 VATy

0 20 40 60−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

0 20 40 60−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

0 20 40 60−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1 A0A1

Fig. 17.4 From left to right, snapshots at various times, of the NLOP functional, with relativeweight of Es ˛ D 0:1, �1 D 0:005 and �2 D 0:0025. The top image displays the reconstructedsignal and the vAT indicator function. The bottom image displays the reconstructed parameters

the question in hand through functional energy consideration. Using the expectedglobal minimum of both functionals (which was achieved by the minimizationprocess of the NLOP functional), we calculated using interpolation, hypotheticalsteps of a minimization from the solution achieved by the MOP functional to theglobal minimum. We then calculated the energy of the MOP functional on eachhypothetical step. It becomes obvious that the energy of the MOP functional is risingbefore dropping to the energy level of the expected global minimum, indicating thatindeed the MOP functional converged into a local minimum. Separate calculationof the energy of the data and the regularization terms achieved in the minimizationof the MOP functional, indicates that most of the functional energy is concentratedin the regularization term. In the transition between the local and global minimumsolutions, the regularization term energy rises and dictates the total energy change,while the data term contribution is negligible.

392 S. Shem-Tov et al.

The solutions to which the MOP functional converges and energy considerations,lead us to the conclusion that the MOP functional is converging into a localminimum solution. This “trapping” effect is alleviated in the NLOP functional,where the presumed local minimum, achieved by the MOP, is no longer a localminimum. This is due to the fact that it is much more difficult to drive the energyof the non-local data term close to zero and it contributes significantly to drive theparameters toward their correct values. Thus, the minimization process does not haltand continues toward the global minimum.

17.5.4 Euler-Lagrange Equations

Once we designed the functionals to be minimized, by interchangeably fixingthe Ai .x/; i D 1 : : : n and vAT .x/ functions, we readily obtain the Euler-Lagrangeequations which characterize the minimizers of the functional.

17.5.4.1 Minimization with Respect to Aq.x/; q D 1 : : : n

(Parameter Minimization Step)

Fixing vAT .x/, we obtain

8q D 1 : : : n; rAq ED � d

dx

�rA0

q˛ES;AT

D 0: (17.14)

the variation of the data term with respect to the model parameter functions Aq.x/

is given by

rAq ED D 2

Z

y

fnoisy.y/ �nX

iD1

Ai .x/ �i .y/

!

�q.y/w .x; y/ dy: (17.15)

For the smoothness term, the Euler-Lagrange equations are

d

dx

�rA0

q˛ES;AT

D 4˛ .1 � vAT / v0

AT � 0

nX

iD1

jjA0i jj2!

A0q

C2˛.1 � vAT/2 d

dx

0

@� 00

@X

j

jjA0i jj21

AA0q

1

A (17.16)

thus, the energy is minimized by solving the following nonlinear system ofequations at each point x, 8q D 1 : : : n

17 On Globally Optimal Local Modeling 393

2

Z

y

fnoisy.y/ �nX

iD1

Ai .x/ �i .y/

!

�q.y/w .x; y/ dy

� 4˛ .1 � vAT / v0AT � 0

nX

iD1

jjA0i jj2!

A0q

� 2˛.1 � vAT /2 d

dx

0

@� 00

@X

j

jjA0i jj21

AA0q

1

A D 0: (17.17)

17.5.4.2 Minimization with Respect to vAT .x/ (AT Minimization Step)

Fixing the functions Ai .x/; i D 1 : : : n, we obtain

� 2˛ .1 � vAT / Es C 2�1 .vAT / � �2.v00AT / D 0: (17.18)

17.6 Implementation

We used central first and second derivatives and reflecting boundary conditions.In all the methods, we used various ˛; �1 and �2 constants, depending on thenoise level, as is common in noise reduction methods (this was done for all theconsidered algorithms, with the appropriate parameters). In all the examples, weassumed a sampling interval of dx D 1. For the minimization process we usedgradient descent with 200,000 iterations. This minimization method is a notoriouslyslowly converging method, but it is fast enough for our 1D example, and we intendto pursue a faster implementations in future work.

We performed an AT minimization step every 100 parameter minimization steps,and updated the weighting function every 1,000 parameter minimization steps. Weused a window size of ten sample points both for the NLOP functional, and forthe weighted least square fit method (used for initialization and reconstruction). Wenote that using a bigger window size resulted in significantly superior results on alltests, but this may not be the case in other signals. Choosing too big a window maycause an overlap between adjacent discontinuities and prevent the functional fromcorrectly recovering them.

17.6.1 Initialization

In order to prevent trapping into local minima, we initialize the model param-eters from the noisy data by means of a robust MLS fitting i.e. we computeeach point’s parameters by choosing the best least square fitting approximation

394 S. Shem-Tov et al.

via a sliding window least square calculation. This was done in exactly the samemanner as the sliding window weighting function described in Sect. 17.5.1.1. Theparameters chosen are those which generated the minimal reconstruction error.This computation provided a robust initialization, which already preserves, to someextent, discontinuities in the signal. This claim is strengthened by experiments weperformed comparing with a regular moving least square fitting initialization. Wefound that with the robust initialization, the functional converges to a better solution,and nicely exposes the true discontinuities in the signal models.

17.7 Experiments and Results

We conducted various experiments in order to verify the performance of theproposed functional. In this paper we focus mainly on 1D examples leaving 2Dextensions to images for future publications, nevertheless we exhibit some initialexperiments conducted on 2D synthetic images which yield a good idea of theperformance to be expected in the 2D case.

17.7.1 1D Experiments

We begin with the selection of the basis functions. We consider linear basis functionsof the form:

�1 D 1

�2 D x(17.19)

This seemingly simple choice of functions, enables us to make comprehensive testsof the functional performance. Under this choice, the functional is expected to havethe best performance on piecewise linear signals. We note that this is an arbitrarychoice of basis functions, and one should choose other basis functions appropriatefor the signal domain.

To perform the tests we devised a set of synthetic 1D signals, and added whiteGaussian noise with standard deviation ranging from STD D 0:01 up to STD D0:25. These signals can be separated to two main groups:

• The first group is comprised of noisy piecewise linear signals. This group isinteresting because it enables us to test the parameter reconstruction performanceas well as noise reduction capabilities, under the optimal basis functions. Thepiecewise linear signals are depicted in Fig. 17.5.

• The second group is comprised of noisy nonlinear signals, such as higher degreepolynomials. This group is interesting only with regard to the noise removalperformance, because generating the ground truth parameters for linear basis

17 On Globally Optimal Local Modeling 395

0 50 1000

0.2

0.4

0.6

0.8

1a b c

0 50 100

0

0.2

0.4

0.6

0.8

1

0 50 100−0.5

0

0.5

1

0 50 100−0.5

0

0.5

1

0 100 200 300−0.5

0

0.5

1

0 100 200 300−0.5

0

0.5

1

Fig. 17.5 The piecewise Linear test signals and their STD 0.05 noisy counterparts. (a) One jump,(b) one discontinuity, and (c) many jumps & discontinuities

functions is problematic and we do not expect the linear parameters to bepiecewise constant. Naturally, we could choose appropriate basis functions forthese signals too, but we wanted to demonstrate that our functional preformssurprisingly well, even when applied with suboptimal basis functions. Thenonlinear signals are depicted in Fig. 17.6.

In order to the check the performance of NLOP functionals, and to test themagainst other denoising algorithms, we also implemented the following noiseremoval algorithms. The first two are the classic TV functional and the original over-parameterized functional (OP). We chose these functionals due to their relation tothe NLOP functional, enabling us to show the improvement that the NLOP hason predecessor functionals. The third and final algorithm, is the state of the artK-SVD noise removal algorithm, firstly proposed by Aharon et al. [1]. We usedimplementation published by Rubinstein et al. [27].

We compared the various algorithms noise reduction performance and, moreimportantly, we compared their parameter reconstruction capability. For the lattercomparison, we reconstructed parameters from the K-SVD denoised signal, usingour robust least square fitting method (see Sect. 17.6.1), and compared the resultswith both our NLOP functional and the original OP functional.

396 S. Shem-Tov et al.

0 50 1000.2

0.4

0.6

0.8

1

0 50 100

0.2

0.4

0.6

0.8

1

0 100 200

0

0.5

1

0 100 200

0

0.5

1

Polynomial Trig

a b

Fig. 17.6 The nonlinear test signals and their STD 0.05 noisy counterparts. Left: a polynomialssignal of degree 2. Right: a signal of combined sine and cosine functions

17.7.2 1D Results

Noise removal performance testing, was done by comparing the L2 norm of theresidual noise, that is the difference between the cleaned reconstructed signal andthe original signal:

EL2 D�����f .x/ �

nX

iD1

Ai.x/�i .x/

�����

2

: (17.20)

Parameters reconstruction performance testing (which with the linear basisfunction, is only relevant on the piecewise linear signals), was done by calculatingthe L2 norm of the difference between the reconstructed parameters and the originalparameters, with whom we generated the signal. Figure 17.7 depicts an exampleof the recovered parameters and the vAT indicator function obtained for the “Manyjumps & discontinuities” signal.

Figure 17.8, displays graphs comparing performance of the various algorithmson piecewise linear signals. The left graphs display the noise removal error norms,while the right graphs display parameters reconstruction error norms.

On various noise removal performance graphs we can see the excellent perfor-mance of both OP and NLOP functionals, reinforcing the claims of outstandingnoise removal performance obtained by the OP functional and maintained by ournew functional. Also, we can see that when the signal contains a discontinuity, suchas in the “One discontinuity” signal (as apposed to a continuous signal with onlyparameters discontinuity such as the “One jump” signal), the NLOP functional hasgreater ability to cope with the discontinuity, thus generating better results than

17 On Globally Optimal Local Modeling 397

0 50 100 150 200 250 300

−0.5

0

0.5

1

Fig. 17.7 Example of parameters reconstruction. Note that the indicator function tends to 1 wherethe signal has parameters discontinuities and tends to 0 in almost any other location

the OP functional. In Fig. 17.9 we display comparison of the residual noise of allthe algorithms on the “One discontinuity” and the “Many jumps & discontinuities”signals.

When considering parameters reconstruction performance, we see a totallydifferent picture. We see that on most cases, particularly in signals which containssignal discontinuity, the NLOP functional and the K-SVD algorithms both outper-form the OP functional. This result demonstrates our claim, in Sect. 17.4.1, that theOP functional lacks the possibility to well enforce a global prior on the reconstructedparameters. Figure 17.10 compares the reconstruction results of NLOP functional,OP functional and the reconstruction from the denoised K-SVD signal, on the “Onediscontinuity” signal. Note the reconstruction of NLOP functional is close to apiecewise constant solution, while the OP reconstruction is seemingly a smoothlychanging function. In the K-SVD reconstruction, where at each point a set ofparameters is chosen regardless of the choice made for it’s adjacent neighbors, thelack of influence of enforcement of a global constraint is evident.

In Fig. 17.11, we compare the noise removal performance on the nonlinearsignals. We can see that the NLOP functional still exhibits the best performancebut it is not unchallenged by the OP functional. This is due to the strong constraintsthe NLOP functional has in trying to enforce the linear basis functions, i.e. trying tofind a piecewise linear solution suitable for the given signal.

Indeed, in order to see that our functional performance is not restricted by thebasis function, and to verify that indeed better performance is achieved if we choosea better set of basis functions to model the signal domain, we performed severaltests with higher degree polynomials. We display in Fig. 17.12, results achieved bydenoising the polynomial signal displayed in Fig. 17.6, while changing the NLOPfunctional basis functions to a second degree polynomial. In general we expect

398 S. Shem-Tov et al.

0.000 0.100 0.200 0.3000.00

1.00

2.00

3.00non-localopksvdtv

Noise level

erro

r no

rm

One jump signal

0.000 0.100 0.200 0.3000.00

5.00

10.00

15.00non-localopksvd

Noise level

erro

r no

rm

One jump parameters

0.0 0.1 0.2 0.30.00

1.00

2.00

3.00non-localopksvdtv

Noise level

erro

r no

rm

One discontinuity signal

0.000 0.100 0.200 0.3000.00

5.00

10.00

15.00

20.00non-local opksvd

Noise level

erro

r no

rm

One discontinuity parameters

0.000 0.100 0.200 0.3000.00

1.00

2.00

3.00

4.00non-local opksvdtv

Noise level

erro

r no

rm

Many jumps & discontinuities signal

0.000 0.100 0.200 0.3000.00

20.00

40.00

60.00

80.00

100.00non-localopksvd

Noise level

erro

r no

rm

Many jumps & discontinuities parameters

a b

c d

e f

Fig. 17.8 Signal noise removal and parameters reconstruction comparison for the various piece-wise Linear signals. On the left column depicted are graphs compering the L2 norm of the residualnoise of the various algorithms. On the right column depicted are graphs compering the L2 normof the various algorithms reconstructed parameters compared to the expected parameters

a polynomial signal to be best recovered by polynomial basis functions of the samedegree. This is clearly depicted in the graph displayed in Fig. 17.12, where we seebetter performance by NLOP with polynomial basis function compared to NLOPwith linear basis functions.

Another test we performed with polynomial basis functions, was on a C1

continuous second degree polynomial displayed in Fig. 17.13. This is an interestingcase, as both this signal and it’s first derivative are continuous, and only thesecond derivative is discontinuous. We found that this signal proved challengingfor the MLS initialization method, causing it to misplace the point of discontinuity

17 On Globally Optimal Local Modeling 399

0 50 100

−0.1

0

0.1

0 50 100

−0.1

0

0.1

0 50 100

−0.1

0

0.1

0 50 100

−0.1

0

0.1

Top left: TV. Top right: OP. Bottom left: K-SVD. Bottom right: NLOP.

0 100 200 300−0.1

−0.05

0

0.05

0.1

0 100 200 300−0.1

−0.05

0

0.05

0.1

0 100 200 300−0.1

−0.05

0

0.05

0.1

0 100 200 300−0.1

−0.05

0

0.05

0.1

Top left: TV. Top right: OP. Bottom left: K-SVD. Bottom right: NLOP.

a

b

Fig. 17.9 Residual noise comparison of noise removal. (a) One discontinuity signal with noiseSTD D 0:0375. (b) “Many jumps & discontinuities” signal with noise STD D 0:05

by several points. This initialization error was not detected by the NLOP functional,which maintained it throughout the minimization process, as displayed for examplein Fig. 17.13. The location of the discontinuity point depends on the random noise.We wish to emphasize that the reconstructed solutions achieved by the minimizationof the NLOP functional have piecewise constant reconstructed parameters, whichgenerate a piecewise smooth polynomial solution. This reconstructed signal may aswell be the signal from which the noisy signal was generated. Also, the solutionachieved by the NLOP functional outperformed the K-SVD method, as displayed inthe graphs in Fig. 17.14.

400 S. Shem-Tov et al.

0 20 40 60 80 100−2

−1.5

−1

−0.5

0

0.5

1

1.5true parametersNLOPOP

NLOP vs OP0 20 40 60 80 100

−1.5

−1

−0.5

0

0.5

1

1.5true parametersNLOPK−SVD

NLOP vs K-SVD

a b

Fig. 17.10 This figure compares the reconstructed second parameters on the various algorithms,when denoising the “One discontinuity” signal. Left image: comparison of parameter reconstruc-tion between NLOP functional and OP functional on one discontinuity signal. Note how far theOP reconstruction is from a piecewise solution, while generating an excellent denoising result(seen in the relevant graph). Right image: comparison of parameter reconstruction between NLOPfunctional and K-SVD algorithm on one discontinuity signal. Note the apparent lack of globalconstraint on the parameters in the K-SVD reconstruction

0.000 0.100 0.200 0.3000.00

0.50

1.00

1.50

2.00

2.50non-local opopksvdtv

Noise level

erro

r no

rm

0.000 0.100 0.200 0.3000.00

1.00

2.00

3.00

4.00 non-local opopksvdtv

Noise level

erro

r no

rm

a b

Fig. 17.11 Signal noise removal comparison of the nonlinear signals. (a) Polynomial signaland (b) trig signal

0.000 0.100 0.200 0.3000.00

0.50

1.00non-local op

non-local op poly

Noise level

erro

r no

rm

Fig. 17.12 Comparison of noise removal performance on the polynomial signal. In this figure wecompare performance of the NLOP functional with linear basis functions (marked by non-localOP), and NLOP functional with polynomial basis functions (marked by non-local OP poly)

17 On Globally Optimal Local Modeling 401

0 20 40 60 80 1000

0.10.20.30.40.50.60.70.80.9

1

0 20 40 60 80 1000.10.20.30.40.50.60.70.80.9

1

0 20 40 60 80 1000

0.2

0.4

0.6

0.8

1

1.2

0 20 40 60 80 1000

0.10.20.30.40.50.60.70.80.9

1yAT

yAT

yAT

0 20 40 60 80 1000

0.10.20.30.40.50.60.70.80.9

1

0 20 40 60 80 1000

0.10.20.30.40.50.60.70.80.9

1

0 20 40 60 80 100−4

−3

−2

−1

0

1

2

3

4

0 20 40 60 80 100−4

−3

−2

−1

0

1

2

3

4

0 20 40 60 80 100−4

−3

−2

−1

0

1

2

3

4

Fig. 17.13 C1 continuous second degree polynomial. Point of the second derivative discontinuityis marked by a red cross (or vertical black line). Top row: images of the signals. Middle row:images of the reconstructed signals and the vAT indicator functions. Bottom row: the reconstructedparameters. From left to right: clean signal, 0.01 STD noise, 0.0375 STD noise

0.000 0.100 0.200 0.3000.00

0.50

1.00 non-local opksvd

Noise level

erro

r no

rm

0.000 0.100 0.200 0.3000.00

100.00

200.00

300.00non-local opksvd

Noise level

erro

r no

rm

a b

Fig. 17.14 Signal noise removal and parameters reconstruction comparison for the C1 continuoussecond degree polynomial. (a) Signal and (b) parameters

17.7.3 2D Example

We also ran initial tests on a 2D example. The non-local over-parameterized methodextends naturally to higher dimensions. We implemented the 2D case in a similarmanner to the 1D case, and chose a linear basis functions of the form

402 S. Shem-Tov et al.

20 40 60 80 100

10

20

30

40

50

60

70

80

90

100 −1

−0.5

0

0.5

1

20 40 60 80 100

10

20

30

40

50

60

70

80

90

100 −1

−0.5

0

0.5

1

20 40 60 80 100

10

20

30

40

50

60

70

80

90

100 −1

−0.5

0

0.5

1

20 40 60 80 100

10

20

30

40

50

60

70

80

90

100 −0.2

−0.15

−0.1

−0.05

0

0.05

0.1

0.15

0.2

a b

c d

Fig. 17.15 A 2D noise removal example of the 2D NLOP functional. (a) Original image, (b) noisyimage, (c) denoised image, and (d) residual noise

8<

:

�1 D 1

�2 D x

�3 D y

(17.21)

In Fig. 17.15, we show the 2D example, the NLOP functional noise removalresult. In Fig. 17.16, we display the parameters which were reconstructed in theminimization process and the generated vAT indicator function. We can see that thevAT indicator function managed to segment the signal in a good manner, althoughstill delineated some ghost segments especially near the image edges. Note that therecovered parameters are almost piecewise constant as expected.

A more thorough discussion of the 2D case is out of the scope of this paper. Weintend to explore it extensively in the near future.

17 On Globally Optimal Local Modeling 403

20 40 60 80 100

10

20

30

40

50

60

70

80

90

100 −2

−1.5

−1

−0.5

0

0.5

1

20 40 60 80 100

10

20

30

40

50

60

70

80

90

100−2

−1.5

−1

−0.5

0

0.5

1

20 40 60 80 100

10

20

30

40

50

60

70

80

90

100−1.2

−1

−0.8

−0.6

−0.4

−0.2

0

0.2

20 40 60 80 100

10

20

30

40

50

60

70

80

90

1000.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

a b

c d

Fig. 17.16 Example of the reconstructed parameters from the noisy image and vAT AT schemeindicator function. (a) Parameter 1, (b) parameter 2, (c) parameter 3, and (d) vAT

17.8 Conclusion

A general over-parameterized variational framework for signal and image analysiswas presented. This framework can be applied to various image processing andvision problems, such as noise removal, segmentation and optical flow computation.This framework is closely related to the powerful moving least squares method,enhancing it by globally constraining the parameter variations in space based onknowledge available on the problem domain. This knowledge enables a model basedreconstruction of the considered signal, by effectively recovering parameters of ana priori, assumed to be known, set of “basis” or “dictionary functions”.

The new variational framework relies on the successful over-parameterizedfunctional, and significantly improves it by making it non-local and giving it thepower not only to generate excellent results for the problem domain (such as noiseremoval), but also to reconstruct the underling model parameters that might captureprior information on the problem.

404 S. Shem-Tov et al.

This paper may be viewed as the basis of extensive future research on thesubject of over-parametrization. In future work we intend to thoroughly explorethe extension of the non-local over-parametrization denoising functional into 2D

settings, where the choice of the support set, expressed by the weighting function,becomes a challenging matter. Also we intend to explore a variety of different basisfunctions, such has higher degree polynomials and non-polynomial basis functions,which may be relevent for other applications. Finally we would like to extend theproposed framework and to apply it to various other problem domains in computervision.

Acknowledgements This research was supported by the Israel Science foundation (ISF) grant no.1551/09.

References

1. Aharon, M., Elad, M., Bruckstein, A.: K-SVD: design of dictionaries for sparse representation.In: Proceedings of SPARS’05, Rennes, pp. 9–12 (2005)

2. Alexa, M., Behr, J., Cohen-Or, D., Fleishman, S., Levin, D., Silva, C.T.: Point set surfaces.In: IEEE Visualization 2001, pp. 21–28. IEEE Computer Society, Piscataway (2001)

3. Ambrosio, L., Tortorelli, V.M.: Approximation of functional depending on jumps by ellipticfunctional via � -convergence. Commun. Pure Appl. Math. 43(8), 999–1036 (1990)

4. Amenta, N., Kil, Y.J.: Defining point-set surfaces. ACM Trans. Graph. 23, 264–270 (2004)5. Black, M.J., Anandan, P.: The robust estimation of multiple motions: parametric and piecewise-

smooth flow fields. Comput. Vis. Image Underst. 63, 75–104 (1996)6. Bruckstein, A.M.: On globally optimal local modeling: from moving least squares to over-

parametrization. In: Workshop on Sparse Representation of Multiscale Data and Images:Theory and Applications. Institute of Advanced Study, Nanyang, Technological University,Singapore (2009)

7. Chan, T.F., Shen, J.: On the role of the bv image model in image restoration. AMS Contemp.Math. 330, 25–41 (2003). accepted (by two referees)

8. Cohen, E., Riesenfeld, R.F., Elber, G.: Geometric Modeling with Splines: an Introduction. AKPeters, Natick (2001)

9. Darrell, T.J., Pentland, A.P.: Cooperative robust estimation using layers of support. IEEETrans. Pattern Anal. Mach. Intell. 17, 474–487 (1991)

10. Farneback, G.: Spatial domain methods for orientation and velocity estimation. Lic. ThesisLiU-Tek-Lic-1999:13, Dept. EE, Linkoping University, SE-581 83 Linkoping, (1999). ThesisNo. 755, ISBN 91-7219-441-3

11. Fleishman, S., Cohen-Or, D., Silva, C.T.: Robust moving least-squares fitting with sharpfeatures. ACM Trans. Graph. 24, 544–552 (2005)

12. Gilboa, G., Osher, S.: Nonlocal operators with applications to image processing. MultiscaleModel. Simul. 7(3), 1005–1028 (2008)

13. Haindl, M., Mikes, S.: Model-based texture segmentation. In: Proceedings of the InternationalConference on Image Analysis and Recognition, pp. 306–313. Springer, Berlin (2004)

14. Haralick, R.M., Watson, L.: A facet model for image data. Comput. Graph. Image Process.15(2), 113–129 (1981)

15. Harten, A., Osher, S.: Uniformly high-order accurate nonoscillatory schemes. SIAM J. Numer.Anal. 24, 279–309 (1987)

16. Kindermann, S., Osher, S., Jones, P.W.: Deblurring and denoising of images by nonlocalfunctionals. Multiscale Model. Simul. 4(4), 1091–1115 (2005)

17 On Globally Optimal Local Modeling 405

17. Levin, D.: The approximation power of moving least-squares. Math. Comput. 67, 1517–1531(1998)

18. Levin, D.: Mesh-independent surface interpolation. In: Brunnett, G. et al. (eds.) GeometricModeling for Scientific Visualization, pp. 37–49. Springer, Berlin/London (2003)

19. Malisiewicz, T., Efros, A.A.: Improving spatial support for objects via multiple segmentations.In: Proceedings of the British Machine Vision Conference (BMVC), Warwich (2007)

20. Memin, E., Perez, P.: A multigrid approach for hierarchical motion estimation. In: Proceedingsof the Sixth International Conference on Computer Vision, 1998, Bombay, pp. 933–938 (1998)

21. Nir, T., Bruckstein, A.M.: On over-parameterized model based TV-denoising. In: InternationalSymposium on Signals, Circuits and Systems (ISSCS 2007), Iasi (2007)

22. Nir, T., Bruckstein, A.M., Kimmel, R.: Over-parameterized variational optical flow. Int. J.Comput. Vis. 76(2), 205–216 (2008)

23. Olshausen, B.A., Fieldt, D.J.: Sparse coding with an overcomplete basis set: a strategyemployed by v1. Vis. Res. 37, 3311–3325 (1997)

24. Pock, T.: Fast total variation for computer vision. Ph.D. thesis, Graz University of Technology(2008)

25. Quatieri, T.: Discrete-Time Speech Signal Processing: Principles and Practice, 1st edn. PrenticeHall, Upper Saddle River (2001)

26. Rosman, G., Shem-Tov, S., Bitton, D., Nir, T., Adiv, G., Kimmel, R., Feuer, A., Bruckstein,A.M.: Over-parameterized optical flow using a stereoscopic constraint. In: Proceedings of theScale Space and Variational Methods in Computer Vision. Springer, Berlin/London (2011)

27. Rubinstein, R., Zibulevsky, M., Elad, M.: Efficient implementation of the k-svd algorithm usingbatch orthogonal matching pursuit. Technical report, Technion – Israel Institute of Technology(2008)

28. Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms.Phys. D 60, 259–268 (1992)

29. Savage, J., Chen, K.: On multigrids for solving a class of improved total variation basedstaircasing reduction models. In: Tai, X.-C., Lie, K.-A., Chan, T.F., Osher, S. (eds.) Proceedingsof the International Conference on PDE-Based Image Processing and Related Inverse Prob-lems, CMA, Oslo, 8–12 Aug 2005. Mathematics and Visualization, Image Processing Basedon Partial Differential Equations, pp. 69–94 (2006)

30. Shah, J.: A common framework for curve evolution, segmentation and anisotropic diffusion.In: Proceedings of the 1996 Conference on Computer Vision and Pattern Recognition (CVPR’96), p. 136. IEEE Computer Society, Washington, DC (1996)

31. Shah, J.: Curve evolution and segmentation functionals: application to color images. In: Pro-ceedings of the IEEE ICIP’96, Lausanne, pp. 461–464 (1996)

32. Shu, C.W.: Essentially non-oscillatory and weighted essentially non-oscillatory schemes forhyperbolic conservation laws. In: Cockburn, B., Shu, C.-W., Johnson, C., Tadmor, E. (eds.)Advanced Numerical Approximation of Nonlinear Hyperbolic Equations. Lecture Notes inMathematics, vol. 1697, pp. 325–432. Springer (1997)

33. Sun, D., Sudderth, E., Black, M.: Layered image motion with explicit occlusions, temporalconsistency, and depth ordering. In: Lafferty, J., Williams, C.K.I., Shawe-Taylor, J., Zemel,R.S., Culotta, A. (eds.) Proceedings of the Advances in Neural Information ProcessingSystems, Vancouver, vol. 23, pp. 2226–2234 (2010)

34. Tarantola, A.: Inverse Problem Theory and Methods for Model Parameter Estimation. Societyfor Industrial and Applied Mathematics, Philadelphia (2004)

35. Trobin, W., Pock, T., Cremers, D., Bischof, H.: An unbiased second-order prior for high-accuracy motion estimation. In: Proceedings of the DAGM Symposium, pp. 396–405. Springer,Berlin/New York (2008)