bayesian estimation for continuous-time

Upload: jagadeesh-jagade

Post on 04-Jun-2018

223 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/13/2019 Bayesian Estimation for Continuous-Time

    1/51

    Bayesian Estimation for Continuous-Time

    Sparse Stochastic Processes

  • 8/13/2019 Bayesian Estimation for Continuous-Time

    2/51

    Abstract

    We consider continuous-time sparse stochastic processes from which we have only a

    finite number of noisy/noiseless samples. Our goal is to estimate the noiseless samples

    (denoising) and the signal in-between (interpolation problem). By relying on tools from the

    theory of splines, we derive the joint a priori distribution of the samples and show how this

    probability density function can be factorized. The factorization enables us to tractably

    implement the maximum a posteriori and minimum mean-square error (MMSE) criteria as two

    statistical approaches for estimating the unknowns. We compare the derived statistical methods

    with well-known techniques for the recovery of sparse signals, such as the norm and Log ( -

    relaxation) regularization methods. The simulation results show that, under certain conditions,

    the performance of the regularization techniques can be very close to that of the MMSE

    estimator.

    INTRODUCTION

    THE recent popularity of regularization techniques in signal and image processing is motivated

    by the sparse

    nature of real-world data. It has resulted in the development of powerful tools for many problems

    such as denoising, deconvolution, and interpolation. The emergence of compressed sensing,

    which focuses on the recovery of sparse vectors from highly under-sampled sets of

    measurements, is playing a key role in this context [1][3]. Assume that the signal of interest is a

    finite-length discrete signal also represented by as a vector) that has a sparse or almost sparse

    representation in some transform or analysis domain (e.g., wavelet or DCT). Assume moreover

    that we only

    have access to noisy measurements of the form , where denotes an additive white Gaussian

    noise. Then, we would like to estimate . The common sparsity-promoting variational techniques

    rely on penalizing the sparsity in the transform/analysis domain [4], [5] by imposing where is the

    vector of noisy measurements, is a penalty function that reflects the sparsity constraint in the

    transform/ analysis domain and is a weight that is usually set based on the noise and signal

    powers. The choice of is one of the favorite ones in compressed sensing when is itself sparse [6],

    while the use of , where stands for total variation, is a common choice for piecewise-smooth

  • 8/13/2019 Bayesian Estimation for Continuous-Time

    3/51

    signals that have sparse derivatives [7]. Although the estimation problem for a given set of

    measurements is a deterministic procedure and can be handled without recourse to statistical

    tools, there are benefits in viewing the problem from the stochastic perspective. For instance, one

    can take advantage of side information about the unobserved data to establish probability laws

    for all or part of the data. Moreover, a stochastic framework allows one to evaluate the

    performance

    of estimation techniques and argue about their distance from the optimal estimator. The

    conventional stochastic interpretation of the variation method in (1) leads to the finding that is

    the maximum a posteriori (MAP) estimate of . In this interpretation, the quadratic data term is

    associated with the Gaussian nature of the additive noise, while the scarifying penalty term

    corresponds to the a priori distribution of the sparse input. For example, the penalty

    is associated with the MAP estimator with Laplace prior [8], [9]. However, investigations of the

    compressible/sparse priors have revealed that the Laplace distribution cannot be considered as a

    sparse prior [10][12]. Recently in [13], it is argued that (1) is better interpreted as the minimum

    mean-square error (MMSE) estimator of a sparse prior.

    Though the discrete stochastic models are widely adopted for sparse signals, they only

    approximate the continuous nature of real-world signals. The main challenge for employing

    continuous models is to transpose the compressibility/sparsity concepts in the continuous domain

    while maintaining compatibility with the discrete domain. In [14], an extended class of

    piecewise-smooth signals is proposed as a candidate for continuous stochastic sparse models.

    This class is closely related to signals with a finite rate of innovation [15]. Based on infinitely

    divisible distributions, a more general stochastic framework has been recently introduced in [16],

    [17]. There, the continuous models include Gaussian processes (such as Brownian motion),

    piecewise-polynomial signals, and -stable processes as special cases. In addition, a large portion

    of the introduced family is considered as compressible/sparse with respect to the definition in

    [11] which is compatible with the discrete definition.

    In this paper, we investigate the estimation problem for the samples of the continuous-time

    sparse models introduced in [16], [17]. We derive the a priori and a posteriori probability density

    functions (pdf) of the noiseless/noisy amples. We present a practical factorization of the prior

    distribution which enables us to perform statistical learning for denoising or interpolation

    problems. In particular, we implement the optimal MMSE estimator based on the message-

  • 8/13/2019 Bayesian Estimation for Continuous-Time

    4/51

    passing algorithm. The implementation involves discretization and convolution of pdfs, and is in

    general, slower than the common variational techniques. We further compare the performance of

    the Bayesian and variational denoising methods. Among the variational methods, we consider

    quadratic, TV, and Log regularization

    Techniques. Our results show that, by matching the regularizer to the statistics, one can almost

    replicate the MMSE

    Performance. The rest of the paper is organized as follows: In Section II, we introduce our signal

    model which relies on the general formulation of sparse stochastic processes proposed in [16],

    [17]. In Section IV, we explain the techniques for obtaining the probability density functions

    and, we derive the estimation methods in Section III. We study the special case of Lvy

    processes which is of interest in many applications in Section V, and present simulation results in

    Section VI. Section VII concludes the paper SIGNAL MODEL In this section, we adapt the

    general framework of [16] to the continuous-time stochastic model studied in this paper. We

    follow the same notational conventions and write the input argument of the continuous-time

    signals/processes inside parenthesis

    (e.g., ) while we employ brackets (e.g., ) for discrete- time ones. Moreover, the tilde diacritic is

    used to indicate the noisy signal. Typically, represents discrete noisy samples. In Fig. 1, we give

    a sketch of the model. The two main arts are the continuous-time innovation process and the

    linear operators. The process is generated by applying the shaping operator on the innovation

    process . It can be whitened back by the inverse operator L. (Since the whitening operator is of

    greater importance, it is represented by L while refers to the shaping operator.) Furthermore, the

    iscrete observations are formed by the noisy measurements of . The innovation process and the

    linear operators have distinct implications on the resultant process . Our model is able to handle

    general innovation processes that may or may not induce sparsity/compressibility. The

    distinction between these two cases is identified by a function that is called the Lvy exponent,

    as will be discussed in Section II-A. The sparsity/ compressibility of and, consequently, of the

    measurements , is inherited from the innovations and is observed in a transform domain. This

    domain is tied to the operator L. In this paper, we deal with operators that we represent by all-

    pole differential systems, tuned by acting upon the poles. Although the model in Fig. 1 is rather

    classical for Gaussian innovations, the investigation of non-Gaussian innovations is nontrivial.

    While the transition from Gaussian to non-Gaussian necessitates the reinvestigation of every

  • 8/13/2019 Bayesian Estimation for Continuous-Time

    5/51

    definition and result, it provides us with a more general class of stochastic processes which

    includes compressible/sparse signals.

    A. I nnovation Process

    Of all white processes, the Gaussian innovation is undoubtedly the one that has been investigated

    most thoroughly. However, it represents only a tiny fraction of the large family of white

    processes, which is best explored by using Gelfands theoryof generalized random processes. In

    his approach, unlike with the conventional point-wise definition, the stochastic process is

    characterized through inner products with test functions. For this purpose, one first chooses a

    function space of test functions (e.g., the Schwartz class of smooth and rapidly decaying

    functions). Then, one considers the random variable given by the inner product

    1) it is stationary, i.e., the random variables and are identically distributed, provided is a shifted

    version of , and

    2) it is white in the sense that the random variables and are independent, provided are non-

    overlapping test functions (i.e., ). The characteristic form of is defined as

    where represents the expected-value operator. The characteristic form is a powerful tool for

    investigating the properties of random processes. For instance, it allows one to easily infer the

    probability density function of the random variable , or the joint densities of . Further details

    regarding characteristic forms can be found in Appendix A. The key point in Gelfands theory is

    to consider the form

    (the Lvy exponent) for to define a generalized innovation process over (dual of ). The class of

    admissible Lvy exponents is characterized by the Lvy-Khintchine representation theorem [19],

    [20] as

  • 8/13/2019 Bayesian Estimation for Continuous-Time

    6/51

    where for and 0 otherwise, and (the Lvy density) is a real-valued

    density function that satisfies

    In this paper, we consider only symmetric real-valued Lvy exponents (i.e., and ). Thus, the

    general form of (4) is reduced to

    Next, we discuss three particular cases of (6) which are of special interest in this paper.

    1) Gaussian Innovation: The choice turns (6) into

    which implies

    This shows that the random variable has a zero-mean Gaussian distribution with variance .

    2) Impulsive Poisson: Let and , where is a symmetric probability density function. The

    corresponding white process is known as the impulsive Poisson innovation. By substitution in

    (6), we obtain

  • 8/13/2019 Bayesian Estimation for Continuous-Time

    7/51

    where denotes the Fourier transform of . Let represents the test function that takes 1 on and 0

    otherwise. Thus, if , then for the pdf of we know that (see Appendix A)

    It is not hard to check (see Appendix II in [14] for a proof) that this distribution matches the one

    that we obtain by defining

    is a sequence of random Poisson points with parameter , and is an independent and identically

    distributed (i.i.d.) sequence with probability density independent of . The sequence is a Poisson

    point random sequence with parameter

    if, for all real values , the random

  • 8/13/2019 Bayesian Estimation for Continuous-Time

    8/51

  • 8/13/2019 Bayesian Estimation for Continuous-Time

    9/51

    The second important component of the model is the shaping operator (the inverse of the

    whitening operator L) that determines the correlation structure of the process. For the generalized

    stochastic definition of in Fig. 1, we expect to have

    where represents the adjoint operator of . It shows that ought to define a valid test function for

    the equalities in (15) to remain valid. In turn, this sets constraints on . The simplest choice for

    would be that of an operator which forms a continuous map from into itself, but the class of such

    operators is not rich enough to cover the desired models in this

    paper. For this reason, we take advantage of a result in [16] that extends the choice of shaping

    operators to those operators for which forms a continuous mapping from into for some .

    1) Valid Inverse Operator :2) In the sequel, we first explain the general requirements on the inverse of a given

    whitening operator L. Then, we focus on a special class of operators L and study the

    implications for the associated shaping operators in more details. We assume L to be a

    given whitening operator, which may or may not be uniquely invertible. The minimum

    requirement on the shaping operator is that it should form a right-inverse of L (i.e., ,

    where I is the identity operator). Furthermore, since the adjoint operator is required in

    (15), needs to be

    linear. This implies the existence of a kernel such that

  • 8/13/2019 Bayesian Estimation for Continuous-Time

    10/51

    Linear shift-invariant shaping operators are special cases that correspond to . However, some of

    the operators considered in this paper are not shift-invariant. We require the kernel to satisfy the

    following three conditions:

    i) , where L acts on the parameter and is the Dirac function, ii) , for , iii) is bounded for all .

    Condition (i) is equivalent to, while (iii) is a sufficient condition studied in [16] to establish the

    continuity of the mapping , for all . Condition (ii) is a constraint that we impose in this paper to

    simplify the statistical analysis. For, its implication is that the random variable is fully

    determined by with, or, equivalently, it is independent of for. From now on, we focus on

    differential operators L of the form, where D is the first-order derivative is the identity operator

    (I), and are constants. With Gaussian innovations, these operators generate the autoregressive

    processes. An equivalent representation of L, which helps us in the analysis, is its decomposition

    into first-order factors as. The scalars are the roots of the characteristic polynomial and

    correspond to the poles of the inverse linear system. Here, we assume that all the poles are in the

    left half-plane. This assumption helps us associate the operator L to a suitable kernel, as shown

    in Appendix B. Every differential operator L has a unique causal Green function [22]. The linear

    shift-invariant system defined by satisfies conditions (i)(ii). If all the poles strictly lie in the left

    half-plane (i.e., ), due to absolute inerrability of (stability of the system), satisfies condition (iii)

    as well. The definition of given through thekernels in Appendix B achieves both linearity and

    stability, while loosing shift-invariance when L contains poles on the imaginary axis. It is worth

    mentioning that the application of two different right-inverses of L on a given input roduces

    results that differ only by an exponential polynomial that is in the null space of L. 2)

    Discretization: Apart from qualifying as whitening operators, differential operators have other

  • 8/13/2019 Bayesian Estimation for Continuous-Time

    11/51

    appealing properties such as the existence of finite-length discrete counterparts. To

    Explain this concept, let us first consider the first-order continuous- time derivative operator D

    that is associated with the finite- difference filter. This discrete counterpart is of finite length

    (FIR filter). Further, for any right inverse of D such as, the system is shift invariant and its

    impulse response is compactly supported. Here, is the discredited operator corresponding to the

    sampling period with impulse response. It should be emphasized that the discrete counterpart is a

    discrete-domain operator, while the discretized operator acts on continuous-domain signals. It is

    easy to check that this impulse response coincides with the causal B-spline of degree 0 . In

    general, the discrete counterpart of is defined through its factors. Each is associated with its

    discrete counterpart and a discretized operator given by the impulse response. The convolution of

    such impulse responses gives rise to the impulse response of (up to the scaling factor), which is

    the discretized operator of L for the sampling period . By expanding the convolution, we obtain

    the form for the impulse response of . It is now evident that corresponds to an FIR filter of lengthrepresented by with . Results in spline theory confirm that, for any right inverse of L, the

    operator is shift invariant and the support of its impulse response is contained in [23]. The

    compactly supported impulse response of , which we denote by , is usually referred to as the L-

    spline. We define the generalized finite differences by

  • 8/13/2019 Bayesian Estimation for Continuous-Time

    12/51

    Sparsity/Compressibility The innovation process can be thought of as a concatenation of

    independent atoms. A consequence of this independence is that the process contains no

    redundancies. Therefore, it is incompressible under unique representation constraint. In our

    framework, the role of the shaping operator is to generate a specific correlation structure in by

    mixing the atoms of . Conversely, the whitening operator L undoes the mixing and returns an

    incompressible set of data, in the sense that it maximally compresses the data. For a

    discretization of corresponding to a sampling period, the operator mimics the role of L. Itefficiently uncouples the sequence of samples and produces the generalized differences, where

    each term depends only on a finite number of other terms. Thus, the role of can be compared to

    that of converting discrete-time signals into their transform domain representation. As we

    explain now, the sparsity/compressibility properties of are closely related to the Lvy exponent

    of . The concept is best explained by focusing on a special class known as Lvy processes that

    correspond to (see Section V for more details). By using (3) and (53), we can check that the

    characteristic function of the random ariable is given by . When the Lvy function is generated

    through a nonzero density that is absolutely integrable (i.e., impulsive Poisson), the pdf of

    associated with necessarily contains a mass at the origin [20] (Theorems 4.18 and 4.20). This is

    interpreted as sparsity when considering a finite number of measurements. It is shown in [11],

    [12] that the compressibility of the measurements depends on the tail of their pdf. In simple

    terms, if the pdf decays at most inverse-polynomially, then, it is compressible in some

    corresponding norm. The interesting point is that the inverse-polynomial decay of an infinite-

    divisible pdf is equivalent to the inverse-polynomial decay of its Lvy ensity

    With the same order [20] (Theorem 7.3). Therefore, an innovation process with a fat-tailed Lvy

    density results in recesses with compressible measurements. This indicates that the slower the

    decay rate of , the more compressible the measurements of .

  • 8/13/2019 Bayesian Estimation for Continuous-Time

    13/51

    D. Summary of the Parameters of the Model We now briefly review the degrees of freedom in

    the model and how the dependent variables are determined. The innovation process is uniquely

    determined by the Lvy triplet . The sparsity/compressibility of the process can be determined

    through the Lvy density (see Section II-C). The values

    and , or, equivalently, the poles of the system, serve as the free parameters for the whitening

    operator L. As explained in Section II-B2, the taps of the FIR filter are determined by the poles.

    MATLAB

  • 8/13/2019 Bayesian Estimation for Continuous-Time

    14/51

    INTRODUCTION TO MATLAB

    What Is MATLAB?

    MATLAB

    is a high-performance language for technical computing. It integrates

    computation, visualization, and programming in an easy-to-use environment where problems and

    solutions are expressed in familiar mathematical notation. Typical uses include

    Math and computation

    Algorithm development

    Data acquisition

    Modeling, simulation, and prototyping

    Data analysis, exploration, and visualization

  • 8/13/2019 Bayesian Estimation for Continuous-Time

    15/51

    Scientific and engineering graphics

    Application development, including graphical user interface building.

    MATLAB is an interactive system whose basic data element is an array that does notrequire dimensioning. This allows you to solve many technical computing problems, especially

    those with matrix and vector formulations, in a fraction of the time it would take to write a

    program in a scalar non interactive language such as C or FORTRAN.

    The name MATLAB stands for matrix laboratory. MATLAB was originally written to

    provide easy access to matrix software developed by the LINPACK and EISPACK projects.

    Today, MATLAB engines incorporate the LAPACK and BLAS libraries, embedding the state of

    the art in software for matrix computation.

    MATLAB has evolved over a period of years with input from many users. In university

    environments, it is the standard instructional tool for introductory and advanced courses in

    mathematics, engineering, and science. In industry, MATLAB is the tool of choice for high-

    productivity research, development, and analysis.

    MATLAB features a family of add-on application-specific solutions called toolboxes.

    Very important to most users of MATLAB, toolboxes allow you to learn and applyspecialized

    technology. Toolboxes are comprehensive collections of MATLAB functions (M-files) that

    extend the MATLAB environment to solve particular classes of problems. Areas in which

    toolboxes are available include signal processing, control systems, neural networks, fuzzy logic,

    wavelets, simulation, and many others.

    The MATLAB System:

    The MATLAB system consists of five main parts:

    Development Environment:

    This is the set of tools and facilities that help you use MATLAB functions and files. Many

    of these tools are graphical user interfaces. It includes the MATLAB desktop and Command

  • 8/13/2019 Bayesian Estimation for Continuous-Time

    16/51

    Window, a command history, an editor and debugger, and browsers for viewing help, the

    workspace, files, and the search path.

    The MATLAB Mathematical Function:

    This is a vast collection of computational algorithms ranging from elementary functions

    like sum, sine, cosine, and complex arithmetic, to more sophisticated functions like matrix

    inverse, matrix eigen values, Bessel functions, and fast Fourier transforms.

    The MATLAB Language:

    This is a high-level matrix/array language with control flow statements, functions, data

    structures, input/output, and object-oriented programming features. It allows both "programmingin the small" to rapidly create quick and dirty throw-away programs, and "programming in the

    large" to create complete large and complex application programs.

    Graphics:

    MATLAB has extensive facilities for displaying vectors and matrices as graphs, as well as

    annotating and printing these graphs. It includes high-level functions for two-dimensional and

    three-dimensional data visualization, image processing, animation, and presentation graphics. Italso includes low-level functions that allow you to fully customize the appearance of graphics as

    well as to build complete graphical user interfaces on your MATLAB applications.

    The MATLAB Application Program Interface (API):

    This is a library that allows you to write C and Fortran programs that interact with

    MATLAB. It includes facilities for calling routines from MATLAB (dynamic linking), calling

    MATLAB as a computational engine, and for reading and writing MAT-files.

    MATLAB WORKING ENVIRONMENT:

    MATLAB DESKTOP:-

  • 8/13/2019 Bayesian Estimation for Continuous-Time

    17/51

  • 8/13/2019 Bayesian Estimation for Continuous-Time

    18/51

  • 8/13/2019 Bayesian Estimation for Continuous-Time

    19/51

    DIGITAL IMAGE PROCESSING

  • 8/13/2019 Bayesian Estimation for Continuous-Time

    20/51

    Digital image processing

    Background:

    Digital image processing is an area characterized by the need for extensive experimental

    work to establish the viability of proposed solutions to a given problem. An important

    characteristic underlying the design of image processing systems is the significant level oftesting & experimentation that normally is required before arriving at an acceptable solution.

    This characteristic implies that the ability to formulate approaches &quickly prototype candidate

    solutions generally plays a major role in reducing the cost & time required to arrive at a viable

    system implementation.

    What is DIP

    An image may be defined as a two-dimensional function f(x, y), where x & y are spatial

    coordinates, & the amplitude of f at any pair of coordinates (x, y) is called the intensity or gray

    level of the image at that point. When x, y & the amplitude values of f are all finite discrete

    quantities, we call the image a digital image. The field of DIP refers to processing digital image

    by means of digital computer. Digital image is composed of a finite number of elements, each of

    which has a particular location & value. The elements are called pixels.

  • 8/13/2019 Bayesian Estimation for Continuous-Time

    21/51

    Vision is the most advanced of our sensor, so it is not surprising that image play the

    single most important role in human perception. However, unlike humans, who are limited to the

    visual band of the EM spectrum imaging machines cover almost the entire EM spectrum, ranging

    from gamma to radio waves. They can operate also on images generated by sources that humans

    are not accustomed to associating with image.

    There is no general agreement among authors regarding where image processing stops &

    other related areas such as image analysis& computer vision start. Sometimes a distinction is

    made by defining image processing as a discipline in which both the input & output at a process

    are images. This is limiting & somewhat artificial boundary. The area of image analysis (image

    understanding) is in between image processing & computer vision.

    There are no clear-cut boundaries in the continuum from image processing at one end to

    complete vision at the other. However, one useful paradigm is to consider three types of

    computerized processes in this continuum: low-, mid-, & high-level processes. Low-level

    process involves primitive operations such as image processing to reduce noise, contrast

    enhancement & image sharpening. A low- level process is characterized by the fact that both its

    inputs & outputs are images.

    Mid-level process on images involves tasks such as segmentation, description of that

    object to reduce them to a form suitable for computer processing & classification of individual

    objects. A mid-level process is characterized by the fact that its inputs generally are images but

    its outputs are attributes extracted from those images. Finally higher- level processing involves

    Making sense of an ensemble of recognized objects, as in image analysis & at the far end of

    the continuum performing the cognitive functions normally associated with human vision.

    Digital image processing, as already defined is used successfully in a broad range ofareas of exceptional social & economic value.

    What is an image?

  • 8/13/2019 Bayesian Estimation for Continuous-Time

    22/51

    An image is represented as a two dimensional function f(x, y) where x and y are spatial

    co-ordinates and the amplitude of f at any pair of coordinates (x, y) is called the intensity of the

    image at that point.

    Gray scale image:

    A grayscale image is a function I (xylem) of the two spatial coordinates of the image

    plane.

    I(x, y) is the intensity of the image at the point (x, y) on the image plane.

    I (xylem)takes non-negative values assume the image is bounded by arectangle[0, a] [0, b]I:

    [0, a] [0, b][0, info)

    Color image:

    Itcan be represented by three functions, R (xylem)for red,G (xylem)for greenandB

    (xylem)for blue.

    An image may be continuous with respect to the x and y coordinates and also in

    amplitude. Converting such an image to digital form requires that the coordinates as well as the

    amplitude to be digitized. Digitizing the coordinates values is called sampling. Digitizing the

    amplitude values is called quantization.

    Coordinate convention:

    The result of sampling and quantization is a matrix of real numbers. We use two

    principal ways to represent digital images. Assume that an image f(x, y) is sampled so that the

    resulting image has M rows and N columns. We say that the image is of size M X N. The values

    of the coordinates (xylem) are discrete quantities. For notational clarity and convenience, we use

    integer values for these discrete coordinates.

    In many image processing books, the image origin is defined to be at (xylem)=(0,0).The

    next coordinate values along the first row of the image are (xylem)=(0,1).It is important to keep

    in mind that the notation (0,1) is used to signify the second sample along the first row. It does

    not mean that these are the actual values of physical coordinates when the image was sampled.

  • 8/13/2019 Bayesian Estimation for Continuous-Time

    23/51

    Following figure shows the coordinate convention. Note that x ranges from 0 to M-1 and y from

    0 to N-1 in integer increments.

    The coordinate convention used in the toolbox to denote arrays is different from the

    preceding paragraph in two minor ways. First, instead of using (xylem) the toolbox uses the

    notation (race) to indicate rows and columns. Note, however, that the order of coordinates is the

    same as the order discussed in the previous paragraph, in the sense that the first element of a

    coordinate topples, (alb), refers to a row and the second to a column. The other difference is that

    the origin of the coordinate system is at (r, c) = (1, 1); thus, r ranges from 1 to M and c from 1 to

    N in integer increments. IPT documentation refers to the coordinates. Less frequently the toolbox

    also employs another coordinate convention called spatial coordinates which uses x to refer to

    columns and y to refers to rows. This is the opposite of our use of variables x and y.

    Image as Matrices:

    The preceding discussion leads to the following representation for a digitized image

    function:

    f (0,0) f(0,1) .. f(0,N-1)

    f (1,0) f(1,1) f(1,N-1)

    f (xylem)= . . .

    . . .

    f (M-1,0) f(M-1,1) f(M-1,N-1)

    The right side of this equation is a digital image by definition. Each element of this array

    is called an image element, picture element, pixel or pel. The terms image and pixel are used

    throughout the rest of our discussions to denote a digital image and its elements.

  • 8/13/2019 Bayesian Estimation for Continuous-Time

    24/51

    A digital image can be represented naturally as a MATLAB matrix:

    f (1,1) f(1,2) . f(1,N)

    f (2,1) f(2,2) .. f (2,N)

    . . .

    f = . . .

    f (M,1) f(M,2) .f(M,N)

    Where f (1,1) = f(0,0) (note the use of a monoscope font to denote MATLAB quantities).

    Clearly the two representations are identical, except for the shift in origin. The notation f(p ,q)

    denotes the element located in row p and the column q. For example f(6,2) is the element in the

    sixth row and second column of the matrix f. Typically we use the letters M and N respectively

    to denote the number of rows and columns in a matrix. A 1xN matrix is called a row vector

    whereas an Mx1 matrix is called a column vector. A 1x1 matrix is a scalar.

    Matrices in MATLAB are stored in variables with names such as A, a, RGB, real array

    and so on. Variables must begin with a letter and contain only letters, numerals and underscores.

    As noted in the previous paragraph, all MATLAB quantities are written using mono-scope

    characters. We use conventional Roman, italic notation such as f(x ,y), for mathematical

    expressions

    Reading Images:

    Images are read into the MATLAB environment using function imread whose syntax is

    Imread (filename)

    Format name Description recognized extension

  • 8/13/2019 Bayesian Estimation for Continuous-Time

    25/51

    TIFF Tagged Image File Format .tif, .tiff

    JPEG Joint Photograph Experts Group .jpg, .jpeg

    GIF Graphics Interchange Format .gif

    BMP Windows Bitmap .bmp

    PNG Portable Network Graphics .png

    XWD X Window Dump .xwd

    Here filename is a spring containing the complete of the image file(including any

    applicable extension).For example the command line

    >> f = imread (8. jpg);

    Reads the JPEG (above table) image chestxray into image array f. Note the use of single

    quotes () to delimit the string filename. The semicolon at the end of a command line is used by

    MATLAB for suppressing output If a semicolon is not included. MATLAB displays the results

    of the operation(s) specified in that line. The prompt symbol (>>) designates the beginning of a

    command line, as it appears in the MATLAB command window.

    Data Classes:

    Although we work with integers coordinates the values of pixels themselves are not

    restricted to be integers in MATLAB. Table above list various data classes supported by

    MATLAB and IPT are representing pixels values. The first eight entries in the table are refers to

    as numeric data classes. The ninth entry is the char class and, as shown, the last entry is referred

    to as logical data class.

    All numeric computations in MATLAB are done in double quantities, so this is also a

    frequent data class encounter in image processing applications. Class unit 8 also is encountered

    frequently, especially when reading data from storages devices, as 8 bit images are most

    common representations found in practice. These two data classes, classes logical, and, to a

  • 8/13/2019 Bayesian Estimation for Continuous-Time

    26/51

    lesser degree, class unit 16 constitute the primary data classes on which we focus. Many ipt

    functions however support all the data classes listed in table. Data class double requires 8 bytes

    to represent a number uint8 and int 8 require one byte each, uint16 and int16 requires 2bytes and

    unit 32.

    Name Description

    Double Double _ precision, floating_ point numbers the Approximate.

    Uint8 unsigned 8_bit integers in the range [0,255] (1byte per

    Element).

    Uint16 unsigned 16_bit integers in the range [0, 65535] (2byte per element).

    Uint 32 unsigned 32_bit integers in the range [0, 4294967295](4 bytes per

    element). Int8 signed 8_bit integers in the range [-128,127] 1 byte per element)

    Int 16 signed 16_byte integers in the range [32768, 32767] (2 bytes per

    element).

    Int 32 Signed 32_byte integers in the range [-2147483648, 21474833647] (4

    byte per element).

    Single single _precision floating _point numbers with values

    In the approximate range (4 bytes per elements)

    Char characters (2 bytes per elements).

    Logical values are 0 to 1 (1byte per element).

    Int 32 and single required 4 bytes each. The char data class holds characters in Unicode

    representation. A character string is merely a 1*n array of characters logical array contains only

    the values 0 to 1,with each element being stored in memory using function logical or by using

    relational operators.

    Image Types:

    The toolbox supports four types of images:

  • 8/13/2019 Bayesian Estimation for Continuous-Time

    27/51

    1 .Intensity images;

    2. Binary images;

    3. Indexed images;

    4. R G B images.

    Most monochrome image processing operations are carried out using binary or intensity

    images, so our initial focus is on these two image types. Indexed and RGB colour images.

    Intensity Images:

    An intensity image is a data matrix whose values have been scaled to represent

    intentions. When the elements of an intensity image are of class unit8, or class unit 16, they have

    integer values in the range [0,255] and [0, 65535], respectively. If the image is of class double,

    the values are floating point numbers. Values of scaled, double intensity images are in the range

    [0, 1] by convention.

    Binary Images:

    Binary images have a very specific meaning in MATLAB.A binary image is a logical

    array 0s and1s.Thus, an array of 0s and 1s whose values are of data class, say unit8, is not

    considered as a binary image in MATLAB .A numeric array is converted to binary using

    function logical. Thus, if A is a numeric array consisting of 0s and 1s, we create an array B using

    the statement.

    B=logical (A)

    If A contains elements other than 0s and 1s.Use of the logical function converts all

    nonzero quantities to logical 1s and all entries with value 0 to logical 0s.

    Using relational and logical operators also creates logical arrays.

  • 8/13/2019 Bayesian Estimation for Continuous-Time

    28/51

    To test if an array is logical we use the I logical function: islogical(c).

    If c is a logical array, this function returns a 1.Otherwise returns a 0. Logical array can be

    converted to numeric arrays using the data class conversion functions.

    Indexed Images:

    An indexed image has two components:

    A data matrix integer, x

    A color map matrix, map

    Matrix map is an m*3 arrays of class double containing floating point values in the range

    [0, 1].The length m of the map are equal to the number of colors it defines. Each row of map

    specifies the red, green and blue components of a single color. An index ed images uses direct

    mapping of pixel intensity values color map values. The color of each pixel is determined by

    using the corresponding value the integer matrix x as a pointer in to map. If x is of class double

    ,then all of its components with values less than or equal to 1 point to the first row in map, all

    components with value 2 point to the second row and so on. If x is of class units or unit 16, then

    all components value 0 point to the first row in map, all components with value 1 point to the

    second and so on.

    RGB Image:

    An RGB color image is an M*N*3 array of color pixels where each color pixel is triplet

    corresponding to the red, green and blue components of an RGB image, at a specific spatial

    location. An RGB image may be viewed as stack of three gray scale images that when fed in to

    the red, green and blue inputs of a color monitor

    Produce a color image on the screen. Convention the three images forming an RGB color

    image are referred to as the red, green and blue components images. The data class of the

    components images determines their range of values. If an RGB image is of class double the

    range of values is [0, 1].

  • 8/13/2019 Bayesian Estimation for Continuous-Time

    29/51

    Similarly the range of values is [0,255] or [0, 65535].For RGB images of class units or

    unit 16 respectively. The number of bits use to represents the pixel values of the component

    images determines the bit depth of an RGB image. For example, if each component image is an

    8bit image, the corresponding RGB image is said to be 24 bits deep.

    Generally, the number of bits in all component images is the same. In this case the

    number of possible color in an RGB image is (2^b) ^3, where b is a number of bits in each

    component image. For the 8bit case the number is 16,777,216 colors.

    We consider continuous-time sparse stochastic processes from which we have only a finite

    number of noisy/noiseless samples. Our goal is to estimate the noiseless samples (denoising) and

    the signal in-between(interpolation problem). By relying on tools from the theory of splines, we

    derive the joint a priori distribution of the samples and show how this probability density

    function can be factorized. The factorization enables us to tractably implement the maximum a

    posteriori and minimum mean-square error (MMSE) criteria as two statistical approaches for

    estimating the unknowns. We compare the derived statistical methods with well-known

    techniques for the recovery of sparse signals, such as the 1 norm and Log (1-0 relaxation)

    regularization methods. The simulation results show that, under certain conditions, the

    performance of the regularization techniques can be very close to that of the MMSE estimator.

    decomposition (POD), reduction based on inertial manifolds, and reduced basis

    methodshave been used in this context. However, the solution of the reduced

    order optimal control problem is generally suboptimal and reliable error estimation

    is thus crucial. A posteriori error bounds for reduced order solutions of optimal control problems

    have been proposed for proper orthogonal decomposition (POD) and reduced basis surrogatemodels in [9] and [2, 7], respectively. However, all of these results have slight deficiencies, i.e.,

    evaluation of the bounds in [9] requires a forwardbackward solution of the underlying high-

    dimensional state and adjoint equations and is thus computationally expensive, the error

    estimator in [2] is not a rigorous upper bound for the error, and the result in [7] only applies to

    optimal control problems without control constraints involving stationary (time-independent)

  • 8/13/2019 Bayesian Estimation for Continuous-Time

    30/51

    PDEs. In this talk, we employ the reduced basis method [8] as a surrogate model for the solution

    of optimal control problems. We consider the following simplified roblem

    setting: Given D RP , we want to solve the optimal control problem (

    yd Y and ud U are the desired state and control, respectively; D is a

    measurable set; and > 0 is the given regularization parameter. It follows from

    our assumptions that there exists a unique optimal solution to (1) [6]. In the reduced basis

    methodology, we next introduce a truth finite element approximation space YN Y of very

    large dimension N and reduced basis space YN YN of dimension N, where usually N N.

    We denote the truth and reduced basis solutions to the optimal control problem (1) obtained

    through a Galerkin projection onto the respective spaces by (yN , u N) YN U and

    (yN , u N) YN U, respectively.

    We first show, for the simple problem setting (1), how to derive rigorous and

    efficiently evaluable a posteriori error bounds for the optimal control, ku N u

    NkU, and the associated cost functional, |J(yN , u N; ) J(yN , u N; )| [3, 4]. We start

    with the bound from [9], replace the required high-dimensional state and adjoint

    solution by the solution to the associated reduced basis approximation, and bound

    the error introduced by extending the standard reduced basis a posteriori error bounds. The error

    bound for the cost functional is based on the standard result

    in [1], again by extending standard reduced basis a posteriori error bounds. The

    offline-online decomposition directly applies to the reduced basis optimal control

    problem and the associated a posteriori error bounds: the computational complexity

    in the online stage to evaluate (yN , u N) as well as the control and associated cost functional

    error bounds depends only on N and is independent of N. Our approach thus allows not only the

    efficient real-time solution of the reduced optimal control problem, but also the efficient real-

    time evaluation of the quality of the suboptimal solution. Finally, we consider various extensions

    of the problem setting (1) to elliptic

    optimal control problems with distributed controls and to parabolic problems with multiple

    (time-dependent) controls and control constraints [5]. We also present

    numerical results to confirm the validity of our approach. In the parabolic case, for

    example, we can guaranteethanks to our a posteriori error boundsa relative

    error in the control of less than 1% whilst obtaining an average (online) speed-up

  • 8/13/2019 Bayesian Estimation for Continuous-Time

    31/51

    of approximately 580 for the solution of the reduced basis optimal control problem

    compared to the truth finite element optimal control problem and an average

    speed-up of approximately 400 for the solution of the optimal control problem and

    evaluation of the error bound for the control and cost functional. A principal approach to

    compute (2) is the Monte Carlo method. However, it is extremely expensive to generate a large

    number of suitable samples and to solve the deterministic boundary value problem (1) on each

    sample. To overcome this obstruction, the multilevel Monte Carlo method (MLMC) has been

    developed in

    [1]. From the stochastic point of view, it is a variance reduction technique which

    considerably decreases the complexity. The idea is to combine the Monte Carlo

    quadrature of the stochastic variable with a multilevel splitting of the Bochner

    space which contains the random solution. Then, to compute (2), most samples

    can be performed on coarse spatial discretizations while only a few samples must

    be performed on fine spatial discretizations. This proceeding is a sparse grid

    approximation of the expectation. If we replace the Monte Carlo quadrature by

    another quadrature rule for high-dimensional integrals, we obtain for example

    the multilevel quasi Monte Carlo method (MLQMC) or the multilevel Gaussian

    quadrature method (MLGQ).

    It is a classical topic to look to orthonormal basis of polynomials on a compact

    set X of Rd, with respect to some Radon measure . For exemple : the one

    dimensional interval (Jacobi), the unit sphere (Spherical harmonics), the ball and

    the simplex (work of Petrushev, Xu, ...) In this framework, one can be interested

    in the best approximation of functions by polynomials of fixed degree, in Lp(),

    and to built a suitable frame for characterization of function spaces related to

    this approximation. This constructions have been carried using special functions

    estimates.

    We will be interested by spaces where the polynomials give the spectral spaces of

    some positive selfadjoint operator. Under suitable conditions, a natural metric

    could be defined on X such that (X, , ) is a homogeneous space, and if the

    associated semi-group has a good Gaussian behavior, then we could apply the

    procedure developed in recent works by P. Petrushev, T. Coulhon and G.K., to

  • 8/13/2019 Bayesian Estimation for Continuous-Time

    32/51

    built such frames, and such function spaces.

    Sample regularity and fast simulation of isotropic Gaussian random fields on the

    sphere are for example of interest for the numerical analysis of stochastic partial

    differential equations and for the simulation of ice crystals or Saharan dust particles

    as lognormal random fields. In what follows we recall the results from [2], which

    include the approximation of isotropic Gaussian random fields with convergence

    rates as well as the regularity of the samples in relation to the smoothness of

    the covariance expressed in terms of the decay of the angular power spectrum.

    As example we construct isotropic Q-Wiener processes out of isotropic Gaussian

    random fields and discretize the stochastic heat equation with spectral methods.

    Before we state the results, we start with a short review of the basics

    III. PRIOR DISTRIBUTION

    To derive statistical estimation methods such as MAP and MMSE, we need the a priori

    distributions. In this section, by using the generalized differences in (17), we obtain the prior

    distribution and factorize it efficiently. This factorization is fundamental, since it makes the

    implementation of the MMSE and MAP estimators tractable. The general problem studied in this

    paper is to estimate at for arbitrary (values of the continuous process at a uniform sampling grid

    with ), given a finite number of noisy/noiseless measurements of at the integers. Although we are

    aware that this is not equivalent to estimating the continuous process, by increasing we are able

    to make the estimation grid as fine as desired. For piecewise-continuous signals, the limit process

    of refining the grid can give us access to the properties of the continuous domain. To simplify the

    notations let us define

    distribution of (a priori distribution). However, the are in general pairwise-dependent, which

    makes it difficult to

    deal with the joint distribution in high dimensions. This corresponds to a large number of

    samples. Meanwhile, as will be shown in Lemma 1, the sequence forms a Markov chain of order

    that helps in factorizing the joint probability

  • 8/13/2019 Bayesian Estimation for Continuous-Time

    33/51

    distributions, whereas does not. The leading idea of this work is then that each depends on a

    finite number of . It then becomes much simpler to derive the joint distribution of and link it to

    that of . Lemma 1 helps us to factorize the joint pdf of . Lemma 1: For and , where is the

    differential order of L, 1) the random variables are identically distributed, 2) the sequence is a

    Markov chain of order , and 3) the sample is statistically independent of and . Proof: First note

    that

    Since functions are shifts of the same function for various and is stationary (Definition 1), areidentically distributed.

    Recalling that is supported on , we know that and have no common support for . Thus, due to the

    whiteness of cf. Definition 1), the random variables and are independent. Consequently, we can

    write

    which confirms that the sequence forms a Markov chain of order . Note that the choice of is due

    to the support of . If the support of was infinite, it would be impossible to find such that and were

    independent in the strict sense. To rove the second independence property, we recall that

  • 8/13/2019 Bayesian Estimation for Continuous-Time

    34/51

    Condition (ii) in Section II-B1 implies that for . Hence, and have disjoint supports. Again due to

    whiteness of , this implies that and are independent. We now exploit the properties of to obtain

    the a priori distribution of . Theorem 1, which is proved in Appendix C summarizes the main

    result of Section III.

    Theorem 1:Using the conventions of Lemma 1, for we have In the definition of proposed in

    Section II-B, except when all the poles are strictly included in the left half-plane, the operator

    fails to be shift-invariant. Consequently, neither nor are stationary. An interesting consequence

    of Theorem 1 is that it relates the probability distribution of

    the non-stationary process to that of the stationary process plus a minimal set of transient terms.

    Next, we show in Theorem 2 how the conditional probability of can be obtained from a

    characteristic form. To maintain the flow of the paper, the proof is postponed to Appendix D.

    Theorem 2:The probability density function of conditioned on previous is given by

    where

  • 8/13/2019 Bayesian Estimation for Continuous-Time

    35/51

    SIGNAL ESTIMATION

    MMSE and MAP estimation are two common statistical paradigms for signal recovery. Since the

    optimal MMSE estimator is rarely realizable in practice, MAP is often used as the next best

    thing. In this section, in addition to applying the two methods to the proposed class of signals, we

    settle the question of knowing when MAP is a good approximation of MMSE. For the estimation

    purpose it is convenient to assume that the sampling instances ssociated with are included in the

    uniform sampling grid for which we want to estimate the values of . In other ords, we assume

    that , where is the sampling period in the definition of and is a positive integer. This assumption

    does impose no lower bound on the resolution of the grid because we can set arbitrarily close to

    zero by increasing .To simplify mathematical formulations, we use vectorial notations. We

    indicate the vector of noisy/noiseless measurements by . The vector stands for the hypothetical

    realization of the process on the considered grid, and denotes the subset that corresponds to the

    points at which we have a sample. Finally, we represent the vector of estimated values by .

    MMSE Denoising It is very common to evaluate the quality of an estimation method by means of

    the mean-square error, or SNR. In this regard, the best estimator, known as MMSE, is obtained

    by evaluating the posterior mean, or . For Gaussian processes, this expectation is easy to obtain,

    since it is equivalent to the best linear estimator [24]. However, there are only a few non-

    Gaussian caseswhere an explicit closed form of this expectation is known. In particular, if the

    additive noise is white and Gaussian (no restriction on the distribution of the signal) and pure

    denoising is desired , then the MMSE estimator can be reformulated as

    where stands for with , and is the variance of the noise [25][27]. Note that the pdf , which is the

    result of convolving the a priori distribution with the Gaussian pdf of the noise, is both

  • 8/13/2019 Bayesian Estimation for Continuous-Time

    36/51

    continuous and differentiable. B. MAP Estimator Searching for the MAP estimator amounts to

    finding themaximizer of the distribution. It is commonplace to reformulate this conditional

    probability in terms of the a priori distribution. The additive discrete noise is white and Gaussian

    with the variance . Thus,

    In MAP estimation, we are looking for a vector that maximizes the conditional a posteriori

    probability, so that (26) leads to

    The last equality is due to the fact that neither nor depend on the choice of . Therefore, they play

    no role

    in the maximization. If the pdf of is bounded, the cost function (27) can be replaced with its

    logarithm without changing the maximizer. The equivalent logarithmic maximization problem is

    given by

    By using the pdf factorization provided by Theorem 1, (28) is further simplified to

  • 8/13/2019 Bayesian Estimation for Continuous-Time

    37/51

    Gaussian pdf becomes more evident as decays faster. In the extreme case where is Gaussian, is

    also Gaussian convolution of two Gaussians) with a different mean (which introduces another

    type of bias). The fact that MAP and MMSE estimators are equivalent for Gaussian innovations

    indicates that the two biases act in opposite directions and cancel out each other. In summary, for

    super-exponentially decaying innovations, MAP seems to be consistent with MMSE. For heavy-

    tail innovations, however, the MAP estimator is a biased version of the MMSE, where the effect

    of the bias is observable at high noise powers. The scenario in which MAP diverges most from

    MMSE might be the exponentially decaying innovations, where we have both a mismatch and a

    bias in the cost function, as will

    be confirmed in the experimental part of the paper

    MMSE Interpolation

    A classical problem is to interpolate the signal using noiseless samples. This corresponds to the

    estimation of where

    ( does not divide ) by assuming . Although there is no noise in this scenario, we can still employ

    the MMSE criterion to estimate . We show that the MMSE interpolator of a Lvy process is the

    simple linear interpolator, irrespective of the type of innovation. To prove this, we assume

  • 8/13/2019 Bayesian Estimation for Continuous-Time

    38/51

    In the case of impulsive Poisson innovations, as shown in (10), the pdf of has a single mass

    probability at . Hence, the MAP estimator will choose for all , resulting in a constant signal. Inother words, according to the MAP criterion and due to the boundary condition , the optimal

    estimate is nothing but the trivial all-zero function. For the other types of innovations where the

    pdf of the increments is bounded, or, equivalently, when the Lvy density is singular at the origin

    [20], one can reformulate the MAP estimation in the form of (30) as

    The parameters of the above innovations are set such that they all lead to the same entropy value.

    The negative log-likelihoods of the first two innovation types resemble the and regularization

    terms. However, the curve of for the Cauchy innovation shows a no convex log-type function. C.

    MMSE Denoising As discussed in Section IV-C, the MMSE estimator, either in the expectation

    form or as a minimization problem, is not separable with respect to the inputs. This is usually a

    critical restriction in high dimensions. Fortunately, due to the factorization of the joint a priori

    distribution, we can lift this restriction by employing the powerful message-passing algorithm.

    The method consists of representing the statistical dependencies between the parameters as a

  • 8/13/2019 Bayesian Estimation for Continuous-Time

    39/51

    graph and finding the marginal distributions by iteratively passing the estimated pdfs along the

    edges [29]. The

    Transmitted messages along the edges are also known as beliefs, which give rise to the

    alternative name of belief propagation. In general, the marginal distributions (beliefs) are

    continuous-domain objects. Hence, for computer simulations we need to discredited them. In

    order to define a graphical model for a given joint probability distribution, we need to define two

    types of nodes: variable nodes that represent the input arguments for the joint pdf and factor

    nodes that portray each one of the terms in the factorization of the joint pdf. The edges in the

    graph are drawn only between nodes of different type and indicate the contribution of an input

    argument to a given factor. For the Lvy process, the joint conditional pdf is factorized as Note

    that, by definition, we have . For illustration purposes, we consider the special case of pure

    denoising corresponding to . We give in Fig. 4 the bipartite graph associated to the joint pdf (46).

    The variable nodes depicted in the middle of the graph stand for the input arguments . The factor

    nodes are placed at the right and left sides of the variable nodes depending on whether theyrepresent the Gaussian factors or the factors, respectively. The set of edges also indicates a

    participation of the variable nodes in the corresponding factor nodes. The message-passing

    algorithm consists of initializing the nodes of the graph with proper 1D functions and updating

    these functions through communications over the graph. It is desired that we eventually obtain

  • 8/13/2019 Bayesian Estimation for Continuous-Time

    40/51

    the marginal pdf on the th variable node, which enables us to obtain the mean. The details of the

    messages sent over the edges and updating rules are given in [30], [31].

    SIMULATION RESULTS

    For the experiments, we consider the denoising of Lvy processes for various types of

    innovation, including those introduced in Section II-A and the Laplace-type innovation

    discussed in Appendix E. Among the heavy-tail -stable innovations, we choose the Cauchy

    distribution corresponding to . The four implemented denoising methods are 1) Linear minimum

    mean-square error (LMMSE) method or quadratic regularization (also known as smoothing

    spline [32]) defined as

    for given innovation statistics and a given additive-noise variance, we search for the best for each

    realization by comparing the results with the oracle estimator provided by the noiseless signal.

    Then, we average over a number of

    realizations to obtain a unified and realization-independent value. This procedure is repeated

    each time the statistics

    (either the innovation or the additive noise) change. For Gaussian processes, the LMMSE

    method coincides with

    both the MAP and MMSE estimators.

    2) Total-variation regularization represented as

  • 8/13/2019 Bayesian Estimation for Continuous-Time

    41/51

    where should be optimized. The optimization process is similar to the one explained for the

    LMMSE method. In

    our experiments, we keep fixed throughout the minimization steps (e.g., in the gradient-descent

    iterations). Unfortunately, Log is not necessarily convex, which might result in a no convex cost

    function. Hence, it is possible

    that gradient-descent methods get trapped in local minima rather than the desired globalminimum. For heavy-tail innovations (e.g., -stables), the Log regularizer is either the exact, or a

    very good approximation of, the MAP estimator. 4) Minimum mean-square error denier which is

    implemented using the message-passing technique discussed in Section V-C. The experiments

    are conducted in MATLAB. We have developed a graphical user interface that facilitates the

    procedures of generating samples of the stochastic process and denoising them using MMSE or

    the variational techniques. We show in Fig. 5 the SNR improvement of a Gaussian process after

    denoising by the four methods. Since the LMMSE and MMSE methods are equivalent in the

    Gaussian case, only the MMSE curve obtained from the message-passing algorithm is plotted.

    As expected, the MMSE method outperforms the

  • 8/13/2019 Bayesian Estimation for Continuous-Time

    42/51

  • 8/13/2019 Bayesian Estimation for Continuous-Time

    43/51

    TV and Log regularization techniques. The counter intuitive observation is that Log, which

    includes a no convex penalty function, performs better than TV. Another advantage of the Log

    regularizer is that it is differentiable and quadratic around the origin. A similar scenario is

    repeated in Fig. 6 for the compound- Poisson innovation with and Gaussian amplitudes (zero-

    mean and ). As mentioned in Section V-B, since the pdf of the increments contains a mass

    probability at , the MAP estimator selects the all-zero signal as the most probable choice. In Fig.

    6, this trivial estimator is excluded from the comparison. It can be observed that the performance

    of the MMSE denoiser, which is considered to be the gold standard, is very close to that of the

    TV regularization method at low noise powers where the source sparsity dictates the structure.

    This is consistent with what was predicted in [13]. Meanwhile, it performs almost as well as the

    LMMSE method at large noise powers. There, the additive Gaussian noise is the dominant term

    and the statistics of the noisy signal is mostly determined by the Gaussian constituent, which is

    matched to the LMMSE method. Excluding the MMSE method, none of the other three

    outperforms another one for the entire range of noise. Heavy-tail distributions such as -stables

    produce sparse or compressible sequences. With high probability, their realizations consist of a

    few large peaks and many insignificant sam-

  • 8/13/2019 Bayesian Estimation for Continuous-Time

    44/51

    ples. Since the convolution of a heavy-tail pdf with a Gaussian pdf is still heavy-tail, the noisy

    signal looks sparse even at large noise powers. The poor performance of the LMMSE method

    observed in Fig. 7 for Cauchy distributions confirms this characteristic. The pdf of the Cauchy

    distribution, given by , is in fact the symmetric -stable distribution with . The Log regularizer

    corresponds to theMAP estimator of this distribution while there is no direct link between the TV

    regularizer and the MAP or MMSE criteria. The SNR improvement curves in Fig. 7 indicate that

    the MMSE and Log (MAP) denoisers for this sparse process perform similarly (specially at small

    noise powers) and outperform the corresponding -norm regularizer (TV). In the final scenario,

    we consider innovations with and . This results in finite differences obtained at that follow a

    Laplace distribution (see Appendix E). Since the MAP denoiser for this process coincides with

    TV regularization, sometimes the Laplace distribution has been considered to be a sparse prior.

    However, it is proved in [11], [12] that the realizations of a sequence with Laplace prior are not

    compressible, almost surely. The curves presented in Fig. 8 show that TV is a good

    approximation of the MMSE method only in light-noise conditions. For moderate to large noise,

    the LMMSE method is better than TV.

    CONCLUSION

    In this paper, we studied continuous-time stochastic processes where the process is defined by

    applying a linear operator on a white innovation process. For specific types of innovation, the

    procedure results in sparse processes. We derived a factorization of the joint posterior

    distribution for the noisy samples of the broad family ruled by fixed-coefficient stochastic

    differential equations. The factorization allows us to efficiently apply statistical estimation tools.

    A consequence of our pdf factorization is that it gives us access to the MMSE

    estimator. It can then be used as a gold standard for evaluating the performance of regularization

    techniques. This enables us to replace the MMSE method with a more-tractable and

    computationally efficient regularization technique matched to the problem without

    compromising the performance. We then focused on Lvy processes as a special case for which

    we studied the denoising and interpolation problems using

  • 8/13/2019 Bayesian Estimation for Continuous-Time

    45/51

    MAP and MMSE methods. We also compared these methods with the popular regularization

    techniques for the recovery of sparse signals, including the norm (e.g., TV regularizer) and the

    Log regularization approaches. Simulation results showed that we can almost achieve the MMSE

    performance by tuning the regularization technique to the type of innovation and the power of

    the noise.We have also developed a graphical user interface

    in MATLAB which generates realizations of stochastic processes with various types of

    innovation and allows the user to apply either the MMSE or variational methods to denoise the

    samples1

  • 8/13/2019 Bayesian Estimation for Continuous-Time

    46/51

    APPENDIX A

    CHARACTERISTIC FORMS

    In Gelfands theory of generalized random processes, the process is defined through its inner

    product with a space of test functions, rather than point values. For a random process and an

    arbitrary test function chosen from a given space, the characteristic form is defined as the

    characteristic function of the random variable and given by

    As an example, let be a normalized white Gaussian noise and let be an arbitrary function in . It is

    well-known that

    is a zero-mean Gaussian random variable with variance . Thus, in this example we have that P

    (52) An interesting property of the characteristic forms is that they help determine the joint

    probability density functions for arbitrary

    finite dimensions as

    where are test functions and are scalars. Equation (53) shows that an inverse Fourier transform of

    the characteristic

    form can yield the desired pdf. Beside joint distributions, characteristic forms are useful for

    generating moments too:

  • 8/13/2019 Bayesian Estimation for Continuous-Time

    47/51

    APPENDIX E

    WHEN DOES TV REGULARIZATION MEET MAP?

    The TV-regularization technique is one of the successful methods in denoising. Since the TV

    penalty is separable with respect to first-order finite differences, its interpretation as a MAP

    estimator is valid only for a Lvy process. Moreover, the MAP estimator of a Lvy process

    coincides with TV regularization only if , where and are constants such that . This condition

    implies that is the Laplace pdf . This pdf is a valid distribution for the first-order finite

    differences of the Lvy process characterized by the innovation with and because

  • 8/13/2019 Bayesian Estimation for Continuous-Time

    48/51

    Where is the modified Bessel function of the second kind. The latter probability density function

    is known assymmetric variance-gamma orsymm-gamma. It is not hard to check that we obtain

    the desired Laplace distribution for. However, this value of is the only one for which we observe

    this property. Should the sampling grid become finer or coarser, the MAP estimator would no

    longer coincide with TV regularization. We show in Fig. 9 the shifted functions for various

    values for the aforementioned innovation where.

    ACKNOWLEDGMENT

    The authors would like to thank P. Thvenaz and the anonymous reviewer for their constructive

    comments which greatly improved the paper.

    REFERENCES

    [1] E. Cands, J. Romberg, and T. Tao, Robust uncertainty principles:

    Exact signal reconstruction from highly incomplete frequency information,

    IEEE Trans. Inf. Theory, vol. 52, no. 2, pp. 489509, Feb.

    2006.

    [2] E. Cands and T. Tao, Near optimal signal recovery from random

  • 8/13/2019 Bayesian Estimation for Continuous-Time

    49/51

    projections: Universal encoding strategies,IEEE Trans. Inf. Theory,

    vol. 52, no. 12, pp. 54065425, Dec. 2006.

    [3] D. L. Donoho, Compressed sensing,IEEE Trans. Inf. Theory, vol.

    52, no. 4, pp. 12891306, Apr. 2006.

    [4] J. L. Starck, M. Elad, and D. L. Donoho, Image decomposition via

    the combination of sparse representations and a variational approach,

    IEEE Trans. Image Process., vol. 14, no. 10, pp. 15701582, Oct. 2005.

    [5] S. J. Wright, R. D. Nowak, and M. A. T. Figueiredo, Sparse reconstruction

    by separable approximation,IEEE Trans. Signal Process.,

    vol. 57, no. 7, pp. 24792493, July 2009.

    [6] E. Cands and T. Tao, Decoding by linear programming,IEEE Trans.

    Inf. Theory, vol. 51, no. 12, pp. 42034215, Dec. 2005.

    [7] L. Rudin, S. Osher, and E. Fatemi, Nonlinear total variation based

    noise removal algorithms,Phys. D, Nonlin. Phenom., vol. 60, pp.

    259268, Nov. 1992.

    [8] D. P. Wipf and B. D. Rao, Sparse Baysian learning for basis selection,

    IEEE Trans. Signal Process., vol. 52, no. 8, pp. 21532164, Aug.

    2004.

    [9] T. Park andG. Casella, The Bayesian LASSO,J. Amer. Stat. Assoc.,

    vol. 103, pp. 681686, June 2008.

    [10] V. Cevher, Learning with compressible priors, presented at the

    Neural Inf. Process. Syst. (NIPS), Vancouver, BC, Canada, Dec. 810,

    2008.

    [11] A.Amini,M. Unser, and F.Marvasti, Compressibility of deterministic

    and random infinite sequences,IEEE Trans. Signal Process., vol. 59,

    no. 11, pp. 51395201, Nov. 2011.

    [12] R. Gribonval, V. Cevher, and M. Davies, Compressible distributions

    for high-dimensional statistics,IEEE Trans. Inf. Theory, vol. 58, no.

    8, pp. 50165034, Aug. 2012.

    [13] R.Gribonval, Should penalized least squares regression be interpreted

    as maximum a posteriori estimation?,IEEE Trans. Signal Process.,

  • 8/13/2019 Bayesian Estimation for Continuous-Time

    50/51

    vol. 59, no. 5, pp. 24052410, May 2011.

    [14] M. Unser and P. Tafti, Stochastic models for sparse and piecewise-

    smooth signals,IEEE Trans. Signal Process., vol. 59, no. 3, pp.

    9891006, Mar. 2011.

    [15] M. Vetterli, P. Marziliano, and T. Blu, Sampling signals with finite

    rate of innovation,IEEE Trans. Signal Process., vol. 50, no. 6, pp.

    14171428, Jun. 2002.

    [16] M. Unser, P. Tafti, and Q. Sun, A unified formulation of

    Gaussian vs. sparse stochastic processes: Part IContinuous-domain

    theory, arXiv:1108.6150v1, 2012 [Online]. Available:

    http://arxiv.org/abs/1108.6150.

    [17] M. Unser, P. Tafti, A. Amini, and H. Kirshner, A unified formulation

    of Gaussian vs. sparse stochastic processes: Part IIDiscretedomain

    theory, arXiv:1108.6152v1, 2012 [Online]. Available: http://

    arxiv.org/abs/1108.6152.

    [18] I. Gelfand and N. Y. Vilenkin, Generalized Functions, Vol. 4. Applications

    of Harmonic Analysis. New York: Academic, 1964.

    [19] K. Sato, Lvy Processes and Infinitely Divisible Distributions.

    London, U.K.: Chapman & Hall, 1994.

    [20] F. W. Steutel and K. V. Harn, Infinite Divisibility of Probability Distributions

    on the Real Line. New York: Marcel Dekker, 2003.

    [21] G. Samorodnitsky and M. S. Taqqu, Stable Non-Gaussian Random

    Processes. London, U.K.: Chapman & Hall/CRC, 1994.

    [22] M. Unser and T. Blue, Self-similarity: Part ISplines and operators,

    IEEE Trans. Signal Process., vol. 55, no. 4, pp. 13521363, Apr. 2007.

    [23] M. Unser and T. Blu, Cardinal exponential splines: Part ITheory

    and filtering algorithms,IEEE Trans. Signal Process., vol. 53, no. 4,

    pp. 14251438, Apr. 2005.

    [24] T. Blu and M. Unser, Self-similarity: Part IIOptimal estimation of

    fractal processes,IEEE Trans. Signal Process., vol. 55, no. 4, pp.

    13641378, Apr. 2007.

  • 8/13/2019 Bayesian Estimation for Continuous-Time

    51/51

    [25] C. Stein, Estimation of the mean of a multivariate normal distribution,

    Ann. Stat., vol. 9, no. 6, pp. 11351151, 1981.

    [26] M. Raphan and E. P. Simoncelli, Learning to be Bayesian without

    supervision, inProc. Neural Inf. Process. Syst. (NIPS), Cambridge,

    MA, Dec. 45, 2006, pp. 11451152, MIT Press.

    [27] F. Luisier, T. Blu, and M. Unser, A new SURE approach to image

    denoising: Interscale orthonormal wavelet thresholding,IEEE Trans.

    Image Process., vol. 16, no. 3, pp. 593606, Mar. 2007.

    [28] D. Madan, P. Carr, and E. Chang, The variance gamma process and

    option pricing,Eur. Finance Rev., vol. 2, no. 1, pp. 79105, 1998.

    [29] H.-A. Loeliger, J. Dauwels, J. Hu, S. Korl, L. Ping, and F. R. Kschischang,

    The factor graph approach to model-based signal processing,

    Proc. IEEE, vol. 95, no. 6, pp. 12951322, Jun. 2007.

    [30] U. Kamilov, A. Amini, and M. Unser, MMSE denoising of sparse

    Lvy processes via message passing, inProc. Int. Conf. Acoust.,

    Speech, Signal Process. (ICASSP), Kyoto, Japan, Mar. 2530, 2012,

    pp. 36373640.

    [31] U. S. Kamilov, P. Pad, A. Amini, and M. Unser, MMSE estimation

    of sparse Lvy processes,IEEE Trans. Signal Process., vol. 61, no. 1,

    pp. 137147, 2012.

    [32] M. Unser and T. Blu, Generalized smoothing splines and the optimal

    discretization of the Wiener filter,IEEE Trans. Signal Process., vol.

    53, no. 6, pp. 21462159, Jun. 2005.