preconditioning iterative regularization methods in ...herbert/pubs/diss.pdf · to inverse and...

J O H A N N E S K E P L E R

U N I V E R S I T A T L I N ZN e t z w e r k f u r F o r s c h u n g , L e h r e u n d P r a x i s

Preconditioning Iterative Regularization

Methods in Hilbert Scales

Dissertation

zur Erlangung des akademischen Grades

Doktor der Technischen Wissenschaften

Angefertigt am Institut fur Industriemathematik

Begutachter:

a.Univ.-Prof. Dr. Andreas Neubauer

Priv.-Doz. Dr. Barbara Kaltenbacher, Universitat Erlangen

Eingereicht von:

Dipl.-Ing. Herbert Egger

Linz, July 2005

Johannes Kepler Universitat

A-4040 Linz · Altenbergerstraße 69 · Internet: http://www.uni-linz.ac.at · DVR 0093696

Acknowledgments

The starting point for my doctoral thesis was when I asked Prof. A. Neubauer for his

judgment on some of my ideas on preconditioning of iterative regularization methods

in Hilbert scales last autumn. During the past year I spent hours and hours in his office

discussing about various aspects of inverse problems and regularization, and asking

for his advice for establishing my proofs. And as the supervisor of my doctoral thesis,

he invested much time in answering my questions, pointing out certain difficulties,

and suggesting improvements. I want to thank him for his encouraging advice and

supervision.

Thanks also to Priv.-Doz. Dr. Barbara Kaltenbacher for the careful proof-reading

an her detailed remarks and suggestions, which surely led to an improved presentation.

I want to express my special gratitude to Prof. H. W. Engl, who has been supervising

my work at the SFB ”Numerical and Symbolic Scientific Computing” over the last three

years, for his guidance and also for providing me the freedom to follow my interests and

choose my own topics of research. Since the foundation of the ”Johann Radon Institute

for Computational and Applied Mathematics” (RICAM) of the Austrian Academy of

Sciences, which Prof. Engl is probably most responsible for, and with the ”Research

Institute for Symbolic Computation” (RISC) and the SFB F013, Linz became a unique

center of mathematics being able to attract people from all over the world, and providing

an inspiring, dynamic atmosphere for young researchers.

I would also like to acknowledge financial support by the Austrian Science Fund

(FWF) under the project grant SFB F013/F1308, by RICAM and the Austrian

Academy of Sciences.

Last but not least I want to say thanks to my colleagues from the Industrial Math-

ematics Institute and the Institute of Computational Mathematics at the University

Linz, from the SFB and from RICAM, for creating such an inspiring and friendly at-

mosphere. Special thanks to Andreas, Benjamin, Martin, Philipp, and Rainer, for their

help and patience with my (I am sure sometimes annoying) questions and comments.

i

Abstract

This theses deals with the preconditioning of iterative regularization methods for linear

and nonlinear inverse problems, which arise in many applications in computational

mathematics, in other natural sciences, in engineering, and in industry. In many cases

such inverse problems are ill-posed, i.e., their solution is unstable with respect to data

perturbations, and stable approximations for a solution can only be found by so-called

regularization methods.

For large scale and nonlinear inverse problems, regularized solutions are typically

constructed by iterative algorithms that are used for a realization of continuous regular-

ization strategies like Tikhonov regularization on the one hand, or may be considered as

regularizing algorithms themselves if the iterations are stopped at the right time. The

stopping index may for instance be determined by the discrepancy principle, whose

implementation does not require any additional effort in case of iterative regularization

methods. The main emphasis of the presentation below is on preconditioning of such it-

erative methods, although most of the results directly apply also to other regularization

methods.

Besides the size of the data perturbations (the noise-level), the smoothness of the

solutions essentially determines the quality of the regularized approximations. Without

any smoothness, convergence (with noise-level tending to zero) will be arbitrarily slow

in general, and even under relatively strong smoothness assumptions on the solution

only Holder (or even only logarithmic) rates can be proven.

A main disadvantage of iterative regularization methods is that, especially for non-

smooth solutions, a large number of iterations has to be performed in order to guarantee

the optimal convergence rates. To overcome this problem, certain acceleration strategies

have been proposed for the iterative solution of linear inverse problems like the ν-

methods or the method of conjugate gradients. Here, we focus on a completely different

approach, namely preconditioning in Hilbert scales. This approach can also be used for

a further acceleration of already improved methods.

Preconditioning of well-posed problems has been investigated intensively, especially

the preconditioning of linear equations arising in the application of finite element meth-

ods to PDEs, and the resulting number of iterations can be shown to grow at most

logarithmically with the desired accuracy if the preconditioner is spectrally equivalent

to the inverse of the operator in the linear system. Note that for well-posed problems,

the preconditioner is typically a bounded (respectively smoothing) operator. In case of

iii

ill-posed problems, the situation is different: there the (forward) operator is typically

smoothing, while its inverse is unbounded. As for well-posed problems, good precon-

ditioners have to mimic the behavior of the inverse of the involved operator and thus

will usually be unbounded for ill-posed problems. This complicates preconditioning in

the presence of data noise, and only a reduction to the square root of iterations can

be achieved. However, since the usual iteration numbers may be rather large (10000

iterations are not unusual for the Landweber method), preconditioning typically leads

to a significant speed-up.

Hilbert scales were originally introduced in regularization theory with the goal to

overcome saturation effects of certain regularization methods. Here, we use Hilbert

scales for a different reason, namely to formulate and investigate preconditioning strate-

gies for a regularized iterative solution of ill-posed operator equations. The convergence

analysis of the resulting regularization methods is kept very general such that the stan-

dard convergence results and even a convergence theory for special methods for sym-

metric problems are included. Of particular importance from a numerical point of view

is that in many cases the preconditioners correspond to simple differential operators,

and thus preconditioning does not increase the numerical effort of a single iteration step

noticeably, while the overall number of iterations is reduced significantly.

The outline of the thesis is as follows: in Chapter 1 we give a short introduction

to inverse and ill-posed problems and recall the main definitions and basic results of

regularization theory.

Chapter 2 then gives a short overview over the most important and widely used

regularization algorithms for linear and nonlinear inverse problems. For later reference

we also summarize the main convergence rates results.

Hilbert scales, which are the main ingredient for the analysis of our preconditioning

strategy are introduced in Chapter 3. For a comparison with our results we also recall

the classical convergence results for regularization in Hilbert scales.

In Chapter 4 we formulate and analyze our preconditioning strategy for iterative

regularization methods for linear and nonlinear inverse problems. The results and con-

ditions of our convergence analysis are discussed in detail and compared to the ones of

standard regularization methods and classical regularization in Hilbert scales.

The applicability of our theoretical results is finally demonstrated in Chapter 5 for

various examples, and the effect of preconditioning is illustrated in several numerical

tests.

iv

Zusammenfassung

Die vorliegende Arbeit beschaftigt sich mit der Vorkonditionierung von iterativen Re-

gularisierungsmethoden fur lineare und nichtlineare inverse Probleme, die sowohl in

der angewandten Mathematik, als auch in naturwissenschaftlichen, technischen und in-

dustriellen Anwendungen auftauchen. In vielen Fallen sind solche inversen Probleme

schlecht gestellt, d.h., ihre Losung ist im allgemeinen instabil in bezug auf Datenfehler,

und daher konnen solche Probleme nur mit sogenannten Regularisierungsmethoden sta-

bil gelost werden.

Im Falle von hochdimensionalen und/oder nichtlinearen Problemen werden zur

Losung ublicherweise iterativen Algorithmen verwendet, etwa bei der Realisierung von

kontinuierlichen Regularisierungsmethoden wie der Tikhonov Regularisierung. Anderer-

seits konnen aber iterative Methoden auch direkt zur Regularisierung verwendet werden,

wenn die Iterationen rechtzeitig gestoppt werden, etwa mittels dem Diskrepanzprinzip,

welches bei iterativen Algorithmen leicht und ohne zusatzlichen Aufwand implemen-

tiert werden kann. In dieser Arbeit werden aus genannten Grunden hauptsachlich itera-

tive Regularisierungsmethoden untersucht; die meisten Resultate lassen sich aber ohne

großere Anderungen auf kontinuierliche Regularisierungsmethoden ubertragen.

Neben der Große des Datenfehlers bestimmt vor allem die Glattheit der Losungen

die Qualitat der Approximationen die durch Regularisierungsmethoden gefunden wer-

den konnen. Ohne jegliche Glattheit ist die Konvergenz der regularisierten Losungen

(mit Datenfehler gegen 0) im allgemeinen beliebig langsam, und selbst unter relativ

starken Glattheitsvoraussetzungen an eine Losung konnen meist nur Holderraten (bei

exponentiell schlecht gestellten Problemen sogar nur logarithmische Konvergenzraten)

erreicht werden.

Einer der wesentlichen Nachteile von iterativen Regularisierungsmethoden besteht

in der relativ großen Anzahl von Iterationen, die benotigt werden, um optimale Konver-

genzraten garantieren zu konnen. Die Anzahl der Iterationen steigt dabei bei reduzierter

Glattheit der Losungen. Fur lineare inverse Probleme stehen einige Beschleunigungs-

techniken zur Verfugung, etwa die sogennanten ν-Methoden oder die Methode der kon-

jugierten Gradienten. Inhalt dieser Arbeit ist ein ganzlich anderer Zugang, namlich

die Vorkonditionierung in Hilbertskalen. Diese Technik kann auch auf oben genannte

beschleunigte Verfahren angewendet werden.

Das Vorkonditionieren von schlecht konditionierten Gleichungssystemen, die von der

Anwendung Finiter Element Methoden auf partielle Differentialgleichungen herruhren,

v

ist relativ gut untersucht. In vielen Fallen kann man zeigen, dass durch geeignete

Wahl des Vorkonditionierers (spektralaquivalent zur Inversen des Operators im Glei-

chungssystem) die Anzahl der Iterationen nur logarithmisch mit der gewunschten

Genauigkeit steigt. Bei inversen Problemen ist die Situation ganz anders: Typischer-

weise sind die das Gleichungssystem beschreibenden Operatoren nicht beschrankt in-

vertierbar, und deshalb sind auch gute Vorkonditionierer im allgemeinen unbeschrankte

Opteratoren. Das erschwert das Vorkonditionieren im Falle von Datenfehlern, und die

Anzahl der Iterationen kann durch Vorkonditionierung im Wesentlichen nur auf die

Wurzel reduziert werden. Wenn man aber bedenkt, dass bei schlecht gestellten Prob-

lemen die Iterationsanzahl meist sehr hoch ist (z.B. sind 10000 Iterationen fur die

Landweber Methode durchaus nicht unublich), dann ist die durch Vorkonditionierung

erzielte Beschleunigung immer noch bemerkenswert.

Hilbertskalen wurden ursprunglich in der Regularisierungstheorie eingefuhrt um Sa-

turierungseffekte diverser Regularisierungsmethoden abzuschwachen und damit im Falle

glatter Losungen bessere Konvergenzraten zu erzielen. In dieser Arbeit werden Hilbert-

skalen mit einer ganz anderen Motivation verwendet, namlich um Vorkonditionierer

zu formulieren und ihren Effekt auf die iterative Losung schlecht gestellter Probleme

eingehend zu untersuchen. Unsere Konvergenzanalyse ist so allgemein gehalten, dass

die Standardresultate der Regularisierungstheorie und auch Konvergenzaussagen uber

weniger diskutierte Methoden fur symmetrische Probleme enthalten sind. Ein wichtiger

Punkt aus numerischer Sicht ist, dass in vielen Fallen mit einem Differentialoperator

vorkonditioniert werden kann, wodurch die Anwendung des Vorkonditionierers prak-

tisch keinen zusatzlichen Mehraufwand bedeutet, wahrend gleichzeitig die Anzahl der

Iterationen deutlich reduziert werden kann.

Der Inhalt der vorliegenden Arbeit ist wie folgt gegliedert: Kapitel 1 gibt eine kurze

Einfuhrung in die inversen und schlechtgestellten Probleme. Daruberhinaus werden die

wesentlichen Begriffe und Konvergenzaussagen der Regularisierungstheorie wiederholt.

Kapitel 2 gibt dann einen kurzen Uberblick uber die gangigsten Regularisie-

rungsmethoden. Um unsere Resultate mit denen der Standardtheorie vergleichen zu

konnen, werden auch die wesentlichen Konvergenzaussagen fur diese Methoden zitiert.

In Kapitel 3 werden wesentliche Aussagen uber Hilbertskalen gesammelt, die einen

Hauptbestandteil zur Formulierung und Untersuchung der spater behandelten Vorkon-

ditionierungsstrategie darstellen. Zusatzlich werden die klassischen Resultate der Re-

gularisierung in Hilbertskalen zum spateren Vergleich zitiert.

Die Vorkonditionierung von iterativen Regularisierungsmethoden in Hilbertskalen

wird in Kapitel 4 ausfuhrlich motiviert und untersucht, und es werden Konvergenzraten

fur die wichtigsten Methoden fur lineare und nichtlineare Probleme gezeigt. Die fur die

Analyse benotigten Bedingungen werden ausfuhrlich diskutiert und Zusammenhange

zu den klassischen Resultaten werden hergestellt.

Kapitel 5 beschaftigt sich schließlich mit dem Nachweis der Anwendbarkeit der the-

oretischen Ergebnisse auf praktische Probleme. Die theoretischen Aussagen uber den

Vorkonditionierungseffekt werden mit numerischen Testergebnissen untermauert.

vi

Contents

1 Inverse Problems and Regularization 1

1.1 Inverse and Ill-posed Problems . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Principles of Regularization . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2.1 Generalized Solutions . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2.2 Compact Operators . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2.3 Regularization Operators . . . . . . . . . . . . . . . . . . . . . . 6

1.2.4 Continuous Regularization Methods . . . . . . . . . . . . . . . . 6

1.2.5 Nonlinear Problems . . . . . . . . . . . . . . . . . . . . . . . . . 10

2 Regularization Methods 13

2.1 Tikhonov Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.2 Iterative Regularization Methods . . . . . . . . . . . . . . . . . . . . . 15

2.2.1 Landweber Iteration . . . . . . . . . . . . . . . . . . . . . . . . 16

2.2.2 The ν-Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.2.3 The Method of Conjugate Gradients . . . . . . . . . . . . . . . 17

2.2.4 Landweber Iteration for Nonlinear Problems . . . . . . . . . . . 18

2.2.5 Regularized Newton-type Iterations . . . . . . . . . . . . . . . . 20

3 Regularization in Hilbert Scales 21

3.1 Introduction and General Definitions . . . . . . . . . . . . . . . . . . . 21

3.2 Linear problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.3 Nonlinear problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4 Preconditioning in Hilbert Scales 29

4.1 Main Assumptions and Preliminary Results . . . . . . . . . . . . . . . 30

4.2 Linear Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.2.1 Semiiterative Methods . . . . . . . . . . . . . . . . . . . . . . . 34

4.2.2 The Conjugate Gradient Method . . . . . . . . . . . . . . . . . 38

4.3 Nonlinear Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

4.3.1 Basic Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . 44

4.3.2 Landweber Iteration . . . . . . . . . . . . . . . . . . . . . . . . 46

4.3.3 Newton-type Iterations . . . . . . . . . . . . . . . . . . . . . . . 50

vii

5 Examples and Numerical Tests 57

5.1 Integral Equations of the First Kind . . . . . . . . . . . . . . . . . . . . 58

5.1.1 Fredholm Integral Equations of the First Kind . . . . . . . . . . 58

5.1.2 Radon Inversion . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

5.1.3 An Inverse Problem in Imaging: Deblurring . . . . . . . . . . . 64

5.1.4 A Volterra-Hammerstein Integral Equation . . . . . . . . . . . . 67

5.2 Parameter Identification . . . . . . . . . . . . . . . . . . . . . . . . . . 70

5.2.1 An Inverse Source Problem in an Elliptic Equation . . . . . . . 70

5.2.2 Identifying a Reaction Term . . . . . . . . . . . . . . . . . . . . 71

5.2.3 An Inverse Problem in Mathematical Finance . . . . . . . . . . 73

5.2.4 Reconstructing a Nonlinear Source Term in a Parabolic Equation 76

5.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

Bibliography 81

Eidesstattliche Erklarung A1

Curriculum Vitae A3

viii

Chapter 1

Inverse Problems and

Regularization

In this work we investigate the acceleration of iterative regularization algorithms for

the solution of ill-posed inverse problems by preconditioning in Hilbert scales. Before

we formulate and analyze our methods, we introduce the basic concepts and notations

of inverse (ill-posed) problems and regularization. For later reference and a comparison

with the results of this work, we give a brief overview over some important classes of

regularization methods for linear and nonlinear inverse problems and recall the most

important convergence (rates) results.

1.1 Inverse and Ill-posed Problems

As already the notion suggests, inverse problems are always connected to direct prob-

lems. According to Keller [51], two problems are inverse to each other, if the formulation

of the one problem involves the other one. At least in a physical context, causality may

serve as a reasonable criterion for distinguishing, which problem is to be considered

the direct one, and which one the inverse; e.g., one would certainly call the prediction

of the physical state or the evolution of a system in dependence of certain (material)

parameters a direct problem, while the identification of (some of the) parameters from

measurements or observations of the physical state or derived quantities would natu-

rally be called the inverse problem. Therefore, inverse problems are typically concerned

with determining causes for a desired or observed effect.

It turns out in practice that many relevant inverse problems are ill-posed, i.e., they

lack at least one of the features of a well-posed problem in the sense of Hadamard:

existence, uniqueness or stability of a solution (with respect to given data). The study of

concrete inverse problems frequently involves the question how to enforce uniqueness by

additional assumptions or information. Such identifiability results (cf., e.g., [44, 45])are

however usually very problem dependent. By introducing a generalized solution concept

the claim for uniqueness (and partly for the existence of a solution) can be relaxed, and

1

2 CHAPTER 1. INVERSE PROBLEMS AND REGULARIZATION

the aspect of instability, which may be seen as the main characteristic of an ill-posed

problem, can be treated in a rather general way. As noted by A. N. Tikhonov [72],

the lack of stability is not due to a wrong formulation of the problem, but rather

naturally arises in many physically relevant problems, typically, if the direct problem

is smoothing.

For motivation we give here a short account on some classes of inverse problems

with important applications:

• (Computerized) tomography (cf. [64]): CT involves the reconstruction of a

function, usually a density distribution, from values of its line integrals and is

important both in medical applications and in nondestructive testing [31]. Math-

ematically, this is connected with the inversion of the Radon transform (see [64]).

• Inverse heat conduction problems like solving a heat equation backwards in

time or sideways (i.e., with Cauchy data on a part of the boundary) (cf. [2, 10, 32]).

• Inverse problems in imaging like deblurring and denoising (cf. [11]). Deblur-

ring has successfully been used to enhance images taken from the Hubble space

telescope. More recently, impainting, image segmentation and recognition have

gained increasing interest (cf. [4]).

• Inverse potential problems, i.e., problems where the observable quantity can

be expressed via surface or volume potentials, with application in geophysics and

geodesy. Typical problems are, e.g., the determination of a spatially varying den-

sity distribution in the earth from gravity measurements (cf. [30]), or of the gravity

potential of the earth from gravity measurements of a satellite (satellite geodesy).

Inverse potential problems also appear in connection with Maxwell’s equations

(cf. [69]); a prominent example is the synthesis problem of distributing charges in

such a way that a prescribed electric field is generated.

• Inverse scattering (cf. [18], where one wants to reconstruct an obstacle or an

inhomogeneity from scattered waves. This is a special case of shape reconstruction

and closely connected to shape optimization.

• Parameter identification in (systems of) (partial) differential equations from

interior or boundary measurements of a physical state (cf. [9, 45] with many ap-

plications, e.g., in groundwater hydrology, semiconductors, structural mechanics,

polymer crystallization, or mathematical finance. Identification from boundary

observations appears, e.g., in impedance tomography. Parameter identification is

also closely related to optimal control and optimal design.

• Geometric inverse problems, like shape reconstruction [43]: there, the quan-

tity of interest is a shape or geometry. Due to the lack of vector-space structure,

the analysis of such problems is rather difficult. If a domain is considered as the

1.2. PRINCIPLES OF REGULARIZATION 3

support of a characteristic function, geometric inverse problems can also be re-

formulated as parameter identification problems with jumping coefficients, where

the goal is to determine the jump. This class of problems is closely related to

optimal design and topology optimization.

Detailed references for these and many more classes of inverse problems can be found

e.g. in [28, 35, 40, 60]. Some of the above mentioned inverse problems, e.g., computerized

tomography, deblurring, and several parameter identification problems, will be discussed

in more detail in Section 5.

As we will see below, the instability is an inherently infinite dimensional phenomenon

(usually caused by the compactness of the direct problem) in many situations. Never-

theless, the lack of stability also causes difficulties in a numerical solution, e.g., finite

dimensional approximations of ill-posed problems are typically ill-conditioned. In any

case, an accurate and stable approximate solution of inverse (ill-posed) problems re-

quires special methods, so-called regularization methods.

1.2 Principles of Regularization

In this section we summarize the basic notions and definitions of regularization. We

follow the presentation in [28] and recall the main definitions and results for linear

inverse problems. The importance of spectral theory for the analysis and construction of

regularization methods will become clear from our presentation. Additionally, we briefly

discuss the most important regularization methods for linear and nonlinear problems

below.

In the sequel we consider linear inverse problems in the framework of abstract op-

erator equations of the form

Tx = y, (1.1)

where T denotes a bounded linear operator between Hilbert spaces X and Y . Unless

specified otherwise, we assume that the data y are attainable, i.e., y ∈ R(T ). In practice,

however, only approximations yδ of the true data y will be available. We further assume

that at least a bound on the data error is known, i.e.,

‖yδ − y‖ ≤ δ. (1.2)

Note that if R(T ) is non-closed, which is typically the case for ill-posed problems,

Tx = yδ may have no solution in general, and even if yδ ∈ R(T ), the corresponding

solution usually may be far away from the solution of Tx = y due to instability.

1.2.1 Generalized Solutions

Since it is too restrictive in many cases to require that (1.1) has a unique solution, we

will use the following generalized solution concept:


We call x a least-squares solution of Tx = y, if

‖Tx− y‖ = inf‖Tz − y‖ : z ∈ X.

Note that the infimum may not be attained, if R(T ) is not closed. A least-squares solu-

tion exists for y ∈ R(T )+R(T )⊥, but will be not unique if T is not injective. Uniqueness

can be restored by an appropriate selection criterion: we call x† an x∗-minimum-norm

solution if it has minimal norm ‖x − x∗‖ among all least-squares solutions. For linear

problems x∗ can always be replaced by 0 by changing the right-hand side to y − Tx∗.In this case, the 0−minimum-norm solution is usually called best approximate solution.

The operator T † with domain D(T †) = R(T ) +R(T )⊥ that maps data y ∈ D(T †)

onto the best-approximate solution x† = T †y is called the Moore-Penrose (generalized)

inverse of T . By the Open-Mapping-Theorem, the Moore-Penrose inverse is bounded,

i.e., the best approximate solution depends continuously on the data y if and only if

R(T ) is closed in Y (see [28, 34, 62] for details). Otherwise, the problem is ill-posed in

the sense of Hadamard, in particular, even for attainable data y ∈ R(T ), the dependence

of the best-approximate solution x† on the data y is unstable.

A least-squares solution x of Tx = y may be alternatively characterized as a solution

of the (Gaussian) normal equations

T ∗Tx = T ∗y. (1.3)

In case T ∗T is invertible, a least-squares solution (which then by injectivity of T is

unique and thus the best-approximate solution) is given by x = (T ∗T )−1T ∗y. Otherwise,

one can similarly as above use the Moore-Penrose inverse of T ∗T and define the solution

of (1.3) by x = (T ∗T )†T ∗y. It turns out that this alternatively characterizes the best-

approximate solution x†, in fact one has

T † = (T ∗T )†T ∗.

With this generalized solution concept in mind, the ill-posedness of problem (1.1) is

essentially reduced to the lack of stability of its solution, or equivalently, the unbound-

edness of the Moore-Penrose inverse T †.

1.2.2 Compact Operators

In the sequel, we discuss ill-posedness and instability of a solution of (1.1) for compact

linear operators T in more detail, noting that most of the results below quite naturally

generalize to the non-compact case and can be analyzed by spectral theory, cf. [28] for

details.

As a prototype for linear inverse problems, with applications in geophysics,

Maxwell’s equations, or deconvolution, we consider linear integral equations of the first

kind, e.g.,

(Tx)(s) =

∫

Ω

k(s, t)x(t)dt = y(s), s ∈ Ω,


with kernel k ∈ L2(Ω2) and y ∈ L2(Ω), over a compact domain Ω . It is well-known (see,

e.g., [55]) that under the above assumptions the operator T is compact on L2(Ω), and

thatR(T ) is non-closed if the problem is infinite dimensional. A compact linear operator

T has a singular system σn, un, vnn∈N, where the σn are the positive square-roots of the

eigenvalues λn = σ2n (enumerated in decreasing order) of the selfadjoint, positive semi-

definite operator T ∗T . The corresponding eigenfunctions un form an orthonormal basis

of R(T ∗T ), and the functions vnn∈N, defined by vn = Tun‖Tun‖ are a complete system of

eigenfunctions of TT ∗ and span the space R(TT ∗). Moreover, one has Tun = σnvn and

T ∗vn = σnun. Thus, the action of a compact operator on an element can be written as

a singular value expansion

Tx =∞∑

n=1

σn〈 x, un 〉vn, T ∗y =∞∑

n=1

σn〈 y, vn 〉yn,

where the series converge in the Hilbert space norms of Y and X , respectively. In case T

has a finite dimensional range, and consequently only finitely many singular values σnexist, we call T degenerate. Otherwise, i.e., if R(T ) is infinite dimensional, the singular

values accumulate (only) at 0, i.e.,

limn→∞

σn = 0. (1.4)

For x ∈ N (T )⊥ = R(T ∗) one has

‖x‖2 =∞∑

n=1

〈 x, un 〉2,

and hence y =∞∑n=0

〈 y, vn 〉 ∈ R(T ) if (and only if) the Picard criterion is satisfied (cf.,

e.g., [28, Theorem 2.8]), i.e.,∞∑

n=1

|〈 y, vn 〉|2σ2n

<∞,

Moreover, one easily verifies that the best approximate solution x† satisfies

x† = T †y =∞∑

n=1

〈 y, vn 〉σn

un = (T ∗T )†T ∗y =∞∑

n=1

〈T ∗y, un 〉σ2n

un, (1.5)

which explains why (1.4) turns (1.1) into an ill-posed equation: errors in the n-th Fourier

coefficient 〈 y, vn 〉 of the data are amplified by a factor 1/σn, which might be arbitrarily

large for high frequency errors if dim(R(T )) = ∞. Note that the faster σn decays, the

stronger the error amplification in (1.5) is, which motivates the following quantification

of ill-posedness: a problem with σn ∼ n−α with α > 0 is usually called moderately (or

mildly) ill-posed, while problems where σn ∼ qn with some q < 1 are called severely

(exponentially) ill-posed.


1.2.3 Regularization Operators

In general terms, regularization means the approximation of an ill-posed problem by a

family of neighboring well-posed problems. More formally, a regularization method can

be defined in the following way (cf. [28]):

Definition 1.1 Let T : X → Y be a bounded linear operator between Hilbert spaces Xand Y, α0 ∈ (0,∞]. For every α ∈ (0, α0), let

Rα : Y → X

be a continuous (not necessarily linear) operator. The family Rα is called a (converg-

ing) regularization or a regularization operator, if for all y ∈ D(T †), there exists a

parameter choice rule α = α(δ, yδ), such that

limδ→∞

sup‖Rα(δ,yδ)yδ − T †y‖ : yδ ∈ Y , ‖yδ − y‖ ≤ δ = 0 (1.6)

holds. Here,

α : R+ × Y → (0, α0)

is such that

limδ→∞

supα(δ, yδ) : yδ ∈ Y , ‖yδ − y‖ ≤ δ = 0

Thus, a regularization method always consists of a regularization operator and a

parameter choice rule. Note that, due to the above definition, the operators Rα are

continuous on Y for α > 0, in particular, Rαyδ is a stable approximation of x† even for

yδ 6∈ R(T ).

We want to emphasize that the convergence condition (1.6) in the above definition

is rather strong, i.e., it uses a worst case error concept, namely

sup‖Rα(δ,yδ)yδ − T †y‖ : yδ ∈ Y , ‖yδ − y‖ ≤ δ

as a measure for convergence, and thus the regularized solutions

xδα := Rα(δ,yδ)yδ

converge uniformly with respect to noise in the data yδ towards the best-approximate

solution x† = T †y. For an extension of this concept of a regularization operators to

quite general nonlinear equations in metric spaces and related discussions we refer to

[7].

1.2.4 Continuous Regularization Methods

The concepts of the previous section allow to construct and analyze regularization meth-

ods for linear problems in a very general way by spectral theory. We shortly motivate

a construction of regularization methods in the compact case:


In order to enforce stability in the explicit solution formula (1.5), one has to replace

the unbounded term 1/σ2n by an appropriate filtered (bounded) approximation gα(σ2

n),

where the filter functions gα(λ) is assumed to converge pointwise to 1/λ for λ > 0 as

α → 0. (The term filter function is used here in a slightly different form than in [60],

where it denotes λgα(λ)). In the compact case, a regularized solution is then defined by

xα :=∞∑

n=1

gα(σ2n)〈T ∗y, un 〉un and xδα :=

∞∑

n=1

gα(σ2n)〈T ∗yδ, un 〉un (1.7)

for exact data y and perturbed data yδ, respectively. The following theorem, which

summarizes the main statements of Theorems 4.1-4.3 in [28], clarifies under which

conditions the filter functions gα(λ) in fact define a regularization operator Rα in the

sense of Definition 1.1.

Theorem 1.2 Let for all α > 0, gα : [0, ‖T‖2]→ R satisfy the following assumptions:

gα is piecewise continuous and there is a C > 0 such that

|λgα(λ)| ≤ C, limα→0

gα(λ) = 1/λ,

for all λ ∈ (0, ‖T‖2]. Then, for all y ∈ D(T †),

limα→0

gα(T ∗T )T ∗y = x† (1.8)

holds with x† = T †y. Moreover, if y /∈ D(T †), then limα→0‖gα(T ∗T )T ∗y‖ = +∞.

Let xα, xδα be defined as in (1.7), and for α > 0, let

Gα := sup|gα(λ)| : λ ∈ [0, ‖T‖2],

then

‖Txα − Txδα‖ ≤ Cδ and ‖xα − xδα‖ ≤ δ√CGα. (1.9)

Finally, for µ > 0 and rα(λ) := 1 − λgα(λ), let ωµ : (0, α0) → R+ be such that for

all α ∈ (0, α0) and λ ∈ [0, ‖T‖2],

λµ|rα(λ)| ≤ ωµ(α)

holds. Then, for x† ∈ R((T ∗T )µ)

‖xα − x†‖ = O(ωµ(α)) and ‖Txα − Tx†‖ = O(ωµ+1/2(α)). (1.10)

We shortly discuss the assumptions and conclusions of this theorem: the first asser-

tion (1.8) states pointwise convergence for exact data y = Tx† but without any rates. It

is a typical feature of ill-posed problems that even for exact data the convergence may

be arbitrarily slow in general (cf. [70]). The result (1.9) is a quantitative estimate of


the propagated data error ‖xα−xδα‖ and essentially reflects the stability of the approx-

imations for α > 0 with respect to perturbations in the data. (1.10) finally estimates

the approximation error ‖x† − xα‖ in terms of the modulus of convergence ωµ charac-

terizing the approximation properties of a regularization method. Typically, ωµ can be

expressed in terms of fractional powers of α, e.g.,

λµ|rα(λ)| ≤ cµαµ, 0 ≤ µ ≤ µ0, (1.11)

holds for for many regularization methods for some µ0 > 0. The maximal µ0 for which

(1.11) holds is usually called qualification of the method, see also the detailed examples

of regularization methods in the next chapter.

Figure 1.1 shows the typical behavior of the error ‖x† − xδα‖ in dependence of α:

the approximation error ‖xα − x†‖ tends to zero as α → 0 with a rate that depends

on the approximation quality ωµ of the regularization method and the smoothness µ

of the solution (see below). The propagated data error ‖xδα − xα‖ , on the other hand,

increases with α → 0. In order to balance between the two error contributions, the

regularization parameter α has to be chosen appropriately.

10−5 10−4 10−3 10−2 10−1 1000

0.5

1

1.5

2

2.5

3

3.5approx. errorpro. data errortotal error

Figure 1.1: Approximation error ‖xα − x†‖ and propagated data error ‖xδα − xα‖ vs.

regularization parameter α for δ fixed.

The condition

x† ∈ R((T ∗T )µ) (1.12)

is called a source condition and measures the smoothness of a solution x† with respect

to the operator T . As we will see in the examples in Chapter 5, the abstract condition

(1.12) can in some cases be interpreted as a smoothness (differentiability) condition on

x†. For general x† ∈ R((T ∗T )µ) with fixed µ and general yδ ∈ Y with (1.2), the rate

‖x† − xδα‖ = O(δ2µ

2µ+1 ), (1.13)


is the best possible in terms of powers of δ, and a regularization method yielding (1.13)

is therefore called order optimal. For linear problems, reverse conclusions have been

shown in so-called converse statements (cf., e.g., [28, 67]): if ‖x† − xδα‖ = O(δ2µ

2µ+1 ),

then it follows that x† ∈ ⋃ν<µ

R((T ∗T )ν).

Parameter Choice Strategies

Note, that Theorem 1.2 only states results about the regularization operators Rα. In

view of Definition 1.1, we still have to combine Rα with an appropriate parameter choice

rule α(δ, yδ), in order to obtain a regularization method. The role of such a parameter

choice strategy is to balance the two different error contributions illustrated in Fig-

ure 1.1. It is meaningful to distinguish between the following two types of strategies: a

parameter choice rule in the sense of Definition 1.1 is called

(a) an a-priori rule, if α does not depend on yδ, and thus one can write α = α(δ).

(b) an a-posteriori parameter choice strategy, if α = α(δ, yδ) depends on yδ.

Besides these two classes of parameter choice strategies, so-called error free parameter

choice rules, i.e., rules which do not incorporate the noise level, are frequently used

in practice. However, as a result due to Bakushinskii [5] shows, no rule α = α(yδ)

depending only on yδ can be part of a converging regularization method in the sense of

Definition 1.1 in general.

The simplest case of an a-priori parameter choice rule is

α = cδs,

for some c, s > 0. As we will see below, such a choice yields order optimal convergence

rates if a-priori information on the smoothness of the solution, i.e., the precise value of

µ in the source condition (1.12) is appropriately incorporated in the choice of s;

The probably most widely used a-posteriori parameter choice rule is the discrepancy

principle (cf. [28, 61])

α(δ, yδ) := supα > 0 : ‖Txδα − yδ‖ ≤ τδ, (1.14)

where τ > sup|rα(λ)| : α > 0, λ ∈ [0, ‖T‖2]. It can be shown that the supremum

in (1.14) is actually attained, if α → gα(λ) is continuous from the left for λ > 0. The

discrepancy principle yields order-optimal convergence rates without a-priori knowledge

on the smoothness of x† but only for µ ≤ µ0 − 1/2. The following theorem summarizes

the main convergence properties for regularization methods Rα satisfying the conditions

of Theorem 1.2 with the afore-mentioned a-priori and a-posteriori stopping rules (cf.

Corollary 4.4 and Theorem 4.12 in [28]).


Theorem 1.3 Let gα, rα satisfy the conditions of Theorem 1.2, and let (1.11) hold for

some µ0 > 0. If α ∼ δ2

2µ+1 , then

‖xδα − x†‖ = O(δ2µ

2µ+1 ) (1.15)

for x† ∈ R((T ∗T )µ) with 0 < µ ≤ µ0. If alternatively, α = α(δ, yδ) is defined by (1.14)

and gα is continuous from the left, then (1.15) holds for 0 < µ ≤ µ0 − 1/2.

The notion continuous regularization methods in the heading of this section refers

to the fact that the regularization parameter α is chosen from a continuum. With

slight modifications, the results can be generalized to cover also iterative regularization

methods: there, the filter functions are defined by polynomials gk(λ) of degree k, and

the stopping index k plays the role of the regularization parameter. In fact, most of the

previously cited results can be applied directly when substituting 1/k for α.

1.2.5 Remarks on nonlinear problems

Below we present some convergence results concerning regularization of nonlinear in-

verse problems, which we again investigate in the framework of abstract operator equa-

tions,

F (x) = y, (1.16)

where the (nonlinear) operator F : D(F ) ⊂ X → Y acts between Hilbert spaces X and

Y . Note that in many relevant nonlinear inverse problems, the operator F is only defined

indirectly, e.g., in parameter identification, F might be defined as the parameter-to-

output mapping that maps a parameter to the solution of a partial differential equation.

Thus, the evaluation of F is not straight forward (and usually computationally expen-

sive), and deriving mapping properties of F may require some analytical reasoning. For

illustration, consider the following model problem:

Example 1.4 (Reconstruction of a reaction term) Consider heat conduction in

a three dimensional body Ω with spatially varying reaction term c. The evolution of

the temperature u is then governed by

ut − κ∆u+ cu = f, in Ω× (0, T )

u = g, on ∂Ω× (0, T ),

u(·, 0) = u0, on Ω

(1.17)

where f denotes interior sources and g, u0 are the prescribed temperatures at the bound-

ary and at time zero, respectively. In the stationary case, the temperature distribution

u = u(x) will approximately satisfy

−κ∆u+ cu = f, in Ω

u = g, on ∂Ω,(1.18)


Assuming that interior measurements of the temperature are available, the parameter-

to-output mapping F may be defined by F : c 7→ u(c), where u(c) denotes a solution

of (1.18) with parameter c. For detailed examples of parameter identification in (1.17)

and (1.18) from interior respectively boundary measurements, we refer to Section 5.

Like in the linear case, we mean by ill-posedness of (1.16) that a solution of F (x) = y

does not depend continuously on the data y. The ill-posedness of the problem F (x) = y

may further be quantitatively characterized via its linearization, although this char-

acterization is not always appropriate: it is shown in [29] that a nonlinear ill-posed

problem may have a well-posed linearization and that well-posed nonlinear problems

may have ill-posed linearizations.

Since linear problems may always be seen as a special case of nonlinear problems,

it is clear that in general, stability has to be enforced by some regularization method,

i.e., appropriate algorithms together with suitable parameter choice strategies have to

be used to obtain reasonable approximations in case of perturbed data. Similarly as

in the linear case, one can distinguish between two main error contributions due to

approximation and noise propagation (cf. Figure 1.1), and the task of parameter choice

strategies is again to balance between accuracy (approximation) and stability (noise

amplification).

While regularization of linear inverse problems can be almost completely analyzed

by spectral theory, the situation is more involved for nonlinear problems: first of all

spectral theory is available only for linear problems and thus can be applied at most to

certain linearizations. Therefore, a comprehensive convergence analysis of regularization

for nonlinear problems requires different functional analytic tools, and no general theory,

like Theorem 1.2 for linear problems, is available in the nonlinear case.

We only mention that by Tikhonov’s Lemma the inverse of a continuous bijective

operator F is continuous if D(F ) is compact. Thus, in principle, (1.16) can always be

regularized by restricting F to a compact domain; such reasoning however does not

yield any quantitative stability estimates (convergence rates); in some cases, stability

estimates may be obtained a-priori by a detailed analysis of the problem under inves-

tigation, e.g., via Carleman estimates (cf., e.g., [25, 46, 52]).

Basic assumptions on the operator F for a reasonable convergence theory (cf. [28])

are :

(i) F is continuous and

(ii) F is weakly (sequentially) closed, i.e., for any sequence xn ⊂ D(F ) weak con-

vergence of xn to x in X and weak convergence of F (xn) to some y in Y imply

that x ∈ D(F ) and F (x) = y.

Note that in the linear case, (ii) already follows from (i). In general, neither existence nor

uniqueness of a solution to (1.16) can be guaranteed, and therefore the concept of x∗-

minimum-norm solutions is used like in the linear case (cf. Section 1.2.1). For nonlinear

problems however, the role of x∗ cannot be neglected. As we will see below, the above


(rather weak) conditions already allow to prove convergence for certain regularization

methods, e.g., Tikhonov regularization.

Chapter 2

Important classes of regularization

methods

In this chapter we recall some of the most widely used regularization methods for

linear and nonlinear problems, and present the most important convergence results. In

the linear case, the results follow more or less directly from Theorem 1.2 respectively

Theorem 1.3 above. The analysis of nonlinear problems is more involved and requires

different reasoning.

As outlined in the previous chapter, the essential step in the construction of a

regularization method for linear problems is to approximate the unbounded function

1/λ in (1.5) by a filtered approximation gα(λ). In order to apply Theorem 1.3, it then

remains to show (1.11). We start with the probably most well-known regularization

method, namely Tikhonov regularization and then turn to a discussion of frequently

used iterative methods.

2.1 Tikhonov Regularization

For linear problems, Tikhonov regularization is defined by the filter function

gα(λ) :=1

λ+ α,

and one easily verifies that gα satisfies the assumption of Theorem 1.2 with C = 1,

Gα = 1/α, and that (1.11) holds for µ ≤ 1. Hence, Tikhonov regularization has finite

qualification µ0 = 1, and the best possible convergence rate that can be guaranteed by

Theorem 1.3 for sufficiently smooth x† is ‖xδα− x†‖ = O(δ2/3), if α is chosen according

to the a-priori stopping rule α ∼ δ2

2µ+1 , and O(δ1/2) when stopped according to the

discrepancy principle (1.14). The saturation at µ = 1 (respectively µ = 1/2) can partly

be overcome by considering Tikhonov regularization in Hilbert scales, see Chapter 3

below.

For Tikhonov regularization the regularized solution has the form

xδα = (T ∗T + αI)−1T ∗yδ. (2.1)

13

14 CHAPTER 2. REGULARIZATION METHODS

Alternatively, xδα can be characterized as the unique minimizer of the (Tikhonov) func-

tional

f δα(x) := ‖Tx− yδ‖2 + α‖x‖2.

This variational characterization is particularly important, since it allows to formulate

the Tikhonov method also for nonlinear problems F (x) = y; there, xδα is defined as a

solution of the nonlinear minimization problem

f δα(x) := ‖F (x)− yδ‖2 + α‖x− x∗‖2 → min, x ∈ D(F ), (2.2)

where α > 0 denotes the regularization parameter, yδ ∈ Y is an approximation of the

right-hand side y satisfying (1.2), and x∗ ∈ X is an appropriate a-priori guess. Under

the assumptions (i), (ii) on F , the existence of a (not necessarily unique) minimizer xδαfollows from the weak lower semicontinuity of the functional f δα. One can further show

that for fixed α > 0 the minimizers xδα depend (in a set-valued sense) continuously on

the data yδ and that xδα converge (in a set-valued sense) towards an x∗-minimum-norm

solution of (1.16) if α(δ)→ 0 and δ2/α(δ)→ 0 as δ tends to zero (cf. [28] for details).

One of the fundamental convergence rates results for Tikhonov regularization for

nonlinear problems reads as follows, cf. [28, Theorem 10.4]:

Theorem 2.1 Let D(F ) be convex, x† ∈ D(F ) be an x∗-minimum-norm solution of

(1.16), and yδ ∈ Y such that (1.2) holds. Furthermore, let F be Frechet-differentiable

with

‖F ′(x)− F ′(x†)‖ ≤ γ‖x− x†‖ , for all x ∈ D(F ).

If x† satisfies the source condition

x† − x∗ = F ′(x†)∗w (2.3)

for some w ∈ Y with γ‖w‖ < 1, then for α ∼ δ the rates

‖xδα − x†‖ = O(√δ), and ‖F (xδα)− yδ‖ = O(δ) (2.4)

hold.

The source condition (2.3) is an abstract smoothness condition corresponding to

(1.12) in the linear case. In fact, (2.3) is equivalent to x† − x∗ ∈ R((F ′(x†)∗F ′(x†))1/2),

and the rates (2.4) as well as the parameter choice α ∼ δ coincide with the the linear

case for µ = 1/2. If x† is in the interior of D(F ), then the optimal convergence rates

(1.13) even hold for µ ∈ [1/2, 1] (cf. [28, Theorem 10.7]). For (optimal) a-posteriori

stopping rules for Tikhonov regularization we refer to [28, Section 10.3].

While Tikhonov regularization can be analyzed under rather weak conditions, the

numerical implementation raises some questions and difficulties:

In contrast to the linear case, where the regularized solution xδα can be found by

(2.1), the minimization problem (2.2), which characterizes the regularized solution in the

nonlinear case, usually has to be solved in an iterative manner. Due to the nonlinearity

2.2. ITERATIVE REGULARIZATION METHODS 15

of F the Tikhonov functional f δα is in general non-convex and might have several local

(or even global) minima. Thus, additional conditions (on the nonlinearity of F and on

x∗) may have to be imposed in order to ensure convergence of an iterative algorithm to

a minimizer of the Tikhonov functional. The situation is analyzed in detail for a class

of nonlinear inverse problems called weakly nonlinear in [16], for which the Tikhonov

functional f δα admits a unique global (and no other local) minimum.

A second disadvantage of nonlinear Tikhonov regularization is that determining

α by an a-posteriori parameter choice rule typically requires the solution of several

optimization problems and thus is computationally expensive.

Therefore, a more direct approach to the regularization of linear and nonlinear

problems is to consider iterative methods as regularization methods themselves.

2.2 Iterative Regularization Methods

We start with a short discussion of iterative regularization methods for linear problems

and then turn to nonlinear problems:

For the iterative solution of linear inverse problems Tx = y, we consider the class

of so-called semiiterative methods (cf., e.g., [28, 36]): a basic step of such a method

consists in updating in direction of the residual of the normal equations T ∗(yδ − Txδk)followed by an averaging over all or some previous iterates. A semiiterative method can

be defined recursively in the following way: for given x0 set xδ0 = x0 and let

xδk = µ1,kxδk−1 + . . .+ µk,kx0 + ωkT

∗(yδ − Txδk), k ≥ 1∑k

i=1 µi,k = 1, ωk 6= 0.(2.5)

Algorithms of this form fall into the class of Krylov-subspace methods, i.e., the k-th

iterate xδk − x0 lies in the k-th Krylov subspace Kk(T ∗T, T ∗yδ), where for some self-

adjoint operator A

Kk(A, r) := spanr, Ar, . . . , Ak−1r, k ≥ 1.

Consequently, xδk can be written as

xδk = x0 + gk(T∗T )T ∗yδ, (2.6)

where gk is a polynomial of degree k− 1. It turns out that for an appropriate choice of

the coefficients µi,j and ωj the filter functions gk(λ) have the usual properties required

in Theorem 1.2.

Stability of iterative regularization methods is ensured by stopping the iteration

at the right time, i.e., the stopping index k∗ plays the role of the regularization pa-

rameter. k∗ be determined a-priori or a-posteriori, e.g., by the discrepancy principle

(k∗ = k(δ, yδ))

‖yδ − Txk∗‖ ≤ τδ < ‖yδ − Txk‖ , 0 ≤ k < k∗. (2.7)


Note that in contrast to Tikhonov regularization, the discrepancy principle requires no

additional computational effort, and may thus be considered as the natural stopping

criterion for iterative regularization methods.

The following general convergence result for linear semiiterative regularization meth-

ods can be concluded with slight modifications from Theorem 1.2 (cf. [28, Theorem

6.11]):

Theorem 2.2 Let y ∈ R(T ), and let the residual polynomials rk = (1−λgk(λ)) satisfy

ωµ(k) ≤ cµk−σµ, for 0 ≤ µ ≤ µ0 (2.8)

for some µ0 > 0 and σ ∈ 1, 2. Then the semiiterative method (2.5) with residual

polynomials rkk∈N is a regularization method of optimal order for T †y ∈ R((T ∗T )µ)

with 0 < µ ≤ µ0 − 1/2 provided the iteration is stopped with k∗ = k∗(δ, yδ) according

to the discrepancy principle (1.14) with fixed τ > supk∈N ‖rk‖C[0,1]. In this case we have

k∗ = O(δ−2

σ(2µ+1) ) and ‖xδk − x†‖ = O(δ2µ

2µ+1 ). The same rate holds for 0 < µ ≤ µ0, if

the iteration is stopped according to the a priori rule k∗ ∼ δ−2

σ(2µ+1) .

We only mention that even o(·) can be derived for ‖xδk−x†‖ , see [28] for details and

proofs.

As a first instance of a semiiterative regularization method of the form (2.5) we

obtain Landweber iteration by choosing µi,j = 0 and ωj = 1.

2.2.1 Landweber Iteration

The recursive form of Landweber iteration for linear problems reads

xδk+1 = xδk + T ∗(yδ − Txδk). (2.9)

Consequently, the iterates have the closed form representation (2.6) with

gk(λ) =k−1∑

j=0

(1− λ)j and rk(λ) = (1− λ)k.

One easily verifies that rk(λ) satisfies the conditions of Theorem 2.2 with σ = 1 and

µ0 =∞. Thus, in contrast to Tikhonov regularization, Landweber iteration exhibits no

saturation.

An alternative interpretation of Landweber iteration is that as a gradient method for

minimizing the least-squares functional ‖Tx − yδ‖2. This actually allows to formulate

a corresponding nonlinear version (see below).

Note, that for Landweber iteration, the number of iterations needed to obtain op-

timal convergence is k∗ = O(δ−2

2µ+1 ). For linear problems, the same convergence rates

can be achieved with far less iterations when using faster semiiterative methods. Of

particular practical importance are iterations, whose residual polynomials rk form an


orthogonal sequence with respect to some positive weight function. In this case, the

residual polynomials satisfy a three-term-recurrence (see, e.g., [28]), which also carries

over to the iterates, i.e.,

xδk = xδk−1 + µk(xδk−1 − xδk−2) + ωkT

∗(yδ − Txδk−1), k ≥ 1, (2.10)

with xδ−1 = xδ0 = x0. A specific choice of such orthogonal polynomials yields the

ν−methods by Brakhage [15].

2.2.2 The ν-Methods

For xδ0 = xδ−1 = x0, let the iterates xδk be defined by (2.10) with µ1 = 0, ω1 = 4ν+24ν+1

, and

for k > 1µk = (k−1)(2k−3)(2k+2ν−1)

(k+2ν−1)(2k+4ν−1)(2k+2ν−3),

ωk = 4 (2k+2ν−1)(k+ν−1)(k+2ν−1)(2k+2ν−1)

.

The corresponding polynomials rk(λ) satisfy the conditions of Theorem 2.2 with σ = 2

and µ0 = ν, yielding optimal rates of convergence with the stopping indices bounded

by

k∗ = O(δ1

2µ+1 ), (2.11)

which is only the square root of iterations as compared to Landweber iteration. In fact,

O(k−2µ) is the best possible estimate in terms of powers of k in (2.8) for semiiterative

methods of the form (2.5), and the bound (2.11) on the number of iterations cannot

be further reduced, in general. Hence, the ν-methods are said to have optimal speed of

convergence (cf., [36]).

Taking into account special properties of the operator T and letting the coefficients

in (2.5) depend on the data y, a further reduction of the stopping index is possible,

e.g., for the method of conjugate gradients, which is known as the probably most pow-

erful iterative method for the solution of well-posed, symmetric, positive semidefinite

problems.

2.2.3 The Method of Conjugate Gradients Applied to the Nor-

mal Equations (cgne)

In principle, cgne (for an algorithm, see [28, p. 177]) falls into the class of semiiterative

regularization methods. In contrast to the methods discussed previously, the iteration

polynomials gk(yδ;λ) of cgne depend on the data yδ, which makes cgne a nonlinear

method. As a matter of fact, no a-priori stopping rule k∗ = k(δ) renders cgne a regular-

ization method, cf. [27], but the iteration can be made an order optimal regularization

method by stopping according to the discrepancy principle (2.7), cf. [28, Theorem 7.12]:


Theorem 2.3 Let y ∈ R(T ), and let cgne be stopped according to the discrepancy

principle (2.7) with k∗ = k(δ, yδ), then cgne is an order optimal regularization method

for all µ > 0., i.e., if x† ∈ R((T ∗T )µ), then

‖x† − xδk∗‖ ≤ O(δ2µ

2µ+1 ).

As for well-posed problems, cgne reduces the residual at least as fast as any other

semiiterative method of the form (2.5). For certain classes of operators T , the bound

on the stopping index defined by the discrepancy principle can be further reduced, cf.

[28, Theorem 7.14]:

Theorem 2.4 Let T be a compact operator, T †y = (T ∗T )µw with ‖w‖ ≤ ρ and some

µ, ρ > 0. If the singular values σn of T decay like O(n−α) for some α > 0, then

k(δ, yδ) = O(δ−

1(2µ+1)(α+1)

).

If the singular values decay like O(qn) with some q < 1, then

k(δ, yδ) = O(1 + | log δ|

).

Remark 2.5 For finite dimensional problems, the residuals of cgne can be shown to

be reduced by a factor

q =

√κ− 1√κ+ 1

in each iteration step, where κ = cond(T ∗T ). Hence, the number kε of iterations needed

to reduce the residual by a factor ε is approximately

kε ∼ ln

√κ+ 1

2· | ln ε|,

i.e., the number of iterations increases with the condition number κ. Note, that this

is no contradiction to the previous theorem, which states decreasing iteration numbers

with increasing ill-posedness. In fact, one may argue that only singular values σn ≥ δ

play a role when stopping according to the discrepancy principle; and it is well-known

that cgne converges much faster if T has only few different (relevant) singular values.

Actually, this is also the reason for instability of cgne when stopping according to an

a-priori rule.

We now turn to iterative regularization of nonlinear problems:


2.2.4 Landweber Iteration for Nonlinear Problems

As already mentioned above, the characterization of Landweber iteration as a gradient

method for minimizing the least-squares functional allows to formulate the method also

for nonlinear problems, i.e., the iteration (2.9) is replaced by

xδk+1 = xδk + F ′(xδk)∗(yδ − F (xδk)

). (2.12)

Similarly, the discrepancy principle can be adapted to the nonlinear case, e.g., k∗ is

determined by

‖F (xδk∗)− yδ‖ ≤ τδ < ‖F (xδk)− yδ‖ , k ≤ k∗ (2.13)

for some τ > 2.

A quantitative convergence analysis of Landweber iteration was carried out in [39]

under the following conditions:

(i) F is Frechet-differentiable in a ball Bρ(x0) and satisfies ‖F ′(x)‖ ≤ 1.

(ii) F ′(x) = R(x)F ′(x†), for all x, x ∈ Bρ(x0) and ‖R(x)− I‖ ≤ C ‖x− x†‖ .

(iii) F has a solution x† ∈ Bρ(x0) satisfying

x† − x0 =(F ′(x†)∗F ′(x†)

)µw,

for some µ > 0 and ‖w‖ sufficiently small.

Note that under condition (ii), the range of the adjoint of the Frechet-derivative

is invariant in a neighborhood of x0, i.e., R(F ′(x)∗) = R(F ′(x)∗) for x, x ∈ Bρ(x0).

We refer to [20] for a discussion of the importance of invariance conditions for the

convergence results for nonlinear problems.

Under the above conditions, the following convergence rates result is derived in [39]:

Theorem 2.6 Let the assumptions (i)-(iii) above hold for some ρ sufficiently small

and µ ≤ 1/2. If the iteration (2.12) is stopped according to the discrepancy principle

(2.13) with τ sufficiently large, then

k∗ = O(δ2

2µ+1 ) and ‖xδk − x†‖ = O(δ2µ

2µ+1 ). (2.14)

Note that under the given nonlinearity condition (ii), Landweber iterations shows

saturation at µ = 1/2, which was not the case for linear problems. A comparison with

the linear case shows that for µ ≤ 1/2, the rates (2.14) are order optimal under the

given source condition. In [39], convergence without rates is proven under the weaker

nonlinearity assumption

(ii’) ‖F (x)−F(x)− F ′(x)(x− x)‖ ≤ η‖F (x)− F (x)‖ for some η < 1/2

and without a source condition on x†.


2.2.5 Regularized Newton-type Iterations

A different, but quite natural step in the construction of iterative regularization methods

for nonlinear problems is to consider the Newton method for the solution of F (x) = y.

However, it turns out that already a single Newton step

F ′(xδk)(xδk+1 − xδk) = (yδ − F (xδk)) (2.15)

is usually ill-posed if the original problem F (x) = y was. Hence, (2.15) has to be solved

by some regularization method again. Applying Tikhonov regularization to (2.15) yields

the Levenberg-Marquardt method [37]. Stability can be further increased by regularizing

around some fixed element x∗, i.e., by solving

F ′(xδk)(xδk+1 − x∗) = (yδ − F (xδk) + F ′(xδk)(x

δk − x∗)) (2.16)

with an appropriate regularization method. With Tikhonov regularization, one obtains

the iteratively regularized Gauß-Newton method [8, 13] in this case. Alternatively, (2.16)

can be solved by more general regularization methods, yielding regularized Newton-type

iterations of the form

xδn+1 = x∗ + gαn(F ′(xδn)∗F ′(xδn))F ′(xδn)∗[y − F (xδn)− F ′(xδn)(x∗ − xδn)].

Especially for large scale problems, (2.16) might have to be solved by some iterative

algorithm [23, 47, 48, 38]. We will consider such methods and their acceleration by

preconditioning in Hilbert scales in detail in Section 4.3.3. Without citing a concrete

result, we only mention that under the above assumptions (i) – (iii), the same conver-

gence rates as for the nonlinear Landweber iteration hold for the iteratively regularized

Gauß-Newton method and the (accelerated) Newton-Landweber iterations ([23, 48]).

For a comprehensive discussion of various regularized Newton-type methods and their

convergence theory, we refer to [49] and the references cited therein.

We end our survey on regularization methods here, noting that this presentation

is of course far from being complete. For further discussion, e.g., on regularization by

discretization, truncated singular value decomposition, or on asymptotic regularization

we refer to the literature.

Chapter 3

Regularization in Hilbert Scales

This chapter is concerned with a special aspect of regularization theory, namely regu-

larization in Hilbert scales. After introducing the concept of Hilbert scales, we derive

some elementary properties of Hilbert scales which will be needed for the analysis of

the next sections. Then, we quote the most important classical results on regulariza-

tion in Hilbert scales in order to show that saturation, which limits the approximation

properties of regularization methods, in particular for nonlinear problems, can partly

be overcome by considering the inverse problems in Hilbert scales. The results will also

serve as motivation and a starting point for the formulation of our preconditioning

strategy for iterative regularization methods.

3.1 Introduction and General Definitions

In the following, we summarize the main results on Hilbert scales needed for the sub-

sequent analysis. For details and proofs we refer to [28, Section 8.4] and [54]:

Further on, let L be a densely defined, unbounded, selfadjoint, strictly positive

operator in X , i.e., L is a closed operator in X satisfying

D(L) = D(L∗) is dense in X , (3.1)

〈Lx, y 〉 = 〈 x, Ly 〉, for all x, y ∈ D(L), (3.2)

and there exists a γ > 0 such that

‖Lx‖ ≥ γ‖x‖ for all x ∈ D(L). (3.3)

As can be shown by spectral theory, the set

M :=∞⋂

k=0

D(Lk) (3.4)

is dense in X . Futhermore, Ls is defined on M for all s ∈ R and

M =⋂

s∈RD(Ls).

21

22 CHAPTER 3. REGULARIZATION IN HILBERT SCALES

These properties allow to make the following definition:

Definition 3.1 For x, y ∈M and s ∈ R, let

〈 x, y 〉s := 〈Lsx, Lsy 〉, (3.5)

‖x‖s := ‖Lsx‖ . (3.6)

Then the Hilbert spaces Xs are defined as the completion of M with respect to the norm

‖ · ‖s, and Xss∈R is called the Hilbert scale induced by L.

This construction implies the following properties, cf. [28, Proposition 8.19]:

Proposition 3.2 Let L be as above and let Xss∈R denote the Hilbert scale induced

by L. Then the following assertions hold:

(i) For −∞ < s < t <∞, the space Xt is densely and continuously embedded in Xs.

(ii) For s, t ∈ R, the operator Lt−s, defined on M, has a unique extension to Xt,which is an isomorphism from Xt onto Xs. If t > s, this extension, again denoted

by Lt−s, is selfadjoint and strictly positive as restriction to Xt in Xs. Moreover,

Lt−s = LtL−s holds for the appropriate extensions, in particular, (Ls)−1 = L−s.

(iii) If s ≥ 0, then Xs = D(Ls) and X−s = (Xs)′, i.e., X−s is the dual space of Xs.

(iv) For −∞ < q < r < s <∞ and x ∈ Xs, the interpolation inequality holds, i.e.,

‖x‖r ≤ ‖x‖s−rs−qq ‖x‖

r−qs−qs .

As in (ii) we will not distinguish between operators and their extensions below, if the

meaning is clear from the context.

Before we continue with our discussion of Hilbert scales and explain their application

to regularization methods, we give two short examples for illustration:

Example 3.3 Let T : X → Y be a compact, injective, linear operator between Hilbert

spaces X and Y . Then L := (T ∗T )−1 induces the Hilbert scale Xss∈R with

Xs := D((T ∗T )−s) = R((T ∗T )s) .

Note that in this case the spaces Xµ, µ ≥ 0, are the usual source sets for regularization

in Hilbert spaces (cf. Theorem 1.2).

Example 3.4 Let Ω ⊂ Rn, n = 2, 3, be a bounded domain with sufficiently smooth

boundary ∂Ω. Then

−∆ : H2(Ω) ∩H10 (Ω) ⊂ L2(Ω)→ L2(Ω)

3.1. INTRODUCTION AND GENERAL DEFINITIONS 23

satisfies the conditions of Definition 3.1, i.e., L := −∆ induces a Hilbert scale Xss∈R.

Furthermore, Xs = H2s0 (Ω) for s ∈ [0, 3/4) which means that the Sobolev spaces Hs

0

are part of a Hilbert scale. Note that in general, Hs0s∈R is not a Hilbert scale, in

particular, with the above definition,

X1 = H2(Ω) ∩H10 (Ω) 6= H2

0 (Ω) .

See also [65] for a related discussion.

The following results are at the core of the subsequent analysis of regularization in

Hilbert scales. For details and proofs see [53] or [28, Section 8.4]:

Proposition 3.5 (Inequality of Heinz) Let L and A be two densely defined, un-

bounded, selfadjoint, strictly positive operators in X with D(A) ⊂ D(L) and

‖Lx‖ ≤ ‖Ax‖ , for all x ∈ D(A).

Then, for all ν ∈ [0, 1], we have that D(Aν) ⊂ D(Lν) and

‖Lνx‖ ≤ ‖Aνx‖ , for all x ∈ D(Aν).

Proposition 3.6 Let T : X → Y be a linear operator and L be as in Proposition 3.2.

Assume that for some a > 0 and m > 0

‖Tx‖ ≤ m‖x‖−a, for all x ∈ X (3.7)

holds and that the extension of T to X−a (again denoted by T) is injective. Then the

following assertions hold: D((B∗B)−ν2 ) = R((B∗B)

ν2 ) ⊂ Xν(a+s) for all ν ∈ [0, 1] with

B := TL−s for some s ≥ −a, and

‖(B∗B)ν2x‖ ≤ mν ‖x‖−ν(a+s) for all x ∈ X , (3.8)

‖(B∗B)−ν2x‖ ≥ m−ν ‖x‖ν(a+s) for all x ∈ D((B∗B)−

ν2 ) . (3.9)

Note also that condition (3.7) is equivalent to

R(T ∗) ⊂ Xa and ‖T ∗w‖a ≤ m‖w‖ for all w ∈ Y . (3.10)

Now assume that in addition to (3.7)

m‖x‖−a ≤ ‖Tx‖ , for all x ∈ X (3.11)

holds for some m > 0, a > 0. Then it follows for all ν ∈ [0, 1] that

Xν(a+s) ⊂ R((B∗B)ν2 ) = D((B∗B)−

ν2 )


and

‖(B∗B)ν2x‖ ≥ mν ‖x‖−ν(a+s) for all x ∈ X ,

‖(B∗B)−ν2x‖ ≤ m−ν ‖x‖ν(a+s) for all x ∈ Xν(a+s) . (3.12)

Moreover, (3.11) is equivalent to

Xa ⊂ R(T ∗) and ‖T ∗w‖ a ≥ m‖w‖for all w ∈ N (T ∗)⊥ with T ∗w ∈ Xa.

(3.13)

Proof. Assume that (3.7) holds: then (3.8) and (3.9) follow from Propositions 3.2,

3.5, interpolation and duality arguments.

Additionally, the operator TLa (respectively its extension) is a continuous mapping

from X to Y . Hence, for y ∈ Y ∩ D(LaT ∗),

‖LaT ∗y‖ = sup‖x‖=1

〈LaT ∗y, x 〉 = sup‖x‖=1

〈 y, TLax 〉 ≤ m‖y‖ ,

which proves (3.10). Next, observe that

‖(B∗B)12x‖ = ‖Bx‖ = ‖TL−sx‖ , ∀x ∈ X−s. (3.14)

By (3.7), B = TL−s can be extended as a continuous operator to X−(a+s) and by (3.14)

D((B∗B)12 ) ⊂ D(TL−s) = X−(a+s).

The reverse implication follows in the same way.

The results under condition (3.11) follow similarly. ¤

Remark 3.7 By (3.10), it follows that R((T ∗T )12 ) = R(T ∗) ⊂ Xa, and hence one can

show with similar reasoning as above that

R((T ∗T )µ) ⊂ X2aµ, for 0 ≤ µ ≤ 1

2.

Note that in general, x† ∈ X2aµ will not imply x† ∈ R((T ∗T )µ).

If, on the other hand, ‖Tx‖ ≥ m‖x‖−a for some m > 0, then the converse inclusion

R((T ∗T )µ) ⊃ X2aµ holds.

Finally, if T ∼ L−a, i.e.,

m‖x‖−a ≤ ‖Tx‖ ≤ m‖x‖−a for x ∈ X (3.15)

holds for some a > 0 and 0 < m < m <∞. which is the usual condition in Hilbert scale

regularization, then the spaces X2aµ and R((T ∗T )µ) coincide for |µ| ≤ 1/2. Condition

(3.15) is called norm equivalence and is important for preconditioning also in the well-

posed situation.

The following result (cf. [28, Corollary 8.22]) is an immediate consequence of Propo-

sition 3.6:

3.1. INTRODUCTION AND GENERAL DEFINITIONS 25

Corollary 3.8 Let Xss∈R be the Hilbert scale introduced by L, and let T : X → Y be

a bounded linear operator satisfying (3.15) for some a > 0 and 0 < m < m <∞. Then

for B := TL−s, s ≥ −a, and for |ν| ≤ 1,

c(ν)‖x‖−ν(a+s) ≤ ‖(B∗B)ν2x‖ ≤ c(ν)‖x‖−ν(a+s)

holds on D((B∗B)ν2 ) with c(ν) = min(mν ,mν) and c(ν) = max(mν ,mν). Moreover,

R((B∗B)ν2 ) = Xν(a+s), where (B∗B)

ν2 has to be replaced by its extension to X if ν < 0.

The number a in (3.15) may be interpreted as degree of ill-posedness. For illustra-

tion we check (3.15) and thus the applicability of the previous result for the following

example:

Example 3.9 Let X = Y = L2[0, 1], and consider T defined by

(Tx)(s) =

∫ s

0

k(t)x(t)dt,

with some 0 < k ≤ k(t) ∈ H1[0, 1]. Then R(T ) = x ∈ H1[0, 1] : x(0) = 0.For the choice of a Hilbert scale, observe the following: if k would be constant, e.g.

k = 1, then R(T ∗T ) = x ∈ H2[0, 1] : x(1) = 0, x′(0) = 0. Hence it is reasonable, to

choose X1 = x ∈ H2[0, 1] : x(1) = 0, x′(0) = 0 with

Lx = −x′′,

which for constant k yields T ∗T ∼ L−2. For non constant k as above it still follows that

X1/2 = x ∈ H1[0, 1] : x(1) = 0 = R(T ∗). In order to show (3.7) and (3.11), we use

that

‖x‖1/2 = ‖L1/2x‖ = ‖x′‖ ,note that L1/2x 6= x′. Since (T ∗y)(t) = k(t)

∫ 1

ty(s)ds, it follows that ‖(T ∗y)′‖ ≤ c ‖y‖ .

Thus T ∗ is a continuous mapping from L2[0, 1] onto X1/2 and has a bounded inverse

by the Open-Mapping-Theorem. Together with Proposition 3.6, this yields (3.15) with

a = 1/2, in particular, (T ∗T )ν2 ∼ L−νa for |ν| ≤ 1/2.

Note that by Proposition 3.6 and Remark 3.7 the Hilbert scales Xs and Xsinduced by L and L := (T ∗T )−1/2a coincide for s ∈ [−a, a]. However, since

(T ∗Tx)(s) = k(s)

∫ 1

s

∫ t

0

k(τ) x(τ)dτdt,

it follows that R(T ∗T ) 6⊂ X1 if k 6∈ H2[0, 1] or k′(0) 6= 0, and thus T ∗T 6∼ L−2a. This

illustrates that two Hilbert scales may coincide (with equivalent norms) for a range of

values of s, while they differ outside.

As in the above example, boundary conditions play an essential role for Hilbert scale

considerations and for regualrization in general, see [65] for a related discussion.


3.2 Linear problems in Hilbert Scales

Originally, regularization in Hilbert scales was introduced by Natterer [63] for the spe-

cial case of Tikhonov regularization combined with an a-priori stopping rule. It was

shown independently in [28] and [71] that the results naturally generalize to quite gen-

eral regularization methods for linear problems. Later, in [66] and [68], the theory was

extended to Tikhonov regularization and Landweber iteration for nonlinear problems

combined with an a-posteriori parameter choice strategy. In any of these works, Hilbert

scales were used to extend the range of optimal convergence of the methods under

consideration: e.g., as mentioned in Section 2.1, Tikhonov regularization has a finite

qualification µ0 = 1 and thus the best possible rate of convergece is O(δ2/3) provided

x† ∈ R((T ∗T )µ) for some µ ≥ 1 and the regualrization parameter α is chosen appro-

priately. However, as we will see below, a better rate (in X = X0) can be obtained

for µ > 1 if T is considered as an operator from Xs to Y and if the regularization is

performed in the (stronger) norm of Xs (s > 0). A second advantage of regularization in

Hilbert scales is that the abstract source conditions x† ∈ R((T ∗T )µ) can be interpreted

in terms of the Hilbert scale Xs, e.g., conditions like x† ∈ Xu are used, which usually

amount to simple differentiability conditions (including boundary conditions), see also

the examples of Hilbert scales in Chapter 5.

Next, we recall the main convergence (rates) results for Tikhonov regularization in

Hilbert scales:

For the rest of this Chapter, let Xss∈R be the Hilbert scale induced by L, and let

T : X → Y be a bounded linear operator satisfying (3.15), i.e.,

m‖x‖−a ≤ ‖Tx‖ ≤ m‖x‖−a

on X for some a > 0 and 0 < m < m <∞. From (3.15) it follows that

B := TL−s

is well-defined for s ≥ −a. For a filter function gα : [0, ‖B‖2] → R satisfying the

conditions of Theorem 1.2, we define the regularized solution xδα by

xδα := L−sgα(B∗B)B∗yδ. (3.16)

This actually corresponds to the standard definition of a regularized solution (1.7) if T

is considered as an operator on Xs. To see this, consider Tikhonov regularization with

the norm of Xs: according to (2.1) the regularized solution xδα is defined as the solution

of the regularized normal equations

T ∗yδ = (T ∗T + αL2s)xδα = Ls(B∗B + αI)Lsxδα

or equivalently

xδα = L−s(B∗B + αI)−1L−sT ∗yδ = L−s(B∗B + αI)−1B∗yδ.

3.3. NONLINEAR PROBLEMS 27

The standard results on regularization (cf. Theorem 1.3) yield convergence (rates) of

xδα to x† with respect to the norm in Xs. However, in Hilbert scale regularization,

one is interested in the convergence in the original space X (at a better rate). The

following result summarizes the convergence behavior with respect to the norm in Xfor regularization of linear problems in Hilbert scales (for the details and proofs we refer

to [28, Section 8.5]).

Theorem 3.10 Let xδα be defined by (3.16), and let the assumptions on gα of Theo-

rem 1.2 hold with rα(λ) = 1 − λgα(λ) satisfying (1.11) for some µ0 ≥ 1. Then for

x† ∈ Xu with 0 < u ≤ a+ 2s, and for the parameter choice

α ∼ δ−2(a+s)a+u

the following estimate holds:

‖xδα − x†‖ = O(δua+u ).

If rα is continuous from the left for all λ ∈ [0, ‖B‖2] as a function of α, and if α is

chosen according to the discrepancy principle (1.14) with τ > c0 (= cµ in (1.11) with

µ = 0), then

‖xδα − x†‖ =

o(δ

ua+u ) if u < a+ 2s or u = a+ 2s, µ0 > 1 ,

O(δua+u ) if u = a+ 2s and µ0 = 1

holds for u ≤ 2(a+ s)µ0 − a.

Remark 3.11 Assume, e.g., that a = 1, s = a and u = 3. Then a convergence rate

of O(δ3/4) can be expected for Tikhonov regularization. Note that due to saturation,

O(δ2/3) is the best possible rate for standard Tikhonov regularization (cf. Section 2.1).

If L commutes with T ∗T , e.g., for L = (T ∗T )−a2 , or if the source condition x† ∈ Xu is

replaced by Lsx† = (B∗B)u−s

2(a+s)w (see Theorem 4.10), the restriction u ≤ a+ 2s can be

replaced by u− s ≤ 2(a+ s)µ0, where µ0 denotes the qualification of the regularization

method under consideration, i.e., the largest µ such that (1.11) holds. The above results

naturally apply also for iterative regularization methods: there, α has to be replaced

by 1/k respectively 1/k2 for semiiterative regularization methods with optimal speed

of convergence, cf. Section 2.2 and [26, 36].

3.3 Iterative regularization of nonlinear problems

in Hilbert scales

In [68], the convergence rates of Theorem 3.10 have been generalized to Landweber

iteration for nonlinear problems F (x) = y. The corresponding iteration has the form

xδk+1 = xδk + L−2sF ′(x†)∗(yδ − F (xδk)), k ≥ 0 . (3.17)


As in the linear case, (3.17) corresponds to the usual Landweber iteration if the operator

F is considered as an operator on Xs (while as above ∗ denotes the adjoint with respect

to the spaces X and Y).

For later reference and a comparison with our results below, we recall the basic

assumptions and convergence (rates) results of [68]:

Assumption 3.12

(i) F : D(F ) ⊂ X → Y is continuous and Frechet-differentiable in X .

(ii) (1.16) has a solution x†; moreover, Bρ(x†) := x ∈ X : ‖x − x†‖0 ≤ ρ ⊂ D(F )

for some ρ > 0.

(iii) ‖F ′(x†)x‖Y ∼ ‖x‖−a for all x ∈ X and some a > 0.

(iv) ‖F ′(x†)∗ − F ′(x)∗‖Y,Xb ≤ c‖x† − x‖β0 for all x ∈ Bρ(x†) and some b ∈ [0, a],

β ∈ (0, 1], and c > 0.

(v) B := F ′(x†)L−s is such that ‖B‖X ,Y ≤ 1.

(vi) x† − x0 ∈ Xu for some a−bβ< u ≤ b+ 2s.

For a discussion of the conditions, see [26, 68] or Remark 4.23 below. Under Assump-

tion 3.12, the following convergence rates for Landweber iteration in Hilbert scales hold:

Theorem 3.13 Let Assumption 3.12 and ‖y − yδ‖ ≤ δ hold. Moreover, let k∗ be the

termination index determined by the discrepancy principle (1.14) with τ sufficiently

large, and let ‖x† − x0‖u be sufficiently small. Then

k∗ = O(δ−

2(a+s)a+u

)(3.18)

and

‖x† − xδk‖0 = O(δ

ua+u

). (3.19)

The convergence rate (3.19) coincides with the one of Theorem 3.10. Saturation at

u = a+ 2s corresponds to µ = 1/2 for standard Landweber iteration (cf. Theorem 2.6,

and the discussion in [26, 39, 68]).

A major drawback of iterative regularization in Hilbert scales with s ≥ 0 is that the

number of iterations needed for optimal convergence, see (3.18), increases with s, i.e.,

for the reconstruction of non-smooth solutions it is numerically advantageous to choose

s as small as possible. As we will see in the next section, it is even possible to choose

s < 0, in which case the operator L−2s in (3.17) acts as a preconditioner. Additionally,

we will show below that the restrictive condition (3.15) respectively (iii) can be relaxed

substantially.

Chapter 4

Preconditioning Iterative

Regularization in Hilbert Scales

The main motivation for regularization in Hilbert scales was originally to overcome

saturation effects of the standard methods to a certain extent. One of the drawbacks of

this approach is that a precise knowledge of the ill-posedness of the involved operators

is required, e.g., for linear problems Tx = y the condition (3.15), which is

m‖x‖−a ≤ ‖Tx‖ ≤ m‖x‖−a.

As we will show below, this assumption can be relaxed substantially in case s ≤ 0.

Another disadvantage of iterative regularization methods in Hilbert scales with s > 0

is that the number of iterations needed to guarantee optimal convergence rates increases

with s, e.g., for Landweber iteration in Hilbert scales one has (cf. Theorem 3.13)

k∗ = O(δ2(a+s)a+u ).

Thus, from a numerical point of view, it is favorable to choose s as small as possible if

one does not expect the solution to be very smooth and thus saturation effects play a

minor role.

For motivation, consider Landweber iteration applied to Tx = y. The corresponding

Hilbert scale iteration (corresponding to standard Landweber iteration with T consid-

ered as operator on Xs) takes the form

xδk+1 = xδk + L−2sT ∗(yδ − Txδk), k ≥ 0, xδ0 = x0. (4.1)

Note, that L−2sT ∗ is the adjoint of T with respect to the spaces Xs and Y . One has

to assume that s ≥ −a/2 in (4.1), otherwise the iteration is not even well-defined as

iteration on X for general yδ ∈ Y .

In order to keep the number of iterations, and thus the overall numerical effort as

small as possible, we are especially interested in the case s < 0 below, in which case the

action of L−2s in (4.1) can be interpreted as preconditioning. Note, that the standard

29

30 CHAPTER 4. PRECONDITIONING IN HILBERT SCALES

theory for Landweber iteration (cf. Theorem 2.2) yields convergence and convergence

rates only in the space Xs, i.e., with respect to the weaker norm ‖ · ‖s (s < 0). Thus,

the main aim of the convergence analysis below is to show that the preconditioned

iterations still provide (optimal) convergence rates in the usual space X . In contrast to

the standard case of regularization in Hilbert scales (s ≥ 0), where convergence in Xsalready implies convergence in X , the situation is more involved in case s < 0. Here, it

has to be proven first that the iterations stay at least bounded in X .

In the sequel we state and discuss the main assumptions for regularization in Hilbert

scales with s < 0 in detail. In Section 4.2, we investigate regularization of linear inverse

problems and derive the main convergence rates results for a class of semiiterative

regularization methods and the method of conjugate gradients applied to the normal

equations (cgne). Nonlinear inverse problems will be investigated in Section 4.3.

4.1 Main Assumptions and Preliminary Results

Our analysis of regularization in Hilbert scales for the case s ≤ 0 is mainly based on

the following condition (cf. Proposition 3.6)

Assumption 4.1 There exists an a > 0 and m > 0 such that (3.7) holds, i.e.,

‖Tx‖ ≤ m‖x‖−a for all x ∈ X .

Moreover, the extension of T to X−a (again denoted by T ) is injective.

Usually, for the analysis of regularization methods in Hilbert scales, the stronger

condition (3.15) is used, i.e.,

m‖x‖−a ≤ ‖Tx‖ ≤ m‖x‖−a(cf., e.g., [63, 66] and Theorem 3.10). However, (3.7) might still be satisfied, even if

(3.15) does not hold. It might also be possible that an estimate from below can be

given in a weaker norm, e.g., there exist a ≥ a and m > 0 such that (3.11) holds, i.e.,

‖Tx‖ ≥ m‖x‖−a for all x ∈ X ,

see Section 5.1.1 for a detailed example.

Important implications of Assumption 4.1 follow readily from the first part of Propo-

sition 3.6 recalling that regularization in Hilbert scales means to consider the operator

T as operator on Xs. The standard source condition (for regularization in Xs) with

µ = 1 then reads x† = L−2sT ∗Tws for some ws ∈ Xs, or equivalently with the notation

B = TL−s

x† = L−s(B∗B)Lsws = L−s(B∗B)w.

Thus, the space R(L−s(B∗B)) is the natural source set (with µ = 1) for regularization

in Hilbert scales. The following definition of a shifted Hilbert scale generalizes this sets

to arbitrary µ, and hence will play an important role in our convergence analysis below:

4.1. MAIN ASSUMPTIONS AND PRELIMINARY RESULTS 31

Definition 4.2 Let a and B be as in Assumption 4.1. For s ≥ −a/2, we define the

shifted Hilbert scale X sr r∈R by

X sr := D((B∗B)

s−r2(a+s)Ls) (4.2)

equipped with the norm

|||x|||r := ‖(B∗B)s−r

2(a+s)Lsx‖X . (4.3)

Remark 4.3 Note that X sr r∈R is generally no Hilbert scale over X , in particular,

X s−r is not the dual space of X s

r in general. However, it turns out that X sr+sr∈R is a

Hilbert scale over Xs (see Proposition 4.4(v) below). This means that the spaces X su are

the natural candidates for the formulation of source conditions, when the operator T is

considered as operator on Xs. In fact, one has

X su = (T ]T )

u−s2(a+s)ws, ws ∈ Xs,

where T ] = L−2sT ∗ denotes the adjoint of T with respect to Xs and Y . Also note, that

the spaces X sr coincide with the usual source sets R((T ∗T )µ) in case s = 0 and with

r = 2aµ. The standard convergence results may be applied and imply (optimal) rates

of convergence, i.e.,

‖xδk − x†‖s = O(δu−su+a )

provided that x† ∈ X su . The main statement of our analysis below will be that the

convergence rates can be shifted to the usual space X , i.e., we will show that the usual

(optimal) rates

‖xδk − x†‖0 = O(δuu+a )

also hold for regularization in Hilbert scales with s ≤ 0, even under our relaxed as-

sumptions.

For our analysis below, we will frequently use the next proposition, which summa-

rizes the basic properties of the shifted Hilbert scale X sr r∈R:

Proposition 4.4 Let Assumption 4.1 hold, let −a/2 ≤ s, and let (X sr )r∈R be defined

as above. Then the following assertions hold:

(i) The space X sq is continuously embedded in X s

p for p < q, i.e., for x ∈ X sq

|||x|||p ≤ γp−q |||x|||q , (4.4)

where γ is such that

〈 (B∗B)−1

2(a+s)x, x 〉 ≥ γ‖x‖2 for all x ∈ D((B∗B)−1

2(a+s) ) .

(ii) The interpolation inequality holds, i.e., for all x ∈ X sr

|||x|||q ≤ |||x|||r−qr−pp |||x|||

q−pr−pr , p < q < r . (4.5)


(iii) For s ≤ r ≤ a+ 2s,

‖x‖r ≤ mr−sa+s |||x|||r for all x ∈ X s

r ⊂ Xr. (4.6)

and for −a ≤ r ≤ s,

‖x‖r ≥ mr−sa+s |||x|||r for all x ∈ Xr ⊂ X s

r . (4.7)

In particular, if −a/2 ≤ s ≤ 0, we obtain

‖x‖0 ≤ m−sa+s |||x|||0 for all x ∈ X s

0 ⊂ X0. (4.8)

Moreover,

|||x|||−a = ‖Tx‖ for all x ∈ X . (4.9)

(iv) If in addition (3.11) is satisfied, then with p = s+ r−sa+s

(a+s) the following estimates

hold:

for s ≤ r ≤ a+ 2s

‖x‖p ≥ mr−sa+s |||x|||r for all x ∈ Xp ⊂ X s

r , (4.10)

and for −a ≤ r ≤ s,

‖x‖p ≤ mr−sa+s |||x|||r for all x ∈ X s

r ⊂ Xp.

(v) X sr+sr∈R is the Hilbert scale induced by A := L−s(B∗B)−

12(a+s)Ls over Xs.

Proof. In view of Remark 4.3, the proof of (i) – (iv) follows from Proposition 3.2

and Proposition 3.6.

To prove (v) we show that A is a densely defined, unbounded, self-adjoint, strictly

positive operator on Xs. First of all, any element x ∈ D(A) has the representation

x = L−s(B∗B)1

2(a+s) z with z ∈ X . Since T is injective by assumption, the space

Lsx : x ∈ D(A) = R((B∗B)1

2(a+s) ) is dense in X and hence D(A) is dense in Xs.Next, for x, y ∈ D(A),

〈Ax, y 〉s = 〈LsL−s(B∗B)−1

2(a+s)Lsx, Lsy 〉0= 〈Lsx, (B∗B)−

12(a+s)Lsy 〉0 = 〈 x,Ay 〉s.

Finally, with γ as in (i), we obtain that

‖Ax‖s = ‖LsL−s(B∗B)−1

2(a+s)Lsx‖0 ≥ γ‖Lsx‖ = γ‖s‖ ,

4.2. LINEAR PROBLEMS 33

where γ is as in (i). It remains to be shown that the Hilbert scale induced by A over

Xs coincides with X sr+s. For r ∈ N we have that

D(Ar) = D((L−s(B∗B)−1

2(a+s)Ls)r)

= D(L−s(B∗B)−r

2(a+s)Ls) as domain in Xs= D((B∗B)−

r2(a+s)Ls) as domain in X

= X sr+s .

For r ∈ R the assertion follows by spectral theory. ¤

Remark 4.5 Applying (4.6), (4.7) with r = s shows that the spaces Xs = X ss coincide

with identical norms (cf. Remark 4.3). However, in general Xr 6⊂ X sr for r 6= s, in

particular, if s < 0, then in general X0 6⊂ X s0 , and the source condition x† ∈ X s

u is

usually stronger than the condition x† ∈ R((T ∗T )u2a ).

If, however, the norm equivalence (3.15) holds, then the spaces Xu, R((T ∗T )u2a ), and

X su coincide for −a ≤ u ≤ a + 2s. In case only a weaker estimate (3.11) from below

holds, one has X su ⊂ R((T ∗T )

u2a ) ⊂ X s

u with u = aau as long as u ≤ a + 2s. The right

inequality even holds for u ≤ a+ 2s.

4.2 Linear Problems

In this section we investigate the regularizing properties of (iterative) regularization

methods in Hilbert scales for linear problems

Tx = y,

under the following (relaxed) assumptions.

Assumption 4.6 Let T : X → Y denote a bounded linear operator, and assume:

(L1) Tx = y has a solution x†.

(L2) ‖Tx‖ ≤ m‖x‖−a for all x ∈ X and some a > 0,m > 0. Moreover, the extension

of T to X−a (again denoted by T ) is injective.

(L3) B := TL−s is such that ‖B‖X ,Y ≤ 1, where −a/2 ≤ s ≤ 0.

The existence of a solution (L1) will be needed for the convergence analysis in case

the regulariaztion parameter is chosen by the discrepancy principle. For the results

under a-priori rules, only y ∈ R(T ) + R(T )⊥ is required. (L2) is Assumption 4.1 in

Section 4.1, and (L3) is a simple scaling condition.


4.2.1 Semiiterative Regularization Methods in Hilbert Scales

A general semiiterative regularization method in Hilbert scales has the form (compare

Section 2.2)

xδk = µ1,kxδk−1 + . . .+ µk,kx0 + ωkL

−2sT ∗(yδ − Txδk), k ≥ 1,∑k

i=1 µi,k = 1, ωk 6= 0.(4.11)

As for Landweber iteration in Hilbert scales (cf. (4.1)), the only algorithmical difference

to the standard iterations is that the residuals T ∗(yδ − Txk) are preconditioned with

L−2s. The iterates defined by (4.11) have the closed form representation

xδk = x0 + L−sgk(B∗B)B∗(yδ − Tx0),

where gk is a polynomial of order k− 1 with gk(0) = 0. Moreover, the following expres-

sions for the approximation and the propagated data error hold:

xδk − xk = L−sgk(B∗B)B∗(yδ − y),

xk − x† = L−srk(B∗B)Ls(x0 − x†), (4.12)

where rk(λ) = 1 − λgk(λ). In order to ensure stability of the approximations xδk, the

iteration (4.11) has to be stopped appropriately, e.g., according to the discrepancy

principle (2.7)

‖yδ − Txδk∗‖ ≤ τδ < ‖yδ − Txδk‖ , 0 ≤ k < k∗

for some τ > sup|rk(λ)| : λ ∈ [0, 1]. According to Theorem 1.2, the approximation

quality of a regularization method is determined by its modulus of convergence ωµ.

Thus, we require

supλ∈[0,1]

λµ|rk(λ)| := ωµ(k) ≤ cµ(k + 1)−σµ, 0 ≤ µ ≤ µ0, (4.13)

to hold for some σ > 0 and µ0 > 0; we are especially interested in the cases σ ∈ 1, 2(cf. Theorem 2.2).

For the proof of of the main convergence statements, we will need the following

Lemma:

Lemma 4.7 Let the residual polynomial rk satisfy |rk| ≤ c0 for all λ ∈ [0, 1]. Then

λµ|gk(λ)| ≤ 2c0kσ(1−µ), for all 0 ≤ µ ≤ 1, λ ∈ [0, 1].

Proof. By rk(λ) = 1− λgk(λ), we obtain for 0 ≤ µ ≤ 1 and λ 6= 0

λµgk(λ) = λµ−1(1− rk(λ))

= [λ−1(1− rk(λ))]1−µ[1− rk(λ)]µ.


Now, by the Mean Value Theorem, one can find a λ ∈ [0, 1] such that

λ−1(1− rk(λ)) = −r′k(λ),

which together with Markov’s inequality (|r′k(λ)| ≤ 2c0k2) and |rk(λ)| ≤ c0 for λ ∈ [0, 1]

yields

λµgk(λ) ≤ 2c0k2(1−µ) for λ ∈ [0, 1].

Note that c0 ≥ 1, since rk(0) = 1. ¤

We are now in the position to state the main results:

Proposition 4.8 Let Assumption 4.6 hold and let xδk be defined by the semiiterative

method (4.11) satisfying (4.13) for some µ0 > 0 and 0 < σ ≤ 2. Additionally, assume

that x† − x0 ∈ X su , i.e.,

x† − x0 = L−s(B∗B)u−s

2(a+s)w, (4.14)

for some w ∈ X and 0 < u ≤ 2(a+ s)µ0 − a. Then

‖xδk − x†‖ ≤ c (δkσa

2(a+s) + k−σu

2(a+s) ‖w‖).

Proof. Denote by C0 = max cµ : 0 ≤ µ ≤ min(µ0,a+u

2(a+s)). Using the source

condition (4.14) and the representation (4.12), we get with (4.3), (4.14) and (4.13)

|||xk − x†|||u = ‖(B∗B)s−u

2(a+s)LsL−srk(B∗B)(B∗B)

u−s2(a+s)w‖

≤ ‖rk(B∗B)‖ ‖w‖ ≤ C0‖w‖ .Similarly, with (4.9) and (4.13), we derive

|||xk − x†|||−a = ‖Txk − Tx†‖= ‖(B∗B)

a+u2(a+s) rk(B

∗B)w‖≤ C0(k + 1)−

σ(a+u)2(a+s) ‖w‖ .

Now, the interpolation inequality (4.5) yields

|||xk − x†|||0 ≤ |||xk − x†|||ua+u

−a |||xk − x†|||a

a+uu

≤ C0(k + 1)−σu

2(a+s) ‖w‖ .Next, we estimate the propagated data error: similarly as above, we get with (4.9) for

−a ≤ r ≤ a+ 2s

|||xδk − xk|||r = ‖(B∗B)s−r

2(a+s) gk(B∗B)B∗(yδ − y)‖

≤ δ ‖(B∗B)a+2s−r2(a+s) gk(B

∗B)‖≤ 2C0δ(k + 1)

σ(a+r)2(a+s) ,

where we have used Lemma 4.7 with µ := a+2s−r2(a+s)

and 0 ≤ µ ≤ 1 for−a ≤ r ≤ a+2s since

s ≥ −a/2. Combining the estimates for the approximation error and the propagated

data error, and using (4.8), i.e., ‖x‖0 ≤ |||x|||0 for s ≤ 0, now yields the assertion. ¤


Remark 4.9 Under our assumptions, the source condition x† − x0 ∈ X su may be

stronger than the usual source condition x† − x0 ∈ Xu (cf. Proposition 3.6). As a

consequence, the usual restriction u ≤ a + 2s can be replaced by the weaker condition

0 < u ≤ 2(a+s)µ0−a (cf. also [71]). For µ0 = 1, this coincides with the usual restriction

u ≤ a + 2s. As already mentioned in Remark 4.5, the spaces Xu and X su coincide for

u ≤ a+2s if the stronger condition (3.15) holds. In case only the weaker estimate (3.11)

is valid, one still has X su ⊂ Xu, with u = (u − s) a+s

a+s+ s. In particular, since s ≤ 0, an

estimate (3.11) from below is only needed to interpret the source condition x†−x0 ∈ X su

in terms of the spaces Xs.As can be seen from the proof, the statement of Proposition 4.8 can be strengthened

in several ways: in fact, we have even shown that

|||xδk − x†|||r ≤ Cr(δka+r

(2)(a+s) + k−u−r

(2)(a+s) ‖w‖)

holds for −a ≤ r ≤ u. However, in case r > 0, an additional restriction r ≤ a + 2s is

needed. Note also, that the restriction s ≤ 0 was only used to derive the estimate in

the original norm ‖ · ‖ , in which we are actually interested. Hence, Proposition 4.8 can

be generalized in another way, i.e., one has

‖xδk − x†‖r ≤ Cr(δkσ(a+r)2(a+s) + k−

σ(u−r)2(a+s) ‖w‖)

for s ≤ r ≤ mina+ 2s, u.

As an immediate consequence of Proposition 4.8, we have at least convergence if

the iteration is stopped after k∗(δ) steps with k∗(δ) → ∞ and δkσa

2(a+s)∗ → 0. In order

to derive convergence rates in terms of δ, the number of iterations has to be bounded

appropriately:

Theorem 4.10 Let the assumptions of Proposition 4.8 hold. If the iteration (4.11) is

stopped according to the a priori rule k∗ ∼ δ−2(a+s)σ(a+u) then

‖xδk − x†‖ = O(δua+u ).

If, alternatively, the iteration is stopped according to the discrepancy principle (1.14)

then

‖xδk − x†‖ = O(δua+u ) and k∗ = O(δ−

2(a+s)σ(a+u) ). (4.15)

Proof. The first statement follows directly from Proposition 4.8. For the second,

observe that by Proposition 4.4, (x† − x0) ∈ X su is equivalent to

(x† − x0) ∈ R(L−s(B∗B)u−s

2(a+s) ) = R((T ]T )u−s

2(a+s) ),

where as above T ] = L−2sT ∗ denotes the adjoint of (the extension of) T with respect to

the spaces Xs and Y . Thus, for σ = 1 we obtain k∗ = k∗(δ, yδ) = O(δ−

2(a+s)a+u ) by Theorem


6.5 in [28], when (4.11) is understood as the standard iteration for T considered as

operator from Xs → Y and the iteration is stopped by the discrepancy principle (1.14).

The estimates for σ 6= 1 follow similarly.

For the proof of the convergence rate, we proceed similarly as in the proof of Theorem

4.17 in [28]: by the interpolation inequality and (4.14), we can estimate ek = xk−x† by

|||ek|||0 = ‖(B∗B)u

2(a+s) rk(B∗B)w‖

≤ c0‖Brk(B∗B)Ls(x† − x0)‖ ua+u ‖w‖ a

a+u .

The discrepancy principle, (4.13), Proposition 4.8, and the bound on k∗ further yield

that

‖Brk(B∗B)Ls(x† − x0)‖ = ‖T (xk∗ − x0)‖ ≤ (τ − 1)δ,

holds for u ≤ 2(a+ s)µ0 − a, and thus

‖ek∗‖ ≤ c δua+u ‖w‖ a

a+u .

In a similar way as in Proposition 4.8 it hence follows with the estimate for k∗ that

‖eδk∗‖ ≤ ‖xδk∗ − xk∗‖ + ‖ek∗‖ = O(δua+u ).

¤

Remark 4.11 We conjecture that in analogy to Theorem 8.25 in [28] it is even possible

to derive o(·)-bounds in (4.15), which then would include the case u = 0 in our con-

vergence analysis. If (3.15) is valid, the rates (4.15) are optimal (i.e., the best possible

worst case error bounds under the given source condition). Note, that the convergence

rates do not depend on s, while the stopping index k∗ does. This suggests to choose s

as small as possible, i.e., s = −a/2, in which case the number of iterations is bounded

by k∗ = O(δ−a

a+u ) for x† ∈ R((T ∗T )u2a ) ∩ X s

u .

At this point, we want to discuss in more detail, in which sense the Hilbert scale

operator L−2s acts as a preconditioner:

Remark 4.12 For simplicity consider the standard Landweber iteration

xδk+1 = xδk + T ∗(yδ − Txδk).

If the operator T is smoothing, e.g., if T : X → Y has a continuous extension to Xsfor some s < 0, then by Proposition 4.4 the inclusion R(T ∗) ⊂ X−s holds, and thus

T ∗(yδ − Txδk) ∈ X−s, i.e., the updates xδk+1 − xδk are smoother then actually needed.

This can be exploited by preconditioning with L−2s. If, e.g., s = −a/2 and if (3.15)

holds, then the backprojection operator L−2sT ∗ = LaT ∗ appearing in the preconditioned

iteration

xδk+1 = xδk + LaT ∗(yδ − Txδk)


is not smoothing any more; to be more specific, for s = −a/2 we obtain

‖LaT ∗y‖ ∼ ‖(T ∗T )−12T ∗y‖ ∼ ‖y‖.

This means that La is an optimal preconditioner for T ∗ and the iteration operator Ma

of Landweber iteration, appearing in the preconditioned normal equation

Max := LaT ∗Tx = LaT ∗yδ

has the same smoothing properties as the operator T in the original equation Tx = y,

while being selfadjoint as operator on X−a/2. Moreover, if the operator T is selfad-

joint and if we use (T ∗T )−1/2 as a preconditioner, then the preconditioned Landweber

iteration amounts to stationary Richardson iteration for the original equation Tx = y.

Note, that it is not possible to choose s < −a/2, e.g., s = −a, in which case one

would have ‖M2ax‖ = ‖L2aT ∗Tx‖ ∼ ‖x‖. If s < −a/2, the iteration (4.11) is not even

well-defined as iteration in X for general yδ ∈ Y , but only as iteration in X−a.

4.2.2 The Conjugate Gradient Method in Hilbert Scales

The method of conjugate gradients is known as a powerful method for solving selfad-

joint, positive (semi-)definite linear problems. The question of applicability to ill-posed

problems, i.e., to the normal equations

T ∗Tx = T ∗y (4.16)

has first been addressed by Kammerer and Nashed [50]. For details of the convergence

analysis in case of noisy data y 6= yδ and for convergence rates we refer to [28, Chapter

7] and the references cited therein. We shortly summarize the main properties of the

conjugate gradient method applied to the normal equations (cgne), see also Section 2.2:

First of all, cgne is a Krylov-subspace method, hence the k-th iterate can be written

as

xδk = x0 + gk(T∗T ; yδ)T ∗(yδ − Tx0).

Note, that in contrast to the (linear) semiiterative methods (2.10), the iteration poly-

nomials gk themselves depend on the data yδ, which makes cgne a nonlinear method.

The main convergence result is that cgne is an order-optimal regularization method

when stopped according to the discrepancy principle (1.14), i.e., the optimal rates

‖xδk∗ − x†‖ = O(δ2µ

2µ+1 )

hold. Another remarkable property of cgne, which alternatively characterizes the

method, is that the iterates xδk satisfy the following optimality condition (cf. [28, The-

orem 7.3]):

‖yδ − Txδk‖ = min‖yδ − Tx‖ : x− x0 ∈ Kk(T ∗T, T ∗(yδ − Tx0)),


which implies that cgne is the method with minimal stopping index k∗ under all Krylov-

subspace methods, in particular, under all semiiterative regularization methods stopped

according to the discrepancy principle (2.7). As we will show below, this optimality

property carries over to the preconditioned version. It is not clear from the beginning,

if cgne in Hilbert scales still yields (optimal) convergence rates in the usual space Xand if the preconditioning will actually reduce the number of iterations compared to

cgne in the standard spaces. The reason for that is the following (cf. [28, Theorem

7.14]): in case T is compact and a decay rate for the singular values of the operator

T is known, it is possible to derive stronger bounds on the size of the stopping index

k(δ, yδ), i.e., if the singular values σn of T decay like O(n−α) for some α > 0, then

k(δ, yδ) = O(δ−1

(2µ+1)(α+1) ), (4.17)

and if the singular values decay like O(qn) with some q < 1, then

k(δ, yδ) = O(1 + | log δ|). (4.18)

As we will see below, the operator B = TL−s determines the behavior of the precondi-

tioned iteration. However, in case s < 0, the singular values σn of B decay at a slower

rate then the ones of T , hence the term (α + 1) in (4.17) will in general decrease with

decreasing s.

The aim of the subsequent analysis is twofold: first we want to show that in case or

preconditioning it is still possible to lift the convergence rates from Xs to the original

space X ; secondly, the bounds on the number of iterations (4.17), (4.18) can be fur-

ther improved when the iteration is preconditioned in Hilbert scales. In the sequel, we

consider the following (preconditioned) Hilbert scale version of the cgne method:

Algorithm 4.13 (hscgne)

x0 = x∗; d0 = yδ − Tx0; w0 = T ∗d0; p1 = s0 = L−2sw0

for k = 1, 2, . . ., unless sk−1 = 0, compute

qk = Tpk

αk = 〈 sk−1,wk−1 〉‖qk‖2

xk = xk−1 + αkpk

dk = dk−1 − αkqkwk = T ∗dk

sk = L−2swk

βk = 〈 sk,wk 〉〈 sk−1,wk−1 〉

pk+1 = sk + βkpk

end


In case L = I, Algorithm 4.13 reduces to to the standard cgne- algorithm (cf. [28]).

The following convergence analysis of the hscgne - method follows the one of cgne

presented in [28, Chapter 7].

As already observed in Section 4.2.1, Algorithm 4.13 can be viewed as standard

cgne iteration for Tx = y when T is considered as operator on Xs. This observation

immediately yields

Corollary 4.14 Let Assumption 4.6 hold, xδk be defined by Algorithm (4.13) and

k(δ, yδ) be determined by the discrepancy principle (1.14). If x0 − x† ∈ X su , then with

µ = u−s2a

,

‖xδk(δ,yδ) − x†‖s = O(δ2µ

2µ+1 ). (4.19)

Moreover, the following optimality condition holds for the iterates xδk:

‖yδ − Txδk‖ = min‖yδ − Tx‖ : x− x0 ∈ Kk(L−2sT ∗(yδ − Tx0), L−2sT ∗T ). (4.20)

Proof. We assume for brevity that x0 = 0. Then (4.19) follows directly from The-

orem 2.3 by observing that Algorithm 4.13 corresponds to cgne for Bz = y with

z = Lsx, and that X su = R((T ]T )

u−s2(a+s) ) (cf. Remark 4.3). Secondly, with Txδk = Bzδk it

follows from [28, Theorem 7.3] that

‖yδ − Txδk‖ = ‖yδ − Bzδk‖= min‖yδ − Bz‖ : z − z0 ∈ Kk(B∗(yδ − Bz0), B∗B)= min‖yδ − Tx‖ : x− x0 ∈ Kk(L−2sT ∗(yδ − Tx0), L−2sT ∗T ).

¤

We will now show that, as for linear semiiterative methods, the convergence rates in

Xs can be lifted to the usual space X . In the following, let κ denote the smallest index

with sκ = 0, respectively κ = +∞, if sk 6= 0 for all k > 0. By gk, rk we denote the

iteration respectively residual polynomials of the hscgne method. For the proof of the

main convergence theorem, we will require some auxiliary results (cf. [28, Section 7]:

Lemma 4.15 Let Assumption 4.6 and (4.14) hold and |||x† − x0|||u ≤ ρ. Then, for

0 < k ≤ κ,

‖yδ − Txδk‖ ≤ δ + c|r′k(0)|− u+a2(a+s)ρ.

Proof. Let z = Lsx and rewrite Tx = y as Bz = y with B as in Assumption 4.6. Now,

with (4.14), z† − z0 = Ls(x† − x0) = (B∗B)u−s

2(a+s)w. The result then follows analogously

to Lemma 7.10 in [28] by replacing T , x and µ by B, z and u−s2(a+s)

, respectively. ¤

Lemma 4.16 Under the assumptions of Lemma 4.15

‖xδk − x†‖ ≤ c (ρa

u+a δuu+a

k + |r′k(0)| a2(a+s) ),


where δk := max(‖Txδk − yδ‖ , δ).

Proof. Let Eλ denote the spectral family associated with the operators B∗B and

let z = L−sx be as above. Then we have, cf. [28, Lemma 7.11],

‖xδk − x†‖ = ‖L−s(zδk − z†)‖ ≤ ‖(B∗B)s

2(a+s) (zδk − z†)‖≤ ‖Eε(B∗B)

s2(a+s) rk(B

∗B)(B∗B)u−s

2(a+s)w‖+ ‖Eε(B∗B)

s2(a+s) gk(B

∗B)B∗(y − yδ)‖ + ε−a

2(a+s) ‖y − Bzδk‖≤ ‖λ u

2(a+s) rk(λ)‖C[0,ε]ρ+ ‖λ a+2s2(a+s) gk(λ)‖C[0,ε]δ + ε−

a2(a+s) (‖yδ − Txδk‖ + δ).

Similarly as in the proof of Lemma 7.11 in [28], we have for ε < λ1,k (where λi,ki=1,...,k

denotes the strictly increasing sequence of real roots of the polynomial rk) that rk is

convex in [0, ε] and hence for λ ∈ [0, ε]

0 ≤ λa+2sa+s g2

k(λ) =

∣∣∣∣1− rk(λ)

λ

∣∣∣∣

aa+s

|1− rk(λ)| a+2sa+s ≤ |r′k(0)| aa+s .

Therefore,

‖xδk − x†‖ ≤ εu

2(a+s)ρ+ |r′k(0)| a2(a+s) δ + 2ε−

a2(a+s) δk =: f(ε).

Note that f(ε) is monotonically decreasing in (0, ε∗) and increasing in (ε∗,∞) where ε∗

is defined by εu+a

2(a+s)∗ = 2a

uδkρ

. The rest of the proof follows the lines of the one of Lemma

7.11 in [28]. ¤

Combining the previous lemmata yields the following convergence rates result:

Theorem 4.17 Let Assumptions 4.6 hold and let the Algorithm 4.13 be stopped ac-

cording to the discrepancy principle (1.14) with k∗ = k(δ, yδ). If x0 − x† ∈ X su , i.e.,

(4.14) holds, then

‖xδk∗ − x†‖ = O(δua+u ).

Proof. By Lemma 4.15 with k = k(δ, yδ) and with ρ such that |||x† − x0|||u ≤ ρ, we

have

|r′k−1(0)| ≤ c(ρδ

) 2(a+s)u+a

.

The result now follows with obvious modifications of the proof of Theorem 7.12 in [28]

by replacing T , x, and 22µ+1

by B, z, and 2(a+s)u+a

, respectively. ¤

Remark 4.18 Similarly as for linear semiiterative methods, the convergence rates in

Theorem 4.17 can be strengthened: in fact, one easily derives the same rate in the norm

||| · |||0. In view of Remark 4.9, it is even possible to derive the corresponding rates for

||| · |||r for −a ≤ r ≤ a+2s. Furthermore, these rates are (order) optimal under the given

conditions. Like Landweber iteration, and in contrast to general semiiterative methods,

hscgne has no saturation, i.e., Theorem 4.17 holds for all u > 0.


Next we show that like for standard cgne, improved bounds on the number of

iterations, depending on the ill-posedness of the operators B = TL−s (cf. Theorem 2.4)

hold:

Theorem 4.19 Let Assumption 4.6 and (4.14) hold, and let B be compact. If the sin-

gular values σn of B decay like O(n−α) with some α > 0, then

k(δ, yδ) = O(δ−a+s

(a+u)(α+1) ).

If the singular values decay like O(qn) with some q < 1, then

k(δ, yδ) = O(1 + | log δ|).

Proof. The proof follows the lines of the proof of Theorem 7.14 in [28]; we only

mention the main differences: for given yδ, denote by rk the residual polynomials of

cgne applied to Bz = y with z = Lsx and for simplicity x0 = z0 = 0, in which case we

have

yδ − Txδk = rk(BB∗)yδ.

Together with (4.20) this implies that

‖yδ − Txδk‖ ≤ ‖pk(BB∗)yδ‖

holds for arbitrary polynomials pk with pk(0) = 1. The rest follows the lines of the

proof of Theorem 7.14 in [28]. ¤

Remark 4.20 If the stronger condition (3.15) holds, and 0 < u ≤ a + 2s, then one

can compare the above estimate with the ones of Theorem 2.4 in the following way: by

Proposition 3.6 we have in this case (B∗B)ν2 ∼ L−ν(a+s) and (T ∗T )

ν2 ∼ L−νa for |ν| ≤ 1.

Consequently, if the singular values of T decay like O(n−α), the corresponding singular

values of B decay like O(n−α), with α = αa+sa

, which can be seen in the following way:

let A1 and A2 be two compact, selfadjoint, spectrally equivalent operators, i.e.,

c〈A1x, x 〉 ≤ 〈A2x, x 〉 ≤ c〈A1x, x 〉 ∀x ∈ X

with eigenvalues λn(A1)n∈N, λn(A2)n∈N sorted in decreasing order. Then by a char-

acterization of the n-th eigenvalue due to Courant and Fischer [19], one has

λn(A1) = supVn⊂X

infx∈Vn\0

〈A1x, x 〉‖x‖2

≤ c supVn⊂X

infx∈Vn\0

〈A2x, x 〉‖x‖2

= c λn(A2).

Here Vn denote arbitrary subspaces of X with dim(Vn) = n. Hence the eigenvalues of

A1 and A2 decay at the same rate. The estimate for the singular values of T and B

then follows easily.


Under condition (3.15), X su = R((T ∗T )µ) with µ = u

2a(cf. Remark 4.18), which

implies that

k(δ, yδ; s) ≤ O(δ−f(s)), with f(s) =a+ s

(a+ u)(αa+sa

+ 1).

Note that f(s) is strictly monotonically increasing in s, thus, for s < 0, the number of

hscgne-iterations needed to reach the discrepancy stopping criterion can be expected

to be smaller than that for standard cgne. For s = 0 (no preconditioning) we have

f(0) =a

(a+ u)(α + 1)=

1

(2µ+ 1)(α + 1),

which coincides with the estimate of Theorem 2.4.

Summarizing the results of this section, we have shown that for linear inverse prob-

lems Tx = y, preconditioning semiiterative methods or cgne in Hilbert scales yields

order-optimal regularization methods under Assumption 4.6. Furthermore, the number

of iterations needed to obtain the optimal rates are reduced substantially in compari-

son to the standard methods. This will be illustrated in several numerical examples in

Section 5.

4.3 Nonlinear Problems

As we have seen in Section 2.2, additional conditions are required for the convergence

analysis of iterative regularization methods for nonlinear inverse problems. We will

now turn to a discussion of adequate conditions for the preconditioned iterations and

investigate the convergence of iterative regularization methods in Hilbert scales in detail.

Before we state our assumptions for the convergence analysis of preconditioned

iterative regularization methods for nonlinear problems

F (x) = y,

we shortly recall the conditions and results for the standard iterations. Below, we will

investigate preconditioning of Landweber iteration

xδk+1 = xδk + F ′(xδk)∗[yδ − F (xδk)], (4.21)

and of a class of regularized Newton-type iterations of the form

xδn+1 = x0 + gαn(F ′(xδn)∗F ′(xδn))F ′(xδn)∗[yδ − F (xδn)− F ′(xδn)(xδn − x0)]. (4.22)

Convergence of iterative regularization methods for nonlinear inverse problems is usually

derived under the following (or similar) nonlinearity conditions (cf. [49] for details and

further references)

F ′(x) = R(x, x)F ′(x) +Q(x, x), (4.23)


with

‖I −R(x, x)‖ ≤ CR < 1 and ‖Q(x, x)‖ ≤ CQ‖F ′(x)(x− x)‖ (4.24)

for all x, x ∈ Bρ(x†).It is somehow clear, that in case the iterations (4.21) or (4.22) are preconditioned

by L−2s, the operators L−2s will also appear somewhere in the nonlinearity conditions.

We state and discuss the appropriate conditions in more detail now:

4.3.1 Basic Assumptions

Similarly to the conditions in Assumption 4.6 for linear problems, we require:

Assumption 4.21

(N1) F : D(F )(⊂ X )→ Y is continuous and Frechet-differentiable in X .

(N2) F (x) = y has a solution x†.

(N3) ‖F ′(x†)x‖ ≤ m‖x‖−a for all x ∈ X , some a > 0, and m > 0. Moreover, the

extension of F ′(x†) to X−a is injective.

(N4) B := F ′(x†)L−s is such that ‖B‖X ,Y ≤ 1, where s ≥ −a/2.

Note that under Assumption 4.21, the results of Proposition 3.6 and 4.4 hold verba-

tim for the linearized operator T := F ′(x†). For the following convergence rates analysis

for nonlinear problems in Hilbert scales, we need a smoothness condition on the solution

x† and additional conditions on the Frechet-derivative of F :

Assumption 4.22

(N5) x0 ∈ Bρ(x†) := x ∈ X : x− x† ∈ X s0 ∧ |||x− x†|||0 ≤ ρ ⊂ D(F ) for some ρ > 0.

(N6) For all x ∈ Bρ(x†) there exist linear operators R(x, x†) and Q(x, x†) such that

F ′(x) = R(x, x†)F ′(x†) +Q(x, x†),

with

‖I −R(x, x†)‖ ≤ CR < 1, ‖Q(x, x†)‖X s−b,Y ≤ CQ |||x− x†|||b−a,

for some b ∈ [0, a], β ∈ (0, 1], and CR, CQ > 0 independent of x.

(N7) x†−x0 ∈ X su for some 0 < u ≤ b+ 2s, i.e., there exists an element w ∈ X so that

Ls(x† − x0) = (B∗B)u−s

2(a+s)w . (4.25)


Before we start our analysis we want to discuss the conditions above.

Remark 4.23 Condition (N6) is very similar to the nonlinearity conditions (4.23),

(4.24) used in [23, 47, 48] for the convergence analysis of Newton-type regularization

methods and (with Q = 0) for Landweber iteration in [39]. In fact, if (3.15) and (4.23),

(4.24) hold, i.e., ‖Q(x, x†)‖X ,Y ≤ CQ‖F ′(x†)(x− x†)‖ , then for b = 0, we have

‖Q(x, x†)‖X s−b,Y ≤ c ‖Q(x, x†)‖X ,Y ≤ C ‖F ′(x†)(x− x†)‖ = C |||x− x†|||−a.

Thus, in this case (N6) with b = 0 is implied by the conditions in [48]. We just mention,

that with minor modifications, our results hold true if we replace the estimate on

Q(x, x†) by ‖Q(x, x†)‖X s−b,Y ≤ CQ‖x − x†‖βc , where c and β will affect the range of

values for u, where the results actually hold, cf. [26] for details, where the condition

‖F ′(x†)− F ′(x)‖X s−b,Y ≤ cβ |||x† − x|||β0

has been used instead. Now, assume that a = b or that CQ = 0 and that a stronger

estimate ‖I − R(x, x†)‖ ≤ CR |||x− x†|||0 holds; then (N6) reduces to

‖F ′(x†)− F ′(x)‖X s−a,Y ≤ C |||x† − x|||0.

Due to (4.9), the operator F ′(x†) has a continuous extension to X s−a ⊃ X−a. Therefore,

condition (N6) implies that F ′(x) also has a continuous extension to X s−a ⊃ X in a

neighborhood of x†. By definition of the space X s−a, this condition is equivalent to

‖(B∗B)−a+s

2(a+s)L−s(F ′(x†)∗ − F ′(x)∗)‖Y,X ≤ C |||x† − x|||0. (4.26)

By virtue of (3.10) and Proposition 4.4 (iii), this implies that L−2sF ′(x)∗ maps Y at

least into X sa+2s ⊂ Xa+2s and hence F ′(x)∗ maps Y at least into Xa. Observe that for

s = 0 and under the above assumptions (N6) reduces to

‖(F ′(x†)∗F ′(x†))− 12 (F ′(x†)∗ − F ′(x)∗)‖Y,X ≤ c‖x† − x‖0,

cf. [39, (3.18)]. Moreover, this condition is equivalent to (N6) with CQ = 0 and ‖Rx−I‖replaced by ‖(Rx− I)P‖ , where P is the orthogonal projector from Y onto R(F ′(x†)).

Condition (N7) finally is a smoothness condition for the exact solution corresponding

to (4.14) for linear problems. Under the usual assumption for regularization in Hilbert

scales, i.e., ‖F ′(x)h‖ ∼ ‖h‖−a, this condition is equivalent to x† − x0 ∈ Xu, i.e., the

source condition

Ls(x† − x0) = (B∗B)u−s

2(a+s)w

can be interpreted in terms of the Hilbert scale Xs. If b = a, then u ≤ a + 2s is

allowed, which is the usual restriction for regularization in Hilbert scales. For s = 0 and

if (3.15) is valid, u ≤ a+ 2s reduces to µ ≤ 1/2.


4.3.2 Landweber Iteration in Hilbert Scales

We now turn to the analysis of preconditioned Landweber iteration in Hilbert scales,

namely

xδk+1 = xδk + L−2sF ′(xδk)∗(yδ − F (xδk)), k = 0, 1, 2, . . . . (4.27)

As in the linear case, (4.27) can be interpreted as standard Landweber iteration with

F considered as operator on Xs. For s = 0 (no preconditioning) the iterations coincide.

For a comparison of our convergence results to those of the standard methods and

Landweber iteration in Hilbert scales with s > 0 under the stronger assumption

m‖h‖−a ≤ ‖F ′(x†)h‖ ≤ m‖h‖−a, (4.28)

(cf. (3.15)), we refer to Sections 2.2.4 and 3.3, and the references cited there.

For the proof of the main convergence results, we will need the following two lem-

mata:

Lemma 4.24 ([68, Lemma 2.9]) Let µ, ν > 0. Then there exists a positive constant

cµ,ν independent of k such that

k−1∑

j=0

(k − j)−µ(j + 1)−ν ≤ cµ,ν(k + 1)1−µ−ν

1 , max(µ, ν) < 1 ,

ln(k + 1) , max(µ, ν) = 1 ,

(k + 1)max(µ,ν)−1 , max(µ, ν) > 1 .

The second lemma provides two estimates of eδk := xδk − x† in terms of k and δ (cf.

Proposition 4.8 for the linear case).

Lemma 4.25 Let Assumptions 4.21 and 4.22 hold and yδ ∈ Y satisfy ‖yδ − y‖ ≤ δ.

Moreover, let k∗ = k∗(δ, yδ) be chosen according to the stopping rule (2.13) with τ > 2,

and assume that |||eδj |||0 ≤ ρ for all 0 ≤ j < k ≤ k∗ and some ρ > 0, where eδj := xδj −x†.Then there is a positive constant C (independent of k and δ) such that for all 0 ≤ k ≤ k∗the following estimates hold:

|||eδk|||u ≤ |||x† − x0|||u + C

k−1∑

j=0

(k − j)−a+2s−u2(a+s) |||eδj |||−a

+ C

k−1∑

j=0

(k − j)−b+2s−u2(a+s) |||eδj |||−a + δk

a+u2(a+s)

(4.29)

and

|||eδk|||−a ≤ (k + 1)−a+u

2(a+s) |||x† − x0|||u + C

k−1∑

j=0

(k − j)−1 |||eδj |||−a

+ C

k−1∑

j=0

(k − j)−b+a+2s2(a+s) |||eδj |||−a + δ.

(4.30)


Proof. From (4.27) we immediately obtain the representation

eδk+1 = (I − L−2sF ′(x†)∗F ′(x†))eδk

+L−2sF ′(x†)∗(yδ − y − qδk) + L−2spδk

with

qδk := F (xδk)− F (x†)− F ′(x†)eδk (4.31)

and

pδk := (F ′(xδk)∗ − F ′(x†)∗)(yδ − F (xδk)) (4.32)

= F ′(x†)∗(R∗ − I)(yδ − F (xδk)) +Q∗(yδ − F (xδk)) (4.33)

=: F ′(x†)∗pδ1,k + pδ2,k, (4.34)

where we used the notations R = R(xδk, x†) and Q = Q(xδk, x

†). Furthermore, we get

the closed form expression

eδk = L−s(I − B∗B)kLs(x0 − x†) +k−1∑

j=0

L−s(I − B∗B)k−j−1[B∗(yδ − y − qδj + pδ1,j) + L−spδ2,j] .

Together with (4.2) and (4.25) we now obtain the following estimates

|||eδk|||u ≤ ‖(I − B∗B)k‖ ‖w‖

+k−1∑

j=0

‖(B∗B)a+2s−u2(a+s) (I − B∗B)k−j−1‖(δ + ‖qδj‖ + ‖pδ1,j‖)

+k−1∑

j=0

‖(B∗B)b+2s−u2(a+s) (I − B∗B)k−j−1‖ ‖(B∗B)−

b+s2(a+s)L−spδ2,j‖

and

|||eδk|||−a ≤ ‖(B∗B)a

2(a+s) (I − B∗B)k‖ ‖w‖ +

+k−1∑

j=0

‖(B∗B)(I − B∗B)k−j−1‖(δ + ‖qδj‖ + ‖pδ1,j‖)

+k−1∑

j=0

‖(B∗B)a+b+2s2(a+s) (I − B∗B)k−j−1‖ ‖(B∗B)−

b+s2(a+s)L−spδ2,j‖0 .

Next, we derive estimates for ‖qδj‖ , ‖pδ1,j‖ , and ‖(B∗B)−b+s

2(a+s)L−spd2,j‖ . Assumption

(N6), (4.5), and (4.31) imply that

‖qδj‖ ≤∫ 1

0

‖F ′(xt)− F ′(x†))eδj‖ dξ

≤∫ 1

0

‖R(xt, x†)− I‖ ‖F ′(x†)eδj‖dt+

∫ 1

0

‖Q(xt, x†)‖X s

−b,Y |||eδj |||−b (4.35)

≤ CR |||eδj |||−a + CQ |||eδj |||b−a |||eδj |||−b ≤ |||eδj |||−a(CR + CQ |||eδj |||0),


with xt = x† + t (xδj − x†).Since τ > 2, (2.13) implies that for all 0 ≤ k < k∗

‖yδ − F (xδk)‖ < 2‖y − F (xδk)‖ .

Thus, we obtain that

‖pδ1,j‖ = ‖R(xδj , x†)− I‖ ‖yδ − F (xδj)‖ (4.36)

≤ CR(‖qδj‖ + ‖F ′(x†)eδj‖) ≤ C ‖eδj‖−a(‖1 + eδj‖0). (4.37)

Finally, (4.9), (4.26), (4.31), (4.32), and F (x†) = y (cf. Assumption (N2)) imply that

‖(B∗B)−b+s

2(a+s)L−spδ2,j‖0 ≤ 2CQ |||eδj |||b−a‖y − F (xδj)‖≤ c |||eδj |||b−a |||eδj |||−a(1 + |||eδj |||0) (4.38)

for all 0 ≤ j < k. Combining the estimates, using spectral theory, Lemma 4.24, and

|||eδj |||0 ≤ ρ for all 0 ≤ j < k now yields the assertions (4.29) and (4.30). ¤

We are now in the position to prove the main convergence (rates) results for the

preconditioned Landweber iteration in Hilbert scales for s ≤ 0 and under our relaxed

assumptions (cf. [26]):

Proposition 4.26 Let Assumptions 4.21, 4.22 hold and ‖yδ−y‖ ≤ δ. Additionally, let

k∗ = k∗(δ, yδ) be chosen according to the discrepancy principle (2.13) with τ sufficiently

large, and let |||x† − x0|||u be sufficiently small. Then

|||xδk − x†|||0 ≤ 4(τ−1)τ−2|||x† − x0|||u(k + 1)−

u2(a+s) , (4.39)

and

‖yδ − F (xδk)‖ ≤ 2τ2

τ−2|||x† − x0|||u(k + 1)−

a+u2(a+s)

for all 0 ≤ k < k∗. In the case of exact data (δ = 0), the above estimates hold for all

k ≥ 0.

Proof. We proceed similarly as in the proof of Theorem 2.3 in [68] and show by

induction that

|||eδj |||u ≤ η |||x† − x0|||u , 0 ≤ j ≤ k∗, (4.40)

and

|||eδj |||−a ≤ η(j + 1)−a+u

2(a+s) |||x† − x0|||u , 0 ≤ j < k∗, (4.41)

hold with

η =4(τ − 1)

τ − 2. (4.42)

if |||x† − x0|||u is sufficiently small.

First note that the assertions hold for j = 0, if |||x† − x0|||u is small enough. Fur-

thermore, if |||x† − x0|||u is so small that C−uη |||x† − x0|||u ≤ ρ then by (4.4) xj ∈ Bρ(x†)


and the iteration (4.27) is well-defined. Now assume that (4.40), (4.41) are valid for

0 ≤ j < k ≤ k∗. Then, by virtue of Lemma 4.24 and 4.25, the estimates

|||eδk|||u ≤ (1 + C2 |||x† − x0|||u) |||x† − x0|||u + δka−u

2(a+s) (4.43)

|||eδk|||−a ≤ (1 + C2 |||x† − x0|||u) |||x† − x0|||u(k + 1)−a+u

2(a+s) + δ

hold for some C2 > 0 (independent of k). Here, we used the restriction 0 ≤ u ≤ b+ 2s.

Next, we derive an estimate for k in terms of δ: similarly to (4.38) we get

(τ − 1)δ ≤ ‖yδ − F (xδj)‖ ≤ c |||eδj |||−a(1 + |||eδj |||0)

for all 0 ≤ j < k ≤ k∗ and hence (4.40) and (4.41) for j = k − 1 yield that

δ ≤ τ2(τ−1)

ηk−a+u

2(a+s) |||x† − x0|||u (4.44)

provided that 2c (1 + η |||x† − x0|||u) ≤ τ . Together with (4.42) and (4.43) we obtain

|||eδk|||u ≤ |||x† − x0|||u(1 + C2 |||x† − x0|||u + τ2(τ−1)

η) ≤ η |||x† − x0|||u

if C2 |||x† − x0|||u ≤ 1 which we again assume to hold in the following. In the same way,

we obtain that

|||eδk|||−a ≤ 2(τ−1)τ−2

(k + 1)−a+u

2(a+s) |||x† − x0|||u(1 + C2 |||x† − x0|||u)≤ η(k + 1)−

a+u2(a+s) |||x† − x0|||u.

(4.39) now follow by the interpolation inequality (4.5), and the estimate (4.41) follows

similarly as (4.38). Thus, if |||x†− x0|||u is sufficiently small, then the assertions hold for

all j ≤ k∗. In the case of exact data (δ = 0), the estimates hold for all k ≥ 0, since then

Lemma 4.25 holds for all k ≥ 0. ¤

Combining the results of Proposition 4.26 now yields the following

Theorem 4.27 Under the assumptions of Proposition 4.26, we have

k∗ = O(δ−2(a+s)a+u ) (4.45)

and

‖xδk∗ − x†‖ = O(δua+u ).

Proof. The estimate on k∗ follow from (4.44). Secondly, it follows from (4.31), (4.35)

and (2.13) with K = F ′(x†) that

‖Keδk‖ ≤ ‖F (x†)− F (xδk) +Keδk‖ + ‖F (xδk)− F (x†)‖≤ (CR + CQ |||eδk|||0) |||eδk|||−a + δ + ‖F (xδk)− yδ‖≤ (CR + CQ |||eδk|||0)‖Keδk‖ + (τ + 1)δ.


Consequently, if CR+CQ |||eδk|||0 < 1, which can always be achieved if |||x†−x0|||u is small

enough, then

‖eδk‖−a = ‖Keδk‖ ≤ Cδ.

The convergence rate then follows by (4.40) and the interpolation inequality (4.5). ¤

Remark 4.28 The condition u ≤ b + 2s in (N7) corresponds to the saturation of

Landweber iteration for nonlinear problems at µ = 1/2 (cf. [39] and Theorem 2.6).

Note that for a = b, and if (4.28) is valid, i.e.,

F ′(x†) ∼ L−a

then L−s(B∗B)u−s

2(a+s) ∼ L−u ∼ (T ∗T )u2a holds for −a ≤ u ≤ mina, a+2s, in particular,

the source sets X su and R((T ∗T )

u2a ) coincide for u ≤ a+ 2s.

In contrast to the case s ≥ 0, where the range of optimal convergence is extended by

performing the iteration in Hilbert scales, the restriction u ≤ a+2s yields µ ≤ a+2s2a

< 12

if s < 0, i.e., optimal convergence can only be proven for a smaller range. However, the

discrepancy principle is reached with far less iterations, i.e., k∗ = O(δ2(a+s)a+u ) instead of

O(δ2aa+u ) for the standard iteration (s = 0). Thus, preconditioning is especially attractive

for reconstructing solutions x† which are not very smooth.

We only mention that with obvious modifications of the proofs one can actually

show the stronger rates

|||xδk − x†|||r ≤ 4(τ−1)τ−2|||x† − x0|||u(k + 1)−

u−r2(a+s) = O(δ

u−ra+u )

for −a ≤ r ≤ 0. Furthermore, we claim that with the same methods of proof and

under the same restrictions, it is possible to derive the corresponding rates even for

−a ≤ r ≤ u (cf. [49]).

For u ≤ a+2s, the rates of Theorem 4.27 coincide with the rates for linear problems

and, as mentioned in the linear case, are order optimal under the given conditions. We

want to emphasize once more, that under our relaxed assumptions, in particular (N2),

the source condition x† − x0 ∈ X su might be stronger than the usual source condition

x† − x0 ∈ Xu. However, x† − x0 ∈ X su is the natural source condition if the iteration is

considered as iteration in Xs (cf. Remark 4.9 and Corollary 4.14).

This concludes our convergence rates analysis of the preconditioned Landweber it-

eration in Hilbert scales for nonlinear problems. As a final topic in our investigation of

iterative regularization methods in Hilbert scales, we discuss preconditioning of certain

Newton-type regularization methods, which are well-known for their excellent conver-

gence behavior for well-posed as well as for ill-posed problems (cf., e.g., [6, 8, 23, 47, 48]).

4.3.3 Newton-type Regularization in Hilbert Scales

The basic step of a Newton-type iteration under consideration consists in the stable

solution of the linearized equation

F ′(xn)(xn+1 − x0) = y − F (xn) + F ′(xn)(xn − x0),


by a regularization method Rα (with filter function gα), which yields

xn+1 = x0 + gαn(F ′(xn)∗F ′(xn))F ′(xn)∗[y − F (xn) + F ′(xn)(xn − x0)]. (4.46)

The iteratively regularized Gauß-Newton method and the Newton Landweber iterations

are of this form (cf. Section 2.2.5). Here, we consider preconditioning of the resulting

methods by the Hilbert scale approach, i.e., solving the preconditioned normal equations

L−2sF ′(xδn)∗F ′(xδn)(xδn+1 − x0) = L−2sF ′(xδn)∗(yδ − F (xδn) + F ′(xn)(xδn − x0)).

Using the notation

Bn := F ′(xδn)L−s,

the preconditioned iteration corresponding to (4.46) can be reformulated as

xδn+1 = x0 + L−sgαn(B∗nBn)B∗n(yδ − F (xδn) + F ′(xδn)(xδn − x0)). (4.47)

Here, the regularization parameters αn usually decrease during the iteration, e.g., a

choice α0 > 0 and

0 < q ≤ αnαn−1

≤ 1 for n ∈ N, and limn→∞

αn = 0, (4.48)

is used in [48] for the analysis of Newton-type methods in the standard spaces, and we

will assume this behavior of αn from now on.

The aim of this section is to give a detailed convergence (rates) analysis for the

preconditioned iterations of the form (4.47), where we assume that the filter functions

gα (and rα(λ) = 1 − λgα(λ)) satisfy the conditions of Theorem 1.2 with Gα = O(α−1)

and

λµ|rα(λ)| ≤ cµαµ for all λ ∈ [0, 1], 0 ≤ µ ≤ µ0 (4.49)

for some µ0 > 0. We start with investigating the convergence behavior for the iteration

(4.47) when stopped by an a-priori rule. In our analysis we use the following result:

Lemma 4.29 Let A, B, R be bounded linear operators between Hilbert spaces X and

Y. If B = RA with ‖I − R‖ < 1, then for every |ν| ≤ 1/2 and w ∈ X there exist

positive constants c, c and an element v ∈ X such that

(A∗A)νw = (B∗B)νv,

with c‖w‖ ≤ ‖v‖ ≤ c‖w‖.

Proof. Observing that R((A∗A)1/2) = R(A∗) = R(B∗) = R((B∗B)1/2), the result

follows by the inequality of Heinz and duality arguments (cf. [48, 49] for details). ¤

Proposition 4.30 Let Assumptions 4.21, 4.22 hold with CQ = 0 and CR sufficiently

small, and let x† denote a solution of (1.16) and x† − x0 ∈ X su , i.e.,

Ls(x† − x0) = (B∗B)u−s

2(a+s)w, 0 < u ≤ a+ 2s (4.50)


with ‖w‖ = |||x† − x0|||u ≤ ω and ω sufficiently small. Assume that yδ ∈ Y is such that

(1.2) holds, i.e., ‖yδ − y‖ ≤ δ, and let xδn denote the iterates defined by (4.47) with

gα, rα satisfying the conditions of Theorem 1.2 and (4.49) for some µ0 ≥ 1, and let αnsatisfy (4.48).

Moreover, let η > 0 and denote by N(δ) the largest integer such that

αn ≥(

1

ηωδ

) 2(a+s)a+u

(4.51)

for all 0 ≤ n ≤ N(δ).

Then there exists a positive constant Cη such that for all −a ≤ r ≤ 0

|||xδn − x†|||r ≤ Cηαu−r

2(a+s)n ω, 0 ≤ n ≤ N(δ) . (4.52)

Additionally, xn ∈ Bρ(x†) for n ≤ N(δ).

Proof. We prove the assertions by induction: since |||x0 − x†|||r ≤ ω, (4.52) holds for

n = 0 if Cη ≥ αr−u

2(a+s)

0 which we assume in the following.

Now let (4.52) hold for some 0 < n < N(δ) and assume that xδn ∈ Bρ(x†). Then

with the notation eδn := xδn − x† and (4.47), we get the closed form representation

eδn+1 = L−srαn(B∗nBn)Ls(x0 − x†) + L−sgαn(B∗nBn)B∗n(yδ − y + ln),

with ln =∫ 1

0(F ′(x† + teδn)− F ′(xδn))eδndt. Now, by the nonlinearity condition (N6) with

CQ = 0 and Lemma 4.29, there exists a wn with ‖wn‖ ∼ ‖w‖ such that

(B∗B)u−s

2(a+s)w = (B∗nBn)u−s

2(a+s)wn

and

|||eδn+1|||0 ≤ c‖(B∗nBn)s

2(a+s) rαn(B∗nBn)(B∗B)u−s

2(a+s)w‖+c‖(B∗nBn)

s2(a+s) gαn(B∗nBn)B∗n(yδ − y + ln)‖

≤ c‖(B∗nBn)u

2(a+s) rαn(B∗nBn)wn‖+c‖(BnB

∗n)

a+2s2(a+s) gαn(BnB

∗n)(yδ − y + ln)‖

for some c > 0. This together with Lemma 4.7 (with µ = a+2s2(a+s)

and k replaced by 1/α),

‖ln‖ = ‖∫ 1

0

(F ′(x† + teδn)− F ′(xδn))eδndt‖

≤ 2CR‖(F ′(x†)eδn‖ = 2CR |||eδn|||−a ≤ 2CRCαa+u

2(a+s)

n−1 ω ,

(4.49), and (4.51) yields

|||eδn+1|||0 ≤ αu

2(a+s)n ω(c1 + c2CRCη)


for some positive constants c1 and c2. Now (4.52) holds for n+ 1 for any Cη satisfying

Cη ≥ max

max−a≤r≤0

αr−u

2(a+s)

0 ,c1

qu

2(a+s) − c2CR

which is always possible as long as CR is smaller than qu

2(a+s)/c2. Finally, if ω is suffi-

ciently small, then xδn+1 remains in Bρ(x†). This finishes the induction. ¤

Proposition 4.30 immediately implies the following convergence rates in terms of δ:

Corollary 4.31 Let the assumptions of Proposition 4.30 be valid and let N(δ) be cho-

sen as in (4.51). Then the following rates hold for −a ≤ r ≤ 0:

|||xδN(δ) − x†|||r = O(δu−ra+u )

Proof. The assertion follows immediately with (4.48) and (4.52). ¤

Remark 4.32 Convergence (without rates) for s = 0 and u = 0 has been proven under

the weaker nonlinearity condition

‖F (x)− F (x†)− F ′(x†)(x− x†)‖ ≤ η‖F (x)− F (x†)‖ , η < 1/2,

for the Levenberg-Marquardt method [37], a Newton-CG method [38], and for the IRGN

and the Newton Landweber iteration in [47]. At least under the stronger condition

(4.28), it should be possible to extend these results also to s < 0.

Note that the rates of Corollary 4.31 apply for arbitrary regularization methods gαsatisfying the conditions of Theorem 1.2 with Gα = O(α−1), in particular, the results

apply to iterative regularization methods gα = gk by replacing αn by k−1n (respec-

tively k−2n for semiiterative methods with optimal speed of convergence). The resulting

completely iterative methods can be viewed as two-level iterations. The outer itera-

tion corresponds to a Newton-type method, whereas the inner iteration is used for the

regularized solution of the linearized equation in each Newton step.

As the above analysis shows, the number of (outer) Newton iterations is bounded

by O(1 + | log δ|). In order to compare the overall computational complexity to the one

of Landweber iteration, we also consider the total number of inner iterations: if the

increasing sequence of inner iterations kn satisfies

kn ∼ q−n, n ≥ 0,

with some q < 1, then the overall number of inner iterations is bounded by

k∗ =

N(δ)∑

n=0

kn = O(δ−2(a+s)a+u ), respectively k∗ =

N(δ)∑

n=0

kn = O(δ−a+sa+u )

for the preconditioned Newton-Landweber iteration and the preconditioned Newton-ν-

methods, respectively. With s = −a/2, the resulting iteration numbers can again be


reduced to the square root by preconditioning. Moreover, the preconditioned Newton-ν-

methods yield optimal convergence with only the square root of iterations that would be

needed for the preconditioned Landweber iteration and only the fourth root of iterations

needed for the standard Landweber iteration. We will demonstrate this substantial

speed-up in several numerical examples in Chapter 5.

The a-priori results of Proposition 4.30 and Corollary 4.31 are not of great use per-

se, since in general one does not know the smoothness of the solution, i.e., for which

u the source condition x† − x0 ∈ X su holds. However, the estimate of Proposition 4.30

will be used to prove convergence rates, when the iteration is stopped according to the

following a-posteriori stopping rule (cf. [23, 48]):

For a sufficiently large τ > 1 let n∗ = n(δ, yδ) be the smallest integer such that

max‖yδ − F (xδn∗−1)‖ , ‖yδ − F (xδn∗)‖ ≤ τδ. (4.53)

According to (4.53), the (outer) Newton iteration is stopped, when the first time two

consecutive residuals are less than τδ. The following Lemma guarantees stability of our

class of preconditioned Newton-type methods (4.47) equipped with the above criterion:

Proposition 4.33 Let the assumptions of Proposition 4.30 be valid, and the iteration

(4.47) be stopped according to (4.53) with τ > 1 sufficiently large. Then, the iteration

is well-defined and n∗ ≤ N(δ) with N(δ) as in (4.51).

Proof. Note that (1.2), (4.52) (r = −a), and (N6) (with CQ = 0) imply that

‖F (xδn)− yδ‖ ≤ δ + (1 + CR)‖F ′(x†)eδn‖ ≤ δ + (1 + CR)Cηαu+a

2(a+s)n ω

for all 0 ≤ n ≤ N(δ). This together with (4.48) and (4.51) yields the estimate

‖F (xδn)− yδ‖ ≤ δ(1 + (1 + CR)Cηη−1q−

a+ua+s )

for n = N(δ) − 1 and n = N(δ). Thus, if τ is larger than the constant in the brackets

above, then obviously n∗ ≤ N(δ) and, due to Proposition 4.30, the iteration is well-

defined. ¤

We are now in the position to prove the following convergence rates result:

Theorem 4.34 Let the assumptions of Proposition 4.30 be satisfied, and the iteration

(4.47) be stopped after n∗ = n(δ, yδ) steps according to the stopping rule (4.53) with

some τ sufficiently large. Then

‖xδn∗ − x†‖ = O(δua+u ).


Proof. We use the notation K := F ′(x†) and Kn := F ′(xδn). Observe, that by (1.2),

(4.23) and (4.24) with CQ = 0 the following estimate holds for n = n∗ and n = n∗ − 1:

‖Kneδn‖ ≤ (1 + CR)‖Keδn‖

≤ 2‖y − F (xδn)−∫ 1

0

[F ′(xδn − teδn)− F ′(x†)]eδndt‖

≤ 2[δ + ‖F (xδn)− yδ‖ + CR1−CR ‖Kne

δn‖ ],

and hence with (4.53)

‖Kneδn‖ ≤ c1δ, for n ∈ n∗ − 1, n∗ (4.54)

for some positive constant c1. Next, by (4.47), and denoting n = n∗ − 1, Bn = KnL−s

we have

Kneδn∗ = Bnrαn(B∗nBn)Ls(x0 − x†) + Bngαn(B∗nBn)B∗n[yδ − F (xδn) +Kne

δn)] .

Thus, we obtain with (N6) with CQ = 0, (4.54), and (4.53) that

‖Bnrαn(B∗nBn)Ls(x0 − x†)‖= ‖Kne

δn∗ − BnB

∗ngαn(B∗nBn)[yδ − F (xδn) +Kn(xδn − x†)]‖

≤ 1+CR1−CR ‖Kn∗e

δn∗‖ + c2(‖yδ − F (xδn)‖ + ‖Kne

δn‖) ≤ c3δ,

for some c2, c3 > 0. Finally, using the above estimates, the representation (4.47), (N7),

and n = n∗ − 1 the error can be estimated as follows:

‖eδn∗‖ ≤ ‖L−srαn(B∗nBn)(B∗B)u−s

2(a+s)w‖ + ‖L−sgαn(B∗nBn)B∗n(yδ − F (xδn) +Kneδn)‖

≤ c4

(‖(B∗nBn)

u2(a+s) rαn(B∗nBn)wn‖

+ ‖gαn(BnB∗n)(BnB

∗n)

a+2s2(a+s)‖(τδ + ‖Kne

δn‖)

≤ c5

(‖rαn(B∗nBn)(B∗nBn)

u2(a+s)wn‖ + α

− a2(a+s)

n δ)

for some constants c4, c5 > 0. Using the interpolation inequality, the above estimates,

(N7), and (4.49), we obtain with Lemma 4.29

‖(B∗nBn)u

2(a+s) rαn(B∗nBn)wn‖ ≤ c6‖(B∗nBn)u+a

2(a+s) rαn(B∗nBn)wn‖ua+u ‖wn‖

aa+u

≤ c6‖Bnrαn(B∗nBn)Ls(x0 − x†)‖ua+u ‖wn‖

aa+u

≤ c7δua+uω

aa+u

for some positive constants c6 and c7. This together with Proposition 4.33 and (4.51)

completes the proof. ¤

Chapter 5

Examples and numerical test results

In this chapter we investigate the applicability of our results for several examples,

in particular, we will discuss the main assumptions needed for the convergence rates

analysis in the previous chapter, i.e., the conditions (L2) for linear, and (N3) and (N6)

for nonlinear problems. As we will show with our examples, the convergence theory of

Chapter 4 is still applicable to problems, where the standard theory of regularization in

Hilbert scales cannot be applied, i.e., (3.15) does not hold. Some of the numerical test

results below even indicate that some of the conditions we needed to prove our results,

e.g., the restriction u ≤ b + 2s in (N7), might be relaxed in some cases, and a further

investigation in this direction might be interesting.

While some of our examples below have the character of model problems, we present

also some examples that stem from certain applications and try to motivate their rel-

evance in practice. Besides a discussion of the conditions needed for application of

our convergence analysis, we will also present the results of several numerical tests,

which, in most cases, are in very good accordance to the theory and clearly illustrate

the effect of preconditioning. We will also demonstrate the limits of our approach, i.e.,

when neglecting boundary conditions, preconditioning might even lead to an increase

of iteration numbers.

Throughout our numerical tests, noisy data yδ are generated by adding uniformly

distributed random noise with ‖y − yδ‖ = δ to the true data y. The problems are

discretized by standard finite difference and finite element methods. The number of

elements of space and time domains are denoted by Nx or Nt in the one dimensional

case. In higher dimensions, Np denotes the number of grid points of the FE-mesh.

The discretization is chosen so fine that discretization errors are dominated by the

additionally added data noise. If the true data y cannot be calculated analytically, they

are computed on finer grids, in order to avoid so-called inverse crimes.

Our examples may be roughly divided into two blocks: since many (linear) inverse

problems can be formulated as integral equations, we first investigate some integral

equations of the first kind. A second block of examples is then concerned with parameter

identification problems in partial differential equations, which are another important

class of (mostly nonlinear) inverse problems arising in natural sciences, in industrial

57

58 CHAPTER 5. EXAMPLES AND NUMERICAL TESTS

applications, or even in mathematical finance.

5.1 Integral Equations of the First Kind

In the following we investigate some integral equations of the first kind, e.g.,

(Tx)(s) =

∫

G

k(s, t)x(t)dt = y(s), s ∈ G. (5.1)

It is well-known [55], that T : L2(G)→ L2(G) is compact if k ∈ (L2(G))2, and that the

range R(T ) is not closed unless it is finite dimensional. Hence, problems like (5.1) are

ill-posed, in general, and their solution requires regularization. We will discuss Fred-

holm integral equations with applications in computerized tomography (CT) and in the

reconstruction of blurred images, and then turn to a simple model problem for nonlinear

evolution.

5.1.1 Fredholm Integral Equations of the First Kind

With the first example we want to demonstrate that, due to our relaxed assumptions,

the results of the previous sections are still applicable to problems, where the standard

theory of regularization in Hilbert scales cannot be applied:

Example 5.1 Let T : L2[0, 1]→ L2[0, 1] be defined by

(Tx)(s) =

∫ 1

0

s1/2k(s, t)x(t)dt,

with the standard Green’s kernel

k(s, t) =

s(1− t) , t > s ,

t(1− s) , s ≥ t .

For application of our theory, we have to verify the conditions of Assumption 4.6, in

particular (L2), i.e., we have to show that there exists an a > 0 such that

‖Tx‖ ≤ m‖x‖−a, for all x ∈ X .

This can be done in the following way: first note that

(T ∗y)(t) = (1− t)∫ t

0

s3/2y(s)ds+ t

∫ 1

t

s1/2(1− s)y(s)ds,

with (T ∗y)(0) = (T ∗y)(1) = 0. Furthermore, one can show that (T ∗y)′′ = (·)1/2y. Hence,

R(T ∗) = w ∈ H2[0, 1] ∩H10 [0, 1] : (·)−1/2w′′ ∈ L2[0, 1].

5.1. INTEGRAL EQUATIONS OF THE FIRST KIND 59

Next, we define the Hilbert scale operator L by

Lsx :=∞∑

n=1

(nπ)s〈 x, xn 〉xn, xn :=√

2 sin(nπ·),

which yields L2x = −x′′. With this choice, we have

R(T ∗) ( X2 := H2[0, 1] ∩H10 [0, 1]

and additionally,

R(T ∗) ⊃ X2.5 := w ∈ H2.5[0, 1] ∩H10 [0, 1] : ρ−1/2w′′ ∈ L2[0, 1],

with ρ(t) = t(1− t). By Theorem 11.7 in [58], it follows that

‖w‖22.5 ∼ ‖w′′‖2

H1/2 + ‖ρ−1/2w′′‖2L2

and thus for T ∗y ∈ X2.5,

‖T ∗y‖22.5 ∼ ‖(·)1/2y‖2

H1/2 + ‖ρ−1/2(·)1/2y‖2L2

≥ ‖(·)1/2y‖2L2

+ ‖ρ−1/2(·)1/2y‖2L2

=

(∫ 1

0

ty(t)2dt+

∫ 1

0

(1− t)−1y(t)2dt

)≥ c‖y‖2

L2.

Together with ‖T ∗y‖2 = ‖(·)1/2y‖ ≤ ‖y‖ and Proposition 3.6 it follows that there exist

constants 0 < m ≤ m <∞ such that

m‖x‖−2.5 ≤ ‖Tx‖ ≤ m‖x‖−2. (5.2)

For a numerical test, we consider the reconstruction of the unknown function

x†(s) = 2t− sign(2t− 1)− 1,

and choose s = −1 and x0 = 0. Note, that x† is discontinuous at t = 1/2, and thus we

only have x† ∈ H1/2−ε(Ω) for arbitrary ε > 0, which by (4.6) implies that x† lies at most

in X1/2 ⊃ X s1/2. Thus, one cannot expect faster convergence than ‖xδk − x†‖ = O(δ1/5).

On the other hand, by (4.10) and (5.2) we immediately obtain that x† ∈ X s−ε. Moreover,

since x†(0) = (x†)′′(0) = 0, we even have x† ∈ X s1/2−ε for arbitrary ε > 0. In view of

Theorem 4.10 and 4.17, we therefore expect to get the optimal convergence rate O(δ1/5)

also for the preconditioned iterates.

As Table 5.1 shows, the theoretically predicted convergence rates can also be ob-

served numerically. Due to the preconditioning, the generated approximation xδk for the

preconditioned iterations are somewhat rougher (cf. Figure 5.1). Note, however, that

the errors ‖eδk∗‖ are measured in the norm of X = L2, and the oscillations are small in

amplitude.

The number of iterations needed for several preconditioned methods and their stan-

dard counterparts are listed in Table 5.2 and illustrate the effect of preconditioning. As

predicted by the theory, the number of iterations can be reduced to about the square

root for Landweber iteration and the ν-methods; for the conjugate gradient method,

the effect is less dramatic.


δ/‖y‖ lw hs-lw ν hs-ν cgne hs-cgne

0.040 0.5187 0.5158 0.4397 0.4230 0.5192 0.4286

0.020 0.4414 0.4407 0.3890 0.3863 0.4349 0.4156

0.010 0.3860 0.3871 0.3518 0.3427 0.3874 0.3460

0.005 0.3395 0.3401 0.3080 0.3012 0.3329 0.2932

0.002 0.2793 0.2808 0.2516 0.2516 0.2738 0.2515

0.001 0.2435 0.2448 0.2220 0.2214 0.2317 0.2065

κ 0.20 0.19 0.20 0.18 0.21 0.20

Table 5.1: Iteration errors ‖eδk∗‖ for iterative regularization methods and their Hilbert-

scale equivalents and the resulting convergence rates ‖eδk∗‖ = O(δκ); parameters

τ = 2.1, ν = 2 and discretization Nt = 200 elements.

0 0.5 1−1

0

1lw

0 0.5 1−1

0

1hslw

0 0.5 1−1

0

1nu

0 0.5 1−1

0

1hsnu

0 0.5 1−1

0

1cg

0 0.5 1−1

0

1hscg

Figure 5.1: Iterates xδk after 1 and 14 iterations (= stopping index for hscgne) with

δ = 0.01 and τ = 1.1.

5.1.2 Radon Inversion

An inverse problem of special interest in medical applications, but also in nondestructive

testing, arises in transmission computerized tomography (see [64]):

Let Ω ⊂ Rn, n = 2, 3 be a compact domain with spatially varying density f . In

a simple physical model the relative intensity loss along a distance ∆x is assumed to


δ/‖y‖ lw hs-ls ν hs-ν cgne hs-cgne

0.04 94 27 11 9 3 3

0.02 328 51 16 10 4 3

0.01 936 86 26 14 5 4

0.005 2743 148 47 19 6 5

0.002 12253 313 102 27 10 6

0.001 36714 544 172 36 14 8

η -1.60 -0.77 -0.80 -0.40 -0.41 -0.28

Table 5.2: Iteration numbers for iterative regularization methods and their Hilbert-scale

equivalents and the corresponding rates k∗ = O(δη); parameters τ = 2.1, ν = 2 and

Nt = 200.

satisfy∆I

I= f(x)∆x.

We denote by I1(θ, s) and I0(θ, s) the intensities of the X-ray beams measured at the

detector and emitter, which are located outside of the domain Ω, and connected by the

line parameterized by the distance to the origin s and the direction θ. Then one obtains

(Rf)(θ, s) :=

∫

x·θ=sf(x)dx = − log

I1(θ, s)

I0(θ, s)=: g(θ, s), (5.3)

for w ∈ R2 with ‖w‖ = 1 and t > 0. R is called Radon transform, and determining the

unknown density f from measurements of the intensity drop g(θ, s) corresponds to the

inversion of the Radon transform. By [64, Theorem 5.1], we know that for α ≥ 0 there

exist positive constants c(α, n) and C(α, n) such that for f ∈ C∞0 (Ωn)

c(α, n)‖f‖Hα0 (Ωn) ≤ ‖Rf‖Hα+(n−1)/2(Z) ≤ C(α, n)‖f‖Hα

0 (Ωn),

with Ωn ⊂ Rn denoting the unit ball, and Z the cylinder Sn−1×R. This already proves

(3.15) and in particular (L2) for an appropriate choice of spaces; e.g., for X = L2(Ωn)

and Y = L2(Z), we see that the Radon transform behaves like differentiation of order

one half in dimension n = 2, and like one times differentiation in dimension n = 3.

A related problem is single photon emission computerized tomography (SPECT), cf.,

e.g., [64], where the aim is to reconstruct the distribution f of a radiopharmazeutical

inside a (human) body from measurements of the radiation outside the body. As a

model for the direct problem, the attenuated Radon transformation is used:

y = R(f, µ)(s, ω) =

∫

R

f(sω⊥ + tω) exp(−∫ ∞

t

µ(sω⊥ + τωdτ)dt,

for s ∈ R and ω ∈ Sn−1. If the attenuation map µ is known (e.g., from an additional CT

scan), then SPECT reduces to a linear inverse problem similar to the Radon inversion

discussed above.


As a test example, let us consider the rotationally symmetric case of CT in more

detail: let Ω ⊂ R2 be a circle with radius ρ, and f be rotationally symmetric with

respect to the origin, i.e., f(x) = F (s, θ) = F (s), and (consequently) g(s, θ) = g(s).

Then it suffices to measure the intensity drop g(s, θ) along one direction θ0, e.g., parallel

to the y-axis, which yields

(Rf)(θ0, s) =

∫

x·θ0=s

f(x) dx =

∫

y∈θ⊥0

f(s θ0 + y)dy

=

√ρ2−s2∫

−√ρ2−s2

F (√s2 + t2) dt = 2

√ρ2−s2∫

0

F (√s2 + t2) dt

= 2

ρ∫

s

rF (r)√r2 − s2

dr = g(s), 0 ≤ s ≤ ρ,

where we used the substitution t =√

(r2 − s2). Further substitutions t = r2 and τ =√s

yield ∫ √ρ

τ

F (√t)√

t− τ dt = g(√τ).

Thus, (5.3) can essentially be reduced to the solution of an Abel integral equation of

the first kind, which can be solved stably, e.g., by iterative regularization methods. We

now turn to our numerical example:

Example 5.2 (An Abel integral equation) Let T : L2[0, 1] → L2[0, 1] be defined

by

(Tx)(s) :=1√π

∫ s

0

x(t)√s− tdt, (5.4)

and consider the approximate reconstruction of x from noisy data yδ with ‖y−yδ‖ ≤ δ,

where y = Tx† denotes the unperturbed data. One can show that

(T 2x)(s) =

∫ s

0

x(t)dt, (5.5)

and thus inverting T essentially amounts to differentiation of half order; more precisely,

cf. [33],

R(T ) ⊂ Hr[0, 1], for all 0 ≤ r < 1/2. (5.6)

Consider the Hilbert scale induced by

L2sx =∞∑

n=0

λsn〈 x, xn 〉xn, xn(t) =√

2 sin(λn(1− t)), λn = (n+ 1/2)π, (5.7)


over X = L2[0, 1] with D(L2) = X2 = x ∈ H1[0, 1] : x(1) = 0. Then one can show

that R(T ∗T ) ⊂ Xr for all r < 2, and Proposition 3.5 and Corollary 3.8 yield that (L2)

holds for a = 1− ε with any ε > 0.

This allows the choice s = −(1−ε)/2, and hence the iterations can be preconditioned

with L1−ε, which for small ε essentially corresponds to differentiation of order 1/2, and

can be realized efficiently via (5.7) and FFT.

As a numerical test, we try to identify the density

x†(s) = 2t− sign(2t− 1)− 1

from noisy measurements of y = Tx† and an initial guess x0 = 0. We wet s = −1, which

is the limiting case of allowed choices.

With the Hilbert scale as above on has x† ∈ Xu for all u < 3/2, thus we expect

the following iteration numbers: k∗ ∼ δ−1 for Landweber iteration, k∗ ∼ δ−1/2 for the

ν-method and the preconditioned Landweber iteration, and k∗ ∼ δ−1/4 for the Hilbert-

scale ν-method. Since the ν-methods have finite qualification µ0 = ν, Theorem 4.10

then optimal convergence rates only for

ν = µ0 ≥u− s

2(a+ s)+

1

2= 2,

(cf. Proposition 4.8) when the iteration is stopped with the discrepancy principle (1.14).

Therefore we have to chose ν ≥ 2 in this example in order to guarantee optimal conver-

gence rates for the Hilbert scale ν- method. By (5.6) it follows that the singular values

of T decay like σn ∼ n−1/2, which in view of Theorem 2.4 yields k∗ ∼ δ−1/3 for cgne.

Note that we expect even less iterations for the preconditioned ν-method. Finally, by

Theorem 4.19 we have k∗ ∼ δ−1/5 for hscgne. The iteration numbers realized in our

numerical tests are listed in Table 5.3.

The numerically observed convergence rates for the iteration error ‖eδk∗‖ is approx-

imately O(δ0.45) for all methods.


0.04 26 6 14 6 4 2

0.02 53 8 20 7 5 3

0.01 103 12 28 9 6 4

0.005 224 19 42 11 9 5

0.002 609 34 69 16 14 7

0.001 1342 58 103 21 19 9

η -1.07 -0.62 -0.55 -0.34 -0.43 -0.39



Nt = 500.


5.1.3 An Inverse Problem in Imaging: Deblurring

Many (mostly linear) inverse problems appear in signal and image processing, e.g.,

image restoration, denoising, impainting, or deblurring. We shortly discuss the latter

(cf. [11]):

In complex imaging systems, blurring may occur at several places: e.g., if an image

is taken in motion (linear motion blur), if an object is out of focus, by diffraction

effects of the medium (atmospheric blurring) or of the optical system (diffraction-limited

systems). The mathematical model of blurring is given by

g(x) = (Tf)(x) =

∫

Rn

K(x, x′)f(x′)dx′, x, x′ ∈ Rn (5.8)

where f , g are the original respectively blurred image, and K(x, x′) is the impulse

response function, also called point spread function. The effect of the recording of an

image can be modeled as additive noise contribution, thus instead of g only a noisy

version of the image gδ = g + wδ is recorded, and instead of (5.8) one has

gδ(x) =

∫

Rn

K(x, x′)f(x′)dx′ + wδ. (5.9)

Here, wδ denotes the noise contribution. In case the imaging system is spatially invari-

ant, (5.9) reduces to a convolution equation gδ = K ∗ f + wδ, i.e.,

gδ(x) =

∫

Rn

K(x− x′)f(x′)dx′ + wδ, (5.10)

where K(x) = K(x, 0), and the Fourier transform K of the transfer function is called

transfer function of the system. Here we used the following definiton of the Fourier

transform:

f(ω) = F(f)(ω) :=

∫

Rn

e−iω·xf(x)dx

With this scaling, the inverse Fourier transform is given by

f(x) = (F−1f)(x) =1

(2π)neix·ωf(ω)dω,

and by the Fourier convolution theorem, (5.10) is then equivalent to

gδ(ω) = K(ω)f(ω) + wδ(ω),

In case of a perfect recording, i.e., wδ = 0, the reconstruction of the image f from the

blurred image g can be calculated explicitly using the Fourier transformation, namely

by

f = F−1( gK

).


If however, ‖wδ‖ 6= 0, then the explicit reconstruction formula

f δ = F−1( gδK

)

is unstable, i.e., high frequency components of the data error wδ are amplified arbitrarily

if K(ω)→ 0 with |ω| → ∞, which is usually the case.

We now consider blurring by the atmosphere in more detail (cf. [11, Section 3.5]):

Atmospheric Turbulence Blur

The reason for atmospheric blurring are inhomogeneities in the refraction index of the

air caused by turbulent velocity fluctuations and the resulting statistical temperature

field in the atmosphere. The effect of atmospheric blurring is important for imaging

through an atmosphere, e.g., in optical and radio astronomy, in remote sensing, or

target identification. In optical astronomy, the overall transfer function of the imaging

system (telescope + atmosphere) is given by

K(ω) = H(ω)B(ω),

where H, B are the transfer functions of the optical system (the telescope) and the

atmosphere, respectively. Furthermore, B has approximately the form (cf. [11])

B(ω) = exp[−3.44(λ|ω|r0

)5/3],

where λ is the wavelength of the observed radiation and r0 is a parameter called critical

wavelength. As suggested in [11], the transfer function of the atmospheric turbulence

blur can be roughly approximated by a Gaussian, i.e.,

B(ω) ' exp(−c|ω|2).

Note the relation to heat transfer, i.e., B is the fundamental solution of the heat equation

evaluated for some fixed time lag t, and thus the blurring operator

T : f → B ∗ f = u

essentially coincides with the solution operator of the heat equation

ut = c∆u, u(·, 0) = f, (5.11)

and evaluation at time t1, were c has to be chosen appropriately.

For a numerical test, we consider the following domain restricted version of (5.11):

Example 5.3 Consider the operator T : L2[0, 1] → L2[0, 1] defined by Tf = u(·, t1),

where u is the solution of

−ut + uxx = 0, u(0, t) = u(1, t) = 0, u(x, 0) = f(x). (5.12)


The operator T is selfadjoint, with eigenvalues λn = e−n2π2t1 and associated eigen-

functions ψn =√

2π

sin(nπ·). Consequently, the inverse problem of solving Tf = u is

exponentially ill-posed.

According to Remark 4.12, the optimal preconditioner is given by L−2s = (T ∗T )−12 .

For symmetric problems, the application of the operators L−2s and T ∗ cancel each other,

yielding the corresponding iterations for symmetric problems, e.g., standard Richardson

or cg, which are included in our theory.

For non-symmetric problems, the application of (T ∗T )−1/2 is in general too com-

plicated and therefore other preconditioners are required. In order to get a feeling for

the performance of preconditioning in Hilbert scales induced by differential operators,

but applied to (not necessarily symmetric) exponentially ill-posed problems, we alter-

natively investigate the preconditioning with

Lf =∞∑

n=0

nπ〈 f, ψn 〉ψn, ψn =√

2 sin(nπ·)

over X = L2[0, 1] and with X1 = H10 [0, 1]. This choice implies that for all 0 ≤ a < 2.5

there exists an mr > 0 such that

‖Tf‖ ≤ mr‖f‖−a.

Thus, (L2) holds for arbitrary 0 ≤ a < 2.5. On the other hand, an estimate (3.11) below

cannot be satisfied for any a.

We want to mention that a source condition Lsf † ∈ R((B∗B)µ) or f † ∈ R((T ∗T )µ)

for some µ > 0 is of course very strong, i.e., it means that f † has to be analytic. Thus,

for exponentially ill-posed problems usually logarithmic source conditions are used, and

only logarithmic convergence rates can be expected (cf. [41, 42]). It would be interesting

to extend our theory also to this case.

As a concrete numerical test, we set

f †(x) := 2x− sign(2x− 1)− 1,

and consider the reconstruction of f from noisy measurements of u(·, 1), where u satisfies

(5.12) with c = 0.01. As initial guess, we choose f0 = 0. In Table 5.4, we list the iteration

numbers of the numerical reconstructions for the preconditioned iterations and compare

them with the results for the symmetric iterations (= optimally preconditioned with

(T ∗T )−1/2).

According to Theorem 7.14 in [28], the stopping index for the conjugate gradient

method can be bounded by k(δ, yδ) ≤ c(1+ | log δ|) for exponentially ill-posed problems,

if the singular values σn of T decay like O(qn) with some q < 1, which is the case in our

example. We conjecture that a similar bound also holds for the symmetric iteration. This

then explains that the iteration numbers for cg and hscgne are almost independent

of the noise level.



0.04 10 8 5 3 2 2

0.02 18 11 6 4 2 2

0.01 29 14 7 5 3 2

0.005 230 44 17 12 5 3

0.002 777 77 56 20 5 3

0.001 1212 93 86 24 5 3

η -1.43 -0.84 -0.73 -0.61 -0.30 -0.14



nt = 500.

The observed convergence rates are about ‖eδk∗‖ ∼ δ0.05 for all methods. We only

mention that the numerically observed rates further decreased when we continued the

test with smaller noise levels, which is in accordance with the theoretically predicted log-

arithmic rates. For a (rather low) noise level of δ = 0.1%, the relative error ‖eδk∗‖/‖x†‖of the reconstructions is still about 42%; since the problem is exponentially ill-posed,

such a poor reconstruction has however to be expected. As the above example suggests,

even exponentially ill-posed problems can be efficiently preconditioned by differential

operators.

5.1.4 A Volterra-Hammerstein Integral Equation

Hammerstein integral equations appear frequently in the modeling of dynamical sys-

tems, e.g., in biology. The following equation of the first kind is a special case from an

example discussed in [68]:

Example 5.4 Let F : H1[0, 1]→ L2[0, 1] be defined by

(F (x))(s) =

∫ s

0

x(t)2dt.

The adjoint of the Frechet derivative is then given by

F ′(x)∗w = 2A−1

[x(·)

∫ 1

·w(t)dt

],

where A : D(A) = ψ ∈ H2[0, 1] : ψ′(0) = ψ′(1) = 0 → L2[0, 1] is defined by

Aψ := −ψ′′ + ψ; note that A−1 is the adjoint of the embedding operator from H1[0, 1]

in L2[0, 1]. Assuming that x† ≥ γ > 0 a.e., we get

R(F ′(x†)∗) = w ∈ H3[0, 1] : w′(0) = w′(1) = 0, w(1) = w′′(1),

and

‖F ′(x†)∗w‖H3 ∼ ‖w‖ , for all w ∈ Y .


As a Hilbert scale we choose the one induced by L2x := −x′′ + x over the space

X = H1[0, 1] with X1 = x ∈ H2[0, 1] : x′(0) = x′(1) = 0. With this choice, we have

R(F ′(x†)∗) ⊂ X2

and hence, by Proposition 3.6, (N3) holds with a = 2. Therefore, we set s = −1 (to be

rigorous, one would have to choose s = −1 + ε for some ε > 0), which yields

L−2sF ′(x)∗w = 2x(·)∫ 1

·w(t)dt; (5.13)

in particular, we have

F ′(x) = R(x, x†)F ′(x†)

with

‖R(x, x†)− I‖ ≤ C ‖x− x†‖0 ≤ c |||x− x†|||0,which proves (N6) with CQ = 0 and CR arbitrarily small if x is sufficiently small to x†.

Note that in this example the application of the Hilbert scale operator L−2 in fact

makes the iteration even simpler, i.e., application of A−1, which is the main numerical

effort in calculating F ′(x)∗ for Landweber iteration, can be avoided.

We note that the choice s = −a/2 is actually not allowed in our theory, when

considering convergence rates in X , since we require 0 < u ≤ a + 2s. However, as we

already remarked above, our results actually hold in different norms, e.g., in Xr for

s ≤ r < 0 or in X sr for −a ≤ r < 0. In particular, for r = −1 and u = 0, Theorem 4.27

and Remark 4.28 yield X−1 = X−1s = L2[0, 1], and

‖x† − xδk∗‖L2 = ‖x† − xδk∗‖−1 = O(δu−ra+u ) = O(δ1/2).

According to Proposition 4.4 and Remark 4.5, the condition x† ∈ Xu does not auto-

matically imply x† ∈ X su for u > s. Thus, a condition x† ∈ Xu may be too weak to get

the expected convergence rates for the Hilbert scale iterations. We will illustrate this in

one of the numerical examples below. In our test, we compare Landweber iteration, the

Newton-Landweber and the Newton-ν method with their preconditioned equivalents.

Test 1. For the first test, we set x†(t) := 3/2 −√|2t− 1|, and x0 = 1/2. Note that

x† − x0 /∈ H1[0, 1], but only in H1−ε[0, 1] for arbitrary ε > 0. This can be seen in

the following way: one shows easily that (x†)′ ∈ Lp[0, 1] for 1 ≤ p < 2. This yields

x† ∈ W 1p [0, 1] and by standard Sobolev embedding theorems [1], x† − x0 ∈ H1−ε for

ε > 0. Thus we can expect convergence rates at most in spaces Xr with −a ≤ r < 0,

e.g., in X−1 = L2[0, 1]. In fact, also in the numerical tests, the error measured in the

norm of X = H1[0, 1] decreases only very slowly, while still good rates ‖eδk∗‖−1 can be

observed for all methods (see Table 5.5). By Remark 4.28, one would expect the rates

O(δu−ra+u ) = O(δ1/2) in the norm of X−1 = X s

−1 with s = −1 in this case.


δ/‖y‖ lw hs-lw new-lw hs-new-lw new-ν hs-new-ν

0.02 0.3174 0.2077 0.2081 0.1115 0.2825 0.09871

0.01 0.1894 0.1390 0.1360 0.1101 0.1256 0.09671

0.005 0.1249 0.09659 0.112 0.07497 0.09839 0.05227

0.002 0.0821 0.06422 0.07735 0.05293 0.07421 0.05019

0.001 0.0548 0.04455 0.04854 0.03641 0.04660 0.02600

κ 0.57 0.50 0.45 0.40 0.53 0.43

Table 5.5: Iteration errors ‖eδk∗‖−1 = ‖eδk∗‖L2 for iterative regularization methods and

their Hilbert-scale equivalents and the corresponding rates ‖eδk∗‖ = O(δκ); parameters

τ = 2.1, ν = 2 and Nt = 501.

Test 2. With the second test example, which is taken from [68], we want to demon-

strate that preconditioning can disturb the convergence behavior, if boundary conditions

are not taken into account properly: let

x†(t) = t+ 10−6(196145− 41286t2 + 19775t4 + 70t6 + 436t7),

and x0 = t. It was shown in [68] that standard Landweber iteration yields a convergence

rate of ‖xδk − x0‖ = O(δ1/2). For the Hilbert scale iteration with s = −1 we can not

guarantee convergence in X0 = H1[0, 1], since it follows from (5.13) that

X s0 ⊂ x ∈ H1[0, 1] : x(1) = 0

and thus x†−x0 /∈ X s0 . In fact, the preconditioned iterations do not converge in H1[0, 1],

while the standard iterations do. However, Theorem 4.27 and Remark 4.28 yield con-

vergence rates for the preconditioned iterations at least in Xs = X−1, which are also

observed numerically (see Table 5.6).


0.02 0.1249 0.4520 0.1719 0.4406 0.1473 0.4235

0.01 0.06333 0.4229 0.04090 0.4090 0.03929 0.4266

0.005 0.06435 0.3286 0.04041 0.3136 0.03840 0.2712

0.002 0.04914 0.2671 0.04107 0.2459 0.03953 0.2205

0.001 0.02433 0.2210 0.01630 0.2148 0.01210 0.2166

κ 0.46 0.24 0.60 0.25 0.64 0.26

Table 5.6: Iteration errors ‖eδk∗‖−1 = ‖eδk∗‖L2 for iterative regularization methods and

their Hilbert-scale equivalents and the corresponding rates ‖eδk∗‖−1 = O(δκ); parameters

τ = 2.1, ν = 2 and Nt = 501.

Note, that as predicted by the theory the convergence rates (in X−1) for the precon-

ditioned iterations are significantly smaller than the ones for the standard iterations.

Additionally, the iteration numbers even increase due to the (wrong) preconditioning

(see Table 5.7). This behavior should illustrate that boundary conditions play an im-

portant role in regularization theory.



0.02 3 4 1(5) 1(5) 1(5) 2(15)

0.01 4 7 2(15) 2(15) 2(15) 2(15)

0.005 4 36 2(15) 4(75) 2(15) 4(75)

0.002 6 120 2(15) 6(315) 2(15) 5(155)

0.001 84 338 6(315) 7(635) 4(75) 5(155)

η -0.93 -1.54 -1.07 -1.67 -0.70 -0.92

Table 5.7: Iteration numbers (outer and total inner iterations) for iterative regu-

larization methods and their Hilbert-scale equivalents and the corresponding rates

k∗ = O(δη); parameters τ = 2.1, ν = 2 and Nt = 501.

5.2 Parameter Identification in Elliptic and Para-

bolic PDEs

The second block of examples is concerned with parameter identification, which is a

main field of inverse (and ill-posed) problems. Here, we restrict ourselves to problems

connected with partial differential equations and point out various applications.

5.2.1 An Inverse Source Problem in an Elliptic Equation

As a first model problem, we consider the identification of a source term in an elliptic

equation from distributed measurements, which is closely related to differentiation and

linear.

Example 5.5 Let Ω be a bounded domain in Rn, n = 2, 3 with sufficiently smooth

boundary (e.g., ∂Ω ∈ C1,1 or ∂Ω ∈ C0,1 and Ω convex), or let Ω be a parallelepiped. We

consider the operator T : L2(Ω)→ L2(Ω) defined by Tf = u, with

Au := −∇ · (q∇u) + p · ∇u+ cu = f, u|∂Ω = 0, (5.14)

and given, sufficiently smooth parameters q, p and c. Assume that A is uniformly elliptic;

then a solution u of (5.14) lies in H2(Ω) ∩H10 (Ω) and satisfies ‖u‖H2 ∼ ‖f‖L2 , i.e., A

is an isomorphism between H2(Ω) ∩H10 (Ω) and L2(Ω).

For preconditioning, we consider the Hilbert scale induced by L2u = −∆u over the

space X = L2(Ω) with X2 = H2(Ω) ∩ H10 (Ω). Then we have T ∼ L−2, and thus (L2)

holds with a = 2. Moreover, the stronger condition (3.15) holds.

For a numerical test, we set Ω = [0, 1]2, q = c = 1, p = 0, s = −a/2 = −1, and try

to identify the function

f † = sign(x− 0.5) · sign(y − 0.5)

from f0 = 0 as a starting value. In this setting, we have f † − f0 ∈ R((T ∗T )µ) for all

0 ≤ µ < 1/8, or equivalently, f † ∈ X sr for all r < 1/2, thus the expected iteration

5.2. PARAMETER IDENTIFICATION 71

numbers are k∗ ∼ δ−8/5 for Landweber iteration, k∗ ∼ δ−4/5 for Landweber iteration

in Hilbert scales and the ν−methods, and k∗ ∼ δ−2/5 for the Hilbert scale ν−method.

Observing that the singular values of T behave like σn = O(1/n), we obtain that the

stopping index for cgne can be bounded by k∗ ≤ O(δ−2/5), while for hscgne the

smaller bound k∗ = O(δ−4/15) holds. The iteration numbers realized in the numerical

tests are listed in Table 5.8. We want to mention that the rates for the iteration numbers

do not exactly match the predictions, which, to our opinion, occurs here primarily due

to the very low number of iterations, i.e. the rates cannot be determined very accurately

from only 5 points.

δ‖uδ−u0‖ lw hs-lw ν hs-ν cgne hscgne

0.04 88 10 26 7 6 2

0.02 332 18 50 11 9 5

0.01 963 34 87 15 14 6

0.005 3089 59 155 20 18 8

0.002 9342 123 316 29 35 11

η -1.55 -0.84 -0.83 -0.46 -0.57 -0.52



Np = 40257.

As predicted by the theory, the iteration error behaves like ‖f δk∗ − f †‖ ∼ δ1/5 for all

methods. We have chosen ν = 2 here to guarantee optimal convergence rates for the

preconditioned ν−method: note that ν ≥ 3/2 is necessary to apply Theorem 4.10 for

u = 2aµ = 1/2.

In the presented 2D example, the bound on the iteration numbers for cgne is of

the same order as the one for the preconditioned ν−method. In 3D, the situation is

different: there we have σN ∼ N−2/3, which yields k∗ = O(δ−12/25) for cgne, while the

rate for the Hilbert scale ν-method is not affected; hence the preconditioned ν-methods

will even outperform cgne in 3D.

5.2.2 Identifying a Reaction Term

The following examples of parameter identification problems in elliptic respectively

parabolic PDEs are nonlinear inverse problems, and additional instabilities may arise

due to the nonlinearity.

Example 5.6 In this example, which is taken from [39], we try to identify the param-

eter c in−∆u+ cu = f in Ω,

u = g in ∂Ω,(5.15)

from distributed measurements of the state u. Here, Ω is an interval in R or a bounded

domain in R2 or R3 with smooth boundary (or a parallelepiped). The right hand side


is assumed to satisfy f ∈ L2(Ω) and the boundary data g ∈ H3/2(∂Ω). If u would be

known exactly, then one could reconstruct c by

c =f + ∆u

u, (5.16)

which in case of noisy measurements uδ is unstable due to differentiation. (5.16) already

reveals another possible source of instability, namely division by u, which may cause

noise amplification where u is close to zero. Note that if u = 0 on a subdomain, then c

is not uniquely determined by (5.15) there.

We consider the inverse problem as abstract operator equation, and define the non-

linear operator

F : D(F ) ⊂ L2(Ω)→ L2(Ω)

by F (c) = u(c), where u = u(c) is the solution of (5.15) with parameter c. One can show

(cf. [17]) that the parameter-to-solution map F is well-defined and Frechet differentiable

on

D(F ) := c ∈ L2(Ω) : ‖c− c‖ ≤ γ for some c ≥ 0 a.e.where u(c) denotes the solution of (5.15), and γ > 0 has to be sufficiently small. By

standard arguments one can show that

F ′(c)∗w = u(c)A(c)−1w,

where A(c) : H2(Ω) ∩H10 (Ω)→ L2(Ω) is defined by A(c)u = −∆u+ cu.

Next we choose an appropriate Hilbert scale namely the one induced by L2 = −∆

over X = L2(Ω) with X2 := H2(Ω) ∩H10 (Ω). This yields

R(F ′(c)∗) ⊂ X2,

which already proves (N3). If furthermore u† ≥ γ > 0 a.e., then we even have

‖F ′(c†)∗w‖2 ∼ ‖w‖0,

and according to Remark 4.5 and 4.23, (N7) can be interpreted in terms of the spaces

Xu. In order to show (N6), we use that

[F ′(c)∗ − F ′(d)∗]w = u(c)[A(c)−1 − A(d)−1]w + [u(d)− u(c)]A(d)−1w

= : r1 + r2.

The terms r1 and r2 can be further estimated by

‖r1‖2 ≤ C ‖u(c)‖H2 ‖[A(c)−1 − A(d)−1]w‖H2

≤ C1‖u(c)‖H2 ‖c− d‖L2 ‖w‖L2

and

‖r2‖2 ≤ ‖u(d)− u(c)‖H2 ‖A(d)−1w‖H2

≤ C2‖c− d‖L2 ‖w‖L2 .


Here we have used that A(c) is an isomorphism between L2(Ω) and H2(Ω) ∩ H10 (Ω).

If u† ≥ γ > 0, this yields (N6) with CQ = 0 and CR ≤ C ‖c0 − c†‖ and thus the

preconditioned Landweber iteration as well as the Newton-type methods can be applied.

For our numerical test, we set s = −1. Note, that in this case we formally have

violated the restriction 0 < u ≤ a + 2s and the results of Section 4.3 do formally not

yield convergence rates in X . Nevertheless, the (optimal) rates are observed numerically,

which indicates that the restriction u ≤ a + 2s might be weakened here. In order to

ensure convergence rates also theoretically, one could alternatively set s > −1 and use

a multilevel technique to implement L−2s.

In the numerical tests, we try to reconstruct the reaction term

c† = sign(x− 0.5) · sign(y − 0.5)

on Ω = [0, 1]2 and start with the initial guess c0 = 0.

δ‖uδ−u(c0)‖ lw hs-lw new-lw hs-new-lw new-ν hs-new-ν

0.08 30 7 4(75) 3(35) 2(15) 2(15)

0.04 99 10 6(315) 4(75) 4(75) 3(35)

0.02 421 20 8(1275) 4(75) 5(155) 3(35)

0.01 1038 37 9(2555) 5(155) 5(155) 3(35)

0.005 3404 62 11(10235) 6(315) 6(315) 4(75)

0.002 14037 133 13(40955) 7(635) 7(635) 5(155)

η -1.66 -0.82 -1.67 -0.77 -0.90 -0.55



Np = 2577.

The iteration numbers for this nonlinear problem (see Table 5.9) essentially coincide

with the ones of the previous linear example. As there, the source condition (4.50) is

satisfied for all u < 1/2 (respectively µ < 1/8) and, as predicted by the theory, the

observed convergence rates are ‖eδk∗‖ ∼ O(δ1/5), see Table 5.10.

Until now we have been able to show the conditions needed for the application of the

convergence rates results of the previous chapter. This is not the case in the following

examples, where we are only able to verify some of the conditions: in particular, we will

show (N3), which at least ensures well-definedness of a single iteration step.

While in the above examples, interior measurements are available for identification

of the parameters, in many relevant problems the state can only be measured at (a part

of) the boundary of the domain of interest.

5.2.3 An Inverse Problem in Mathematical Finance

The prices of European Call options C = CK,t(S, t) with strike K and maturity T

considered as functions of the spot price S of the underlying and time t, satisfy the


δ‖uδ−u(c0)‖ lw hs-lw new-lw new-hslw new-ν new-hsν

0.080 0.5604 0.5037 0.5477 0.4785 0.5600 0.4686

0.040 0.5116 0.4599 0.4839 0.4017 0.4260 0.3862

0.020 0.4152 0.3906 0.3907 0.3915 0.3581 0.3318

0.010 0.3670 0.3302 0.3577 0.3247 0.3564 0.3121

0.005 0.3137 0.2914 0.2970 0.2762 0.2939 0.2475

0.002 0.2595 0.2406 0.2475 0.2318 0.2455 0.2138

κ 0.21 0.20 0.22 0.19 0.21 0.21

Table 5.10: Relative iteration errors ‖eδk∗‖/‖e0‖ for iterative regularization methods and

their Hilbert-scale equivalents and the corresponding rates ‖eδk∗‖ = O(δκ); parameters

τ = 2.1, ν = 2 and Np = 2577.

well-known Black-Scholes differential equation [12],

Ct +σ2

2S2CSS + (r − q)SCS − rC = 0, S > 0,

where σ = σ(S, t) is called volatility (of the underlying process), r(t) is the risk-free

interest rate and q(t) is the dividend rate. At maturity T , C(S, T ) has to equal the

payoff of the option, i.e.,

C(S, T ) = max(S −K, 0).

The inverse problem of option pricing (cf., e.g., [14, 24, 57]) now consists in determing

a volatility surface σ(S, t) out of market prices CK,T (S∗, 0) of European Call options,

where S∗ denotes the current (spot) price of the underlying asset, and t = 0 corresponds

to today.

According to [14, 21], the option prices C = CS,t(K, τ) as a function of the strike

K and maturity τ satisfy the following (Dupire-) equation

−Cτ +σ2(K, τ)

2K2CKK − (r − q)KCK − qK = 0, K > 0 (5.17)

with initial condition C(K, 0) = max(S − K, 0) and boundary condition C(0, τ) = 0.

Hence, σ(S, t) shall be determined from observation at different strikes K and maturities

τ of the option prices C(K, τ) satisfying (5.17). Since in reality, option prices are only

available for a few maturities T , but for many strikes K, one has to make some additional

assumptions on the structure of the volatility surface, e.g., σ(S, t) = ρ(t)σ(S). It turns

out, that in particular the identification of the volatility smile σ(S) is of interest in

finance. For simplicity we assume ρ(t) = 1 and r = q = 0 in the sequel, and focus on

the identification of σ(S) below:

Example 5.7 By substituting K = Sey and u = C/S, (5.17) transforms into

−uτ + a(y)(uyy − uy) = 0, u(y, 0) = max(1− ey, 0). (5.18)


Following the calculations in the appendix of [24], one sees that for sufficiently smooth

a, the solution u(q) of (5.18) is in u∗ + H2,1(QT ), where QT = R × (0, T ) and u∗

is a solution of the equation with a constant parameter a∗, which can be expressed

analytically. Hence, the parameter-to-output mapping

F : K(a∗) → u∗ + L2(R)

a 7→ u(·, T )

is well-defined for a ∈ K(a∗) := a ∈ a∗ + H1(R) : 0 < a ≤ a ≤ a, and we obtain (see

[22, 24] for details) that F ′(q)h = w(·, T ), where w satisfies

−wτ + a(wyy − wy) = −h(uyy − uy), on QT

with homogeneous initial condition. Furthermore, p = F ′(q)∗r satisfies

〈 p, h 〉H1 = −〈h,∫ T

0

(uyy − uy)Rdτ 〉L2 ,

where R solves

Rτ + (qR)yy + (qR)y = 0,

with terminal condition R(T ) = r. Under our regularity assumption on q, one can show

that∫ T

0(uyy − uy)Rdτ ∈ L2(QT ). Hence, F ′∗ maps Y = L2(R) into H2(R).

Next, we consider some aspects of a numerical implementation: according to [59], a

solution of (5.18) has the following asymptotic behavior

|u(y, τ)−H(y)| = O(e−|y|), |uy(y)| = O(e−|y|), |y| → ∞,

where H(x) denotes the Heavyside function. Hence, it is reasonable to approximate

(5.17) by an equation on a finite domain, which we do in the sequel, i.e., we assume

(5.18) holds for y ∈ QT = ΩM × (0, T ) with ΩM = (−M1,M2) and impose additional

boundary conditions, e.g.,

u(−M1, τ) = g0(τ), u(M2, τ) = g1(τ),

or

uy(−M1, τ) = h0(τ), uy(M2, τ) = h1(τ),

with g0, h0, h1 ∼ 1 and g1 ∼ 0. As a Hilbert scale we choose the one defined by

L2q = −qyy over X0 = H10 with X−1 = L2 and X1 = H2 ∩H1

0 . As we have seen above,

‖F ′(a)∗r‖1 = ‖F ′(a)∗r‖H2 ≤ C ‖r‖Y and consequently (N3) holds with

‖F ′(q)h‖Y ≤ C ‖h‖−a, a = 1.

Although we cannot verify (N6), we obtain (with CR = 0) that

‖Q(x, x)‖X−a,Y = ‖F ′(x)− F ′(x)‖X−a,Y ≤ c‖x− x‖ ,


which implies the Lipschitz-continuity of the Frechet-derivative considered as operator

on X−a. Note that if X−a and X s−a were equivalent (which is the case for s = 0), or if

at least an estimate from below (3.11) was available, this would in fact show (N6).

For an implementation of the preconditioned iterative methods we have to apply

the operator L−2sF ′(q)∗ to elements r ∈ Y . By the above representation, one gets for

s = −a/2 that L−2sF ′(q)∗r = L−1(∫ T

0(uyy − uy)Rdt), where L−1 amounts to one times

integration and can efficiently be implemented by Fourier transformation.

In the numerical example we set S∗ = 100 and try to identify a typical smile given

by aδ(y) = 12− 1

10erf(2(ey − 1)) from option values for strikes K = S∗ey ∈ [0, 300] and

maturity T = 1 year.

We want to mention at this point that in [24] sufficient conditions for a convergence

rate of O(δ1/2) for Tikhonov regularization applied to the inverse problem F (a) = uδ

have been derived, which essentially are differentiability conditions and fast decay of

a† − a∗ for |y| → ∞. In view of this results, we expect similar convergence rates in our

numerical tests.

Table 5.11 lists the iteration numbers for Landweber iteration and two Newton-type

iterations and their preconditioned version. Although, we could not proof (N6), and thus

formally our theory cannot be applied, the iteration numbers are still reduced quite

dramatically by preconditioning. Furthermore, the convergence rates of the iteration

errors are approximately ‖eδk∗‖ ∼ O(δ1/2) for all methods.

δ‖uδ−u(a0)‖ lw hs-lw new-lw hs-new-lw new-ν hs-new-ν

0.04 30 18 6(111) 4(43) 5(70) 3(25)

0.02 51 29 7(173) 5(70) 6(111) 4(43)

0.01 143 67 10(616) 8(266) 7(173) 5(70)

0.005 310 90 12(1404) 8(266) 8(266) 5(70)

0.0025 630 113 13(2114) 10(616) 9(406) 6(111)

η -1.13 -0.69 -1.15 -0.96 -0.63 -0.50

Table 5.11: Iteration numbers for iterative regularization methods and their Hilbert-

scale equivalents and the corresponding rates k∗ = O(δη); parameters τ = 2.1, ν = 2,

Nk = 500 and Nt = 250.

5.2.4 Reconstructing a Nonlinear Source Term in a Parabolic

Equation

In the final example, we investigate the identification of the nonlinearity in a parabolic

equation from boundary measurements:

Nonlinear parabolic equations appear, e.g., in the modeling of cooling processes for

steel and glass in liquids or gases, in the modeling of phase transitions, for instance in

crystallization processes of polymers, or in reaction kinetics of chemical systems. In all


these areas, the inverse problem of determining the nonlinearity from measurements is

of interest.

Example 5.8 As a model problem, we consider in the following the reconstruction of

a nonlinear source term q(u) in the parabolic equation

−ut + uxx + q(u) = f, on QT = [0, 1]× (0, T ) (5.19)

u(0, t) = ϕ0(t), u(1, t) = ϕ1(t), for t ∈ (0, T )

u(x, 0) = u0(x), for x ∈ [0, 1] (5.20)

from measurements of the Neumann data

ux(0, t) = ψ(t), ux(1, t) = ψ1(t), for t ∈ (0, T ).

According to [3] even such one-dimensional problems have important applications in

heat transfer at high temperatures, and we will therefore call u the temperature in the

sequel.

For further investigation we assume that the functions ϕ0, ϕ1, u0, and f are suffi-

ciently smooth, and that

ux ≤ −γ < 0, and ϕ′i(t) ≥ γ > 0. (5.21)

By means of the maximum principle, this monotonicity assumption can rather easily

be transfered to simple conditions on f , φi, and ψi. In [25], uniqueness and conditional

stability of a solution (u, q(u)) to this inverse problem has been proven by Carleman

estimates. Note that under the monotonicity assumption (5.21) the range of temperature

is known a-priori, i.e.,

u(x, t) ∈ y ∈ R : ϕ1(0) ≤ u ≤ ϕ0(T ) =: [a, b].

Thus, q can be considered as a function on [a, b]. Without (5.21), the range of values

of u is not known, and therefore has to be identified simultaneously (cf., e.g., [56] for a

related discussion). Below, we also assume that q is known at low temperature, i.e.,

q ∈ K = p ∈ H1[a, b] : p(a) = pa).

Such a condition was required for the stability analysis in [25]. We will now investigate

the conditions of Assumptions 4.21 and 4.22:

For q ∈ H1[a, b], f ∈ L2(QT ), and ϕ0, ϕ1, u0 sufficiently smooth, (5.19) – (5.20) has

a unique solution u ∈ H2,1(QT ), hence the operator

F : K → (L2[0, 1])2, q 7→ [ux(0, ·), ux(1, ·)]

is well-defined. Furthermore, F is Frechet-differentiable and the derivative in direction

h is given by F (q)′h = [wx(0, ·), wx(1, ·)], where w = u′(q)[h] denotes the solution of the

linearized (at q in direction h) equation

−wt + wxx + q′(u)w = −h(u), u = u(q)


with homogeneous initial and boundary conditions. By our assumptions, h(u) ∈ H1(QT )

and consequently, w ∈ H2,1(QT ). Next we show that F (q)′ can be extended to a linear

operator on L2(0, T ), i.e., one has

‖F ′(q)h‖2Y = ‖wx(0, ·)‖2

L2(0,T ) + ‖wx(1, ·)‖2L2(0,T ) ≤ c‖w‖2

H2,1(QT )

≤ C ‖h(u)‖2L2(QT ),

and it remains to estimate ‖h(u)‖L2 . By the monotonicity assumption (5.21), we obtain

‖h(u)‖2L2(QT ) =

∫

QT

(h(u))2dxdt =

∫ T

0

∫ u(0,t)

u(1,t)

(h(y))2|u−1x | dydt

≤ C ‖h‖2L2[0,1].

Now consider the Hilbert scale induced by L2p = −pxx over

X0 = p ∈ H1[a, b] : p(a) = 0

with X1 = p ∈ H2[a, b] : p(a) = p′(b) = 0. By differentiation one obtains that

X−1 = L2[a, b], and hence (N3) holds at least with a = 1. As in the previous example,

we are not able to verify (N6), but with the same arguments as above, one can show at

least that

‖F ′(x)− F ′(x)‖X−a,Y ≤ c‖x− x‖0,

which implies the Lipschitz-continuity of the Frechet-derivative considered as operator

on X−a = L2[a, b] here.

For the implementation of the preconditioned iterations, we have to calculate the

action of the operator L−2sF ′(q)∗ on some element r ∈ Y : one can show that F ′(q)∗ is

given by p = F ′(q)∗r with p ∈ X0 such that

〈 px, hx 〉L2[a,b] = 〈h(u), R 〉L2(QT ) (5.22)

and R satisfies the adjoint equation

Rt +Rxx + q′(u)R = 0

with homogeneous terminal condition and boundary condition

R(0, t) = −(ux(0, t)− ψδ0(t)), R(1, t) = ux(1, t)− ψδ1(t).

Instead of (5.22), one can alternatively solve

〈 p, h 〉L2[a,b] = 〈h(u), R 〉L2(QT ), L2p = p,

yielding L−2sF ′(q)∗r = L1p = L−1p. As in previous examples, the action of L−1 can be

implemented efficiently by Fourier transformation.

5.3. SUMMARY 79

We now turn to a numerical test: let φ0 = 1+t, φ1 = t, u0 = 1−t, f = sin(π(1−x+t)),

and consider the reconstruction of

q†(y) = sin(32πy),

from the Neumann data

ux(0, t) = ut(1, t) = −1.

For preconditioning we choose s = −1/2 with L as above. As the iteration numbers

listed in Table 5.12 show, the rates for the stopping indices are not reduced in all cases

by preconditioning. Note that our theory can formally not be applied, since we could

not verify condition (N6). However, throughout our numerical tests, the preconditioned

iterations perform much faster than their standard variants.

δ‖uδ−u(q0)‖ lw hs-lw new-lw hs-new-lw new-ν hs-new-ν

0.04 431 29 12(1404) 6(111) 7(173) 3(25)

0.02 737 54 14(3179) 7(173) 8(266) 4(43)

0.01 1210 90 15(4777) 8(266) 8(266) 5(70)

0.005 1983 161 16(7174) 10(616) 9(406) 6(111)

0.0025 3800 346 18(16164) 12(1404) 10(616) 7(173)

η -0.77 -0.87 -0.82 -0.91 -0.42 -0.69

Table 5.12: Iteration numbers for iterative regularization methods and their Hilbert-

scale equivalents and the corresponding rates k∗ = O(δη); parameters τ = 2.1, ν = 2,

Nx = 200, Nt = 100 and Nu = 100.

Note that the rates at which the iteration numbers of the numerical test grow with

increasing noise level are note improved by preconditioning in Hilbert scales. Neverthe-

less, the number of iterations are reduced by a factor of more than 10. The numerical

reconstructions of all methods are comparable and the numerically observed conver-

gence rates for the iteration errors are approximately O(δ1/5) throughout. More numer-

ical tests for this example can be found in [25], where Holder stability of the inverse

problem is shown, essentially under simple smoothness assumptions on q†.

5.3 Summary

We have seen in various examples, that preconditioning in Hilbert scales may drastically

accelerate iterative regularization methods, in particular, if the solutions are not very

smooth. The relaxed assumptions, i.e., (L2) and (N3) can usually be verified easily.

The verification of the nonlinearity condition (N6) is more subtle. However, even if

(N6) cannot be proven, (N3) at least guarantees well-definedness of a single iteration

step. As the results of our numerical examples suggest, some of the conditions in the

convergence analysis might still be weakend.


As the second test in Section 5.1.4 suggests, the choice of the Hilbert scale, in partic-

ular, the incorporation of the appropriate boundary conditions is essential. Otherwise,

the application of Hilbert scale preconditioning might even lead to non-convergence (in

the strong norm X0).

Bibliography

[1] R. A. Adams, Sobolev Spaces, Academic Press, London, 1975.

[2] O. M. Alifanov, Inverse Heat Transfer Problems, Springer, New York, 1994.

[3] O. M. Alifanov, E. A. Artyukhin, and S. V. Rumyantsev, Extreme Meth-

ods for Solving Ill-Posed Problems with Applications to Inverse Heat Transfer Prob-

lems, Begell House Inc., New York, 1995.

[4] G. Aubin and P. Kornprobst, Mathematical Problems in Image Processing,

Springer, Berlin, 2001.

[5] A. B. Bakushinskii, Remarks on choosing the regularization parameter using the

quasi-optimality and ratio criterion, USSR Comp. Math. MAth. Phys. 24 (1984),

181–182.

[6] A. B. Bakushinskii and A. V. Goncharskii, Iterative methods for the solution

of incorrect problems, Nauka, Moscow, 1989.

[7] , Ill-Posed Problems: Theory and Applications, Kluwer, Dordrecht, 1994.

[8] A. W. Bakushinskii, The problem of the convergence of the iteratively regularized

Gauß-Newton method, Comput. Math. Math. Phys. 32 (1992), 1353–1359.

[9] H. Banks and K. Kunisch, Parameter Estimation Techniques for Distributed

Systems, Birkhauser, Braunschweig, 1989.

[10] J. Beck, B. Blackwell, and C. S. Clair, Inverse Heat Conductions, Wiley,

Sussex, 1985.

[11] M. Bertero and P. Boccacci, Introduction to Inverse Problems in Imaging,

Istitute of Physics Publishing, London, 1998.

[12] F. Black and M. Scholes, The pricing of options and corporate liabilities, J.

of Political Economy 81 (1973), 637–659.

[13] B. Blaschke, A. Neubauer, and O. Scherzer, On convergence rates for

the iteratively regularized Gauss-Newton method, IMA Journal of Numer. Anal. 17

(1997), 421–436.

81

82 BIBLIOGRAPHY

[14] I. Bouchouev and V. Isakov, Uniqueness, stability and numerical methods for

the inverse problem that arises in financial markets, Inverse Problems 15 (1999),

R95–116.

[15] H. Brakhage, On ill-posed problems and the method of conjugate gradients, in:

H. W. Engl and C. W. Groetsch, eds., Inverse and Ill-posed Problems, Academic

Press, Boston, New York, London, 1987, 165–175.

[16] G. Chavent and K. Kunisch, On weakly nonlinear inverse problems, SIAM J.

Appl. Math. 56,2 (1996), 542–572.

[17] F. Colonius and K. Kunisch, Stability for parameter estimation in two point

boundary value problems, J. Reine Angew. Math. 370 (1986), 1–29.

[18] D. Colton and R. Kress, Inverse Acoustic and Electromagnetic Scattering

Theory, Springer, Berlin, 1992.

[19] R. Courant, uber die Eigenwerte bei Differentialgleichungen der mathematischen

Physik, Math.Z. 7 (1920), 1–57.

[20] P. Deuflhard, H. W. Engl, and O. Scherzer, A convergence analysis of

iterative methods for the solution of nonlinear ill-posed problems under affinely

invariant conditions., Inverse Problems 14 (1998), 1081–1106.

[21] B. Dupire, Pricing with a smile, RISK 7 (1994), 18–20.

[22] H. Egger, Identification of Volatility Smiles in the Black-Scholes Equation via

Tikhonov Regularization, Master’s thesis, Johannes Kepler University Linz, 2002.

[23] , Accelerated Newton-Landweber iterations for regularizing nonlinear inverse

problems, SFB-Report 2005-3, Linz, January 2005.

[24] H. Egger and H.W. Engl, Tikhonov regularization applied to the inverse prob-

lem of option pricing: Convergence analysis and rates, Inverse Problems 21 (2005),

1027–1045.

[25] H. Egger, H. W. Engl, and M. V. Klibanov, Global uniqueness and Holder

stability for recovering a nonlinear source term in a parabolic equation, Inverse

Problems 21 (2005), 271–290.

[26] H. Egger and A. Neubauer, Preconditioning Landweber iteration in Hilbert

scales, Numer. Math. (2005), to appear.

[27] B. Eicke, A. K. Louis, and R. Plato, The instability of some gradient methods

for ill-posed problems, Numer. Math. 58 (1990), 129–134.

[28] H. W. Engl, M. Hanke, and A. Neubauer, Regularization of Inverse Prob-

lems, Kluwer Academic Publishers, 1996.

BIBLIOGRAPHY 83

[29] H. W. Engl, K. Kunisch, and A. Neubauer, Convergence rates for Tikhonov

regularization of nonlinear ill-posed problems, Inverse Problems 5 (1989), 523–540.

[30] H. W. Engl, A. K. Louis, and W. Rundell, eds., Inverse Problems in Geo-

physics, SIAM, Philadelphia, 1996.

[31] , eds., Inverse Problems in Medichal Imaging and Nondestructive Testing,

Springer, Wien, New York, 1996.

[32] H. W. Engl and W. Rundell, eds., Inverse Problems in Diffusion Processes,

SIAM, Philadelphia, 1995.

[33] R. Gorenflo and S. Vesella, Abel Integral Equations: Analysis and Applica-

tions, Lecture Notes in Math. 1461, Springer, Berlin, 1991.

[34] C. W. Groetsch, Generalized Inverses of Linear Operators, Dekker, New York,

Basel, 1977.

[35] , Inverse Problems in the Mathematical Sciences, Vieweg, Braunschweig, 1993.

[36] M. Hanke, Accelerated Landweber iterations for the solution of ill-posed equations,

Numer. Math. 60 (1991), 341–373.

[37] , A regularization Levenberg-Marquart scheme, with application to inverse

groundwater filtration problems, Inverse Problems 13 (1997), 79–95.

[38] , Regularizing properties of a truncated Newton-CG algorithm for nonlinear

inverse problems, Numer. Func. Anal. Optim. 18 (1997), 971–993.

[39] M. Hanke, A. Neubauer, and O. Scherzer, A convergence analysis of the

Landweber iteration for nonlinear ill-posed problems, Numer. Math. 72 (1995), 21–

37.

[40] B. Hofmann, Mathematik inverser Probleme, Teubner, Stuttgart, Leipzig, 1999.

[41] T. Hohage, Logarithmic convergence rates of the iteratively regularized Gauß-

Newton method for an inverse potential and an inverse scattering problem, Inverse

Problems 13 (1997), 1279–1299.

[42] , Regularization of exponentially ill-posed problems, Numer. Funct. Anal. Op-

tim. 21 (2000), 439–464.

[43] D. Isaacson and J. C. Newell, Electrical impedance tomography, SIAM Review

41 (1999), 85–101.

[44] V. Isakov, Inverse Source Problems, Vol. 34 of Mathematical Surveys and Mono-

graphs, American Mathematical Society, Providence, RI, 1990.

84 BIBLIOGRAPHY

[45] , Inverse Problems in Partial Differential Equations, Springer, Berlin, 1998.

[46] , Carleman type estimates and their applications, in: K. Bingham et al., ed.,

New analytic and geometric methods in inverse problems, Springer, Berlin, 2004,

93–125.

[47] B. Kaltenbacher, Some Newton-type methods for the regularization of nonlinear

ill-posed problems, Inverse Problems 13 (1997), 729–753.

[48] , A-posteriori parameter choice strategies for some Newton type methods for

the regularization of nonlinear ill-posed problems, Numer. Math. 79 (1998), 501–

528.

[49] B. Kaltenbacher, A. Neubauer, and O. Scherzer, Iterative Regularization

Methods for Nonlinear Problems, Springer, Dordrecht, 2005, to appear.

[50] W. J. Kammerer and M. Z. Nashed, Iterative methods for best approximate

solutions of linear integral equations of the first and second kinds, J. Math. Anal.

Appl. 40 (1972), 547–573.

[51] J. Keller, Inverse problems, Amer. Math. Monthly 83 (1976), 107–118.

[52] M. V. Klibanov and A. Timonov, Carleman Estimates for Coefficient Inverse

Problems and Numerical Applications, Inverse and Ill-Posed Problems Series, VSP,

Netherlands, 2004.

[53] M. A. Krasnoselskii, P. P. Zabreiko, E. I. Pustylnik, and P. E.

Sbolevskii, Integral Operators in spaces of Summable Functions, Nordhoff In-

ternational Publishing, Leyden, 1976.

[54] S. G. Krein and J. I. Petunin, Scales of Banach spaces, Russian Math. Surveys

21 (1966), 85–160.

[55] R. Kreß, Linear Integral Equations, Springer, Berlin, 1989.

[56] P. Kugler and H. W. Engl, Identification of a temperature dependent heat

conductivity by Tikhonov regularization, J. Inv. Ill-posed Problems 10 (2002), 67–

90.

[57] R. Lagnado and S. Osher, A technique for calibrating derivative security pric-

ing models: numerical solution of the inverse problem, J. Computational Finance

1 (1997), 13–25.

[58] J. L. Lions and E. Magenes, Non-Homogeneous Boundary Value Problems and

Applications: Volume I, Springer, Berlin - Heidelberg, 1972.

[59] J. Lishang and T. Youshan, Identifying the volatility of underlying assets from

option prices, Inverse Problems 17 (2001), 137–155.

BIBLIOGRAPHY 85

[60] A. K. Louis, Inverse und schlecht gestellte Probleme, Teubner, Stuttgart, 1989.

[61] V. A. Morozov, On the solution of functional equations by the method of regu-

larization, Soviet Math. Dokl. 7 (1966), 414–417.

[62] M. Z. Nashed, Generalized Inverses and Applications, Academic Press, 1976.

[63] F. Natterer, Error bounds for Tikhonov regularization in Hilbert scales, Appl.

Anal. 18 (1984), 29–37.

[64] , The Mathematics of Computerized Tomography, Teubner, Stuttgart, 1986.

[65] A. Neubauer, When do Sobolev spaces form a Hilbert scale, Proc. Amer. Math.

Soc. 103 (1988), 557–562.

[66] , Tikhonov regularization of nonlinear ill-posed problems in Hilbert scales,

Appl. Anal. 46 (1992), 59–72.

[67] , On converse and saturation results for regularization methods, in: E. Schock,

ed., Beitrage zur Angewandten Analysis und Informatik, Helmut Brakhage zu

Ehren, Shaker, Aachen, 1994, 262–270.

[68] , On Landweber iteration for nonlinear ill-posed problems in Hilbert scales,

Numer. Math. 85 (2000), 309–328.

[69] V. G. Romanov and S. I. Kabanikhin, Inverse Problems for Maxwell’s Equa-

tions, VSP, Utrecht, 1994.

[70] E. Schock, Approximate solution of ill-posed equations: arbitrarily slow conver-

gence vs. superconvergence, in: G. Hammerlein and K. H. Hoffmann, eds., Con-

structive Methods for the Practical Treatment of Integral Equations, Birkhauser,

Basel, 1985, 234–243.

[71] U. Tautenhahn, Error estimates for regularization methods in Hilbert scales,

SIAM J. Numer. Anal. 33 (1996), 2120–2130.

[72] A. N. Tikhonov, Regularization of incorrectly posed problems, Soviet Math. Dokl.

4 (1963), 1624–1627.

Eidesstattliche Erklarung

Ich, Herbert Egger, erklare an Eides statt, dass ich die vorliegende Dissertation

selbstandig und ohne fremde Hilfe verfasst, andere als die angegebenen Quellen und

Hilfsmittel nicht benutzt bzw. die wortlich oder sinngemaß entnommenen Stellen als

solche kenntlich gemacht habe.

Linz, Juli 2004

Herbert Egger

A1

Curriculum Vitae

Personal Data

Name: Herbert Alexander Egger

Date of Birth: June 23, 1973

Place of Birth: 4400 Steyr, Austria

Nationality: Austrian

Education

1979 – 1983 Primary school in Enns

1983 – 1991 Highschool at ”Bischofliches Gymnasium am Kollegium

Petrinum”

1991 – 1995 Technical Chemistry (WITECH) at the J. Kepler University,

Linz

1996 – 2002 Studies in Technical Mathematics, Studienzweig ”Industrial

Mathematics”, J. Kepler University, Linz; Diploma Jan, 2002

since 2002 Doctoral student at J. Kepler University, Linz

Awards

06/2002 Master’s thesis awarded the Ludwig Scharinger Price, Raif-

feisen Landesbank OO,

Professional Career

02/2002 – 03/2004 Scientific staff at the Special Research Project ”Numerical

and Symbolic Computation”, SFB013, University Linz

since 04/2004 Research Scientist at the Inverse Problems Group of the

J. Radon Institute for Computational and Applied Mathe-

matics, Austrian Academy of Sciences

A3

Miscellaneous

07/1995 – 02/1996 Military Service.

07/2000 Participation at the ECMI-Modeling Week in Lund, Sweden

01/2001 – 06/2002 Foreign semester at the Oxford Center for Industrial and Ap-

plied Mathematics (OCIAM), UK

07/2001 Participation at the summerschool ”Industrial Mathematics”,

ISAM 2001 in Siena, Italy

08/2002 SFB-Conference on ”Computational Methods for Inverse

Problems” in Strobl, Austria

09/2002 ECMI Conference, Riga, Latvia

02/2003 MATHMOD 2003, Vienna

09/2003 Inverse Problems Workshop series during the Special

Semester on ”Computational Methods and Emerging Appli-

cations”, IPAM, UCLA, USA

06/2004 Invited Colloquium Talk at University Chemnitz.

01/2005 Workshop on ”Symmetries, Inverse Problems and Image Pro-

cessing”, RICAM, Linz

04/2005 GAMM Conference, Luxembourg

06/2005 Applied Inverse Problems (AIP) Conference, Cirencester, UK

A4

preconditioning iterative regularization methods in ...herbert/pubs/diss.pdf · to inverse and...

Documents