preconditioning iterative regularization methods in ...herbert/pubs/diss.pdf · to inverse and...
TRANSCRIPT
J O H A N N E S K E P L E R
U N I V E R S I T A T L I N ZN e t z w e r k f u r F o r s c h u n g , L e h r e u n d P r a x i s
Preconditioning Iterative Regularization
Methods in Hilbert Scales
Dissertation
zur Erlangung des akademischen Grades
Doktor der Technischen Wissenschaften
Angefertigt am Institut fur Industriemathematik
Begutachter:
a.Univ.-Prof. Dr. Andreas Neubauer
Priv.-Doz. Dr. Barbara Kaltenbacher, Universitat Erlangen
Eingereicht von:
Dipl.-Ing. Herbert Egger
Linz, July 2005
Johannes Kepler Universitat
A-4040 Linz · Altenbergerstraße 69 · Internet: http://www.uni-linz.ac.at · DVR 0093696
Acknowledgments
The starting point for my doctoral thesis was when I asked Prof. A. Neubauer for his
judgment on some of my ideas on preconditioning of iterative regularization methods
in Hilbert scales last autumn. During the past year I spent hours and hours in his office
discussing about various aspects of inverse problems and regularization, and asking
for his advice for establishing my proofs. And as the supervisor of my doctoral thesis,
he invested much time in answering my questions, pointing out certain difficulties,
and suggesting improvements. I want to thank him for his encouraging advice and
supervision.
Thanks also to Priv.-Doz. Dr. Barbara Kaltenbacher for the careful proof-reading
an her detailed remarks and suggestions, which surely led to an improved presentation.
I want to express my special gratitude to Prof. H. W. Engl, who has been supervising
my work at the SFB ”Numerical and Symbolic Scientific Computing” over the last three
years, for his guidance and also for providing me the freedom to follow my interests and
choose my own topics of research. Since the foundation of the ”Johann Radon Institute
for Computational and Applied Mathematics” (RICAM) of the Austrian Academy of
Sciences, which Prof. Engl is probably most responsible for, and with the ”Research
Institute for Symbolic Computation” (RISC) and the SFB F013, Linz became a unique
center of mathematics being able to attract people from all over the world, and providing
an inspiring, dynamic atmosphere for young researchers.
I would also like to acknowledge financial support by the Austrian Science Fund
(FWF) under the project grant SFB F013/F1308, by RICAM and the Austrian
Academy of Sciences.
Last but not least I want to say thanks to my colleagues from the Industrial Math-
ematics Institute and the Institute of Computational Mathematics at the University
Linz, from the SFB and from RICAM, for creating such an inspiring and friendly at-
mosphere. Special thanks to Andreas, Benjamin, Martin, Philipp, and Rainer, for their
help and patience with my (I am sure sometimes annoying) questions and comments.
i
ii
Abstract
This theses deals with the preconditioning of iterative regularization methods for linear
and nonlinear inverse problems, which arise in many applications in computational
mathematics, in other natural sciences, in engineering, and in industry. In many cases
such inverse problems are ill-posed, i.e., their solution is unstable with respect to data
perturbations, and stable approximations for a solution can only be found by so-called
regularization methods.
For large scale and nonlinear inverse problems, regularized solutions are typically
constructed by iterative algorithms that are used for a realization of continuous regular-
ization strategies like Tikhonov regularization on the one hand, or may be considered as
regularizing algorithms themselves if the iterations are stopped at the right time. The
stopping index may for instance be determined by the discrepancy principle, whose
implementation does not require any additional effort in case of iterative regularization
methods. The main emphasis of the presentation below is on preconditioning of such it-
erative methods, although most of the results directly apply also to other regularization
methods.
Besides the size of the data perturbations (the noise-level), the smoothness of the
solutions essentially determines the quality of the regularized approximations. Without
any smoothness, convergence (with noise-level tending to zero) will be arbitrarily slow
in general, and even under relatively strong smoothness assumptions on the solution
only Holder (or even only logarithmic) rates can be proven.
A main disadvantage of iterative regularization methods is that, especially for non-
smooth solutions, a large number of iterations has to be performed in order to guarantee
the optimal convergence rates. To overcome this problem, certain acceleration strategies
have been proposed for the iterative solution of linear inverse problems like the ν-
methods or the method of conjugate gradients. Here, we focus on a completely different
approach, namely preconditioning in Hilbert scales. This approach can also be used for
a further acceleration of already improved methods.
Preconditioning of well-posed problems has been investigated intensively, especially
the preconditioning of linear equations arising in the application of finite element meth-
ods to PDEs, and the resulting number of iterations can be shown to grow at most
logarithmically with the desired accuracy if the preconditioner is spectrally equivalent
to the inverse of the operator in the linear system. Note that for well-posed problems,
the preconditioner is typically a bounded (respectively smoothing) operator. In case of
iii
ill-posed problems, the situation is different: there the (forward) operator is typically
smoothing, while its inverse is unbounded. As for well-posed problems, good precon-
ditioners have to mimic the behavior of the inverse of the involved operator and thus
will usually be unbounded for ill-posed problems. This complicates preconditioning in
the presence of data noise, and only a reduction to the square root of iterations can
be achieved. However, since the usual iteration numbers may be rather large (10000
iterations are not unusual for the Landweber method), preconditioning typically leads
to a significant speed-up.
Hilbert scales were originally introduced in regularization theory with the goal to
overcome saturation effects of certain regularization methods. Here, we use Hilbert
scales for a different reason, namely to formulate and investigate preconditioning strate-
gies for a regularized iterative solution of ill-posed operator equations. The convergence
analysis of the resulting regularization methods is kept very general such that the stan-
dard convergence results and even a convergence theory for special methods for sym-
metric problems are included. Of particular importance from a numerical point of view
is that in many cases the preconditioners correspond to simple differential operators,
and thus preconditioning does not increase the numerical effort of a single iteration step
noticeably, while the overall number of iterations is reduced significantly.
The outline of the thesis is as follows: in Chapter 1 we give a short introduction
to inverse and ill-posed problems and recall the main definitions and basic results of
regularization theory.
Chapter 2 then gives a short overview over the most important and widely used
regularization algorithms for linear and nonlinear inverse problems. For later reference
we also summarize the main convergence rates results.
Hilbert scales, which are the main ingredient for the analysis of our preconditioning
strategy are introduced in Chapter 3. For a comparison with our results we also recall
the classical convergence results for regularization in Hilbert scales.
In Chapter 4 we formulate and analyze our preconditioning strategy for iterative
regularization methods for linear and nonlinear inverse problems. The results and con-
ditions of our convergence analysis are discussed in detail and compared to the ones of
standard regularization methods and classical regularization in Hilbert scales.
The applicability of our theoretical results is finally demonstrated in Chapter 5 for
various examples, and the effect of preconditioning is illustrated in several numerical
tests.
iv
Zusammenfassung
Die vorliegende Arbeit beschaftigt sich mit der Vorkonditionierung von iterativen Re-
gularisierungsmethoden fur lineare und nichtlineare inverse Probleme, die sowohl in
der angewandten Mathematik, als auch in naturwissenschaftlichen, technischen und in-
dustriellen Anwendungen auftauchen. In vielen Fallen sind solche inversen Probleme
schlecht gestellt, d.h., ihre Losung ist im allgemeinen instabil in bezug auf Datenfehler,
und daher konnen solche Probleme nur mit sogenannten Regularisierungsmethoden sta-
bil gelost werden.
Im Falle von hochdimensionalen und/oder nichtlinearen Problemen werden zur
Losung ublicherweise iterativen Algorithmen verwendet, etwa bei der Realisierung von
kontinuierlichen Regularisierungsmethoden wie der Tikhonov Regularisierung. Anderer-
seits konnen aber iterative Methoden auch direkt zur Regularisierung verwendet werden,
wenn die Iterationen rechtzeitig gestoppt werden, etwa mittels dem Diskrepanzprinzip,
welches bei iterativen Algorithmen leicht und ohne zusatzlichen Aufwand implemen-
tiert werden kann. In dieser Arbeit werden aus genannten Grunden hauptsachlich itera-
tive Regularisierungsmethoden untersucht; die meisten Resultate lassen sich aber ohne
großere Anderungen auf kontinuierliche Regularisierungsmethoden ubertragen.
Neben der Große des Datenfehlers bestimmt vor allem die Glattheit der Losungen
die Qualitat der Approximationen die durch Regularisierungsmethoden gefunden wer-
den konnen. Ohne jegliche Glattheit ist die Konvergenz der regularisierten Losungen
(mit Datenfehler gegen 0) im allgemeinen beliebig langsam, und selbst unter relativ
starken Glattheitsvoraussetzungen an eine Losung konnen meist nur Holderraten (bei
exponentiell schlecht gestellten Problemen sogar nur logarithmische Konvergenzraten)
erreicht werden.
Einer der wesentlichen Nachteile von iterativen Regularisierungsmethoden besteht
in der relativ großen Anzahl von Iterationen, die benotigt werden, um optimale Konver-
genzraten garantieren zu konnen. Die Anzahl der Iterationen steigt dabei bei reduzierter
Glattheit der Losungen. Fur lineare inverse Probleme stehen einige Beschleunigungs-
techniken zur Verfugung, etwa die sogennanten ν-Methoden oder die Methode der kon-
jugierten Gradienten. Inhalt dieser Arbeit ist ein ganzlich anderer Zugang, namlich
die Vorkonditionierung in Hilbertskalen. Diese Technik kann auch auf oben genannte
beschleunigte Verfahren angewendet werden.
Das Vorkonditionieren von schlecht konditionierten Gleichungssystemen, die von der
Anwendung Finiter Element Methoden auf partielle Differentialgleichungen herruhren,
v
ist relativ gut untersucht. In vielen Fallen kann man zeigen, dass durch geeignete
Wahl des Vorkonditionierers (spektralaquivalent zur Inversen des Operators im Glei-
chungssystem) die Anzahl der Iterationen nur logarithmisch mit der gewunschten
Genauigkeit steigt. Bei inversen Problemen ist die Situation ganz anders: Typischer-
weise sind die das Gleichungssystem beschreibenden Operatoren nicht beschrankt in-
vertierbar, und deshalb sind auch gute Vorkonditionierer im allgemeinen unbeschrankte
Opteratoren. Das erschwert das Vorkonditionieren im Falle von Datenfehlern, und die
Anzahl der Iterationen kann durch Vorkonditionierung im Wesentlichen nur auf die
Wurzel reduziert werden. Wenn man aber bedenkt, dass bei schlecht gestellten Prob-
lemen die Iterationsanzahl meist sehr hoch ist (z.B. sind 10000 Iterationen fur die
Landweber Methode durchaus nicht unublich), dann ist die durch Vorkonditionierung
erzielte Beschleunigung immer noch bemerkenswert.
Hilbertskalen wurden ursprunglich in der Regularisierungstheorie eingefuhrt um Sa-
turierungseffekte diverser Regularisierungsmethoden abzuschwachen und damit im Falle
glatter Losungen bessere Konvergenzraten zu erzielen. In dieser Arbeit werden Hilbert-
skalen mit einer ganz anderen Motivation verwendet, namlich um Vorkonditionierer
zu formulieren und ihren Effekt auf die iterative Losung schlecht gestellter Probleme
eingehend zu untersuchen. Unsere Konvergenzanalyse ist so allgemein gehalten, dass
die Standardresultate der Regularisierungstheorie und auch Konvergenzaussagen uber
weniger diskutierte Methoden fur symmetrische Probleme enthalten sind. Ein wichtiger
Punkt aus numerischer Sicht ist, dass in vielen Fallen mit einem Differentialoperator
vorkonditioniert werden kann, wodurch die Anwendung des Vorkonditionierers prak-
tisch keinen zusatzlichen Mehraufwand bedeutet, wahrend gleichzeitig die Anzahl der
Iterationen deutlich reduziert werden kann.
Der Inhalt der vorliegenden Arbeit ist wie folgt gegliedert: Kapitel 1 gibt eine kurze
Einfuhrung in die inversen und schlechtgestellten Probleme. Daruberhinaus werden die
wesentlichen Begriffe und Konvergenzaussagen der Regularisierungstheorie wiederholt.
Kapitel 2 gibt dann einen kurzen Uberblick uber die gangigsten Regularisie-
rungsmethoden. Um unsere Resultate mit denen der Standardtheorie vergleichen zu
konnen, werden auch die wesentlichen Konvergenzaussagen fur diese Methoden zitiert.
In Kapitel 3 werden wesentliche Aussagen uber Hilbertskalen gesammelt, die einen
Hauptbestandteil zur Formulierung und Untersuchung der spater behandelten Vorkon-
ditionierungsstrategie darstellen. Zusatzlich werden die klassischen Resultate der Re-
gularisierung in Hilbertskalen zum spateren Vergleich zitiert.
Die Vorkonditionierung von iterativen Regularisierungsmethoden in Hilbertskalen
wird in Kapitel 4 ausfuhrlich motiviert und untersucht, und es werden Konvergenzraten
fur die wichtigsten Methoden fur lineare und nichtlineare Probleme gezeigt. Die fur die
Analyse benotigten Bedingungen werden ausfuhrlich diskutiert und Zusammenhange
zu den klassischen Resultaten werden hergestellt.
Kapitel 5 beschaftigt sich schließlich mit dem Nachweis der Anwendbarkeit der the-
oretischen Ergebnisse auf praktische Probleme. Die theoretischen Aussagen uber den
Vorkonditionierungseffekt werden mit numerischen Testergebnissen untermauert.
vi
Contents
1 Inverse Problems and Regularization 1
1.1 Inverse and Ill-posed Problems . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Principles of Regularization . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.1 Generalized Solutions . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.2 Compact Operators . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.3 Regularization Operators . . . . . . . . . . . . . . . . . . . . . . 6
1.2.4 Continuous Regularization Methods . . . . . . . . . . . . . . . . 6
1.2.5 Nonlinear Problems . . . . . . . . . . . . . . . . . . . . . . . . . 10
2 Regularization Methods 13
2.1 Tikhonov Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Iterative Regularization Methods . . . . . . . . . . . . . . . . . . . . . 15
2.2.1 Landweber Iteration . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.2 The ν-Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2.3 The Method of Conjugate Gradients . . . . . . . . . . . . . . . 17
2.2.4 Landweber Iteration for Nonlinear Problems . . . . . . . . . . . 18
2.2.5 Regularized Newton-type Iterations . . . . . . . . . . . . . . . . 20
3 Regularization in Hilbert Scales 21
3.1 Introduction and General Definitions . . . . . . . . . . . . . . . . . . . 21
3.2 Linear problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.3 Nonlinear problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4 Preconditioning in Hilbert Scales 29
4.1 Main Assumptions and Preliminary Results . . . . . . . . . . . . . . . 30
4.2 Linear Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.2.1 Semiiterative Methods . . . . . . . . . . . . . . . . . . . . . . . 34
4.2.2 The Conjugate Gradient Method . . . . . . . . . . . . . . . . . 38
4.3 Nonlinear Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.3.1 Basic Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.3.2 Landweber Iteration . . . . . . . . . . . . . . . . . . . . . . . . 46
4.3.3 Newton-type Iterations . . . . . . . . . . . . . . . . . . . . . . . 50
vii
5 Examples and Numerical Tests 57
5.1 Integral Equations of the First Kind . . . . . . . . . . . . . . . . . . . . 58
5.1.1 Fredholm Integral Equations of the First Kind . . . . . . . . . . 58
5.1.2 Radon Inversion . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.1.3 An Inverse Problem in Imaging: Deblurring . . . . . . . . . . . 64
5.1.4 A Volterra-Hammerstein Integral Equation . . . . . . . . . . . . 67
5.2 Parameter Identification . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.2.1 An Inverse Source Problem in an Elliptic Equation . . . . . . . 70
5.2.2 Identifying a Reaction Term . . . . . . . . . . . . . . . . . . . . 71
5.2.3 An Inverse Problem in Mathematical Finance . . . . . . . . . . 73
5.2.4 Reconstructing a Nonlinear Source Term in a Parabolic Equation 76
5.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
Bibliography 81
Eidesstattliche Erklarung A1
Curriculum Vitae A3
viii
Chapter 1
Inverse Problems and
Regularization
In this work we investigate the acceleration of iterative regularization algorithms for
the solution of ill-posed inverse problems by preconditioning in Hilbert scales. Before
we formulate and analyze our methods, we introduce the basic concepts and notations
of inverse (ill-posed) problems and regularization. For later reference and a comparison
with the results of this work, we give a brief overview over some important classes of
regularization methods for linear and nonlinear inverse problems and recall the most
important convergence (rates) results.
1.1 Inverse and Ill-posed Problems
As already the notion suggests, inverse problems are always connected to direct prob-
lems. According to Keller [51], two problems are inverse to each other, if the formulation
of the one problem involves the other one. At least in a physical context, causality may
serve as a reasonable criterion for distinguishing, which problem is to be considered
the direct one, and which one the inverse; e.g., one would certainly call the prediction
of the physical state or the evolution of a system in dependence of certain (material)
parameters a direct problem, while the identification of (some of the) parameters from
measurements or observations of the physical state or derived quantities would natu-
rally be called the inverse problem. Therefore, inverse problems are typically concerned
with determining causes for a desired or observed effect.
It turns out in practice that many relevant inverse problems are ill-posed, i.e., they
lack at least one of the features of a well-posed problem in the sense of Hadamard:
existence, uniqueness or stability of a solution (with respect to given data). The study of
concrete inverse problems frequently involves the question how to enforce uniqueness by
additional assumptions or information. Such identifiability results (cf., e.g., [44, 45])are
however usually very problem dependent. By introducing a generalized solution concept
the claim for uniqueness (and partly for the existence of a solution) can be relaxed, and
1
2 CHAPTER 1. INVERSE PROBLEMS AND REGULARIZATION
the aspect of instability, which may be seen as the main characteristic of an ill-posed
problem, can be treated in a rather general way. As noted by A. N. Tikhonov [72],
the lack of stability is not due to a wrong formulation of the problem, but rather
naturally arises in many physically relevant problems, typically, if the direct problem
is smoothing.
For motivation we give here a short account on some classes of inverse problems
with important applications:
• (Computerized) tomography (cf. [64]): CT involves the reconstruction of a
function, usually a density distribution, from values of its line integrals and is
important both in medical applications and in nondestructive testing [31]. Math-
ematically, this is connected with the inversion of the Radon transform (see [64]).
• Inverse heat conduction problems like solving a heat equation backwards in
time or sideways (i.e., with Cauchy data on a part of the boundary) (cf. [2, 10, 32]).
• Inverse problems in imaging like deblurring and denoising (cf. [11]). Deblur-
ring has successfully been used to enhance images taken from the Hubble space
telescope. More recently, impainting, image segmentation and recognition have
gained increasing interest (cf. [4]).
• Inverse potential problems, i.e., problems where the observable quantity can
be expressed via surface or volume potentials, with application in geophysics and
geodesy. Typical problems are, e.g., the determination of a spatially varying den-
sity distribution in the earth from gravity measurements (cf. [30]), or of the gravity
potential of the earth from gravity measurements of a satellite (satellite geodesy).
Inverse potential problems also appear in connection with Maxwell’s equations
(cf. [69]); a prominent example is the synthesis problem of distributing charges in
such a way that a prescribed electric field is generated.
• Inverse scattering (cf. [18], where one wants to reconstruct an obstacle or an
inhomogeneity from scattered waves. This is a special case of shape reconstruction
and closely connected to shape optimization.
• Parameter identification in (systems of) (partial) differential equations from
interior or boundary measurements of a physical state (cf. [9, 45] with many ap-
plications, e.g., in groundwater hydrology, semiconductors, structural mechanics,
polymer crystallization, or mathematical finance. Identification from boundary
observations appears, e.g., in impedance tomography. Parameter identification is
also closely related to optimal control and optimal design.
• Geometric inverse problems, like shape reconstruction [43]: there, the quan-
tity of interest is a shape or geometry. Due to the lack of vector-space structure,
the analysis of such problems is rather difficult. If a domain is considered as the
1.2. PRINCIPLES OF REGULARIZATION 3
support of a characteristic function, geometric inverse problems can also be re-
formulated as parameter identification problems with jumping coefficients, where
the goal is to determine the jump. This class of problems is closely related to
optimal design and topology optimization.
Detailed references for these and many more classes of inverse problems can be found
e.g. in [28, 35, 40, 60]. Some of the above mentioned inverse problems, e.g., computerized
tomography, deblurring, and several parameter identification problems, will be discussed
in more detail in Section 5.
As we will see below, the instability is an inherently infinite dimensional phenomenon
(usually caused by the compactness of the direct problem) in many situations. Never-
theless, the lack of stability also causes difficulties in a numerical solution, e.g., finite
dimensional approximations of ill-posed problems are typically ill-conditioned. In any
case, an accurate and stable approximate solution of inverse (ill-posed) problems re-
quires special methods, so-called regularization methods.
1.2 Principles of Regularization
In this section we summarize the basic notions and definitions of regularization. We
follow the presentation in [28] and recall the main definitions and results for linear
inverse problems. The importance of spectral theory for the analysis and construction of
regularization methods will become clear from our presentation. Additionally, we briefly
discuss the most important regularization methods for linear and nonlinear problems
below.
In the sequel we consider linear inverse problems in the framework of abstract op-
erator equations of the form
Tx = y, (1.1)
where T denotes a bounded linear operator between Hilbert spaces X and Y . Unless
specified otherwise, we assume that the data y are attainable, i.e., y ∈ R(T ). In practice,
however, only approximations yδ of the true data y will be available. We further assume
that at least a bound on the data error is known, i.e.,
‖yδ − y‖ ≤ δ. (1.2)
Note that if R(T ) is non-closed, which is typically the case for ill-posed problems,
Tx = yδ may have no solution in general, and even if yδ ∈ R(T ), the corresponding
solution usually may be far away from the solution of Tx = y due to instability.
1.2.1 Generalized Solutions
Since it is too restrictive in many cases to require that (1.1) has a unique solution, we
will use the following generalized solution concept:
4 CHAPTER 1. INVERSE PROBLEMS AND REGULARIZATION
We call x a least-squares solution of Tx = y, if
‖Tx− y‖ = inf‖Tz − y‖ : z ∈ X.
Note that the infimum may not be attained, if R(T ) is not closed. A least-squares solu-
tion exists for y ∈ R(T )+R(T )⊥, but will be not unique if T is not injective. Uniqueness
can be restored by an appropriate selection criterion: we call x† an x∗-minimum-norm
solution if it has minimal norm ‖x − x∗‖ among all least-squares solutions. For linear
problems x∗ can always be replaced by 0 by changing the right-hand side to y − Tx∗.In this case, the 0−minimum-norm solution is usually called best approximate solution.
The operator T † with domain D(T †) = R(T ) +R(T )⊥ that maps data y ∈ D(T †)
onto the best-approximate solution x† = T †y is called the Moore-Penrose (generalized)
inverse of T . By the Open-Mapping-Theorem, the Moore-Penrose inverse is bounded,
i.e., the best approximate solution depends continuously on the data y if and only if
R(T ) is closed in Y (see [28, 34, 62] for details). Otherwise, the problem is ill-posed in
the sense of Hadamard, in particular, even for attainable data y ∈ R(T ), the dependence
of the best-approximate solution x† on the data y is unstable.
A least-squares solution x of Tx = y may be alternatively characterized as a solution
of the (Gaussian) normal equations
T ∗Tx = T ∗y. (1.3)
In case T ∗T is invertible, a least-squares solution (which then by injectivity of T is
unique and thus the best-approximate solution) is given by x = (T ∗T )−1T ∗y. Otherwise,
one can similarly as above use the Moore-Penrose inverse of T ∗T and define the solution
of (1.3) by x = (T ∗T )†T ∗y. It turns out that this alternatively characterizes the best-
approximate solution x†, in fact one has
T † = (T ∗T )†T ∗.
With this generalized solution concept in mind, the ill-posedness of problem (1.1) is
essentially reduced to the lack of stability of its solution, or equivalently, the unbound-
edness of the Moore-Penrose inverse T †.
1.2.2 Compact Operators
In the sequel, we discuss ill-posedness and instability of a solution of (1.1) for compact
linear operators T in more detail, noting that most of the results below quite naturally
generalize to the non-compact case and can be analyzed by spectral theory, cf. [28] for
details.
As a prototype for linear inverse problems, with applications in geophysics,
Maxwell’s equations, or deconvolution, we consider linear integral equations of the first
kind, e.g.,
(Tx)(s) =
∫
Ω
k(s, t)x(t)dt = y(s), s ∈ Ω,
1.2. PRINCIPLES OF REGULARIZATION 5
with kernel k ∈ L2(Ω2) and y ∈ L2(Ω), over a compact domain Ω . It is well-known (see,
e.g., [55]) that under the above assumptions the operator T is compact on L2(Ω), and
thatR(T ) is non-closed if the problem is infinite dimensional. A compact linear operator
T has a singular system σn, un, vnn∈N, where the σn are the positive square-roots of the
eigenvalues λn = σ2n (enumerated in decreasing order) of the selfadjoint, positive semi-
definite operator T ∗T . The corresponding eigenfunctions un form an orthonormal basis
of R(T ∗T ), and the functions vnn∈N, defined by vn = Tun‖Tun‖ are a complete system of
eigenfunctions of TT ∗ and span the space R(TT ∗). Moreover, one has Tun = σnvn and
T ∗vn = σnun. Thus, the action of a compact operator on an element can be written as
a singular value expansion
Tx =∞∑
n=1
σn〈 x, un 〉vn, T ∗y =∞∑
n=1
σn〈 y, vn 〉yn,
where the series converge in the Hilbert space norms of Y and X , respectively. In case T
has a finite dimensional range, and consequently only finitely many singular values σnexist, we call T degenerate. Otherwise, i.e., if R(T ) is infinite dimensional, the singular
values accumulate (only) at 0, i.e.,
limn→∞
σn = 0. (1.4)
For x ∈ N (T )⊥ = R(T ∗) one has
‖x‖2 =∞∑
n=1
〈 x, un 〉2,
and hence y =∞∑n=0
〈 y, vn 〉 ∈ R(T ) if (and only if) the Picard criterion is satisfied (cf.,
e.g., [28, Theorem 2.8]), i.e.,∞∑
n=1
|〈 y, vn 〉|2σ2n
<∞,
Moreover, one easily verifies that the best approximate solution x† satisfies
x† = T †y =∞∑
n=1
〈 y, vn 〉σn
un = (T ∗T )†T ∗y =∞∑
n=1
〈T ∗y, un 〉σ2n
un, (1.5)
which explains why (1.4) turns (1.1) into an ill-posed equation: errors in the n-th Fourier
coefficient 〈 y, vn 〉 of the data are amplified by a factor 1/σn, which might be arbitrarily
large for high frequency errors if dim(R(T )) = ∞. Note that the faster σn decays, the
stronger the error amplification in (1.5) is, which motivates the following quantification
of ill-posedness: a problem with σn ∼ n−α with α > 0 is usually called moderately (or
mildly) ill-posed, while problems where σn ∼ qn with some q < 1 are called severely
(exponentially) ill-posed.
6 CHAPTER 1. INVERSE PROBLEMS AND REGULARIZATION
1.2.3 Regularization Operators
In general terms, regularization means the approximation of an ill-posed problem by a
family of neighboring well-posed problems. More formally, a regularization method can
be defined in the following way (cf. [28]):
Definition 1.1 Let T : X → Y be a bounded linear operator between Hilbert spaces Xand Y, α0 ∈ (0,∞]. For every α ∈ (0, α0), let
Rα : Y → X
be a continuous (not necessarily linear) operator. The family Rα is called a (converg-
ing) regularization or a regularization operator, if for all y ∈ D(T †), there exists a
parameter choice rule α = α(δ, yδ), such that
limδ→∞
sup‖Rα(δ,yδ)yδ − T †y‖ : yδ ∈ Y , ‖yδ − y‖ ≤ δ = 0 (1.6)
holds. Here,
α : R+ × Y → (0, α0)
is such that
limδ→∞
supα(δ, yδ) : yδ ∈ Y , ‖yδ − y‖ ≤ δ = 0
Thus, a regularization method always consists of a regularization operator and a
parameter choice rule. Note that, due to the above definition, the operators Rα are
continuous on Y for α > 0, in particular, Rαyδ is a stable approximation of x† even for
yδ 6∈ R(T ).
We want to emphasize that the convergence condition (1.6) in the above definition
is rather strong, i.e., it uses a worst case error concept, namely
sup‖Rα(δ,yδ)yδ − T †y‖ : yδ ∈ Y , ‖yδ − y‖ ≤ δ
as a measure for convergence, and thus the regularized solutions
xδα := Rα(δ,yδ)yδ
converge uniformly with respect to noise in the data yδ towards the best-approximate
solution x† = T †y. For an extension of this concept of a regularization operators to
quite general nonlinear equations in metric spaces and related discussions we refer to
[7].
1.2.4 Continuous Regularization Methods
The concepts of the previous section allow to construct and analyze regularization meth-
ods for linear problems in a very general way by spectral theory. We shortly motivate
a construction of regularization methods in the compact case:
1.2. PRINCIPLES OF REGULARIZATION 7
In order to enforce stability in the explicit solution formula (1.5), one has to replace
the unbounded term 1/σ2n by an appropriate filtered (bounded) approximation gα(σ2
n),
where the filter functions gα(λ) is assumed to converge pointwise to 1/λ for λ > 0 as
α → 0. (The term filter function is used here in a slightly different form than in [60],
where it denotes λgα(λ)). In the compact case, a regularized solution is then defined by
xα :=∞∑
n=1
gα(σ2n)〈T ∗y, un 〉un and xδα :=
∞∑
n=1
gα(σ2n)〈T ∗yδ, un 〉un (1.7)
for exact data y and perturbed data yδ, respectively. The following theorem, which
summarizes the main statements of Theorems 4.1-4.3 in [28], clarifies under which
conditions the filter functions gα(λ) in fact define a regularization operator Rα in the
sense of Definition 1.1.
Theorem 1.2 Let for all α > 0, gα : [0, ‖T‖2]→ R satisfy the following assumptions:
gα is piecewise continuous and there is a C > 0 such that
|λgα(λ)| ≤ C, limα→0
gα(λ) = 1/λ,
for all λ ∈ (0, ‖T‖2]. Then, for all y ∈ D(T †),
limα→0
gα(T ∗T )T ∗y = x† (1.8)
holds with x† = T †y. Moreover, if y /∈ D(T †), then limα→0‖gα(T ∗T )T ∗y‖ = +∞.
Let xα, xδα be defined as in (1.7), and for α > 0, let
Gα := sup|gα(λ)| : λ ∈ [0, ‖T‖2],
then
‖Txα − Txδα‖ ≤ Cδ and ‖xα − xδα‖ ≤ δ√CGα. (1.9)
Finally, for µ > 0 and rα(λ) := 1 − λgα(λ), let ωµ : (0, α0) → R+ be such that for
all α ∈ (0, α0) and λ ∈ [0, ‖T‖2],
λµ|rα(λ)| ≤ ωµ(α)
holds. Then, for x† ∈ R((T ∗T )µ)
‖xα − x†‖ = O(ωµ(α)) and ‖Txα − Tx†‖ = O(ωµ+1/2(α)). (1.10)
We shortly discuss the assumptions and conclusions of this theorem: the first asser-
tion (1.8) states pointwise convergence for exact data y = Tx† but without any rates. It
is a typical feature of ill-posed problems that even for exact data the convergence may
be arbitrarily slow in general (cf. [70]). The result (1.9) is a quantitative estimate of
8 CHAPTER 1. INVERSE PROBLEMS AND REGULARIZATION
the propagated data error ‖xα−xδα‖ and essentially reflects the stability of the approx-
imations for α > 0 with respect to perturbations in the data. (1.10) finally estimates
the approximation error ‖x† − xα‖ in terms of the modulus of convergence ωµ charac-
terizing the approximation properties of a regularization method. Typically, ωµ can be
expressed in terms of fractional powers of α, e.g.,
λµ|rα(λ)| ≤ cµαµ, 0 ≤ µ ≤ µ0, (1.11)
holds for for many regularization methods for some µ0 > 0. The maximal µ0 for which
(1.11) holds is usually called qualification of the method, see also the detailed examples
of regularization methods in the next chapter.
Figure 1.1 shows the typical behavior of the error ‖x† − xδα‖ in dependence of α:
the approximation error ‖xα − x†‖ tends to zero as α → 0 with a rate that depends
on the approximation quality ωµ of the regularization method and the smoothness µ
of the solution (see below). The propagated data error ‖xδα − xα‖ , on the other hand,
increases with α → 0. In order to balance between the two error contributions, the
regularization parameter α has to be chosen appropriately.
10−5 10−4 10−3 10−2 10−1 1000
0.5
1
1.5
2
2.5
3
3.5approx. errorpro. data errortotal error
Figure 1.1: Approximation error ‖xα − x†‖ and propagated data error ‖xδα − xα‖ vs.
regularization parameter α for δ fixed.
The condition
x† ∈ R((T ∗T )µ) (1.12)
is called a source condition and measures the smoothness of a solution x† with respect
to the operator T . As we will see in the examples in Chapter 5, the abstract condition
(1.12) can in some cases be interpreted as a smoothness (differentiability) condition on
x†. For general x† ∈ R((T ∗T )µ) with fixed µ and general yδ ∈ Y with (1.2), the rate
‖x† − xδα‖ = O(δ2µ
2µ+1 ), (1.13)
1.2. PRINCIPLES OF REGULARIZATION 9
is the best possible in terms of powers of δ, and a regularization method yielding (1.13)
is therefore called order optimal. For linear problems, reverse conclusions have been
shown in so-called converse statements (cf., e.g., [28, 67]): if ‖x† − xδα‖ = O(δ2µ
2µ+1 ),
then it follows that x† ∈ ⋃ν<µ
R((T ∗T )ν).
Parameter Choice Strategies
Note, that Theorem 1.2 only states results about the regularization operators Rα. In
view of Definition 1.1, we still have to combine Rα with an appropriate parameter choice
rule α(δ, yδ), in order to obtain a regularization method. The role of such a parameter
choice strategy is to balance the two different error contributions illustrated in Fig-
ure 1.1. It is meaningful to distinguish between the following two types of strategies: a
parameter choice rule in the sense of Definition 1.1 is called
(a) an a-priori rule, if α does not depend on yδ, and thus one can write α = α(δ).
(b) an a-posteriori parameter choice strategy, if α = α(δ, yδ) depends on yδ.
Besides these two classes of parameter choice strategies, so-called error free parameter
choice rules, i.e., rules which do not incorporate the noise level, are frequently used
in practice. However, as a result due to Bakushinskii [5] shows, no rule α = α(yδ)
depending only on yδ can be part of a converging regularization method in the sense of
Definition 1.1 in general.
The simplest case of an a-priori parameter choice rule is
α = cδs,
for some c, s > 0. As we will see below, such a choice yields order optimal convergence
rates if a-priori information on the smoothness of the solution, i.e., the precise value of
µ in the source condition (1.12) is appropriately incorporated in the choice of s;
The probably most widely used a-posteriori parameter choice rule is the discrepancy
principle (cf. [28, 61])
α(δ, yδ) := supα > 0 : ‖Txδα − yδ‖ ≤ τδ, (1.14)
where τ > sup|rα(λ)| : α > 0, λ ∈ [0, ‖T‖2]. It can be shown that the supremum
in (1.14) is actually attained, if α → gα(λ) is continuous from the left for λ > 0. The
discrepancy principle yields order-optimal convergence rates without a-priori knowledge
on the smoothness of x† but only for µ ≤ µ0 − 1/2. The following theorem summarizes
the main convergence properties for regularization methods Rα satisfying the conditions
of Theorem 1.2 with the afore-mentioned a-priori and a-posteriori stopping rules (cf.
Corollary 4.4 and Theorem 4.12 in [28]).
10 CHAPTER 1. INVERSE PROBLEMS AND REGULARIZATION
Theorem 1.3 Let gα, rα satisfy the conditions of Theorem 1.2, and let (1.11) hold for
some µ0 > 0. If α ∼ δ2
2µ+1 , then
‖xδα − x†‖ = O(δ2µ
2µ+1 ) (1.15)
for x† ∈ R((T ∗T )µ) with 0 < µ ≤ µ0. If alternatively, α = α(δ, yδ) is defined by (1.14)
and gα is continuous from the left, then (1.15) holds for 0 < µ ≤ µ0 − 1/2.
The notion continuous regularization methods in the heading of this section refers
to the fact that the regularization parameter α is chosen from a continuum. With
slight modifications, the results can be generalized to cover also iterative regularization
methods: there, the filter functions are defined by polynomials gk(λ) of degree k, and
the stopping index k plays the role of the regularization parameter. In fact, most of the
previously cited results can be applied directly when substituting 1/k for α.
1.2.5 Remarks on nonlinear problems
Below we present some convergence results concerning regularization of nonlinear in-
verse problems, which we again investigate in the framework of abstract operator equa-
tions,
F (x) = y, (1.16)
where the (nonlinear) operator F : D(F ) ⊂ X → Y acts between Hilbert spaces X and
Y . Note that in many relevant nonlinear inverse problems, the operator F is only defined
indirectly, e.g., in parameter identification, F might be defined as the parameter-to-
output mapping that maps a parameter to the solution of a partial differential equation.
Thus, the evaluation of F is not straight forward (and usually computationally expen-
sive), and deriving mapping properties of F may require some analytical reasoning. For
illustration, consider the following model problem:
Example 1.4 (Reconstruction of a reaction term) Consider heat conduction in
a three dimensional body Ω with spatially varying reaction term c. The evolution of
the temperature u is then governed by
ut − κ∆u+ cu = f, in Ω× (0, T )
u = g, on ∂Ω× (0, T ),
u(·, 0) = u0, on Ω
(1.17)
where f denotes interior sources and g, u0 are the prescribed temperatures at the bound-
ary and at time zero, respectively. In the stationary case, the temperature distribution
u = u(x) will approximately satisfy
−κ∆u+ cu = f, in Ω
u = g, on ∂Ω,(1.18)
1.2. PRINCIPLES OF REGULARIZATION 11
Assuming that interior measurements of the temperature are available, the parameter-
to-output mapping F may be defined by F : c 7→ u(c), where u(c) denotes a solution
of (1.18) with parameter c. For detailed examples of parameter identification in (1.17)
and (1.18) from interior respectively boundary measurements, we refer to Section 5.
Like in the linear case, we mean by ill-posedness of (1.16) that a solution of F (x) = y
does not depend continuously on the data y. The ill-posedness of the problem F (x) = y
may further be quantitatively characterized via its linearization, although this char-
acterization is not always appropriate: it is shown in [29] that a nonlinear ill-posed
problem may have a well-posed linearization and that well-posed nonlinear problems
may have ill-posed linearizations.
Since linear problems may always be seen as a special case of nonlinear problems,
it is clear that in general, stability has to be enforced by some regularization method,
i.e., appropriate algorithms together with suitable parameter choice strategies have to
be used to obtain reasonable approximations in case of perturbed data. Similarly as
in the linear case, one can distinguish between two main error contributions due to
approximation and noise propagation (cf. Figure 1.1), and the task of parameter choice
strategies is again to balance between accuracy (approximation) and stability (noise
amplification).
While regularization of linear inverse problems can be almost completely analyzed
by spectral theory, the situation is more involved for nonlinear problems: first of all
spectral theory is available only for linear problems and thus can be applied at most to
certain linearizations. Therefore, a comprehensive convergence analysis of regularization
for nonlinear problems requires different functional analytic tools, and no general theory,
like Theorem 1.2 for linear problems, is available in the nonlinear case.
We only mention that by Tikhonov’s Lemma the inverse of a continuous bijective
operator F is continuous if D(F ) is compact. Thus, in principle, (1.16) can always be
regularized by restricting F to a compact domain; such reasoning however does not
yield any quantitative stability estimates (convergence rates); in some cases, stability
estimates may be obtained a-priori by a detailed analysis of the problem under inves-
tigation, e.g., via Carleman estimates (cf., e.g., [25, 46, 52]).
Basic assumptions on the operator F for a reasonable convergence theory (cf. [28])
are :
(i) F is continuous and
(ii) F is weakly (sequentially) closed, i.e., for any sequence xn ⊂ D(F ) weak con-
vergence of xn to x in X and weak convergence of F (xn) to some y in Y imply
that x ∈ D(F ) and F (x) = y.
Note that in the linear case, (ii) already follows from (i). In general, neither existence nor
uniqueness of a solution to (1.16) can be guaranteed, and therefore the concept of x∗-
minimum-norm solutions is used like in the linear case (cf. Section 1.2.1). For nonlinear
problems however, the role of x∗ cannot be neglected. As we will see below, the above
12 CHAPTER 1. INVERSE PROBLEMS AND REGULARIZATION
(rather weak) conditions already allow to prove convergence for certain regularization
methods, e.g., Tikhonov regularization.
Chapter 2
Important classes of regularization
methods
In this chapter we recall some of the most widely used regularization methods for
linear and nonlinear problems, and present the most important convergence results. In
the linear case, the results follow more or less directly from Theorem 1.2 respectively
Theorem 1.3 above. The analysis of nonlinear problems is more involved and requires
different reasoning.
As outlined in the previous chapter, the essential step in the construction of a
regularization method for linear problems is to approximate the unbounded function
1/λ in (1.5) by a filtered approximation gα(λ). In order to apply Theorem 1.3, it then
remains to show (1.11). We start with the probably most well-known regularization
method, namely Tikhonov regularization and then turn to a discussion of frequently
used iterative methods.
2.1 Tikhonov Regularization
For linear problems, Tikhonov regularization is defined by the filter function
gα(λ) :=1
λ+ α,
and one easily verifies that gα satisfies the assumption of Theorem 1.2 with C = 1,
Gα = 1/α, and that (1.11) holds for µ ≤ 1. Hence, Tikhonov regularization has finite
qualification µ0 = 1, and the best possible convergence rate that can be guaranteed by
Theorem 1.3 for sufficiently smooth x† is ‖xδα− x†‖ = O(δ2/3), if α is chosen according
to the a-priori stopping rule α ∼ δ2
2µ+1 , and O(δ1/2) when stopped according to the
discrepancy principle (1.14). The saturation at µ = 1 (respectively µ = 1/2) can partly
be overcome by considering Tikhonov regularization in Hilbert scales, see Chapter 3
below.
For Tikhonov regularization the regularized solution has the form
xδα = (T ∗T + αI)−1T ∗yδ. (2.1)
13
14 CHAPTER 2. REGULARIZATION METHODS
Alternatively, xδα can be characterized as the unique minimizer of the (Tikhonov) func-
tional
f δα(x) := ‖Tx− yδ‖2 + α‖x‖2.
This variational characterization is particularly important, since it allows to formulate
the Tikhonov method also for nonlinear problems F (x) = y; there, xδα is defined as a
solution of the nonlinear minimization problem
f δα(x) := ‖F (x)− yδ‖2 + α‖x− x∗‖2 → min, x ∈ D(F ), (2.2)
where α > 0 denotes the regularization parameter, yδ ∈ Y is an approximation of the
right-hand side y satisfying (1.2), and x∗ ∈ X is an appropriate a-priori guess. Under
the assumptions (i), (ii) on F , the existence of a (not necessarily unique) minimizer xδαfollows from the weak lower semicontinuity of the functional f δα. One can further show
that for fixed α > 0 the minimizers xδα depend (in a set-valued sense) continuously on
the data yδ and that xδα converge (in a set-valued sense) towards an x∗-minimum-norm
solution of (1.16) if α(δ)→ 0 and δ2/α(δ)→ 0 as δ tends to zero (cf. [28] for details).
One of the fundamental convergence rates results for Tikhonov regularization for
nonlinear problems reads as follows, cf. [28, Theorem 10.4]:
Theorem 2.1 Let D(F ) be convex, x† ∈ D(F ) be an x∗-minimum-norm solution of
(1.16), and yδ ∈ Y such that (1.2) holds. Furthermore, let F be Frechet-differentiable
with
‖F ′(x)− F ′(x†)‖ ≤ γ‖x− x†‖ , for all x ∈ D(F ).
If x† satisfies the source condition
x† − x∗ = F ′(x†)∗w (2.3)
for some w ∈ Y with γ‖w‖ < 1, then for α ∼ δ the rates
‖xδα − x†‖ = O(√δ), and ‖F (xδα)− yδ‖ = O(δ) (2.4)
hold.
The source condition (2.3) is an abstract smoothness condition corresponding to
(1.12) in the linear case. In fact, (2.3) is equivalent to x† − x∗ ∈ R((F ′(x†)∗F ′(x†))1/2),
and the rates (2.4) as well as the parameter choice α ∼ δ coincide with the the linear
case for µ = 1/2. If x† is in the interior of D(F ), then the optimal convergence rates
(1.13) even hold for µ ∈ [1/2, 1] (cf. [28, Theorem 10.7]). For (optimal) a-posteriori
stopping rules for Tikhonov regularization we refer to [28, Section 10.3].
While Tikhonov regularization can be analyzed under rather weak conditions, the
numerical implementation raises some questions and difficulties:
In contrast to the linear case, where the regularized solution xδα can be found by
(2.1), the minimization problem (2.2), which characterizes the regularized solution in the
nonlinear case, usually has to be solved in an iterative manner. Due to the nonlinearity
2.2. ITERATIVE REGULARIZATION METHODS 15
of F the Tikhonov functional f δα is in general non-convex and might have several local
(or even global) minima. Thus, additional conditions (on the nonlinearity of F and on
x∗) may have to be imposed in order to ensure convergence of an iterative algorithm to
a minimizer of the Tikhonov functional. The situation is analyzed in detail for a class
of nonlinear inverse problems called weakly nonlinear in [16], for which the Tikhonov
functional f δα admits a unique global (and no other local) minimum.
A second disadvantage of nonlinear Tikhonov regularization is that determining
α by an a-posteriori parameter choice rule typically requires the solution of several
optimization problems and thus is computationally expensive.
Therefore, a more direct approach to the regularization of linear and nonlinear
problems is to consider iterative methods as regularization methods themselves.
2.2 Iterative Regularization Methods
We start with a short discussion of iterative regularization methods for linear problems
and then turn to nonlinear problems:
For the iterative solution of linear inverse problems Tx = y, we consider the class
of so-called semiiterative methods (cf., e.g., [28, 36]): a basic step of such a method
consists in updating in direction of the residual of the normal equations T ∗(yδ − Txδk)followed by an averaging over all or some previous iterates. A semiiterative method can
be defined recursively in the following way: for given x0 set xδ0 = x0 and let
xδk = µ1,kxδk−1 + . . .+ µk,kx0 + ωkT
∗(yδ − Txδk), k ≥ 1∑k
i=1 µi,k = 1, ωk 6= 0.(2.5)
Algorithms of this form fall into the class of Krylov-subspace methods, i.e., the k-th
iterate xδk − x0 lies in the k-th Krylov subspace Kk(T ∗T, T ∗yδ), where for some self-
adjoint operator A
Kk(A, r) := spanr, Ar, . . . , Ak−1r, k ≥ 1.
Consequently, xδk can be written as
xδk = x0 + gk(T∗T )T ∗yδ, (2.6)
where gk is a polynomial of degree k− 1. It turns out that for an appropriate choice of
the coefficients µi,j and ωj the filter functions gk(λ) have the usual properties required
in Theorem 1.2.
Stability of iterative regularization methods is ensured by stopping the iteration
at the right time, i.e., the stopping index k∗ plays the role of the regularization pa-
rameter. k∗ be determined a-priori or a-posteriori, e.g., by the discrepancy principle
(k∗ = k(δ, yδ))
‖yδ − Txk∗‖ ≤ τδ < ‖yδ − Txk‖ , 0 ≤ k < k∗. (2.7)
16 CHAPTER 2. REGULARIZATION METHODS
Note that in contrast to Tikhonov regularization, the discrepancy principle requires no
additional computational effort, and may thus be considered as the natural stopping
criterion for iterative regularization methods.
The following general convergence result for linear semiiterative regularization meth-
ods can be concluded with slight modifications from Theorem 1.2 (cf. [28, Theorem
6.11]):
Theorem 2.2 Let y ∈ R(T ), and let the residual polynomials rk = (1−λgk(λ)) satisfy
ωµ(k) ≤ cµk−σµ, for 0 ≤ µ ≤ µ0 (2.8)
for some µ0 > 0 and σ ∈ 1, 2. Then the semiiterative method (2.5) with residual
polynomials rkk∈N is a regularization method of optimal order for T †y ∈ R((T ∗T )µ)
with 0 < µ ≤ µ0 − 1/2 provided the iteration is stopped with k∗ = k∗(δ, yδ) according
to the discrepancy principle (1.14) with fixed τ > supk∈N ‖rk‖C[0,1]. In this case we have
k∗ = O(δ−2
σ(2µ+1) ) and ‖xδk − x†‖ = O(δ2µ
2µ+1 ). The same rate holds for 0 < µ ≤ µ0, if
the iteration is stopped according to the a priori rule k∗ ∼ δ−2
σ(2µ+1) .
We only mention that even o(·) can be derived for ‖xδk−x†‖ , see [28] for details and
proofs.
As a first instance of a semiiterative regularization method of the form (2.5) we
obtain Landweber iteration by choosing µi,j = 0 and ωj = 1.
2.2.1 Landweber Iteration
The recursive form of Landweber iteration for linear problems reads
xδk+1 = xδk + T ∗(yδ − Txδk). (2.9)
Consequently, the iterates have the closed form representation (2.6) with
gk(λ) =k−1∑
j=0
(1− λ)j and rk(λ) = (1− λ)k.
One easily verifies that rk(λ) satisfies the conditions of Theorem 2.2 with σ = 1 and
µ0 =∞. Thus, in contrast to Tikhonov regularization, Landweber iteration exhibits no
saturation.
An alternative interpretation of Landweber iteration is that as a gradient method for
minimizing the least-squares functional ‖Tx − yδ‖2. This actually allows to formulate
a corresponding nonlinear version (see below).
Note, that for Landweber iteration, the number of iterations needed to obtain op-
timal convergence is k∗ = O(δ−2
2µ+1 ). For linear problems, the same convergence rates
can be achieved with far less iterations when using faster semiiterative methods. Of
particular practical importance are iterations, whose residual polynomials rk form an
2.2. ITERATIVE REGULARIZATION METHODS 17
orthogonal sequence with respect to some positive weight function. In this case, the
residual polynomials satisfy a three-term-recurrence (see, e.g., [28]), which also carries
over to the iterates, i.e.,
xδk = xδk−1 + µk(xδk−1 − xδk−2) + ωkT
∗(yδ − Txδk−1), k ≥ 1, (2.10)
with xδ−1 = xδ0 = x0. A specific choice of such orthogonal polynomials yields the
ν−methods by Brakhage [15].
2.2.2 The ν-Methods
For xδ0 = xδ−1 = x0, let the iterates xδk be defined by (2.10) with µ1 = 0, ω1 = 4ν+24ν+1
, and
for k > 1µk = (k−1)(2k−3)(2k+2ν−1)
(k+2ν−1)(2k+4ν−1)(2k+2ν−3),
ωk = 4 (2k+2ν−1)(k+ν−1)(k+2ν−1)(2k+2ν−1)
.
The corresponding polynomials rk(λ) satisfy the conditions of Theorem 2.2 with σ = 2
and µ0 = ν, yielding optimal rates of convergence with the stopping indices bounded
by
k∗ = O(δ1
2µ+1 ), (2.11)
which is only the square root of iterations as compared to Landweber iteration. In fact,
O(k−2µ) is the best possible estimate in terms of powers of k in (2.8) for semiiterative
methods of the form (2.5), and the bound (2.11) on the number of iterations cannot
be further reduced, in general. Hence, the ν-methods are said to have optimal speed of
convergence (cf., [36]).
Taking into account special properties of the operator T and letting the coefficients
in (2.5) depend on the data y, a further reduction of the stopping index is possible,
e.g., for the method of conjugate gradients, which is known as the probably most pow-
erful iterative method for the solution of well-posed, symmetric, positive semidefinite
problems.
2.2.3 The Method of Conjugate Gradients Applied to the Nor-
mal Equations (cgne)
In principle, cgne (for an algorithm, see [28, p. 177]) falls into the class of semiiterative
regularization methods. In contrast to the methods discussed previously, the iteration
polynomials gk(yδ;λ) of cgne depend on the data yδ, which makes cgne a nonlinear
method. As a matter of fact, no a-priori stopping rule k∗ = k(δ) renders cgne a regular-
ization method, cf. [27], but the iteration can be made an order optimal regularization
method by stopping according to the discrepancy principle (2.7), cf. [28, Theorem 7.12]:
18 CHAPTER 2. REGULARIZATION METHODS
Theorem 2.3 Let y ∈ R(T ), and let cgne be stopped according to the discrepancy
principle (2.7) with k∗ = k(δ, yδ), then cgne is an order optimal regularization method
for all µ > 0., i.e., if x† ∈ R((T ∗T )µ), then
‖x† − xδk∗‖ ≤ O(δ2µ
2µ+1 ).
As for well-posed problems, cgne reduces the residual at least as fast as any other
semiiterative method of the form (2.5). For certain classes of operators T , the bound
on the stopping index defined by the discrepancy principle can be further reduced, cf.
[28, Theorem 7.14]:
Theorem 2.4 Let T be a compact operator, T †y = (T ∗T )µw with ‖w‖ ≤ ρ and some
µ, ρ > 0. If the singular values σn of T decay like O(n−α) for some α > 0, then
k(δ, yδ) = O(δ−
1(2µ+1)(α+1)
).
If the singular values decay like O(qn) with some q < 1, then
k(δ, yδ) = O(1 + | log δ|
).
Remark 2.5 For finite dimensional problems, the residuals of cgne can be shown to
be reduced by a factor
q =
√κ− 1√κ+ 1
in each iteration step, where κ = cond(T ∗T ). Hence, the number kε of iterations needed
to reduce the residual by a factor ε is approximately
kε ∼ ln
√κ+ 1
2· | ln ε|,
i.e., the number of iterations increases with the condition number κ. Note, that this
is no contradiction to the previous theorem, which states decreasing iteration numbers
with increasing ill-posedness. In fact, one may argue that only singular values σn ≥ δ
play a role when stopping according to the discrepancy principle; and it is well-known
that cgne converges much faster if T has only few different (relevant) singular values.
Actually, this is also the reason for instability of cgne when stopping according to an
a-priori rule.
We now turn to iterative regularization of nonlinear problems:
2.2. ITERATIVE REGULARIZATION METHODS 19
2.2.4 Landweber Iteration for Nonlinear Problems
As already mentioned above, the characterization of Landweber iteration as a gradient
method for minimizing the least-squares functional allows to formulate the method also
for nonlinear problems, i.e., the iteration (2.9) is replaced by
xδk+1 = xδk + F ′(xδk)∗(yδ − F (xδk)
). (2.12)
Similarly, the discrepancy principle can be adapted to the nonlinear case, e.g., k∗ is
determined by
‖F (xδk∗)− yδ‖ ≤ τδ < ‖F (xδk)− yδ‖ , k ≤ k∗ (2.13)
for some τ > 2.
A quantitative convergence analysis of Landweber iteration was carried out in [39]
under the following conditions:
(i) F is Frechet-differentiable in a ball Bρ(x0) and satisfies ‖F ′(x)‖ ≤ 1.
(ii) F ′(x) = R(x)F ′(x†), for all x, x ∈ Bρ(x0) and ‖R(x)− I‖ ≤ C ‖x− x†‖ .
(iii) F has a solution x† ∈ Bρ(x0) satisfying
x† − x0 =(F ′(x†)∗F ′(x†)
)µw,
for some µ > 0 and ‖w‖ sufficiently small.
Note that under condition (ii), the range of the adjoint of the Frechet-derivative
is invariant in a neighborhood of x0, i.e., R(F ′(x)∗) = R(F ′(x)∗) for x, x ∈ Bρ(x0).
We refer to [20] for a discussion of the importance of invariance conditions for the
convergence results for nonlinear problems.
Under the above conditions, the following convergence rates result is derived in [39]:
Theorem 2.6 Let the assumptions (i)-(iii) above hold for some ρ sufficiently small
and µ ≤ 1/2. If the iteration (2.12) is stopped according to the discrepancy principle
(2.13) with τ sufficiently large, then
k∗ = O(δ2
2µ+1 ) and ‖xδk − x†‖ = O(δ2µ
2µ+1 ). (2.14)
Note that under the given nonlinearity condition (ii), Landweber iterations shows
saturation at µ = 1/2, which was not the case for linear problems. A comparison with
the linear case shows that for µ ≤ 1/2, the rates (2.14) are order optimal under the
given source condition. In [39], convergence without rates is proven under the weaker
nonlinearity assumption
(ii’) ‖F (x)−F(x)− F ′(x)(x− x)‖ ≤ η‖F (x)− F (x)‖ for some η < 1/2
and without a source condition on x†.
20 CHAPTER 2. REGULARIZATION METHODS
2.2.5 Regularized Newton-type Iterations
A different, but quite natural step in the construction of iterative regularization methods
for nonlinear problems is to consider the Newton method for the solution of F (x) = y.
However, it turns out that already a single Newton step
F ′(xδk)(xδk+1 − xδk) = (yδ − F (xδk)) (2.15)
is usually ill-posed if the original problem F (x) = y was. Hence, (2.15) has to be solved
by some regularization method again. Applying Tikhonov regularization to (2.15) yields
the Levenberg-Marquardt method [37]. Stability can be further increased by regularizing
around some fixed element x∗, i.e., by solving
F ′(xδk)(xδk+1 − x∗) = (yδ − F (xδk) + F ′(xδk)(x
δk − x∗)) (2.16)
with an appropriate regularization method. With Tikhonov regularization, one obtains
the iteratively regularized Gauß-Newton method [8, 13] in this case. Alternatively, (2.16)
can be solved by more general regularization methods, yielding regularized Newton-type
iterations of the form
xδn+1 = x∗ + gαn(F ′(xδn)∗F ′(xδn))F ′(xδn)∗[y − F (xδn)− F ′(xδn)(x∗ − xδn)].
Especially for large scale problems, (2.16) might have to be solved by some iterative
algorithm [23, 47, 48, 38]. We will consider such methods and their acceleration by
preconditioning in Hilbert scales in detail in Section 4.3.3. Without citing a concrete
result, we only mention that under the above assumptions (i) – (iii), the same conver-
gence rates as for the nonlinear Landweber iteration hold for the iteratively regularized
Gauß-Newton method and the (accelerated) Newton-Landweber iterations ([23, 48]).
For a comprehensive discussion of various regularized Newton-type methods and their
convergence theory, we refer to [49] and the references cited therein.
We end our survey on regularization methods here, noting that this presentation
is of course far from being complete. For further discussion, e.g., on regularization by
discretization, truncated singular value decomposition, or on asymptotic regularization
we refer to the literature.
Chapter 3
Regularization in Hilbert Scales
This chapter is concerned with a special aspect of regularization theory, namely regu-
larization in Hilbert scales. After introducing the concept of Hilbert scales, we derive
some elementary properties of Hilbert scales which will be needed for the analysis of
the next sections. Then, we quote the most important classical results on regulariza-
tion in Hilbert scales in order to show that saturation, which limits the approximation
properties of regularization methods, in particular for nonlinear problems, can partly
be overcome by considering the inverse problems in Hilbert scales. The results will also
serve as motivation and a starting point for the formulation of our preconditioning
strategy for iterative regularization methods.
3.1 Introduction and General Definitions
In the following, we summarize the main results on Hilbert scales needed for the sub-
sequent analysis. For details and proofs we refer to [28, Section 8.4] and [54]:
Further on, let L be a densely defined, unbounded, selfadjoint, strictly positive
operator in X , i.e., L is a closed operator in X satisfying
D(L) = D(L∗) is dense in X , (3.1)
〈Lx, y 〉 = 〈 x, Ly 〉, for all x, y ∈ D(L), (3.2)
and there exists a γ > 0 such that
‖Lx‖ ≥ γ‖x‖ for all x ∈ D(L). (3.3)
As can be shown by spectral theory, the set
M :=∞⋂
k=0
D(Lk) (3.4)
is dense in X . Futhermore, Ls is defined on M for all s ∈ R and
M =⋂
s∈RD(Ls).
21
22 CHAPTER 3. REGULARIZATION IN HILBERT SCALES
These properties allow to make the following definition:
Definition 3.1 For x, y ∈M and s ∈ R, let
〈 x, y 〉s := 〈Lsx, Lsy 〉, (3.5)
‖x‖s := ‖Lsx‖ . (3.6)
Then the Hilbert spaces Xs are defined as the completion of M with respect to the norm
‖ · ‖s, and Xss∈R is called the Hilbert scale induced by L.
This construction implies the following properties, cf. [28, Proposition 8.19]:
Proposition 3.2 Let L be as above and let Xss∈R denote the Hilbert scale induced
by L. Then the following assertions hold:
(i) For −∞ < s < t <∞, the space Xt is densely and continuously embedded in Xs.
(ii) For s, t ∈ R, the operator Lt−s, defined on M, has a unique extension to Xt,which is an isomorphism from Xt onto Xs. If t > s, this extension, again denoted
by Lt−s, is selfadjoint and strictly positive as restriction to Xt in Xs. Moreover,
Lt−s = LtL−s holds for the appropriate extensions, in particular, (Ls)−1 = L−s.
(iii) If s ≥ 0, then Xs = D(Ls) and X−s = (Xs)′, i.e., X−s is the dual space of Xs.
(iv) For −∞ < q < r < s <∞ and x ∈ Xs, the interpolation inequality holds, i.e.,
‖x‖r ≤ ‖x‖s−rs−qq ‖x‖
r−qs−qs .
As in (ii) we will not distinguish between operators and their extensions below, if the
meaning is clear from the context.
Before we continue with our discussion of Hilbert scales and explain their application
to regularization methods, we give two short examples for illustration:
Example 3.3 Let T : X → Y be a compact, injective, linear operator between Hilbert
spaces X and Y . Then L := (T ∗T )−1 induces the Hilbert scale Xss∈R with
Xs := D((T ∗T )−s) = R((T ∗T )s) .
Note that in this case the spaces Xµ, µ ≥ 0, are the usual source sets for regularization
in Hilbert spaces (cf. Theorem 1.2).
Example 3.4 Let Ω ⊂ Rn, n = 2, 3, be a bounded domain with sufficiently smooth
boundary ∂Ω. Then
−∆ : H2(Ω) ∩H10 (Ω) ⊂ L2(Ω)→ L2(Ω)
3.1. INTRODUCTION AND GENERAL DEFINITIONS 23
satisfies the conditions of Definition 3.1, i.e., L := −∆ induces a Hilbert scale Xss∈R.
Furthermore, Xs = H2s0 (Ω) for s ∈ [0, 3/4) which means that the Sobolev spaces Hs
0
are part of a Hilbert scale. Note that in general, Hs0s∈R is not a Hilbert scale, in
particular, with the above definition,
X1 = H2(Ω) ∩H10 (Ω) 6= H2
0 (Ω) .
See also [65] for a related discussion.
The following results are at the core of the subsequent analysis of regularization in
Hilbert scales. For details and proofs see [53] or [28, Section 8.4]:
Proposition 3.5 (Inequality of Heinz) Let L and A be two densely defined, un-
bounded, selfadjoint, strictly positive operators in X with D(A) ⊂ D(L) and
‖Lx‖ ≤ ‖Ax‖ , for all x ∈ D(A).
Then, for all ν ∈ [0, 1], we have that D(Aν) ⊂ D(Lν) and
‖Lνx‖ ≤ ‖Aνx‖ , for all x ∈ D(Aν).
Proposition 3.6 Let T : X → Y be a linear operator and L be as in Proposition 3.2.
Assume that for some a > 0 and m > 0
‖Tx‖ ≤ m‖x‖−a, for all x ∈ X (3.7)
holds and that the extension of T to X−a (again denoted by T) is injective. Then the
following assertions hold: D((B∗B)−ν2 ) = R((B∗B)
ν2 ) ⊂ Xν(a+s) for all ν ∈ [0, 1] with
B := TL−s for some s ≥ −a, and
‖(B∗B)ν2x‖ ≤ mν ‖x‖−ν(a+s) for all x ∈ X , (3.8)
‖(B∗B)−ν2x‖ ≥ m−ν ‖x‖ν(a+s) for all x ∈ D((B∗B)−
ν2 ) . (3.9)
Note also that condition (3.7) is equivalent to
R(T ∗) ⊂ Xa and ‖T ∗w‖a ≤ m‖w‖ for all w ∈ Y . (3.10)
Now assume that in addition to (3.7)
m‖x‖−a ≤ ‖Tx‖ , for all x ∈ X (3.11)
holds for some m > 0, a > 0. Then it follows for all ν ∈ [0, 1] that
Xν(a+s) ⊂ R((B∗B)ν2 ) = D((B∗B)−
ν2 )
24 CHAPTER 3. REGULARIZATION IN HILBERT SCALES
and
‖(B∗B)ν2x‖ ≥ mν ‖x‖−ν(a+s) for all x ∈ X ,
‖(B∗B)−ν2x‖ ≤ m−ν ‖x‖ν(a+s) for all x ∈ Xν(a+s) . (3.12)
Moreover, (3.11) is equivalent to
Xa ⊂ R(T ∗) and ‖T ∗w‖ a ≥ m‖w‖for all w ∈ N (T ∗)⊥ with T ∗w ∈ Xa.
(3.13)
Proof. Assume that (3.7) holds: then (3.8) and (3.9) follow from Propositions 3.2,
3.5, interpolation and duality arguments.
Additionally, the operator TLa (respectively its extension) is a continuous mapping
from X to Y . Hence, for y ∈ Y ∩ D(LaT ∗),
‖LaT ∗y‖ = sup‖x‖=1
〈LaT ∗y, x 〉 = sup‖x‖=1
〈 y, TLax 〉 ≤ m‖y‖ ,
which proves (3.10). Next, observe that
‖(B∗B)12x‖ = ‖Bx‖ = ‖TL−sx‖ , ∀x ∈ X−s. (3.14)
By (3.7), B = TL−s can be extended as a continuous operator to X−(a+s) and by (3.14)
D((B∗B)12 ) ⊂ D(TL−s) = X−(a+s).
The reverse implication follows in the same way.
The results under condition (3.11) follow similarly. ¤
Remark 3.7 By (3.10), it follows that R((T ∗T )12 ) = R(T ∗) ⊂ Xa, and hence one can
show with similar reasoning as above that
R((T ∗T )µ) ⊂ X2aµ, for 0 ≤ µ ≤ 1
2.
Note that in general, x† ∈ X2aµ will not imply x† ∈ R((T ∗T )µ).
If, on the other hand, ‖Tx‖ ≥ m‖x‖−a for some m > 0, then the converse inclusion
R((T ∗T )µ) ⊃ X2aµ holds.
Finally, if T ∼ L−a, i.e.,
m‖x‖−a ≤ ‖Tx‖ ≤ m‖x‖−a for x ∈ X (3.15)
holds for some a > 0 and 0 < m < m <∞. which is the usual condition in Hilbert scale
regularization, then the spaces X2aµ and R((T ∗T )µ) coincide for |µ| ≤ 1/2. Condition
(3.15) is called norm equivalence and is important for preconditioning also in the well-
posed situation.
The following result (cf. [28, Corollary 8.22]) is an immediate consequence of Propo-
sition 3.6:
3.1. INTRODUCTION AND GENERAL DEFINITIONS 25
Corollary 3.8 Let Xss∈R be the Hilbert scale introduced by L, and let T : X → Y be
a bounded linear operator satisfying (3.15) for some a > 0 and 0 < m < m <∞. Then
for B := TL−s, s ≥ −a, and for |ν| ≤ 1,
c(ν)‖x‖−ν(a+s) ≤ ‖(B∗B)ν2x‖ ≤ c(ν)‖x‖−ν(a+s)
holds on D((B∗B)ν2 ) with c(ν) = min(mν ,mν) and c(ν) = max(mν ,mν). Moreover,
R((B∗B)ν2 ) = Xν(a+s), where (B∗B)
ν2 has to be replaced by its extension to X if ν < 0.
The number a in (3.15) may be interpreted as degree of ill-posedness. For illustra-
tion we check (3.15) and thus the applicability of the previous result for the following
example:
Example 3.9 Let X = Y = L2[0, 1], and consider T defined by
(Tx)(s) =
∫ s
0
k(t)x(t)dt,
with some 0 < k ≤ k(t) ∈ H1[0, 1]. Then R(T ) = x ∈ H1[0, 1] : x(0) = 0.For the choice of a Hilbert scale, observe the following: if k would be constant, e.g.
k = 1, then R(T ∗T ) = x ∈ H2[0, 1] : x(1) = 0, x′(0) = 0. Hence it is reasonable, to
choose X1 = x ∈ H2[0, 1] : x(1) = 0, x′(0) = 0 with
Lx = −x′′,
which for constant k yields T ∗T ∼ L−2. For non constant k as above it still follows that
X1/2 = x ∈ H1[0, 1] : x(1) = 0 = R(T ∗). In order to show (3.7) and (3.11), we use
that
‖x‖1/2 = ‖L1/2x‖ = ‖x′‖ ,note that L1/2x 6= x′. Since (T ∗y)(t) = k(t)
∫ 1
ty(s)ds, it follows that ‖(T ∗y)′‖ ≤ c ‖y‖ .
Thus T ∗ is a continuous mapping from L2[0, 1] onto X1/2 and has a bounded inverse
by the Open-Mapping-Theorem. Together with Proposition 3.6, this yields (3.15) with
a = 1/2, in particular, (T ∗T )ν2 ∼ L−νa for |ν| ≤ 1/2.
Note that by Proposition 3.6 and Remark 3.7 the Hilbert scales Xs and Xsinduced by L and L := (T ∗T )−1/2a coincide for s ∈ [−a, a]. However, since
(T ∗Tx)(s) = k(s)
∫ 1
s
∫ t
0
k(τ) x(τ)dτdt,
it follows that R(T ∗T ) 6⊂ X1 if k 6∈ H2[0, 1] or k′(0) 6= 0, and thus T ∗T 6∼ L−2a. This
illustrates that two Hilbert scales may coincide (with equivalent norms) for a range of
values of s, while they differ outside.
As in the above example, boundary conditions play an essential role for Hilbert scale
considerations and for regualrization in general, see [65] for a related discussion.
26 CHAPTER 3. REGULARIZATION IN HILBERT SCALES
3.2 Linear problems in Hilbert Scales
Originally, regularization in Hilbert scales was introduced by Natterer [63] for the spe-
cial case of Tikhonov regularization combined with an a-priori stopping rule. It was
shown independently in [28] and [71] that the results naturally generalize to quite gen-
eral regularization methods for linear problems. Later, in [66] and [68], the theory was
extended to Tikhonov regularization and Landweber iteration for nonlinear problems
combined with an a-posteriori parameter choice strategy. In any of these works, Hilbert
scales were used to extend the range of optimal convergence of the methods under
consideration: e.g., as mentioned in Section 2.1, Tikhonov regularization has a finite
qualification µ0 = 1 and thus the best possible rate of convergece is O(δ2/3) provided
x† ∈ R((T ∗T )µ) for some µ ≥ 1 and the regualrization parameter α is chosen appro-
priately. However, as we will see below, a better rate (in X = X0) can be obtained
for µ > 1 if T is considered as an operator from Xs to Y and if the regularization is
performed in the (stronger) norm of Xs (s > 0). A second advantage of regularization in
Hilbert scales is that the abstract source conditions x† ∈ R((T ∗T )µ) can be interpreted
in terms of the Hilbert scale Xs, e.g., conditions like x† ∈ Xu are used, which usually
amount to simple differentiability conditions (including boundary conditions), see also
the examples of Hilbert scales in Chapter 5.
Next, we recall the main convergence (rates) results for Tikhonov regularization in
Hilbert scales:
For the rest of this Chapter, let Xss∈R be the Hilbert scale induced by L, and let
T : X → Y be a bounded linear operator satisfying (3.15), i.e.,
m‖x‖−a ≤ ‖Tx‖ ≤ m‖x‖−a
on X for some a > 0 and 0 < m < m <∞. From (3.15) it follows that
B := TL−s
is well-defined for s ≥ −a. For a filter function gα : [0, ‖B‖2] → R satisfying the
conditions of Theorem 1.2, we define the regularized solution xδα by
xδα := L−sgα(B∗B)B∗yδ. (3.16)
This actually corresponds to the standard definition of a regularized solution (1.7) if T
is considered as an operator on Xs. To see this, consider Tikhonov regularization with
the norm of Xs: according to (2.1) the regularized solution xδα is defined as the solution
of the regularized normal equations
T ∗yδ = (T ∗T + αL2s)xδα = Ls(B∗B + αI)Lsxδα
or equivalently
xδα = L−s(B∗B + αI)−1L−sT ∗yδ = L−s(B∗B + αI)−1B∗yδ.
3.3. NONLINEAR PROBLEMS 27
The standard results on regularization (cf. Theorem 1.3) yield convergence (rates) of
xδα to x† with respect to the norm in Xs. However, in Hilbert scale regularization,
one is interested in the convergence in the original space X (at a better rate). The
following result summarizes the convergence behavior with respect to the norm in Xfor regularization of linear problems in Hilbert scales (for the details and proofs we refer
to [28, Section 8.5]).
Theorem 3.10 Let xδα be defined by (3.16), and let the assumptions on gα of Theo-
rem 1.2 hold with rα(λ) = 1 − λgα(λ) satisfying (1.11) for some µ0 ≥ 1. Then for
x† ∈ Xu with 0 < u ≤ a+ 2s, and for the parameter choice
α ∼ δ−2(a+s)a+u
the following estimate holds:
‖xδα − x†‖ = O(δua+u ).
If rα is continuous from the left for all λ ∈ [0, ‖B‖2] as a function of α, and if α is
chosen according to the discrepancy principle (1.14) with τ > c0 (= cµ in (1.11) with
µ = 0), then
‖xδα − x†‖ =
o(δ
ua+u ) if u < a+ 2s or u = a+ 2s, µ0 > 1 ,
O(δua+u ) if u = a+ 2s and µ0 = 1
holds for u ≤ 2(a+ s)µ0 − a.
Remark 3.11 Assume, e.g., that a = 1, s = a and u = 3. Then a convergence rate
of O(δ3/4) can be expected for Tikhonov regularization. Note that due to saturation,
O(δ2/3) is the best possible rate for standard Tikhonov regularization (cf. Section 2.1).
If L commutes with T ∗T , e.g., for L = (T ∗T )−a2 , or if the source condition x† ∈ Xu is
replaced by Lsx† = (B∗B)u−s
2(a+s)w (see Theorem 4.10), the restriction u ≤ a+ 2s can be
replaced by u− s ≤ 2(a+ s)µ0, where µ0 denotes the qualification of the regularization
method under consideration, i.e., the largest µ such that (1.11) holds. The above results
naturally apply also for iterative regularization methods: there, α has to be replaced
by 1/k respectively 1/k2 for semiiterative regularization methods with optimal speed
of convergence, cf. Section 2.2 and [26, 36].
3.3 Iterative regularization of nonlinear problems
in Hilbert scales
In [68], the convergence rates of Theorem 3.10 have been generalized to Landweber
iteration for nonlinear problems F (x) = y. The corresponding iteration has the form
xδk+1 = xδk + L−2sF ′(x†)∗(yδ − F (xδk)), k ≥ 0 . (3.17)
28 CHAPTER 3. REGULARIZATION IN HILBERT SCALES
As in the linear case, (3.17) corresponds to the usual Landweber iteration if the operator
F is considered as an operator on Xs (while as above ∗ denotes the adjoint with respect
to the spaces X and Y).
For later reference and a comparison with our results below, we recall the basic
assumptions and convergence (rates) results of [68]:
Assumption 3.12
(i) F : D(F ) ⊂ X → Y is continuous and Frechet-differentiable in X .
(ii) (1.16) has a solution x†; moreover, Bρ(x†) := x ∈ X : ‖x − x†‖0 ≤ ρ ⊂ D(F )
for some ρ > 0.
(iii) ‖F ′(x†)x‖Y ∼ ‖x‖−a for all x ∈ X and some a > 0.
(iv) ‖F ′(x†)∗ − F ′(x)∗‖Y,Xb ≤ c‖x† − x‖β0 for all x ∈ Bρ(x†) and some b ∈ [0, a],
β ∈ (0, 1], and c > 0.
(v) B := F ′(x†)L−s is such that ‖B‖X ,Y ≤ 1.
(vi) x† − x0 ∈ Xu for some a−bβ< u ≤ b+ 2s.
For a discussion of the conditions, see [26, 68] or Remark 4.23 below. Under Assump-
tion 3.12, the following convergence rates for Landweber iteration in Hilbert scales hold:
Theorem 3.13 Let Assumption 3.12 and ‖y − yδ‖ ≤ δ hold. Moreover, let k∗ be the
termination index determined by the discrepancy principle (1.14) with τ sufficiently
large, and let ‖x† − x0‖u be sufficiently small. Then
k∗ = O(δ−
2(a+s)a+u
)(3.18)
and
‖x† − xδk‖0 = O(δ
ua+u
). (3.19)
The convergence rate (3.19) coincides with the one of Theorem 3.10. Saturation at
u = a+ 2s corresponds to µ = 1/2 for standard Landweber iteration (cf. Theorem 2.6,
and the discussion in [26, 39, 68]).
A major drawback of iterative regularization in Hilbert scales with s ≥ 0 is that the
number of iterations needed for optimal convergence, see (3.18), increases with s, i.e.,
for the reconstruction of non-smooth solutions it is numerically advantageous to choose
s as small as possible. As we will see in the next section, it is even possible to choose
s < 0, in which case the operator L−2s in (3.17) acts as a preconditioner. Additionally,
we will show below that the restrictive condition (3.15) respectively (iii) can be relaxed
substantially.
Chapter 4
Preconditioning Iterative
Regularization in Hilbert Scales
The main motivation for regularization in Hilbert scales was originally to overcome
saturation effects of the standard methods to a certain extent. One of the drawbacks of
this approach is that a precise knowledge of the ill-posedness of the involved operators
is required, e.g., for linear problems Tx = y the condition (3.15), which is
m‖x‖−a ≤ ‖Tx‖ ≤ m‖x‖−a.
As we will show below, this assumption can be relaxed substantially in case s ≤ 0.
Another disadvantage of iterative regularization methods in Hilbert scales with s > 0
is that the number of iterations needed to guarantee optimal convergence rates increases
with s, e.g., for Landweber iteration in Hilbert scales one has (cf. Theorem 3.13)
k∗ = O(δ2(a+s)a+u ).
Thus, from a numerical point of view, it is favorable to choose s as small as possible if
one does not expect the solution to be very smooth and thus saturation effects play a
minor role.
For motivation, consider Landweber iteration applied to Tx = y. The corresponding
Hilbert scale iteration (corresponding to standard Landweber iteration with T consid-
ered as operator on Xs) takes the form
xδk+1 = xδk + L−2sT ∗(yδ − Txδk), k ≥ 0, xδ0 = x0. (4.1)
Note, that L−2sT ∗ is the adjoint of T with respect to the spaces Xs and Y . One has
to assume that s ≥ −a/2 in (4.1), otherwise the iteration is not even well-defined as
iteration on X for general yδ ∈ Y .
In order to keep the number of iterations, and thus the overall numerical effort as
small as possible, we are especially interested in the case s < 0 below, in which case the
action of L−2s in (4.1) can be interpreted as preconditioning. Note, that the standard
29
30 CHAPTER 4. PRECONDITIONING IN HILBERT SCALES
theory for Landweber iteration (cf. Theorem 2.2) yields convergence and convergence
rates only in the space Xs, i.e., with respect to the weaker norm ‖ · ‖s (s < 0). Thus,
the main aim of the convergence analysis below is to show that the preconditioned
iterations still provide (optimal) convergence rates in the usual space X . In contrast to
the standard case of regularization in Hilbert scales (s ≥ 0), where convergence in Xsalready implies convergence in X , the situation is more involved in case s < 0. Here, it
has to be proven first that the iterations stay at least bounded in X .
In the sequel we state and discuss the main assumptions for regularization in Hilbert
scales with s < 0 in detail. In Section 4.2, we investigate regularization of linear inverse
problems and derive the main convergence rates results for a class of semiiterative
regularization methods and the method of conjugate gradients applied to the normal
equations (cgne). Nonlinear inverse problems will be investigated in Section 4.3.
4.1 Main Assumptions and Preliminary Results
Our analysis of regularization in Hilbert scales for the case s ≤ 0 is mainly based on
the following condition (cf. Proposition 3.6)
Assumption 4.1 There exists an a > 0 and m > 0 such that (3.7) holds, i.e.,
‖Tx‖ ≤ m‖x‖−a for all x ∈ X .
Moreover, the extension of T to X−a (again denoted by T ) is injective.
Usually, for the analysis of regularization methods in Hilbert scales, the stronger
condition (3.15) is used, i.e.,
m‖x‖−a ≤ ‖Tx‖ ≤ m‖x‖−a(cf., e.g., [63, 66] and Theorem 3.10). However, (3.7) might still be satisfied, even if
(3.15) does not hold. It might also be possible that an estimate from below can be
given in a weaker norm, e.g., there exist a ≥ a and m > 0 such that (3.11) holds, i.e.,
‖Tx‖ ≥ m‖x‖−a for all x ∈ X ,
see Section 5.1.1 for a detailed example.
Important implications of Assumption 4.1 follow readily from the first part of Propo-
sition 3.6 recalling that regularization in Hilbert scales means to consider the operator
T as operator on Xs. The standard source condition (for regularization in Xs) with
µ = 1 then reads x† = L−2sT ∗Tws for some ws ∈ Xs, or equivalently with the notation
B = TL−s
x† = L−s(B∗B)Lsws = L−s(B∗B)w.
Thus, the space R(L−s(B∗B)) is the natural source set (with µ = 1) for regularization
in Hilbert scales. The following definition of a shifted Hilbert scale generalizes this sets
to arbitrary µ, and hence will play an important role in our convergence analysis below:
4.1. MAIN ASSUMPTIONS AND PRELIMINARY RESULTS 31
Definition 4.2 Let a and B be as in Assumption 4.1. For s ≥ −a/2, we define the
shifted Hilbert scale X sr r∈R by
X sr := D((B∗B)
s−r2(a+s)Ls) (4.2)
equipped with the norm
|||x|||r := ‖(B∗B)s−r
2(a+s)Lsx‖X . (4.3)
Remark 4.3 Note that X sr r∈R is generally no Hilbert scale over X , in particular,
X s−r is not the dual space of X s
r in general. However, it turns out that X sr+sr∈R is a
Hilbert scale over Xs (see Proposition 4.4(v) below). This means that the spaces X su are
the natural candidates for the formulation of source conditions, when the operator T is
considered as operator on Xs. In fact, one has
X su = (T ]T )
u−s2(a+s)ws, ws ∈ Xs,
where T ] = L−2sT ∗ denotes the adjoint of T with respect to Xs and Y . Also note, that
the spaces X sr coincide with the usual source sets R((T ∗T )µ) in case s = 0 and with
r = 2aµ. The standard convergence results may be applied and imply (optimal) rates
of convergence, i.e.,
‖xδk − x†‖s = O(δu−su+a )
provided that x† ∈ X su . The main statement of our analysis below will be that the
convergence rates can be shifted to the usual space X , i.e., we will show that the usual
(optimal) rates
‖xδk − x†‖0 = O(δuu+a )
also hold for regularization in Hilbert scales with s ≤ 0, even under our relaxed as-
sumptions.
For our analysis below, we will frequently use the next proposition, which summa-
rizes the basic properties of the shifted Hilbert scale X sr r∈R:
Proposition 4.4 Let Assumption 4.1 hold, let −a/2 ≤ s, and let (X sr )r∈R be defined
as above. Then the following assertions hold:
(i) The space X sq is continuously embedded in X s
p for p < q, i.e., for x ∈ X sq
|||x|||p ≤ γp−q |||x|||q , (4.4)
where γ is such that
〈 (B∗B)−1
2(a+s)x, x 〉 ≥ γ‖x‖2 for all x ∈ D((B∗B)−1
2(a+s) ) .
(ii) The interpolation inequality holds, i.e., for all x ∈ X sr
|||x|||q ≤ |||x|||r−qr−pp |||x|||
q−pr−pr , p < q < r . (4.5)
32 CHAPTER 4. PRECONDITIONING IN HILBERT SCALES
(iii) For s ≤ r ≤ a+ 2s,
‖x‖r ≤ mr−sa+s |||x|||r for all x ∈ X s
r ⊂ Xr. (4.6)
and for −a ≤ r ≤ s,
‖x‖r ≥ mr−sa+s |||x|||r for all x ∈ Xr ⊂ X s
r . (4.7)
In particular, if −a/2 ≤ s ≤ 0, we obtain
‖x‖0 ≤ m−sa+s |||x|||0 for all x ∈ X s
0 ⊂ X0. (4.8)
Moreover,
|||x|||−a = ‖Tx‖ for all x ∈ X . (4.9)
(iv) If in addition (3.11) is satisfied, then with p = s+ r−sa+s
(a+s) the following estimates
hold:
for s ≤ r ≤ a+ 2s
‖x‖p ≥ mr−sa+s |||x|||r for all x ∈ Xp ⊂ X s
r , (4.10)
and for −a ≤ r ≤ s,
‖x‖p ≤ mr−sa+s |||x|||r for all x ∈ X s
r ⊂ Xp.
(v) X sr+sr∈R is the Hilbert scale induced by A := L−s(B∗B)−
12(a+s)Ls over Xs.
Proof. In view of Remark 4.3, the proof of (i) – (iv) follows from Proposition 3.2
and Proposition 3.6.
To prove (v) we show that A is a densely defined, unbounded, self-adjoint, strictly
positive operator on Xs. First of all, any element x ∈ D(A) has the representation
x = L−s(B∗B)1
2(a+s) z with z ∈ X . Since T is injective by assumption, the space
Lsx : x ∈ D(A) = R((B∗B)1
2(a+s) ) is dense in X and hence D(A) is dense in Xs.Next, for x, y ∈ D(A),
〈Ax, y 〉s = 〈LsL−s(B∗B)−1
2(a+s)Lsx, Lsy 〉0= 〈Lsx, (B∗B)−
12(a+s)Lsy 〉0 = 〈 x,Ay 〉s.
Finally, with γ as in (i), we obtain that
‖Ax‖s = ‖LsL−s(B∗B)−1
2(a+s)Lsx‖0 ≥ γ‖Lsx‖ = γ‖s‖ ,
4.2. LINEAR PROBLEMS 33
where γ is as in (i). It remains to be shown that the Hilbert scale induced by A over
Xs coincides with X sr+s. For r ∈ N we have that
D(Ar) = D((L−s(B∗B)−1
2(a+s)Ls)r)
= D(L−s(B∗B)−r
2(a+s)Ls) as domain in Xs= D((B∗B)−
r2(a+s)Ls) as domain in X
= X sr+s .
For r ∈ R the assertion follows by spectral theory. ¤
Remark 4.5 Applying (4.6), (4.7) with r = s shows that the spaces Xs = X ss coincide
with identical norms (cf. Remark 4.3). However, in general Xr 6⊂ X sr for r 6= s, in
particular, if s < 0, then in general X0 6⊂ X s0 , and the source condition x† ∈ X s
u is
usually stronger than the condition x† ∈ R((T ∗T )u2a ).
If, however, the norm equivalence (3.15) holds, then the spaces Xu, R((T ∗T )u2a ), and
X su coincide for −a ≤ u ≤ a + 2s. In case only a weaker estimate (3.11) from below
holds, one has X su ⊂ R((T ∗T )
u2a ) ⊂ X s
u with u = aau as long as u ≤ a + 2s. The right
inequality even holds for u ≤ a+ 2s.
4.2 Linear Problems
In this section we investigate the regularizing properties of (iterative) regularization
methods in Hilbert scales for linear problems
Tx = y,
under the following (relaxed) assumptions.
Assumption 4.6 Let T : X → Y denote a bounded linear operator, and assume:
(L1) Tx = y has a solution x†.
(L2) ‖Tx‖ ≤ m‖x‖−a for all x ∈ X and some a > 0,m > 0. Moreover, the extension
of T to X−a (again denoted by T ) is injective.
(L3) B := TL−s is such that ‖B‖X ,Y ≤ 1, where −a/2 ≤ s ≤ 0.
The existence of a solution (L1) will be needed for the convergence analysis in case
the regulariaztion parameter is chosen by the discrepancy principle. For the results
under a-priori rules, only y ∈ R(T ) + R(T )⊥ is required. (L2) is Assumption 4.1 in
Section 4.1, and (L3) is a simple scaling condition.
34 CHAPTER 4. PRECONDITIONING IN HILBERT SCALES
4.2.1 Semiiterative Regularization Methods in Hilbert Scales
A general semiiterative regularization method in Hilbert scales has the form (compare
Section 2.2)
xδk = µ1,kxδk−1 + . . .+ µk,kx0 + ωkL
−2sT ∗(yδ − Txδk), k ≥ 1,∑k
i=1 µi,k = 1, ωk 6= 0.(4.11)
As for Landweber iteration in Hilbert scales (cf. (4.1)), the only algorithmical difference
to the standard iterations is that the residuals T ∗(yδ − Txk) are preconditioned with
L−2s. The iterates defined by (4.11) have the closed form representation
xδk = x0 + L−sgk(B∗B)B∗(yδ − Tx0),
where gk is a polynomial of order k− 1 with gk(0) = 0. Moreover, the following expres-
sions for the approximation and the propagated data error hold:
xδk − xk = L−sgk(B∗B)B∗(yδ − y),
xk − x† = L−srk(B∗B)Ls(x0 − x†), (4.12)
where rk(λ) = 1 − λgk(λ). In order to ensure stability of the approximations xδk, the
iteration (4.11) has to be stopped appropriately, e.g., according to the discrepancy
principle (2.7)
‖yδ − Txδk∗‖ ≤ τδ < ‖yδ − Txδk‖ , 0 ≤ k < k∗
for some τ > sup|rk(λ)| : λ ∈ [0, 1]. According to Theorem 1.2, the approximation
quality of a regularization method is determined by its modulus of convergence ωµ.
Thus, we require
supλ∈[0,1]
λµ|rk(λ)| := ωµ(k) ≤ cµ(k + 1)−σµ, 0 ≤ µ ≤ µ0, (4.13)
to hold for some σ > 0 and µ0 > 0; we are especially interested in the cases σ ∈ 1, 2(cf. Theorem 2.2).
For the proof of of the main convergence statements, we will need the following
Lemma:
Lemma 4.7 Let the residual polynomial rk satisfy |rk| ≤ c0 for all λ ∈ [0, 1]. Then
λµ|gk(λ)| ≤ 2c0kσ(1−µ), for all 0 ≤ µ ≤ 1, λ ∈ [0, 1].
Proof. By rk(λ) = 1− λgk(λ), we obtain for 0 ≤ µ ≤ 1 and λ 6= 0
λµgk(λ) = λµ−1(1− rk(λ))
= [λ−1(1− rk(λ))]1−µ[1− rk(λ)]µ.
4.2. LINEAR PROBLEMS 35
Now, by the Mean Value Theorem, one can find a λ ∈ [0, 1] such that
λ−1(1− rk(λ)) = −r′k(λ),
which together with Markov’s inequality (|r′k(λ)| ≤ 2c0k2) and |rk(λ)| ≤ c0 for λ ∈ [0, 1]
yields
λµgk(λ) ≤ 2c0k2(1−µ) for λ ∈ [0, 1].
Note that c0 ≥ 1, since rk(0) = 1. ¤
We are now in the position to state the main results:
Proposition 4.8 Let Assumption 4.6 hold and let xδk be defined by the semiiterative
method (4.11) satisfying (4.13) for some µ0 > 0 and 0 < σ ≤ 2. Additionally, assume
that x† − x0 ∈ X su , i.e.,
x† − x0 = L−s(B∗B)u−s
2(a+s)w, (4.14)
for some w ∈ X and 0 < u ≤ 2(a+ s)µ0 − a. Then
‖xδk − x†‖ ≤ c (δkσa
2(a+s) + k−σu
2(a+s) ‖w‖).
Proof. Denote by C0 = max cµ : 0 ≤ µ ≤ min(µ0,a+u
2(a+s)). Using the source
condition (4.14) and the representation (4.12), we get with (4.3), (4.14) and (4.13)
|||xk − x†|||u = ‖(B∗B)s−u
2(a+s)LsL−srk(B∗B)(B∗B)
u−s2(a+s)w‖
≤ ‖rk(B∗B)‖ ‖w‖ ≤ C0‖w‖ .Similarly, with (4.9) and (4.13), we derive
|||xk − x†|||−a = ‖Txk − Tx†‖= ‖(B∗B)
a+u2(a+s) rk(B
∗B)w‖≤ C0(k + 1)−
σ(a+u)2(a+s) ‖w‖ .
Now, the interpolation inequality (4.5) yields
|||xk − x†|||0 ≤ |||xk − x†|||ua+u
−a |||xk − x†|||a
a+uu
≤ C0(k + 1)−σu
2(a+s) ‖w‖ .Next, we estimate the propagated data error: similarly as above, we get with (4.9) for
−a ≤ r ≤ a+ 2s
|||xδk − xk|||r = ‖(B∗B)s−r
2(a+s) gk(B∗B)B∗(yδ − y)‖
≤ δ ‖(B∗B)a+2s−r2(a+s) gk(B
∗B)‖≤ 2C0δ(k + 1)
σ(a+r)2(a+s) ,
where we have used Lemma 4.7 with µ := a+2s−r2(a+s)
and 0 ≤ µ ≤ 1 for−a ≤ r ≤ a+2s since
s ≥ −a/2. Combining the estimates for the approximation error and the propagated
data error, and using (4.8), i.e., ‖x‖0 ≤ |||x|||0 for s ≤ 0, now yields the assertion. ¤
36 CHAPTER 4. PRECONDITIONING IN HILBERT SCALES
Remark 4.9 Under our assumptions, the source condition x† − x0 ∈ X su may be
stronger than the usual source condition x† − x0 ∈ Xu (cf. Proposition 3.6). As a
consequence, the usual restriction u ≤ a + 2s can be replaced by the weaker condition
0 < u ≤ 2(a+s)µ0−a (cf. also [71]). For µ0 = 1, this coincides with the usual restriction
u ≤ a + 2s. As already mentioned in Remark 4.5, the spaces Xu and X su coincide for
u ≤ a+2s if the stronger condition (3.15) holds. In case only the weaker estimate (3.11)
is valid, one still has X su ⊂ Xu, with u = (u − s) a+s
a+s+ s. In particular, since s ≤ 0, an
estimate (3.11) from below is only needed to interpret the source condition x†−x0 ∈ X su
in terms of the spaces Xs.As can be seen from the proof, the statement of Proposition 4.8 can be strengthened
in several ways: in fact, we have even shown that
|||xδk − x†|||r ≤ Cr(δka+r
(2)(a+s) + k−u−r
(2)(a+s) ‖w‖)
holds for −a ≤ r ≤ u. However, in case r > 0, an additional restriction r ≤ a + 2s is
needed. Note also, that the restriction s ≤ 0 was only used to derive the estimate in
the original norm ‖ · ‖ , in which we are actually interested. Hence, Proposition 4.8 can
be generalized in another way, i.e., one has
‖xδk − x†‖r ≤ Cr(δkσ(a+r)2(a+s) + k−
σ(u−r)2(a+s) ‖w‖)
for s ≤ r ≤ mina+ 2s, u.
As an immediate consequence of Proposition 4.8, we have at least convergence if
the iteration is stopped after k∗(δ) steps with k∗(δ) → ∞ and δkσa
2(a+s)∗ → 0. In order
to derive convergence rates in terms of δ, the number of iterations has to be bounded
appropriately:
Theorem 4.10 Let the assumptions of Proposition 4.8 hold. If the iteration (4.11) is
stopped according to the a priori rule k∗ ∼ δ−2(a+s)σ(a+u) then
‖xδk − x†‖ = O(δua+u ).
If, alternatively, the iteration is stopped according to the discrepancy principle (1.14)
then
‖xδk − x†‖ = O(δua+u ) and k∗ = O(δ−
2(a+s)σ(a+u) ). (4.15)
Proof. The first statement follows directly from Proposition 4.8. For the second,
observe that by Proposition 4.4, (x† − x0) ∈ X su is equivalent to
(x† − x0) ∈ R(L−s(B∗B)u−s
2(a+s) ) = R((T ]T )u−s
2(a+s) ),
where as above T ] = L−2sT ∗ denotes the adjoint of (the extension of) T with respect to
the spaces Xs and Y . Thus, for σ = 1 we obtain k∗ = k∗(δ, yδ) = O(δ−
2(a+s)a+u ) by Theorem
4.2. LINEAR PROBLEMS 37
6.5 in [28], when (4.11) is understood as the standard iteration for T considered as
operator from Xs → Y and the iteration is stopped by the discrepancy principle (1.14).
The estimates for σ 6= 1 follow similarly.
For the proof of the convergence rate, we proceed similarly as in the proof of Theorem
4.17 in [28]: by the interpolation inequality and (4.14), we can estimate ek = xk−x† by
|||ek|||0 = ‖(B∗B)u
2(a+s) rk(B∗B)w‖
≤ c0‖Brk(B∗B)Ls(x† − x0)‖ ua+u ‖w‖ a
a+u .
The discrepancy principle, (4.13), Proposition 4.8, and the bound on k∗ further yield
that
‖Brk(B∗B)Ls(x† − x0)‖ = ‖T (xk∗ − x0)‖ ≤ (τ − 1)δ,
holds for u ≤ 2(a+ s)µ0 − a, and thus
‖ek∗‖ ≤ c δua+u ‖w‖ a
a+u .
In a similar way as in Proposition 4.8 it hence follows with the estimate for k∗ that
‖eδk∗‖ ≤ ‖xδk∗ − xk∗‖ + ‖ek∗‖ = O(δua+u ).
¤
Remark 4.11 We conjecture that in analogy to Theorem 8.25 in [28] it is even possible
to derive o(·)-bounds in (4.15), which then would include the case u = 0 in our con-
vergence analysis. If (3.15) is valid, the rates (4.15) are optimal (i.e., the best possible
worst case error bounds under the given source condition). Note, that the convergence
rates do not depend on s, while the stopping index k∗ does. This suggests to choose s
as small as possible, i.e., s = −a/2, in which case the number of iterations is bounded
by k∗ = O(δ−a
a+u ) for x† ∈ R((T ∗T )u2a ) ∩ X s
u .
At this point, we want to discuss in more detail, in which sense the Hilbert scale
operator L−2s acts as a preconditioner:
Remark 4.12 For simplicity consider the standard Landweber iteration
xδk+1 = xδk + T ∗(yδ − Txδk).
If the operator T is smoothing, e.g., if T : X → Y has a continuous extension to Xsfor some s < 0, then by Proposition 4.4 the inclusion R(T ∗) ⊂ X−s holds, and thus
T ∗(yδ − Txδk) ∈ X−s, i.e., the updates xδk+1 − xδk are smoother then actually needed.
This can be exploited by preconditioning with L−2s. If, e.g., s = −a/2 and if (3.15)
holds, then the backprojection operator L−2sT ∗ = LaT ∗ appearing in the preconditioned
iteration
xδk+1 = xδk + LaT ∗(yδ − Txδk)
38 CHAPTER 4. PRECONDITIONING IN HILBERT SCALES
is not smoothing any more; to be more specific, for s = −a/2 we obtain
‖LaT ∗y‖ ∼ ‖(T ∗T )−12T ∗y‖ ∼ ‖y‖.
This means that La is an optimal preconditioner for T ∗ and the iteration operator Ma
of Landweber iteration, appearing in the preconditioned normal equation
Max := LaT ∗Tx = LaT ∗yδ
has the same smoothing properties as the operator T in the original equation Tx = y,
while being selfadjoint as operator on X−a/2. Moreover, if the operator T is selfad-
joint and if we use (T ∗T )−1/2 as a preconditioner, then the preconditioned Landweber
iteration amounts to stationary Richardson iteration for the original equation Tx = y.
Note, that it is not possible to choose s < −a/2, e.g., s = −a, in which case one
would have ‖M2ax‖ = ‖L2aT ∗Tx‖ ∼ ‖x‖. If s < −a/2, the iteration (4.11) is not even
well-defined as iteration in X for general yδ ∈ Y , but only as iteration in X−a.
4.2.2 The Conjugate Gradient Method in Hilbert Scales
The method of conjugate gradients is known as a powerful method for solving selfad-
joint, positive (semi-)definite linear problems. The question of applicability to ill-posed
problems, i.e., to the normal equations
T ∗Tx = T ∗y (4.16)
has first been addressed by Kammerer and Nashed [50]. For details of the convergence
analysis in case of noisy data y 6= yδ and for convergence rates we refer to [28, Chapter
7] and the references cited therein. We shortly summarize the main properties of the
conjugate gradient method applied to the normal equations (cgne), see also Section 2.2:
First of all, cgne is a Krylov-subspace method, hence the k-th iterate can be written
as
xδk = x0 + gk(T∗T ; yδ)T ∗(yδ − Tx0).
Note, that in contrast to the (linear) semiiterative methods (2.10), the iteration poly-
nomials gk themselves depend on the data yδ, which makes cgne a nonlinear method.
The main convergence result is that cgne is an order-optimal regularization method
when stopped according to the discrepancy principle (1.14), i.e., the optimal rates
‖xδk∗ − x†‖ = O(δ2µ
2µ+1 )
hold. Another remarkable property of cgne, which alternatively characterizes the
method, is that the iterates xδk satisfy the following optimality condition (cf. [28, The-
orem 7.3]):
‖yδ − Txδk‖ = min‖yδ − Tx‖ : x− x0 ∈ Kk(T ∗T, T ∗(yδ − Tx0)),
4.2. LINEAR PROBLEMS 39
which implies that cgne is the method with minimal stopping index k∗ under all Krylov-
subspace methods, in particular, under all semiiterative regularization methods stopped
according to the discrepancy principle (2.7). As we will show below, this optimality
property carries over to the preconditioned version. It is not clear from the beginning,
if cgne in Hilbert scales still yields (optimal) convergence rates in the usual space Xand if the preconditioning will actually reduce the number of iterations compared to
cgne in the standard spaces. The reason for that is the following (cf. [28, Theorem
7.14]): in case T is compact and a decay rate for the singular values of the operator
T is known, it is possible to derive stronger bounds on the size of the stopping index
k(δ, yδ), i.e., if the singular values σn of T decay like O(n−α) for some α > 0, then
k(δ, yδ) = O(δ−1
(2µ+1)(α+1) ), (4.17)
and if the singular values decay like O(qn) with some q < 1, then
k(δ, yδ) = O(1 + | log δ|). (4.18)
As we will see below, the operator B = TL−s determines the behavior of the precondi-
tioned iteration. However, in case s < 0, the singular values σn of B decay at a slower
rate then the ones of T , hence the term (α + 1) in (4.17) will in general decrease with
decreasing s.
The aim of the subsequent analysis is twofold: first we want to show that in case or
preconditioning it is still possible to lift the convergence rates from Xs to the original
space X ; secondly, the bounds on the number of iterations (4.17), (4.18) can be fur-
ther improved when the iteration is preconditioned in Hilbert scales. In the sequel, we
consider the following (preconditioned) Hilbert scale version of the cgne method:
Algorithm 4.13 (hscgne)
x0 = x∗; d0 = yδ − Tx0; w0 = T ∗d0; p1 = s0 = L−2sw0
for k = 1, 2, . . ., unless sk−1 = 0, compute
qk = Tpk
αk = 〈 sk−1,wk−1 〉‖qk‖2
xk = xk−1 + αkpk
dk = dk−1 − αkqkwk = T ∗dk
sk = L−2swk
βk = 〈 sk,wk 〉〈 sk−1,wk−1 〉
pk+1 = sk + βkpk
end
40 CHAPTER 4. PRECONDITIONING IN HILBERT SCALES
In case L = I, Algorithm 4.13 reduces to to the standard cgne- algorithm (cf. [28]).
The following convergence analysis of the hscgne - method follows the one of cgne
presented in [28, Chapter 7].
As already observed in Section 4.2.1, Algorithm 4.13 can be viewed as standard
cgne iteration for Tx = y when T is considered as operator on Xs. This observation
immediately yields
Corollary 4.14 Let Assumption 4.6 hold, xδk be defined by Algorithm (4.13) and
k(δ, yδ) be determined by the discrepancy principle (1.14). If x0 − x† ∈ X su , then with
µ = u−s2a
,
‖xδk(δ,yδ) − x†‖s = O(δ2µ
2µ+1 ). (4.19)
Moreover, the following optimality condition holds for the iterates xδk:
‖yδ − Txδk‖ = min‖yδ − Tx‖ : x− x0 ∈ Kk(L−2sT ∗(yδ − Tx0), L−2sT ∗T ). (4.20)
Proof. We assume for brevity that x0 = 0. Then (4.19) follows directly from The-
orem 2.3 by observing that Algorithm 4.13 corresponds to cgne for Bz = y with
z = Lsx, and that X su = R((T ]T )
u−s2(a+s) ) (cf. Remark 4.3). Secondly, with Txδk = Bzδk it
follows from [28, Theorem 7.3] that
‖yδ − Txδk‖ = ‖yδ − Bzδk‖= min‖yδ − Bz‖ : z − z0 ∈ Kk(B∗(yδ − Bz0), B∗B)= min‖yδ − Tx‖ : x− x0 ∈ Kk(L−2sT ∗(yδ − Tx0), L−2sT ∗T ).
¤
We will now show that, as for linear semiiterative methods, the convergence rates in
Xs can be lifted to the usual space X . In the following, let κ denote the smallest index
with sκ = 0, respectively κ = +∞, if sk 6= 0 for all k > 0. By gk, rk we denote the
iteration respectively residual polynomials of the hscgne method. For the proof of the
main convergence theorem, we will require some auxiliary results (cf. [28, Section 7]:
Lemma 4.15 Let Assumption 4.6 and (4.14) hold and |||x† − x0|||u ≤ ρ. Then, for
0 < k ≤ κ,
‖yδ − Txδk‖ ≤ δ + c|r′k(0)|− u+a2(a+s)ρ.
Proof. Let z = Lsx and rewrite Tx = y as Bz = y with B as in Assumption 4.6. Now,
with (4.14), z† − z0 = Ls(x† − x0) = (B∗B)u−s
2(a+s)w. The result then follows analogously
to Lemma 7.10 in [28] by replacing T , x and µ by B, z and u−s2(a+s)
, respectively. ¤
Lemma 4.16 Under the assumptions of Lemma 4.15
‖xδk − x†‖ ≤ c (ρa
u+a δuu+a
k + |r′k(0)| a2(a+s) ),
4.2. LINEAR PROBLEMS 41
where δk := max(‖Txδk − yδ‖ , δ).
Proof. Let Eλ denote the spectral family associated with the operators B∗B and
let z = L−sx be as above. Then we have, cf. [28, Lemma 7.11],
‖xδk − x†‖ = ‖L−s(zδk − z†)‖ ≤ ‖(B∗B)s
2(a+s) (zδk − z†)‖≤ ‖Eε(B∗B)
s2(a+s) rk(B
∗B)(B∗B)u−s
2(a+s)w‖+ ‖Eε(B∗B)
s2(a+s) gk(B
∗B)B∗(y − yδ)‖ + ε−a
2(a+s) ‖y − Bzδk‖≤ ‖λ u
2(a+s) rk(λ)‖C[0,ε]ρ+ ‖λ a+2s2(a+s) gk(λ)‖C[0,ε]δ + ε−
a2(a+s) (‖yδ − Txδk‖ + δ).
Similarly as in the proof of Lemma 7.11 in [28], we have for ε < λ1,k (where λi,ki=1,...,k
denotes the strictly increasing sequence of real roots of the polynomial rk) that rk is
convex in [0, ε] and hence for λ ∈ [0, ε]
0 ≤ λa+2sa+s g2
k(λ) =
∣∣∣∣1− rk(λ)
λ
∣∣∣∣
aa+s
|1− rk(λ)| a+2sa+s ≤ |r′k(0)| aa+s .
Therefore,
‖xδk − x†‖ ≤ εu
2(a+s)ρ+ |r′k(0)| a2(a+s) δ + 2ε−
a2(a+s) δk =: f(ε).
Note that f(ε) is monotonically decreasing in (0, ε∗) and increasing in (ε∗,∞) where ε∗
is defined by εu+a
2(a+s)∗ = 2a
uδkρ
. The rest of the proof follows the lines of the one of Lemma
7.11 in [28]. ¤
Combining the previous lemmata yields the following convergence rates result:
Theorem 4.17 Let Assumptions 4.6 hold and let the Algorithm 4.13 be stopped ac-
cording to the discrepancy principle (1.14) with k∗ = k(δ, yδ). If x0 − x† ∈ X su , i.e.,
(4.14) holds, then
‖xδk∗ − x†‖ = O(δua+u ).
Proof. By Lemma 4.15 with k = k(δ, yδ) and with ρ such that |||x† − x0|||u ≤ ρ, we
have
|r′k−1(0)| ≤ c(ρδ
) 2(a+s)u+a
.
The result now follows with obvious modifications of the proof of Theorem 7.12 in [28]
by replacing T , x, and 22µ+1
by B, z, and 2(a+s)u+a
, respectively. ¤
Remark 4.18 Similarly as for linear semiiterative methods, the convergence rates in
Theorem 4.17 can be strengthened: in fact, one easily derives the same rate in the norm
||| · |||0. In view of Remark 4.9, it is even possible to derive the corresponding rates for
||| · |||r for −a ≤ r ≤ a+2s. Furthermore, these rates are (order) optimal under the given
conditions. Like Landweber iteration, and in contrast to general semiiterative methods,
hscgne has no saturation, i.e., Theorem 4.17 holds for all u > 0.
42 CHAPTER 4. PRECONDITIONING IN HILBERT SCALES
Next we show that like for standard cgne, improved bounds on the number of
iterations, depending on the ill-posedness of the operators B = TL−s (cf. Theorem 2.4)
hold:
Theorem 4.19 Let Assumption 4.6 and (4.14) hold, and let B be compact. If the sin-
gular values σn of B decay like O(n−α) with some α > 0, then
k(δ, yδ) = O(δ−a+s
(a+u)(α+1) ).
If the singular values decay like O(qn) with some q < 1, then
k(δ, yδ) = O(1 + | log δ|).
Proof. The proof follows the lines of the proof of Theorem 7.14 in [28]; we only
mention the main differences: for given yδ, denote by rk the residual polynomials of
cgne applied to Bz = y with z = Lsx and for simplicity x0 = z0 = 0, in which case we
have
yδ − Txδk = rk(BB∗)yδ.
Together with (4.20) this implies that
‖yδ − Txδk‖ ≤ ‖pk(BB∗)yδ‖
holds for arbitrary polynomials pk with pk(0) = 1. The rest follows the lines of the
proof of Theorem 7.14 in [28]. ¤
Remark 4.20 If the stronger condition (3.15) holds, and 0 < u ≤ a + 2s, then one
can compare the above estimate with the ones of Theorem 2.4 in the following way: by
Proposition 3.6 we have in this case (B∗B)ν2 ∼ L−ν(a+s) and (T ∗T )
ν2 ∼ L−νa for |ν| ≤ 1.
Consequently, if the singular values of T decay like O(n−α), the corresponding singular
values of B decay like O(n−α), with α = αa+sa
, which can be seen in the following way:
let A1 and A2 be two compact, selfadjoint, spectrally equivalent operators, i.e.,
c〈A1x, x 〉 ≤ 〈A2x, x 〉 ≤ c〈A1x, x 〉 ∀x ∈ X
with eigenvalues λn(A1)n∈N, λn(A2)n∈N sorted in decreasing order. Then by a char-
acterization of the n-th eigenvalue due to Courant and Fischer [19], one has
λn(A1) = supVn⊂X
infx∈Vn\0
〈A1x, x 〉‖x‖2
≤ c supVn⊂X
infx∈Vn\0
〈A2x, x 〉‖x‖2
= c λn(A2).
Here Vn denote arbitrary subspaces of X with dim(Vn) = n. Hence the eigenvalues of
A1 and A2 decay at the same rate. The estimate for the singular values of T and B
then follows easily.
4.3. NONLINEAR PROBLEMS 43
Under condition (3.15), X su = R((T ∗T )µ) with µ = u
2a(cf. Remark 4.18), which
implies that
k(δ, yδ; s) ≤ O(δ−f(s)), with f(s) =a+ s
(a+ u)(αa+sa
+ 1).
Note that f(s) is strictly monotonically increasing in s, thus, for s < 0, the number of
hscgne-iterations needed to reach the discrepancy stopping criterion can be expected
to be smaller than that for standard cgne. For s = 0 (no preconditioning) we have
f(0) =a
(a+ u)(α + 1)=
1
(2µ+ 1)(α + 1),
which coincides with the estimate of Theorem 2.4.
Summarizing the results of this section, we have shown that for linear inverse prob-
lems Tx = y, preconditioning semiiterative methods or cgne in Hilbert scales yields
order-optimal regularization methods under Assumption 4.6. Furthermore, the number
of iterations needed to obtain the optimal rates are reduced substantially in compari-
son to the standard methods. This will be illustrated in several numerical examples in
Section 5.
4.3 Nonlinear Problems
As we have seen in Section 2.2, additional conditions are required for the convergence
analysis of iterative regularization methods for nonlinear inverse problems. We will
now turn to a discussion of adequate conditions for the preconditioned iterations and
investigate the convergence of iterative regularization methods in Hilbert scales in detail.
Before we state our assumptions for the convergence analysis of preconditioned
iterative regularization methods for nonlinear problems
F (x) = y,
we shortly recall the conditions and results for the standard iterations. Below, we will
investigate preconditioning of Landweber iteration
xδk+1 = xδk + F ′(xδk)∗[yδ − F (xδk)], (4.21)
and of a class of regularized Newton-type iterations of the form
xδn+1 = x0 + gαn(F ′(xδn)∗F ′(xδn))F ′(xδn)∗[yδ − F (xδn)− F ′(xδn)(xδn − x0)]. (4.22)
Convergence of iterative regularization methods for nonlinear inverse problems is usually
derived under the following (or similar) nonlinearity conditions (cf. [49] for details and
further references)
F ′(x) = R(x, x)F ′(x) +Q(x, x), (4.23)
44 CHAPTER 4. PRECONDITIONING IN HILBERT SCALES
with
‖I −R(x, x)‖ ≤ CR < 1 and ‖Q(x, x)‖ ≤ CQ‖F ′(x)(x− x)‖ (4.24)
for all x, x ∈ Bρ(x†).It is somehow clear, that in case the iterations (4.21) or (4.22) are preconditioned
by L−2s, the operators L−2s will also appear somewhere in the nonlinearity conditions.
We state and discuss the appropriate conditions in more detail now:
4.3.1 Basic Assumptions
Similarly to the conditions in Assumption 4.6 for linear problems, we require:
Assumption 4.21
(N1) F : D(F )(⊂ X )→ Y is continuous and Frechet-differentiable in X .
(N2) F (x) = y has a solution x†.
(N3) ‖F ′(x†)x‖ ≤ m‖x‖−a for all x ∈ X , some a > 0, and m > 0. Moreover, the
extension of F ′(x†) to X−a is injective.
(N4) B := F ′(x†)L−s is such that ‖B‖X ,Y ≤ 1, where s ≥ −a/2.
Note that under Assumption 4.21, the results of Proposition 3.6 and 4.4 hold verba-
tim for the linearized operator T := F ′(x†). For the following convergence rates analysis
for nonlinear problems in Hilbert scales, we need a smoothness condition on the solution
x† and additional conditions on the Frechet-derivative of F :
Assumption 4.22
(N5) x0 ∈ Bρ(x†) := x ∈ X : x− x† ∈ X s0 ∧ |||x− x†|||0 ≤ ρ ⊂ D(F ) for some ρ > 0.
(N6) For all x ∈ Bρ(x†) there exist linear operators R(x, x†) and Q(x, x†) such that
F ′(x) = R(x, x†)F ′(x†) +Q(x, x†),
with
‖I −R(x, x†)‖ ≤ CR < 1, ‖Q(x, x†)‖X s−b,Y ≤ CQ |||x− x†|||b−a,
for some b ∈ [0, a], β ∈ (0, 1], and CR, CQ > 0 independent of x.
(N7) x†−x0 ∈ X su for some 0 < u ≤ b+ 2s, i.e., there exists an element w ∈ X so that
Ls(x† − x0) = (B∗B)u−s
2(a+s)w . (4.25)
4.3. NONLINEAR PROBLEMS 45
Before we start our analysis we want to discuss the conditions above.
Remark 4.23 Condition (N6) is very similar to the nonlinearity conditions (4.23),
(4.24) used in [23, 47, 48] for the convergence analysis of Newton-type regularization
methods and (with Q = 0) for Landweber iteration in [39]. In fact, if (3.15) and (4.23),
(4.24) hold, i.e., ‖Q(x, x†)‖X ,Y ≤ CQ‖F ′(x†)(x− x†)‖ , then for b = 0, we have
‖Q(x, x†)‖X s−b,Y ≤ c ‖Q(x, x†)‖X ,Y ≤ C ‖F ′(x†)(x− x†)‖ = C |||x− x†|||−a.
Thus, in this case (N6) with b = 0 is implied by the conditions in [48]. We just mention,
that with minor modifications, our results hold true if we replace the estimate on
Q(x, x†) by ‖Q(x, x†)‖X s−b,Y ≤ CQ‖x − x†‖βc , where c and β will affect the range of
values for u, where the results actually hold, cf. [26] for details, where the condition
‖F ′(x†)− F ′(x)‖X s−b,Y ≤ cβ |||x† − x|||β0
has been used instead. Now, assume that a = b or that CQ = 0 and that a stronger
estimate ‖I − R(x, x†)‖ ≤ CR |||x− x†|||0 holds; then (N6) reduces to
‖F ′(x†)− F ′(x)‖X s−a,Y ≤ C |||x† − x|||0.
Due to (4.9), the operator F ′(x†) has a continuous extension to X s−a ⊃ X−a. Therefore,
condition (N6) implies that F ′(x) also has a continuous extension to X s−a ⊃ X in a
neighborhood of x†. By definition of the space X s−a, this condition is equivalent to
‖(B∗B)−a+s
2(a+s)L−s(F ′(x†)∗ − F ′(x)∗)‖Y,X ≤ C |||x† − x|||0. (4.26)
By virtue of (3.10) and Proposition 4.4 (iii), this implies that L−2sF ′(x)∗ maps Y at
least into X sa+2s ⊂ Xa+2s and hence F ′(x)∗ maps Y at least into Xa. Observe that for
s = 0 and under the above assumptions (N6) reduces to
‖(F ′(x†)∗F ′(x†))− 12 (F ′(x†)∗ − F ′(x)∗)‖Y,X ≤ c‖x† − x‖0,
cf. [39, (3.18)]. Moreover, this condition is equivalent to (N6) with CQ = 0 and ‖Rx−I‖replaced by ‖(Rx− I)P‖ , where P is the orthogonal projector from Y onto R(F ′(x†)).
Condition (N7) finally is a smoothness condition for the exact solution corresponding
to (4.14) for linear problems. Under the usual assumption for regularization in Hilbert
scales, i.e., ‖F ′(x)h‖ ∼ ‖h‖−a, this condition is equivalent to x† − x0 ∈ Xu, i.e., the
source condition
Ls(x† − x0) = (B∗B)u−s
2(a+s)w
can be interpreted in terms of the Hilbert scale Xs. If b = a, then u ≤ a + 2s is
allowed, which is the usual restriction for regularization in Hilbert scales. For s = 0 and
if (3.15) is valid, u ≤ a+ 2s reduces to µ ≤ 1/2.
46 CHAPTER 4. PRECONDITIONING IN HILBERT SCALES
4.3.2 Landweber Iteration in Hilbert Scales
We now turn to the analysis of preconditioned Landweber iteration in Hilbert scales,
namely
xδk+1 = xδk + L−2sF ′(xδk)∗(yδ − F (xδk)), k = 0, 1, 2, . . . . (4.27)
As in the linear case, (4.27) can be interpreted as standard Landweber iteration with
F considered as operator on Xs. For s = 0 (no preconditioning) the iterations coincide.
For a comparison of our convergence results to those of the standard methods and
Landweber iteration in Hilbert scales with s > 0 under the stronger assumption
m‖h‖−a ≤ ‖F ′(x†)h‖ ≤ m‖h‖−a, (4.28)
(cf. (3.15)), we refer to Sections 2.2.4 and 3.3, and the references cited there.
For the proof of the main convergence results, we will need the following two lem-
mata:
Lemma 4.24 ([68, Lemma 2.9]) Let µ, ν > 0. Then there exists a positive constant
cµ,ν independent of k such that
k−1∑
j=0
(k − j)−µ(j + 1)−ν ≤ cµ,ν(k + 1)1−µ−ν
1 , max(µ, ν) < 1 ,
ln(k + 1) , max(µ, ν) = 1 ,
(k + 1)max(µ,ν)−1 , max(µ, ν) > 1 .
The second lemma provides two estimates of eδk := xδk − x† in terms of k and δ (cf.
Proposition 4.8 for the linear case).
Lemma 4.25 Let Assumptions 4.21 and 4.22 hold and yδ ∈ Y satisfy ‖yδ − y‖ ≤ δ.
Moreover, let k∗ = k∗(δ, yδ) be chosen according to the stopping rule (2.13) with τ > 2,
and assume that |||eδj |||0 ≤ ρ for all 0 ≤ j < k ≤ k∗ and some ρ > 0, where eδj := xδj −x†.Then there is a positive constant C (independent of k and δ) such that for all 0 ≤ k ≤ k∗the following estimates hold:
|||eδk|||u ≤ |||x† − x0|||u + C
k−1∑
j=0
(k − j)−a+2s−u2(a+s) |||eδj |||−a
+ C
k−1∑
j=0
(k − j)−b+2s−u2(a+s) |||eδj |||−a + δk
a+u2(a+s)
(4.29)
and
|||eδk|||−a ≤ (k + 1)−a+u
2(a+s) |||x† − x0|||u + C
k−1∑
j=0
(k − j)−1 |||eδj |||−a
+ C
k−1∑
j=0
(k − j)−b+a+2s2(a+s) |||eδj |||−a + δ.
(4.30)
4.3. NONLINEAR PROBLEMS 47
Proof. From (4.27) we immediately obtain the representation
eδk+1 = (I − L−2sF ′(x†)∗F ′(x†))eδk
+L−2sF ′(x†)∗(yδ − y − qδk) + L−2spδk
with
qδk := F (xδk)− F (x†)− F ′(x†)eδk (4.31)
and
pδk := (F ′(xδk)∗ − F ′(x†)∗)(yδ − F (xδk)) (4.32)
= F ′(x†)∗(R∗ − I)(yδ − F (xδk)) +Q∗(yδ − F (xδk)) (4.33)
=: F ′(x†)∗pδ1,k + pδ2,k, (4.34)
where we used the notations R = R(xδk, x†) and Q = Q(xδk, x
†). Furthermore, we get
the closed form expression
eδk = L−s(I − B∗B)kLs(x0 − x†) +k−1∑
j=0
L−s(I − B∗B)k−j−1[B∗(yδ − y − qδj + pδ1,j) + L−spδ2,j] .
Together with (4.2) and (4.25) we now obtain the following estimates
|||eδk|||u ≤ ‖(I − B∗B)k‖ ‖w‖
+k−1∑
j=0
‖(B∗B)a+2s−u2(a+s) (I − B∗B)k−j−1‖(δ + ‖qδj‖ + ‖pδ1,j‖)
+k−1∑
j=0
‖(B∗B)b+2s−u2(a+s) (I − B∗B)k−j−1‖ ‖(B∗B)−
b+s2(a+s)L−spδ2,j‖
and
|||eδk|||−a ≤ ‖(B∗B)a
2(a+s) (I − B∗B)k‖ ‖w‖ +
+k−1∑
j=0
‖(B∗B)(I − B∗B)k−j−1‖(δ + ‖qδj‖ + ‖pδ1,j‖)
+k−1∑
j=0
‖(B∗B)a+b+2s2(a+s) (I − B∗B)k−j−1‖ ‖(B∗B)−
b+s2(a+s)L−spδ2,j‖0 .
Next, we derive estimates for ‖qδj‖ , ‖pδ1,j‖ , and ‖(B∗B)−b+s
2(a+s)L−spd2,j‖ . Assumption
(N6), (4.5), and (4.31) imply that
‖qδj‖ ≤∫ 1
0
‖F ′(xt)− F ′(x†))eδj‖ dξ
≤∫ 1
0
‖R(xt, x†)− I‖ ‖F ′(x†)eδj‖dt+
∫ 1
0
‖Q(xt, x†)‖X s
−b,Y |||eδj |||−b (4.35)
≤ CR |||eδj |||−a + CQ |||eδj |||b−a |||eδj |||−b ≤ |||eδj |||−a(CR + CQ |||eδj |||0),
48 CHAPTER 4. PRECONDITIONING IN HILBERT SCALES
with xt = x† + t (xδj − x†).Since τ > 2, (2.13) implies that for all 0 ≤ k < k∗
‖yδ − F (xδk)‖ < 2‖y − F (xδk)‖ .
Thus, we obtain that
‖pδ1,j‖ = ‖R(xδj , x†)− I‖ ‖yδ − F (xδj)‖ (4.36)
≤ CR(‖qδj‖ + ‖F ′(x†)eδj‖) ≤ C ‖eδj‖−a(‖1 + eδj‖0). (4.37)
Finally, (4.9), (4.26), (4.31), (4.32), and F (x†) = y (cf. Assumption (N2)) imply that
‖(B∗B)−b+s
2(a+s)L−spδ2,j‖0 ≤ 2CQ |||eδj |||b−a‖y − F (xδj)‖≤ c |||eδj |||b−a |||eδj |||−a(1 + |||eδj |||0) (4.38)
for all 0 ≤ j < k. Combining the estimates, using spectral theory, Lemma 4.24, and
|||eδj |||0 ≤ ρ for all 0 ≤ j < k now yields the assertions (4.29) and (4.30). ¤
We are now in the position to prove the main convergence (rates) results for the
preconditioned Landweber iteration in Hilbert scales for s ≤ 0 and under our relaxed
assumptions (cf. [26]):
Proposition 4.26 Let Assumptions 4.21, 4.22 hold and ‖yδ−y‖ ≤ δ. Additionally, let
k∗ = k∗(δ, yδ) be chosen according to the discrepancy principle (2.13) with τ sufficiently
large, and let |||x† − x0|||u be sufficiently small. Then
|||xδk − x†|||0 ≤ 4(τ−1)τ−2|||x† − x0|||u(k + 1)−
u2(a+s) , (4.39)
and
‖yδ − F (xδk)‖ ≤ 2τ2
τ−2|||x† − x0|||u(k + 1)−
a+u2(a+s)
for all 0 ≤ k < k∗. In the case of exact data (δ = 0), the above estimates hold for all
k ≥ 0.
Proof. We proceed similarly as in the proof of Theorem 2.3 in [68] and show by
induction that
|||eδj |||u ≤ η |||x† − x0|||u , 0 ≤ j ≤ k∗, (4.40)
and
|||eδj |||−a ≤ η(j + 1)−a+u
2(a+s) |||x† − x0|||u , 0 ≤ j < k∗, (4.41)
hold with
η =4(τ − 1)
τ − 2. (4.42)
if |||x† − x0|||u is sufficiently small.
First note that the assertions hold for j = 0, if |||x† − x0|||u is small enough. Fur-
thermore, if |||x† − x0|||u is so small that C−uη |||x† − x0|||u ≤ ρ then by (4.4) xj ∈ Bρ(x†)
4.3. NONLINEAR PROBLEMS 49
and the iteration (4.27) is well-defined. Now assume that (4.40), (4.41) are valid for
0 ≤ j < k ≤ k∗. Then, by virtue of Lemma 4.24 and 4.25, the estimates
|||eδk|||u ≤ (1 + C2 |||x† − x0|||u) |||x† − x0|||u + δka−u
2(a+s) (4.43)
|||eδk|||−a ≤ (1 + C2 |||x† − x0|||u) |||x† − x0|||u(k + 1)−a+u
2(a+s) + δ
hold for some C2 > 0 (independent of k). Here, we used the restriction 0 ≤ u ≤ b+ 2s.
Next, we derive an estimate for k in terms of δ: similarly to (4.38) we get
(τ − 1)δ ≤ ‖yδ − F (xδj)‖ ≤ c |||eδj |||−a(1 + |||eδj |||0)
for all 0 ≤ j < k ≤ k∗ and hence (4.40) and (4.41) for j = k − 1 yield that
δ ≤ τ2(τ−1)
ηk−a+u
2(a+s) |||x† − x0|||u (4.44)
provided that 2c (1 + η |||x† − x0|||u) ≤ τ . Together with (4.42) and (4.43) we obtain
|||eδk|||u ≤ |||x† − x0|||u(1 + C2 |||x† − x0|||u + τ2(τ−1)
η) ≤ η |||x† − x0|||u
if C2 |||x† − x0|||u ≤ 1 which we again assume to hold in the following. In the same way,
we obtain that
|||eδk|||−a ≤ 2(τ−1)τ−2
(k + 1)−a+u
2(a+s) |||x† − x0|||u(1 + C2 |||x† − x0|||u)≤ η(k + 1)−
a+u2(a+s) |||x† − x0|||u.
(4.39) now follow by the interpolation inequality (4.5), and the estimate (4.41) follows
similarly as (4.38). Thus, if |||x†− x0|||u is sufficiently small, then the assertions hold for
all j ≤ k∗. In the case of exact data (δ = 0), the estimates hold for all k ≥ 0, since then
Lemma 4.25 holds for all k ≥ 0. ¤
Combining the results of Proposition 4.26 now yields the following
Theorem 4.27 Under the assumptions of Proposition 4.26, we have
k∗ = O(δ−2(a+s)a+u ) (4.45)
and
‖xδk∗ − x†‖ = O(δua+u ).
Proof. The estimate on k∗ follow from (4.44). Secondly, it follows from (4.31), (4.35)
and (2.13) with K = F ′(x†) that
‖Keδk‖ ≤ ‖F (x†)− F (xδk) +Keδk‖ + ‖F (xδk)− F (x†)‖≤ (CR + CQ |||eδk|||0) |||eδk|||−a + δ + ‖F (xδk)− yδ‖≤ (CR + CQ |||eδk|||0)‖Keδk‖ + (τ + 1)δ.
50 CHAPTER 4. PRECONDITIONING IN HILBERT SCALES
Consequently, if CR+CQ |||eδk|||0 < 1, which can always be achieved if |||x†−x0|||u is small
enough, then
‖eδk‖−a = ‖Keδk‖ ≤ Cδ.
The convergence rate then follows by (4.40) and the interpolation inequality (4.5). ¤
Remark 4.28 The condition u ≤ b + 2s in (N7) corresponds to the saturation of
Landweber iteration for nonlinear problems at µ = 1/2 (cf. [39] and Theorem 2.6).
Note that for a = b, and if (4.28) is valid, i.e.,
F ′(x†) ∼ L−a
then L−s(B∗B)u−s
2(a+s) ∼ L−u ∼ (T ∗T )u2a holds for −a ≤ u ≤ mina, a+2s, in particular,
the source sets X su and R((T ∗T )
u2a ) coincide for u ≤ a+ 2s.
In contrast to the case s ≥ 0, where the range of optimal convergence is extended by
performing the iteration in Hilbert scales, the restriction u ≤ a+2s yields µ ≤ a+2s2a
< 12
if s < 0, i.e., optimal convergence can only be proven for a smaller range. However, the
discrepancy principle is reached with far less iterations, i.e., k∗ = O(δ2(a+s)a+u ) instead of
O(δ2aa+u ) for the standard iteration (s = 0). Thus, preconditioning is especially attractive
for reconstructing solutions x† which are not very smooth.
We only mention that with obvious modifications of the proofs one can actually
show the stronger rates
|||xδk − x†|||r ≤ 4(τ−1)τ−2|||x† − x0|||u(k + 1)−
u−r2(a+s) = O(δ
u−ra+u )
for −a ≤ r ≤ 0. Furthermore, we claim that with the same methods of proof and
under the same restrictions, it is possible to derive the corresponding rates even for
−a ≤ r ≤ u (cf. [49]).
For u ≤ a+2s, the rates of Theorem 4.27 coincide with the rates for linear problems
and, as mentioned in the linear case, are order optimal under the given conditions. We
want to emphasize once more, that under our relaxed assumptions, in particular (N2),
the source condition x† − x0 ∈ X su might be stronger than the usual source condition
x† − x0 ∈ Xu. However, x† − x0 ∈ X su is the natural source condition if the iteration is
considered as iteration in Xs (cf. Remark 4.9 and Corollary 4.14).
This concludes our convergence rates analysis of the preconditioned Landweber it-
eration in Hilbert scales for nonlinear problems. As a final topic in our investigation of
iterative regularization methods in Hilbert scales, we discuss preconditioning of certain
Newton-type regularization methods, which are well-known for their excellent conver-
gence behavior for well-posed as well as for ill-posed problems (cf., e.g., [6, 8, 23, 47, 48]).
4.3.3 Newton-type Regularization in Hilbert Scales
The basic step of a Newton-type iteration under consideration consists in the stable
solution of the linearized equation
F ′(xn)(xn+1 − x0) = y − F (xn) + F ′(xn)(xn − x0),
4.3. NONLINEAR PROBLEMS 51
by a regularization method Rα (with filter function gα), which yields
xn+1 = x0 + gαn(F ′(xn)∗F ′(xn))F ′(xn)∗[y − F (xn) + F ′(xn)(xn − x0)]. (4.46)
The iteratively regularized Gauß-Newton method and the Newton Landweber iterations
are of this form (cf. Section 2.2.5). Here, we consider preconditioning of the resulting
methods by the Hilbert scale approach, i.e., solving the preconditioned normal equations
L−2sF ′(xδn)∗F ′(xδn)(xδn+1 − x0) = L−2sF ′(xδn)∗(yδ − F (xδn) + F ′(xn)(xδn − x0)).
Using the notation
Bn := F ′(xδn)L−s,
the preconditioned iteration corresponding to (4.46) can be reformulated as
xδn+1 = x0 + L−sgαn(B∗nBn)B∗n(yδ − F (xδn) + F ′(xδn)(xδn − x0)). (4.47)
Here, the regularization parameters αn usually decrease during the iteration, e.g., a
choice α0 > 0 and
0 < q ≤ αnαn−1
≤ 1 for n ∈ N, and limn→∞
αn = 0, (4.48)
is used in [48] for the analysis of Newton-type methods in the standard spaces, and we
will assume this behavior of αn from now on.
The aim of this section is to give a detailed convergence (rates) analysis for the
preconditioned iterations of the form (4.47), where we assume that the filter functions
gα (and rα(λ) = 1 − λgα(λ)) satisfy the conditions of Theorem 1.2 with Gα = O(α−1)
and
λµ|rα(λ)| ≤ cµαµ for all λ ∈ [0, 1], 0 ≤ µ ≤ µ0 (4.49)
for some µ0 > 0. We start with investigating the convergence behavior for the iteration
(4.47) when stopped by an a-priori rule. In our analysis we use the following result:
Lemma 4.29 Let A, B, R be bounded linear operators between Hilbert spaces X and
Y. If B = RA with ‖I − R‖ < 1, then for every |ν| ≤ 1/2 and w ∈ X there exist
positive constants c, c and an element v ∈ X such that
(A∗A)νw = (B∗B)νv,
with c‖w‖ ≤ ‖v‖ ≤ c‖w‖.
Proof. Observing that R((A∗A)1/2) = R(A∗) = R(B∗) = R((B∗B)1/2), the result
follows by the inequality of Heinz and duality arguments (cf. [48, 49] for details). ¤
Proposition 4.30 Let Assumptions 4.21, 4.22 hold with CQ = 0 and CR sufficiently
small, and let x† denote a solution of (1.16) and x† − x0 ∈ X su , i.e.,
Ls(x† − x0) = (B∗B)u−s
2(a+s)w, 0 < u ≤ a+ 2s (4.50)
52 CHAPTER 4. PRECONDITIONING IN HILBERT SCALES
with ‖w‖ = |||x† − x0|||u ≤ ω and ω sufficiently small. Assume that yδ ∈ Y is such that
(1.2) holds, i.e., ‖yδ − y‖ ≤ δ, and let xδn denote the iterates defined by (4.47) with
gα, rα satisfying the conditions of Theorem 1.2 and (4.49) for some µ0 ≥ 1, and let αnsatisfy (4.48).
Moreover, let η > 0 and denote by N(δ) the largest integer such that
αn ≥(
1
ηωδ
) 2(a+s)a+u
(4.51)
for all 0 ≤ n ≤ N(δ).
Then there exists a positive constant Cη such that for all −a ≤ r ≤ 0
|||xδn − x†|||r ≤ Cηαu−r
2(a+s)n ω, 0 ≤ n ≤ N(δ) . (4.52)
Additionally, xn ∈ Bρ(x†) for n ≤ N(δ).
Proof. We prove the assertions by induction: since |||x0 − x†|||r ≤ ω, (4.52) holds for
n = 0 if Cη ≥ αr−u
2(a+s)
0 which we assume in the following.
Now let (4.52) hold for some 0 < n < N(δ) and assume that xδn ∈ Bρ(x†). Then
with the notation eδn := xδn − x† and (4.47), we get the closed form representation
eδn+1 = L−srαn(B∗nBn)Ls(x0 − x†) + L−sgαn(B∗nBn)B∗n(yδ − y + ln),
with ln =∫ 1
0(F ′(x† + teδn)− F ′(xδn))eδndt. Now, by the nonlinearity condition (N6) with
CQ = 0 and Lemma 4.29, there exists a wn with ‖wn‖ ∼ ‖w‖ such that
(B∗B)u−s
2(a+s)w = (B∗nBn)u−s
2(a+s)wn
and
|||eδn+1|||0 ≤ c‖(B∗nBn)s
2(a+s) rαn(B∗nBn)(B∗B)u−s
2(a+s)w‖+c‖(B∗nBn)
s2(a+s) gαn(B∗nBn)B∗n(yδ − y + ln)‖
≤ c‖(B∗nBn)u
2(a+s) rαn(B∗nBn)wn‖+c‖(BnB
∗n)
a+2s2(a+s) gαn(BnB
∗n)(yδ − y + ln)‖
for some c > 0. This together with Lemma 4.7 (with µ = a+2s2(a+s)
and k replaced by 1/α),
‖ln‖ = ‖∫ 1
0
(F ′(x† + teδn)− F ′(xδn))eδndt‖
≤ 2CR‖(F ′(x†)eδn‖ = 2CR |||eδn|||−a ≤ 2CRCαa+u
2(a+s)
n−1 ω ,
(4.49), and (4.51) yields
|||eδn+1|||0 ≤ αu
2(a+s)n ω(c1 + c2CRCη)
4.3. NONLINEAR PROBLEMS 53
for some positive constants c1 and c2. Now (4.52) holds for n+ 1 for any Cη satisfying
Cη ≥ max
max−a≤r≤0
αr−u
2(a+s)
0 ,c1
qu
2(a+s) − c2CR
which is always possible as long as CR is smaller than qu
2(a+s)/c2. Finally, if ω is suffi-
ciently small, then xδn+1 remains in Bρ(x†). This finishes the induction. ¤
Proposition 4.30 immediately implies the following convergence rates in terms of δ:
Corollary 4.31 Let the assumptions of Proposition 4.30 be valid and let N(δ) be cho-
sen as in (4.51). Then the following rates hold for −a ≤ r ≤ 0:
|||xδN(δ) − x†|||r = O(δu−ra+u )
Proof. The assertion follows immediately with (4.48) and (4.52). ¤
Remark 4.32 Convergence (without rates) for s = 0 and u = 0 has been proven under
the weaker nonlinearity condition
‖F (x)− F (x†)− F ′(x†)(x− x†)‖ ≤ η‖F (x)− F (x†)‖ , η < 1/2,
for the Levenberg-Marquardt method [37], a Newton-CG method [38], and for the IRGN
and the Newton Landweber iteration in [47]. At least under the stronger condition
(4.28), it should be possible to extend these results also to s < 0.
Note that the rates of Corollary 4.31 apply for arbitrary regularization methods gαsatisfying the conditions of Theorem 1.2 with Gα = O(α−1), in particular, the results
apply to iterative regularization methods gα = gk by replacing αn by k−1n (respec-
tively k−2n for semiiterative methods with optimal speed of convergence). The resulting
completely iterative methods can be viewed as two-level iterations. The outer itera-
tion corresponds to a Newton-type method, whereas the inner iteration is used for the
regularized solution of the linearized equation in each Newton step.
As the above analysis shows, the number of (outer) Newton iterations is bounded
by O(1 + | log δ|). In order to compare the overall computational complexity to the one
of Landweber iteration, we also consider the total number of inner iterations: if the
increasing sequence of inner iterations kn satisfies
kn ∼ q−n, n ≥ 0,
with some q < 1, then the overall number of inner iterations is bounded by
k∗ =
N(δ)∑
n=0
kn = O(δ−2(a+s)a+u ), respectively k∗ =
N(δ)∑
n=0
kn = O(δ−a+sa+u )
for the preconditioned Newton-Landweber iteration and the preconditioned Newton-ν-
methods, respectively. With s = −a/2, the resulting iteration numbers can again be
54 CHAPTER 4. PRECONDITIONING IN HILBERT SCALES
reduced to the square root by preconditioning. Moreover, the preconditioned Newton-ν-
methods yield optimal convergence with only the square root of iterations that would be
needed for the preconditioned Landweber iteration and only the fourth root of iterations
needed for the standard Landweber iteration. We will demonstrate this substantial
speed-up in several numerical examples in Chapter 5.
The a-priori results of Proposition 4.30 and Corollary 4.31 are not of great use per-
se, since in general one does not know the smoothness of the solution, i.e., for which
u the source condition x† − x0 ∈ X su holds. However, the estimate of Proposition 4.30
will be used to prove convergence rates, when the iteration is stopped according to the
following a-posteriori stopping rule (cf. [23, 48]):
For a sufficiently large τ > 1 let n∗ = n(δ, yδ) be the smallest integer such that
max‖yδ − F (xδn∗−1)‖ , ‖yδ − F (xδn∗)‖ ≤ τδ. (4.53)
According to (4.53), the (outer) Newton iteration is stopped, when the first time two
consecutive residuals are less than τδ. The following Lemma guarantees stability of our
class of preconditioned Newton-type methods (4.47) equipped with the above criterion:
Proposition 4.33 Let the assumptions of Proposition 4.30 be valid, and the iteration
(4.47) be stopped according to (4.53) with τ > 1 sufficiently large. Then, the iteration
is well-defined and n∗ ≤ N(δ) with N(δ) as in (4.51).
Proof. Note that (1.2), (4.52) (r = −a), and (N6) (with CQ = 0) imply that
‖F (xδn)− yδ‖ ≤ δ + (1 + CR)‖F ′(x†)eδn‖ ≤ δ + (1 + CR)Cηαu+a
2(a+s)n ω
for all 0 ≤ n ≤ N(δ). This together with (4.48) and (4.51) yields the estimate
‖F (xδn)− yδ‖ ≤ δ(1 + (1 + CR)Cηη−1q−
a+ua+s )
for n = N(δ) − 1 and n = N(δ). Thus, if τ is larger than the constant in the brackets
above, then obviously n∗ ≤ N(δ) and, due to Proposition 4.30, the iteration is well-
defined. ¤
We are now in the position to prove the following convergence rates result:
Theorem 4.34 Let the assumptions of Proposition 4.30 be satisfied, and the iteration
(4.47) be stopped after n∗ = n(δ, yδ) steps according to the stopping rule (4.53) with
some τ sufficiently large. Then
‖xδn∗ − x†‖ = O(δua+u ).
4.3. NONLINEAR PROBLEMS 55
Proof. We use the notation K := F ′(x†) and Kn := F ′(xδn). Observe, that by (1.2),
(4.23) and (4.24) with CQ = 0 the following estimate holds for n = n∗ and n = n∗ − 1:
‖Kneδn‖ ≤ (1 + CR)‖Keδn‖
≤ 2‖y − F (xδn)−∫ 1
0
[F ′(xδn − teδn)− F ′(x†)]eδndt‖
≤ 2[δ + ‖F (xδn)− yδ‖ + CR1−CR ‖Kne
δn‖ ],
and hence with (4.53)
‖Kneδn‖ ≤ c1δ, for n ∈ n∗ − 1, n∗ (4.54)
for some positive constant c1. Next, by (4.47), and denoting n = n∗ − 1, Bn = KnL−s
we have
Kneδn∗ = Bnrαn(B∗nBn)Ls(x0 − x†) + Bngαn(B∗nBn)B∗n[yδ − F (xδn) +Kne
δn)] .
Thus, we obtain with (N6) with CQ = 0, (4.54), and (4.53) that
‖Bnrαn(B∗nBn)Ls(x0 − x†)‖= ‖Kne
δn∗ − BnB
∗ngαn(B∗nBn)[yδ − F (xδn) +Kn(xδn − x†)]‖
≤ 1+CR1−CR ‖Kn∗e
δn∗‖ + c2(‖yδ − F (xδn)‖ + ‖Kne
δn‖) ≤ c3δ,
for some c2, c3 > 0. Finally, using the above estimates, the representation (4.47), (N7),
and n = n∗ − 1 the error can be estimated as follows:
‖eδn∗‖ ≤ ‖L−srαn(B∗nBn)(B∗B)u−s
2(a+s)w‖ + ‖L−sgαn(B∗nBn)B∗n(yδ − F (xδn) +Kneδn)‖
≤ c4
(‖(B∗nBn)
u2(a+s) rαn(B∗nBn)wn‖
+ ‖gαn(BnB∗n)(BnB
∗n)
a+2s2(a+s)‖(τδ + ‖Kne
δn‖)
≤ c5
(‖rαn(B∗nBn)(B∗nBn)
u2(a+s)wn‖ + α
− a2(a+s)
n δ)
for some constants c4, c5 > 0. Using the interpolation inequality, the above estimates,
(N7), and (4.49), we obtain with Lemma 4.29
‖(B∗nBn)u
2(a+s) rαn(B∗nBn)wn‖ ≤ c6‖(B∗nBn)u+a
2(a+s) rαn(B∗nBn)wn‖ua+u ‖wn‖
aa+u
≤ c6‖Bnrαn(B∗nBn)Ls(x0 − x†)‖ua+u ‖wn‖
aa+u
≤ c7δua+uω
aa+u
for some positive constants c6 and c7. This together with Proposition 4.33 and (4.51)
completes the proof. ¤
Chapter 5
Examples and numerical test results
In this chapter we investigate the applicability of our results for several examples,
in particular, we will discuss the main assumptions needed for the convergence rates
analysis in the previous chapter, i.e., the conditions (L2) for linear, and (N3) and (N6)
for nonlinear problems. As we will show with our examples, the convergence theory of
Chapter 4 is still applicable to problems, where the standard theory of regularization in
Hilbert scales cannot be applied, i.e., (3.15) does not hold. Some of the numerical test
results below even indicate that some of the conditions we needed to prove our results,
e.g., the restriction u ≤ b + 2s in (N7), might be relaxed in some cases, and a further
investigation in this direction might be interesting.
While some of our examples below have the character of model problems, we present
also some examples that stem from certain applications and try to motivate their rel-
evance in practice. Besides a discussion of the conditions needed for application of
our convergence analysis, we will also present the results of several numerical tests,
which, in most cases, are in very good accordance to the theory and clearly illustrate
the effect of preconditioning. We will also demonstrate the limits of our approach, i.e.,
when neglecting boundary conditions, preconditioning might even lead to an increase
of iteration numbers.
Throughout our numerical tests, noisy data yδ are generated by adding uniformly
distributed random noise with ‖y − yδ‖ = δ to the true data y. The problems are
discretized by standard finite difference and finite element methods. The number of
elements of space and time domains are denoted by Nx or Nt in the one dimensional
case. In higher dimensions, Np denotes the number of grid points of the FE-mesh.
The discretization is chosen so fine that discretization errors are dominated by the
additionally added data noise. If the true data y cannot be calculated analytically, they
are computed on finer grids, in order to avoid so-called inverse crimes.
Our examples may be roughly divided into two blocks: since many (linear) inverse
problems can be formulated as integral equations, we first investigate some integral
equations of the first kind. A second block of examples is then concerned with parameter
identification problems in partial differential equations, which are another important
class of (mostly nonlinear) inverse problems arising in natural sciences, in industrial
57
58 CHAPTER 5. EXAMPLES AND NUMERICAL TESTS
applications, or even in mathematical finance.
5.1 Integral Equations of the First Kind
In the following we investigate some integral equations of the first kind, e.g.,
(Tx)(s) =
∫
G
k(s, t)x(t)dt = y(s), s ∈ G. (5.1)
It is well-known [55], that T : L2(G)→ L2(G) is compact if k ∈ (L2(G))2, and that the
range R(T ) is not closed unless it is finite dimensional. Hence, problems like (5.1) are
ill-posed, in general, and their solution requires regularization. We will discuss Fred-
holm integral equations with applications in computerized tomography (CT) and in the
reconstruction of blurred images, and then turn to a simple model problem for nonlinear
evolution.
5.1.1 Fredholm Integral Equations of the First Kind
With the first example we want to demonstrate that, due to our relaxed assumptions,
the results of the previous sections are still applicable to problems, where the standard
theory of regularization in Hilbert scales cannot be applied:
Example 5.1 Let T : L2[0, 1]→ L2[0, 1] be defined by
(Tx)(s) =
∫ 1
0
s1/2k(s, t)x(t)dt,
with the standard Green’s kernel
k(s, t) =
s(1− t) , t > s ,
t(1− s) , s ≥ t .
For application of our theory, we have to verify the conditions of Assumption 4.6, in
particular (L2), i.e., we have to show that there exists an a > 0 such that
‖Tx‖ ≤ m‖x‖−a, for all x ∈ X .
This can be done in the following way: first note that
(T ∗y)(t) = (1− t)∫ t
0
s3/2y(s)ds+ t
∫ 1
t
s1/2(1− s)y(s)ds,
with (T ∗y)(0) = (T ∗y)(1) = 0. Furthermore, one can show that (T ∗y)′′ = (·)1/2y. Hence,
R(T ∗) = w ∈ H2[0, 1] ∩H10 [0, 1] : (·)−1/2w′′ ∈ L2[0, 1].
5.1. INTEGRAL EQUATIONS OF THE FIRST KIND 59
Next, we define the Hilbert scale operator L by
Lsx :=∞∑
n=1
(nπ)s〈 x, xn 〉xn, xn :=√
2 sin(nπ·),
which yields L2x = −x′′. With this choice, we have
R(T ∗) ( X2 := H2[0, 1] ∩H10 [0, 1]
and additionally,
R(T ∗) ⊃ X2.5 := w ∈ H2.5[0, 1] ∩H10 [0, 1] : ρ−1/2w′′ ∈ L2[0, 1],
with ρ(t) = t(1− t). By Theorem 11.7 in [58], it follows that
‖w‖22.5 ∼ ‖w′′‖2
H1/2 + ‖ρ−1/2w′′‖2L2
and thus for T ∗y ∈ X2.5,
‖T ∗y‖22.5 ∼ ‖(·)1/2y‖2
H1/2 + ‖ρ−1/2(·)1/2y‖2L2
≥ ‖(·)1/2y‖2L2
+ ‖ρ−1/2(·)1/2y‖2L2
=
(∫ 1
0
ty(t)2dt+
∫ 1
0
(1− t)−1y(t)2dt
)≥ c‖y‖2
L2.
Together with ‖T ∗y‖2 = ‖(·)1/2y‖ ≤ ‖y‖ and Proposition 3.6 it follows that there exist
constants 0 < m ≤ m <∞ such that
m‖x‖−2.5 ≤ ‖Tx‖ ≤ m‖x‖−2. (5.2)
For a numerical test, we consider the reconstruction of the unknown function
x†(s) = 2t− sign(2t− 1)− 1,
and choose s = −1 and x0 = 0. Note, that x† is discontinuous at t = 1/2, and thus we
only have x† ∈ H1/2−ε(Ω) for arbitrary ε > 0, which by (4.6) implies that x† lies at most
in X1/2 ⊃ X s1/2. Thus, one cannot expect faster convergence than ‖xδk − x†‖ = O(δ1/5).
On the other hand, by (4.10) and (5.2) we immediately obtain that x† ∈ X s−ε. Moreover,
since x†(0) = (x†)′′(0) = 0, we even have x† ∈ X s1/2−ε for arbitrary ε > 0. In view of
Theorem 4.10 and 4.17, we therefore expect to get the optimal convergence rate O(δ1/5)
also for the preconditioned iterates.
As Table 5.1 shows, the theoretically predicted convergence rates can also be ob-
served numerically. Due to the preconditioning, the generated approximation xδk for the
preconditioned iterations are somewhat rougher (cf. Figure 5.1). Note, however, that
the errors ‖eδk∗‖ are measured in the norm of X = L2, and the oscillations are small in
amplitude.
The number of iterations needed for several preconditioned methods and their stan-
dard counterparts are listed in Table 5.2 and illustrate the effect of preconditioning. As
predicted by the theory, the number of iterations can be reduced to about the square
root for Landweber iteration and the ν-methods; for the conjugate gradient method,
the effect is less dramatic.
60 CHAPTER 5. EXAMPLES AND NUMERICAL TESTS
δ/‖y‖ lw hs-lw ν hs-ν cgne hs-cgne
0.040 0.5187 0.5158 0.4397 0.4230 0.5192 0.4286
0.020 0.4414 0.4407 0.3890 0.3863 0.4349 0.4156
0.010 0.3860 0.3871 0.3518 0.3427 0.3874 0.3460
0.005 0.3395 0.3401 0.3080 0.3012 0.3329 0.2932
0.002 0.2793 0.2808 0.2516 0.2516 0.2738 0.2515
0.001 0.2435 0.2448 0.2220 0.2214 0.2317 0.2065
κ 0.20 0.19 0.20 0.18 0.21 0.20
Table 5.1: Iteration errors ‖eδk∗‖ for iterative regularization methods and their Hilbert-
scale equivalents and the resulting convergence rates ‖eδk∗‖ = O(δκ); parameters
τ = 2.1, ν = 2 and discretization Nt = 200 elements.
0 0.5 1−1
0
1lw
0 0.5 1−1
0
1hslw
0 0.5 1−1
0
1nu
0 0.5 1−1
0
1hsnu
0 0.5 1−1
0
1cg
0 0.5 1−1
0
1hscg
Figure 5.1: Iterates xδk after 1 and 14 iterations (= stopping index for hscgne) with
δ = 0.01 and τ = 1.1.
5.1.2 Radon Inversion
An inverse problem of special interest in medical applications, but also in nondestructive
testing, arises in transmission computerized tomography (see [64]):
Let Ω ⊂ Rn, n = 2, 3 be a compact domain with spatially varying density f . In
a simple physical model the relative intensity loss along a distance ∆x is assumed to
5.1. INTEGRAL EQUATIONS OF THE FIRST KIND 61
δ/‖y‖ lw hs-ls ν hs-ν cgne hs-cgne
0.04 94 27 11 9 3 3
0.02 328 51 16 10 4 3
0.01 936 86 26 14 5 4
0.005 2743 148 47 19 6 5
0.002 12253 313 102 27 10 6
0.001 36714 544 172 36 14 8
η -1.60 -0.77 -0.80 -0.40 -0.41 -0.28
Table 5.2: Iteration numbers for iterative regularization methods and their Hilbert-scale
equivalents and the corresponding rates k∗ = O(δη); parameters τ = 2.1, ν = 2 and
Nt = 200.
satisfy∆I
I= f(x)∆x.
We denote by I1(θ, s) and I0(θ, s) the intensities of the X-ray beams measured at the
detector and emitter, which are located outside of the domain Ω, and connected by the
line parameterized by the distance to the origin s and the direction θ. Then one obtains
(Rf)(θ, s) :=
∫
x·θ=sf(x)dx = − log
I1(θ, s)
I0(θ, s)=: g(θ, s), (5.3)
for w ∈ R2 with ‖w‖ = 1 and t > 0. R is called Radon transform, and determining the
unknown density f from measurements of the intensity drop g(θ, s) corresponds to the
inversion of the Radon transform. By [64, Theorem 5.1], we know that for α ≥ 0 there
exist positive constants c(α, n) and C(α, n) such that for f ∈ C∞0 (Ωn)
c(α, n)‖f‖Hα0 (Ωn) ≤ ‖Rf‖Hα+(n−1)/2(Z) ≤ C(α, n)‖f‖Hα
0 (Ωn),
with Ωn ⊂ Rn denoting the unit ball, and Z the cylinder Sn−1×R. This already proves
(3.15) and in particular (L2) for an appropriate choice of spaces; e.g., for X = L2(Ωn)
and Y = L2(Z), we see that the Radon transform behaves like differentiation of order
one half in dimension n = 2, and like one times differentiation in dimension n = 3.
A related problem is single photon emission computerized tomography (SPECT), cf.,
e.g., [64], where the aim is to reconstruct the distribution f of a radiopharmazeutical
inside a (human) body from measurements of the radiation outside the body. As a
model for the direct problem, the attenuated Radon transformation is used:
y = R(f, µ)(s, ω) =
∫
R
f(sω⊥ + tω) exp(−∫ ∞
t
µ(sω⊥ + τωdτ)dt,
for s ∈ R and ω ∈ Sn−1. If the attenuation map µ is known (e.g., from an additional CT
scan), then SPECT reduces to a linear inverse problem similar to the Radon inversion
discussed above.
62 CHAPTER 5. EXAMPLES AND NUMERICAL TESTS
As a test example, let us consider the rotationally symmetric case of CT in more
detail: let Ω ⊂ R2 be a circle with radius ρ, and f be rotationally symmetric with
respect to the origin, i.e., f(x) = F (s, θ) = F (s), and (consequently) g(s, θ) = g(s).
Then it suffices to measure the intensity drop g(s, θ) along one direction θ0, e.g., parallel
to the y-axis, which yields
(Rf)(θ0, s) =
∫
x·θ0=s
f(x) dx =
∫
y∈θ⊥0
f(s θ0 + y)dy
=
√ρ2−s2∫
−√ρ2−s2
F (√s2 + t2) dt = 2
√ρ2−s2∫
0
F (√s2 + t2) dt
= 2
ρ∫
s
rF (r)√r2 − s2
dr = g(s), 0 ≤ s ≤ ρ,
where we used the substitution t =√
(r2 − s2). Further substitutions t = r2 and τ =√s
yield ∫ √ρ
τ
F (√t)√
t− τ dt = g(√τ).
Thus, (5.3) can essentially be reduced to the solution of an Abel integral equation of
the first kind, which can be solved stably, e.g., by iterative regularization methods. We
now turn to our numerical example:
Example 5.2 (An Abel integral equation) Let T : L2[0, 1] → L2[0, 1] be defined
by
(Tx)(s) :=1√π
∫ s
0
x(t)√s− tdt, (5.4)
and consider the approximate reconstruction of x from noisy data yδ with ‖y−yδ‖ ≤ δ,
where y = Tx† denotes the unperturbed data. One can show that
(T 2x)(s) =
∫ s
0
x(t)dt, (5.5)
and thus inverting T essentially amounts to differentiation of half order; more precisely,
cf. [33],
R(T ) ⊂ Hr[0, 1], for all 0 ≤ r < 1/2. (5.6)
Consider the Hilbert scale induced by
L2sx =∞∑
n=0
λsn〈 x, xn 〉xn, xn(t) =√
2 sin(λn(1− t)), λn = (n+ 1/2)π, (5.7)
5.1. INTEGRAL EQUATIONS OF THE FIRST KIND 63
over X = L2[0, 1] with D(L2) = X2 = x ∈ H1[0, 1] : x(1) = 0. Then one can show
that R(T ∗T ) ⊂ Xr for all r < 2, and Proposition 3.5 and Corollary 3.8 yield that (L2)
holds for a = 1− ε with any ε > 0.
This allows the choice s = −(1−ε)/2, and hence the iterations can be preconditioned
with L1−ε, which for small ε essentially corresponds to differentiation of order 1/2, and
can be realized efficiently via (5.7) and FFT.
As a numerical test, we try to identify the density
x†(s) = 2t− sign(2t− 1)− 1
from noisy measurements of y = Tx† and an initial guess x0 = 0. We wet s = −1, which
is the limiting case of allowed choices.
With the Hilbert scale as above on has x† ∈ Xu for all u < 3/2, thus we expect
the following iteration numbers: k∗ ∼ δ−1 for Landweber iteration, k∗ ∼ δ−1/2 for the
ν-method and the preconditioned Landweber iteration, and k∗ ∼ δ−1/4 for the Hilbert-
scale ν-method. Since the ν-methods have finite qualification µ0 = ν, Theorem 4.10
then optimal convergence rates only for
ν = µ0 ≥u− s
2(a+ s)+
1
2= 2,
(cf. Proposition 4.8) when the iteration is stopped with the discrepancy principle (1.14).
Therefore we have to chose ν ≥ 2 in this example in order to guarantee optimal conver-
gence rates for the Hilbert scale ν- method. By (5.6) it follows that the singular values
of T decay like σn ∼ n−1/2, which in view of Theorem 2.4 yields k∗ ∼ δ−1/3 for cgne.
Note that we expect even less iterations for the preconditioned ν-method. Finally, by
Theorem 4.19 we have k∗ ∼ δ−1/5 for hscgne. The iteration numbers realized in our
numerical tests are listed in Table 5.3.
The numerically observed convergence rates for the iteration error ‖eδk∗‖ is approx-
imately O(δ0.45) for all methods.
δ/‖y‖ lw hs-lw ν hs-ν cgne hs-cgne
0.04 26 6 14 6 4 2
0.02 53 8 20 7 5 3
0.01 103 12 28 9 6 4
0.005 224 19 42 11 9 5
0.002 609 34 69 16 14 7
0.001 1342 58 103 21 19 9
η -1.07 -0.62 -0.55 -0.34 -0.43 -0.39
Table 5.3: Iteration numbers for iterative regularization methods and their Hilbert-scale
equivalents and the corresponding rates k∗ = O(δη); parameters τ = 2.1, ν = 2 and
Nt = 500.
64 CHAPTER 5. EXAMPLES AND NUMERICAL TESTS
5.1.3 An Inverse Problem in Imaging: Deblurring
Many (mostly linear) inverse problems appear in signal and image processing, e.g.,
image restoration, denoising, impainting, or deblurring. We shortly discuss the latter
(cf. [11]):
In complex imaging systems, blurring may occur at several places: e.g., if an image
is taken in motion (linear motion blur), if an object is out of focus, by diffraction
effects of the medium (atmospheric blurring) or of the optical system (diffraction-limited
systems). The mathematical model of blurring is given by
g(x) = (Tf)(x) =
∫
Rn
K(x, x′)f(x′)dx′, x, x′ ∈ Rn (5.8)
where f , g are the original respectively blurred image, and K(x, x′) is the impulse
response function, also called point spread function. The effect of the recording of an
image can be modeled as additive noise contribution, thus instead of g only a noisy
version of the image gδ = g + wδ is recorded, and instead of (5.8) one has
gδ(x) =
∫
Rn
K(x, x′)f(x′)dx′ + wδ. (5.9)
Here, wδ denotes the noise contribution. In case the imaging system is spatially invari-
ant, (5.9) reduces to a convolution equation gδ = K ∗ f + wδ, i.e.,
gδ(x) =
∫
Rn
K(x− x′)f(x′)dx′ + wδ, (5.10)
where K(x) = K(x, 0), and the Fourier transform K of the transfer function is called
transfer function of the system. Here we used the following definiton of the Fourier
transform:
f(ω) = F(f)(ω) :=
∫
Rn
e−iω·xf(x)dx
With this scaling, the inverse Fourier transform is given by
f(x) = (F−1f)(x) =1
(2π)neix·ωf(ω)dω,
and by the Fourier convolution theorem, (5.10) is then equivalent to
gδ(ω) = K(ω)f(ω) + wδ(ω),
In case of a perfect recording, i.e., wδ = 0, the reconstruction of the image f from the
blurred image g can be calculated explicitly using the Fourier transformation, namely
by
f = F−1( gK
).
5.1. INTEGRAL EQUATIONS OF THE FIRST KIND 65
If however, ‖wδ‖ 6= 0, then the explicit reconstruction formula
f δ = F−1( gδK
)
is unstable, i.e., high frequency components of the data error wδ are amplified arbitrarily
if K(ω)→ 0 with |ω| → ∞, which is usually the case.
We now consider blurring by the atmosphere in more detail (cf. [11, Section 3.5]):
Atmospheric Turbulence Blur
The reason for atmospheric blurring are inhomogeneities in the refraction index of the
air caused by turbulent velocity fluctuations and the resulting statistical temperature
field in the atmosphere. The effect of atmospheric blurring is important for imaging
through an atmosphere, e.g., in optical and radio astronomy, in remote sensing, or
target identification. In optical astronomy, the overall transfer function of the imaging
system (telescope + atmosphere) is given by
K(ω) = H(ω)B(ω),
where H, B are the transfer functions of the optical system (the telescope) and the
atmosphere, respectively. Furthermore, B has approximately the form (cf. [11])
B(ω) = exp[−3.44(λ|ω|r0
)5/3],
where λ is the wavelength of the observed radiation and r0 is a parameter called critical
wavelength. As suggested in [11], the transfer function of the atmospheric turbulence
blur can be roughly approximated by a Gaussian, i.e.,
B(ω) ' exp(−c|ω|2).
Note the relation to heat transfer, i.e., B is the fundamental solution of the heat equation
evaluated for some fixed time lag t, and thus the blurring operator
T : f → B ∗ f = u
essentially coincides with the solution operator of the heat equation
ut = c∆u, u(·, 0) = f, (5.11)
and evaluation at time t1, were c has to be chosen appropriately.
For a numerical test, we consider the following domain restricted version of (5.11):
Example 5.3 Consider the operator T : L2[0, 1] → L2[0, 1] defined by Tf = u(·, t1),
where u is the solution of
−ut + uxx = 0, u(0, t) = u(1, t) = 0, u(x, 0) = f(x). (5.12)
66 CHAPTER 5. EXAMPLES AND NUMERICAL TESTS
The operator T is selfadjoint, with eigenvalues λn = e−n2π2t1 and associated eigen-
functions ψn =√
2π
sin(nπ·). Consequently, the inverse problem of solving Tf = u is
exponentially ill-posed.
According to Remark 4.12, the optimal preconditioner is given by L−2s = (T ∗T )−12 .
For symmetric problems, the application of the operators L−2s and T ∗ cancel each other,
yielding the corresponding iterations for symmetric problems, e.g., standard Richardson
or cg, which are included in our theory.
For non-symmetric problems, the application of (T ∗T )−1/2 is in general too com-
plicated and therefore other preconditioners are required. In order to get a feeling for
the performance of preconditioning in Hilbert scales induced by differential operators,
but applied to (not necessarily symmetric) exponentially ill-posed problems, we alter-
natively investigate the preconditioning with
Lf =∞∑
n=0
nπ〈 f, ψn 〉ψn, ψn =√
2 sin(nπ·)
over X = L2[0, 1] and with X1 = H10 [0, 1]. This choice implies that for all 0 ≤ a < 2.5
there exists an mr > 0 such that
‖Tf‖ ≤ mr‖f‖−a.
Thus, (L2) holds for arbitrary 0 ≤ a < 2.5. On the other hand, an estimate (3.11) below
cannot be satisfied for any a.
We want to mention that a source condition Lsf † ∈ R((B∗B)µ) or f † ∈ R((T ∗T )µ)
for some µ > 0 is of course very strong, i.e., it means that f † has to be analytic. Thus,
for exponentially ill-posed problems usually logarithmic source conditions are used, and
only logarithmic convergence rates can be expected (cf. [41, 42]). It would be interesting
to extend our theory also to this case.
As a concrete numerical test, we set
f †(x) := 2x− sign(2x− 1)− 1,
and consider the reconstruction of f from noisy measurements of u(·, 1), where u satisfies
(5.12) with c = 0.01. As initial guess, we choose f0 = 0. In Table 5.4, we list the iteration
numbers of the numerical reconstructions for the preconditioned iterations and compare
them with the results for the symmetric iterations (= optimally preconditioned with
(T ∗T )−1/2).
According to Theorem 7.14 in [28], the stopping index for the conjugate gradient
method can be bounded by k(δ, yδ) ≤ c(1+ | log δ|) for exponentially ill-posed problems,
if the singular values σn of T decay like O(qn) with some q < 1, which is the case in our
example. We conjecture that a similar bound also holds for the symmetric iteration. This
then explains that the iteration numbers for cg and hscgne are almost independent
of the noise level.
5.1. INTEGRAL EQUATIONS OF THE FIRST KIND 67
δ/‖y‖ lw hs-lw ν hs-ν cgne hs-cgne
0.04 10 8 5 3 2 2
0.02 18 11 6 4 2 2
0.01 29 14 7 5 3 2
0.005 230 44 17 12 5 3
0.002 777 77 56 20 5 3
0.001 1212 93 86 24 5 3
η -1.43 -0.84 -0.73 -0.61 -0.30 -0.14
Table 5.4: Iteration numbers for iterative regularization methods and their Hilbert-scale
equivalents and the corresponding rates k∗ = O(δη); parameters τ = 2.1, ν = 2 and
nt = 500.
The observed convergence rates are about ‖eδk∗‖ ∼ δ0.05 for all methods. We only
mention that the numerically observed rates further decreased when we continued the
test with smaller noise levels, which is in accordance with the theoretically predicted log-
arithmic rates. For a (rather low) noise level of δ = 0.1%, the relative error ‖eδk∗‖/‖x†‖of the reconstructions is still about 42%; since the problem is exponentially ill-posed,
such a poor reconstruction has however to be expected. As the above example suggests,
even exponentially ill-posed problems can be efficiently preconditioned by differential
operators.
5.1.4 A Volterra-Hammerstein Integral Equation
Hammerstein integral equations appear frequently in the modeling of dynamical sys-
tems, e.g., in biology. The following equation of the first kind is a special case from an
example discussed in [68]:
Example 5.4 Let F : H1[0, 1]→ L2[0, 1] be defined by
(F (x))(s) =
∫ s
0
x(t)2dt.
The adjoint of the Frechet derivative is then given by
F ′(x)∗w = 2A−1
[x(·)
∫ 1
·w(t)dt
],
where A : D(A) = ψ ∈ H2[0, 1] : ψ′(0) = ψ′(1) = 0 → L2[0, 1] is defined by
Aψ := −ψ′′ + ψ; note that A−1 is the adjoint of the embedding operator from H1[0, 1]
in L2[0, 1]. Assuming that x† ≥ γ > 0 a.e., we get
R(F ′(x†)∗) = w ∈ H3[0, 1] : w′(0) = w′(1) = 0, w(1) = w′′(1),
and
‖F ′(x†)∗w‖H3 ∼ ‖w‖ , for all w ∈ Y .
68 CHAPTER 5. EXAMPLES AND NUMERICAL TESTS
As a Hilbert scale we choose the one induced by L2x := −x′′ + x over the space
X = H1[0, 1] with X1 = x ∈ H2[0, 1] : x′(0) = x′(1) = 0. With this choice, we have
R(F ′(x†)∗) ⊂ X2
and hence, by Proposition 3.6, (N3) holds with a = 2. Therefore, we set s = −1 (to be
rigorous, one would have to choose s = −1 + ε for some ε > 0), which yields
L−2sF ′(x)∗w = 2x(·)∫ 1
·w(t)dt; (5.13)
in particular, we have
F ′(x) = R(x, x†)F ′(x†)
with
‖R(x, x†)− I‖ ≤ C ‖x− x†‖0 ≤ c |||x− x†|||0,which proves (N6) with CQ = 0 and CR arbitrarily small if x is sufficiently small to x†.
Note that in this example the application of the Hilbert scale operator L−2 in fact
makes the iteration even simpler, i.e., application of A−1, which is the main numerical
effort in calculating F ′(x)∗ for Landweber iteration, can be avoided.
We note that the choice s = −a/2 is actually not allowed in our theory, when
considering convergence rates in X , since we require 0 < u ≤ a + 2s. However, as we
already remarked above, our results actually hold in different norms, e.g., in Xr for
s ≤ r < 0 or in X sr for −a ≤ r < 0. In particular, for r = −1 and u = 0, Theorem 4.27
and Remark 4.28 yield X−1 = X−1s = L2[0, 1], and
‖x† − xδk∗‖L2 = ‖x† − xδk∗‖−1 = O(δu−ra+u ) = O(δ1/2).
According to Proposition 4.4 and Remark 4.5, the condition x† ∈ Xu does not auto-
matically imply x† ∈ X su for u > s. Thus, a condition x† ∈ Xu may be too weak to get
the expected convergence rates for the Hilbert scale iterations. We will illustrate this in
one of the numerical examples below. In our test, we compare Landweber iteration, the
Newton-Landweber and the Newton-ν method with their preconditioned equivalents.
Test 1. For the first test, we set x†(t) := 3/2 −√|2t− 1|, and x0 = 1/2. Note that
x† − x0 /∈ H1[0, 1], but only in H1−ε[0, 1] for arbitrary ε > 0. This can be seen in
the following way: one shows easily that (x†)′ ∈ Lp[0, 1] for 1 ≤ p < 2. This yields
x† ∈ W 1p [0, 1] and by standard Sobolev embedding theorems [1], x† − x0 ∈ H1−ε for
ε > 0. Thus we can expect convergence rates at most in spaces Xr with −a ≤ r < 0,
e.g., in X−1 = L2[0, 1]. In fact, also in the numerical tests, the error measured in the
norm of X = H1[0, 1] decreases only very slowly, while still good rates ‖eδk∗‖−1 can be
observed for all methods (see Table 5.5). By Remark 4.28, one would expect the rates
O(δu−ra+u ) = O(δ1/2) in the norm of X−1 = X s
−1 with s = −1 in this case.
5.1. INTEGRAL EQUATIONS OF THE FIRST KIND 69
δ/‖y‖ lw hs-lw new-lw hs-new-lw new-ν hs-new-ν
0.02 0.3174 0.2077 0.2081 0.1115 0.2825 0.09871
0.01 0.1894 0.1390 0.1360 0.1101 0.1256 0.09671
0.005 0.1249 0.09659 0.112 0.07497 0.09839 0.05227
0.002 0.0821 0.06422 0.07735 0.05293 0.07421 0.05019
0.001 0.0548 0.04455 0.04854 0.03641 0.04660 0.02600
κ 0.57 0.50 0.45 0.40 0.53 0.43
Table 5.5: Iteration errors ‖eδk∗‖−1 = ‖eδk∗‖L2 for iterative regularization methods and
their Hilbert-scale equivalents and the corresponding rates ‖eδk∗‖ = O(δκ); parameters
τ = 2.1, ν = 2 and Nt = 501.
Test 2. With the second test example, which is taken from [68], we want to demon-
strate that preconditioning can disturb the convergence behavior, if boundary conditions
are not taken into account properly: let
x†(t) = t+ 10−6(196145− 41286t2 + 19775t4 + 70t6 + 436t7),
and x0 = t. It was shown in [68] that standard Landweber iteration yields a convergence
rate of ‖xδk − x0‖ = O(δ1/2). For the Hilbert scale iteration with s = −1 we can not
guarantee convergence in X0 = H1[0, 1], since it follows from (5.13) that
X s0 ⊂ x ∈ H1[0, 1] : x(1) = 0
and thus x†−x0 /∈ X s0 . In fact, the preconditioned iterations do not converge in H1[0, 1],
while the standard iterations do. However, Theorem 4.27 and Remark 4.28 yield con-
vergence rates for the preconditioned iterations at least in Xs = X−1, which are also
observed numerically (see Table 5.6).
δ/‖y‖ lw hs-lw new-lw hs-new-lw new-ν hs-new-ν
0.02 0.1249 0.4520 0.1719 0.4406 0.1473 0.4235
0.01 0.06333 0.4229 0.04090 0.4090 0.03929 0.4266
0.005 0.06435 0.3286 0.04041 0.3136 0.03840 0.2712
0.002 0.04914 0.2671 0.04107 0.2459 0.03953 0.2205
0.001 0.02433 0.2210 0.01630 0.2148 0.01210 0.2166
κ 0.46 0.24 0.60 0.25 0.64 0.26
Table 5.6: Iteration errors ‖eδk∗‖−1 = ‖eδk∗‖L2 for iterative regularization methods and
their Hilbert-scale equivalents and the corresponding rates ‖eδk∗‖−1 = O(δκ); parameters
τ = 2.1, ν = 2 and Nt = 501.
Note, that as predicted by the theory the convergence rates (in X−1) for the precon-
ditioned iterations are significantly smaller than the ones for the standard iterations.
Additionally, the iteration numbers even increase due to the (wrong) preconditioning
(see Table 5.7). This behavior should illustrate that boundary conditions play an im-
portant role in regularization theory.
70 CHAPTER 5. EXAMPLES AND NUMERICAL TESTS
δ/‖y‖ lw hs-lw new-lw hs-new-lw new-ν hs-new-ν
0.02 3 4 1(5) 1(5) 1(5) 2(15)
0.01 4 7 2(15) 2(15) 2(15) 2(15)
0.005 4 36 2(15) 4(75) 2(15) 4(75)
0.002 6 120 2(15) 6(315) 2(15) 5(155)
0.001 84 338 6(315) 7(635) 4(75) 5(155)
η -0.93 -1.54 -1.07 -1.67 -0.70 -0.92
Table 5.7: Iteration numbers (outer and total inner iterations) for iterative regu-
larization methods and their Hilbert-scale equivalents and the corresponding rates
k∗ = O(δη); parameters τ = 2.1, ν = 2 and Nt = 501.
5.2 Parameter Identification in Elliptic and Para-
bolic PDEs
The second block of examples is concerned with parameter identification, which is a
main field of inverse (and ill-posed) problems. Here, we restrict ourselves to problems
connected with partial differential equations and point out various applications.
5.2.1 An Inverse Source Problem in an Elliptic Equation
As a first model problem, we consider the identification of a source term in an elliptic
equation from distributed measurements, which is closely related to differentiation and
linear.
Example 5.5 Let Ω be a bounded domain in Rn, n = 2, 3 with sufficiently smooth
boundary (e.g., ∂Ω ∈ C1,1 or ∂Ω ∈ C0,1 and Ω convex), or let Ω be a parallelepiped. We
consider the operator T : L2(Ω)→ L2(Ω) defined by Tf = u, with
Au := −∇ · (q∇u) + p · ∇u+ cu = f, u|∂Ω = 0, (5.14)
and given, sufficiently smooth parameters q, p and c. Assume that A is uniformly elliptic;
then a solution u of (5.14) lies in H2(Ω) ∩H10 (Ω) and satisfies ‖u‖H2 ∼ ‖f‖L2 , i.e., A
is an isomorphism between H2(Ω) ∩H10 (Ω) and L2(Ω).
For preconditioning, we consider the Hilbert scale induced by L2u = −∆u over the
space X = L2(Ω) with X2 = H2(Ω) ∩ H10 (Ω). Then we have T ∼ L−2, and thus (L2)
holds with a = 2. Moreover, the stronger condition (3.15) holds.
For a numerical test, we set Ω = [0, 1]2, q = c = 1, p = 0, s = −a/2 = −1, and try
to identify the function
f † = sign(x− 0.5) · sign(y − 0.5)
from f0 = 0 as a starting value. In this setting, we have f † − f0 ∈ R((T ∗T )µ) for all
0 ≤ µ < 1/8, or equivalently, f † ∈ X sr for all r < 1/2, thus the expected iteration
5.2. PARAMETER IDENTIFICATION 71
numbers are k∗ ∼ δ−8/5 for Landweber iteration, k∗ ∼ δ−4/5 for Landweber iteration
in Hilbert scales and the ν−methods, and k∗ ∼ δ−2/5 for the Hilbert scale ν−method.
Observing that the singular values of T behave like σn = O(1/n), we obtain that the
stopping index for cgne can be bounded by k∗ ≤ O(δ−2/5), while for hscgne the
smaller bound k∗ = O(δ−4/15) holds. The iteration numbers realized in the numerical
tests are listed in Table 5.8. We want to mention that the rates for the iteration numbers
do not exactly match the predictions, which, to our opinion, occurs here primarily due
to the very low number of iterations, i.e. the rates cannot be determined very accurately
from only 5 points.
δ‖uδ−u0‖ lw hs-lw ν hs-ν cgne hscgne
0.04 88 10 26 7 6 2
0.02 332 18 50 11 9 5
0.01 963 34 87 15 14 6
0.005 3089 59 155 20 18 8
0.002 9342 123 316 29 35 11
η -1.55 -0.84 -0.83 -0.46 -0.57 -0.52
Table 5.8: Iteration numbers for iterative regularization methods and their Hilbert-scale
equivalents and the corresponding rates k∗ = O(δη); parameters τ = 2.1, ν = 2 and
Np = 40257.
As predicted by the theory, the iteration error behaves like ‖f δk∗ − f †‖ ∼ δ1/5 for all
methods. We have chosen ν = 2 here to guarantee optimal convergence rates for the
preconditioned ν−method: note that ν ≥ 3/2 is necessary to apply Theorem 4.10 for
u = 2aµ = 1/2.
In the presented 2D example, the bound on the iteration numbers for cgne is of
the same order as the one for the preconditioned ν−method. In 3D, the situation is
different: there we have σN ∼ N−2/3, which yields k∗ = O(δ−12/25) for cgne, while the
rate for the Hilbert scale ν-method is not affected; hence the preconditioned ν-methods
will even outperform cgne in 3D.
5.2.2 Identifying a Reaction Term
The following examples of parameter identification problems in elliptic respectively
parabolic PDEs are nonlinear inverse problems, and additional instabilities may arise
due to the nonlinearity.
Example 5.6 In this example, which is taken from [39], we try to identify the param-
eter c in−∆u+ cu = f in Ω,
u = g in ∂Ω,(5.15)
from distributed measurements of the state u. Here, Ω is an interval in R or a bounded
domain in R2 or R3 with smooth boundary (or a parallelepiped). The right hand side
72 CHAPTER 5. EXAMPLES AND NUMERICAL TESTS
is assumed to satisfy f ∈ L2(Ω) and the boundary data g ∈ H3/2(∂Ω). If u would be
known exactly, then one could reconstruct c by
c =f + ∆u
u, (5.16)
which in case of noisy measurements uδ is unstable due to differentiation. (5.16) already
reveals another possible source of instability, namely division by u, which may cause
noise amplification where u is close to zero. Note that if u = 0 on a subdomain, then c
is not uniquely determined by (5.15) there.
We consider the inverse problem as abstract operator equation, and define the non-
linear operator
F : D(F ) ⊂ L2(Ω)→ L2(Ω)
by F (c) = u(c), where u = u(c) is the solution of (5.15) with parameter c. One can show
(cf. [17]) that the parameter-to-solution map F is well-defined and Frechet differentiable
on
D(F ) := c ∈ L2(Ω) : ‖c− c‖ ≤ γ for some c ≥ 0 a.e.where u(c) denotes the solution of (5.15), and γ > 0 has to be sufficiently small. By
standard arguments one can show that
F ′(c)∗w = u(c)A(c)−1w,
where A(c) : H2(Ω) ∩H10 (Ω)→ L2(Ω) is defined by A(c)u = −∆u+ cu.
Next we choose an appropriate Hilbert scale namely the one induced by L2 = −∆
over X = L2(Ω) with X2 := H2(Ω) ∩H10 (Ω). This yields
R(F ′(c)∗) ⊂ X2,
which already proves (N3). If furthermore u† ≥ γ > 0 a.e., then we even have
‖F ′(c†)∗w‖2 ∼ ‖w‖0,
and according to Remark 4.5 and 4.23, (N7) can be interpreted in terms of the spaces
Xu. In order to show (N6), we use that
[F ′(c)∗ − F ′(d)∗]w = u(c)[A(c)−1 − A(d)−1]w + [u(d)− u(c)]A(d)−1w
= : r1 + r2.
The terms r1 and r2 can be further estimated by
‖r1‖2 ≤ C ‖u(c)‖H2 ‖[A(c)−1 − A(d)−1]w‖H2
≤ C1‖u(c)‖H2 ‖c− d‖L2 ‖w‖L2
and
‖r2‖2 ≤ ‖u(d)− u(c)‖H2 ‖A(d)−1w‖H2
≤ C2‖c− d‖L2 ‖w‖L2 .
5.2. PARAMETER IDENTIFICATION 73
Here we have used that A(c) is an isomorphism between L2(Ω) and H2(Ω) ∩ H10 (Ω).
If u† ≥ γ > 0, this yields (N6) with CQ = 0 and CR ≤ C ‖c0 − c†‖ and thus the
preconditioned Landweber iteration as well as the Newton-type methods can be applied.
For our numerical test, we set s = −1. Note, that in this case we formally have
violated the restriction 0 < u ≤ a + 2s and the results of Section 4.3 do formally not
yield convergence rates in X . Nevertheless, the (optimal) rates are observed numerically,
which indicates that the restriction u ≤ a + 2s might be weakened here. In order to
ensure convergence rates also theoretically, one could alternatively set s > −1 and use
a multilevel technique to implement L−2s.
In the numerical tests, we try to reconstruct the reaction term
c† = sign(x− 0.5) · sign(y − 0.5)
on Ω = [0, 1]2 and start with the initial guess c0 = 0.
δ‖uδ−u(c0)‖ lw hs-lw new-lw hs-new-lw new-ν hs-new-ν
0.08 30 7 4(75) 3(35) 2(15) 2(15)
0.04 99 10 6(315) 4(75) 4(75) 3(35)
0.02 421 20 8(1275) 4(75) 5(155) 3(35)
0.01 1038 37 9(2555) 5(155) 5(155) 3(35)
0.005 3404 62 11(10235) 6(315) 6(315) 4(75)
0.002 14037 133 13(40955) 7(635) 7(635) 5(155)
η -1.66 -0.82 -1.67 -0.77 -0.90 -0.55
Table 5.9: Iteration numbers for iterative regularization methods and their Hilbert-scale
equivalents and the corresponding rates k∗ = O(δη); parameters τ = 2.1, ν = 2 and
Np = 2577.
The iteration numbers for this nonlinear problem (see Table 5.9) essentially coincide
with the ones of the previous linear example. As there, the source condition (4.50) is
satisfied for all u < 1/2 (respectively µ < 1/8) and, as predicted by the theory, the
observed convergence rates are ‖eδk∗‖ ∼ O(δ1/5), see Table 5.10.
Until now we have been able to show the conditions needed for the application of the
convergence rates results of the previous chapter. This is not the case in the following
examples, where we are only able to verify some of the conditions: in particular, we will
show (N3), which at least ensures well-definedness of a single iteration step.
While in the above examples, interior measurements are available for identification
of the parameters, in many relevant problems the state can only be measured at (a part
of) the boundary of the domain of interest.
5.2.3 An Inverse Problem in Mathematical Finance
The prices of European Call options C = CK,t(S, t) with strike K and maturity T
considered as functions of the spot price S of the underlying and time t, satisfy the
74 CHAPTER 5. EXAMPLES AND NUMERICAL TESTS
δ‖uδ−u(c0)‖ lw hs-lw new-lw new-hslw new-ν new-hsν
0.080 0.5604 0.5037 0.5477 0.4785 0.5600 0.4686
0.040 0.5116 0.4599 0.4839 0.4017 0.4260 0.3862
0.020 0.4152 0.3906 0.3907 0.3915 0.3581 0.3318
0.010 0.3670 0.3302 0.3577 0.3247 0.3564 0.3121
0.005 0.3137 0.2914 0.2970 0.2762 0.2939 0.2475
0.002 0.2595 0.2406 0.2475 0.2318 0.2455 0.2138
κ 0.21 0.20 0.22 0.19 0.21 0.21
Table 5.10: Relative iteration errors ‖eδk∗‖/‖e0‖ for iterative regularization methods and
their Hilbert-scale equivalents and the corresponding rates ‖eδk∗‖ = O(δκ); parameters
τ = 2.1, ν = 2 and Np = 2577.
well-known Black-Scholes differential equation [12],
Ct +σ2
2S2CSS + (r − q)SCS − rC = 0, S > 0,
where σ = σ(S, t) is called volatility (of the underlying process), r(t) is the risk-free
interest rate and q(t) is the dividend rate. At maturity T , C(S, T ) has to equal the
payoff of the option, i.e.,
C(S, T ) = max(S −K, 0).
The inverse problem of option pricing (cf., e.g., [14, 24, 57]) now consists in determing
a volatility surface σ(S, t) out of market prices CK,T (S∗, 0) of European Call options,
where S∗ denotes the current (spot) price of the underlying asset, and t = 0 corresponds
to today.
According to [14, 21], the option prices C = CS,t(K, τ) as a function of the strike
K and maturity τ satisfy the following (Dupire-) equation
−Cτ +σ2(K, τ)
2K2CKK − (r − q)KCK − qK = 0, K > 0 (5.17)
with initial condition C(K, 0) = max(S − K, 0) and boundary condition C(0, τ) = 0.
Hence, σ(S, t) shall be determined from observation at different strikes K and maturities
τ of the option prices C(K, τ) satisfying (5.17). Since in reality, option prices are only
available for a few maturities T , but for many strikes K, one has to make some additional
assumptions on the structure of the volatility surface, e.g., σ(S, t) = ρ(t)σ(S). It turns
out, that in particular the identification of the volatility smile σ(S) is of interest in
finance. For simplicity we assume ρ(t) = 1 and r = q = 0 in the sequel, and focus on
the identification of σ(S) below:
Example 5.7 By substituting K = Sey and u = C/S, (5.17) transforms into
−uτ + a(y)(uyy − uy) = 0, u(y, 0) = max(1− ey, 0). (5.18)
5.2. PARAMETER IDENTIFICATION 75
Following the calculations in the appendix of [24], one sees that for sufficiently smooth
a, the solution u(q) of (5.18) is in u∗ + H2,1(QT ), where QT = R × (0, T ) and u∗
is a solution of the equation with a constant parameter a∗, which can be expressed
analytically. Hence, the parameter-to-output mapping
F : K(a∗) → u∗ + L2(R)
a 7→ u(·, T )
is well-defined for a ∈ K(a∗) := a ∈ a∗ + H1(R) : 0 < a ≤ a ≤ a, and we obtain (see
[22, 24] for details) that F ′(q)h = w(·, T ), where w satisfies
−wτ + a(wyy − wy) = −h(uyy − uy), on QT
with homogeneous initial condition. Furthermore, p = F ′(q)∗r satisfies
〈 p, h 〉H1 = −〈h,∫ T
0
(uyy − uy)Rdτ 〉L2 ,
where R solves
Rτ + (qR)yy + (qR)y = 0,
with terminal condition R(T ) = r. Under our regularity assumption on q, one can show
that∫ T
0(uyy − uy)Rdτ ∈ L2(QT ). Hence, F ′∗ maps Y = L2(R) into H2(R).
Next, we consider some aspects of a numerical implementation: according to [59], a
solution of (5.18) has the following asymptotic behavior
|u(y, τ)−H(y)| = O(e−|y|), |uy(y)| = O(e−|y|), |y| → ∞,
where H(x) denotes the Heavyside function. Hence, it is reasonable to approximate
(5.17) by an equation on a finite domain, which we do in the sequel, i.e., we assume
(5.18) holds for y ∈ QT = ΩM × (0, T ) with ΩM = (−M1,M2) and impose additional
boundary conditions, e.g.,
u(−M1, τ) = g0(τ), u(M2, τ) = g1(τ),
or
uy(−M1, τ) = h0(τ), uy(M2, τ) = h1(τ),
with g0, h0, h1 ∼ 1 and g1 ∼ 0. As a Hilbert scale we choose the one defined by
L2q = −qyy over X0 = H10 with X−1 = L2 and X1 = H2 ∩H1
0 . As we have seen above,
‖F ′(a)∗r‖1 = ‖F ′(a)∗r‖H2 ≤ C ‖r‖Y and consequently (N3) holds with
‖F ′(q)h‖Y ≤ C ‖h‖−a, a = 1.
Although we cannot verify (N6), we obtain (with CR = 0) that
‖Q(x, x)‖X−a,Y = ‖F ′(x)− F ′(x)‖X−a,Y ≤ c‖x− x‖ ,
76 CHAPTER 5. EXAMPLES AND NUMERICAL TESTS
which implies the Lipschitz-continuity of the Frechet-derivative considered as operator
on X−a. Note that if X−a and X s−a were equivalent (which is the case for s = 0), or if
at least an estimate from below (3.11) was available, this would in fact show (N6).
For an implementation of the preconditioned iterative methods we have to apply
the operator L−2sF ′(q)∗ to elements r ∈ Y . By the above representation, one gets for
s = −a/2 that L−2sF ′(q)∗r = L−1(∫ T
0(uyy − uy)Rdt), where L−1 amounts to one times
integration and can efficiently be implemented by Fourier transformation.
In the numerical example we set S∗ = 100 and try to identify a typical smile given
by aδ(y) = 12− 1
10erf(2(ey − 1)) from option values for strikes K = S∗ey ∈ [0, 300] and
maturity T = 1 year.
We want to mention at this point that in [24] sufficient conditions for a convergence
rate of O(δ1/2) for Tikhonov regularization applied to the inverse problem F (a) = uδ
have been derived, which essentially are differentiability conditions and fast decay of
a† − a∗ for |y| → ∞. In view of this results, we expect similar convergence rates in our
numerical tests.
Table 5.11 lists the iteration numbers for Landweber iteration and two Newton-type
iterations and their preconditioned version. Although, we could not proof (N6), and thus
formally our theory cannot be applied, the iteration numbers are still reduced quite
dramatically by preconditioning. Furthermore, the convergence rates of the iteration
errors are approximately ‖eδk∗‖ ∼ O(δ1/2) for all methods.
δ‖uδ−u(a0)‖ lw hs-lw new-lw hs-new-lw new-ν hs-new-ν
0.04 30 18 6(111) 4(43) 5(70) 3(25)
0.02 51 29 7(173) 5(70) 6(111) 4(43)
0.01 143 67 10(616) 8(266) 7(173) 5(70)
0.005 310 90 12(1404) 8(266) 8(266) 5(70)
0.0025 630 113 13(2114) 10(616) 9(406) 6(111)
η -1.13 -0.69 -1.15 -0.96 -0.63 -0.50
Table 5.11: Iteration numbers for iterative regularization methods and their Hilbert-
scale equivalents and the corresponding rates k∗ = O(δη); parameters τ = 2.1, ν = 2,
Nk = 500 and Nt = 250.
5.2.4 Reconstructing a Nonlinear Source Term in a Parabolic
Equation
In the final example, we investigate the identification of the nonlinearity in a parabolic
equation from boundary measurements:
Nonlinear parabolic equations appear, e.g., in the modeling of cooling processes for
steel and glass in liquids or gases, in the modeling of phase transitions, for instance in
crystallization processes of polymers, or in reaction kinetics of chemical systems. In all
5.2. PARAMETER IDENTIFICATION 77
these areas, the inverse problem of determining the nonlinearity from measurements is
of interest.
Example 5.8 As a model problem, we consider in the following the reconstruction of
a nonlinear source term q(u) in the parabolic equation
−ut + uxx + q(u) = f, on QT = [0, 1]× (0, T ) (5.19)
u(0, t) = ϕ0(t), u(1, t) = ϕ1(t), for t ∈ (0, T )
u(x, 0) = u0(x), for x ∈ [0, 1] (5.20)
from measurements of the Neumann data
ux(0, t) = ψ(t), ux(1, t) = ψ1(t), for t ∈ (0, T ).
According to [3] even such one-dimensional problems have important applications in
heat transfer at high temperatures, and we will therefore call u the temperature in the
sequel.
For further investigation we assume that the functions ϕ0, ϕ1, u0, and f are suffi-
ciently smooth, and that
ux ≤ −γ < 0, and ϕ′i(t) ≥ γ > 0. (5.21)
By means of the maximum principle, this monotonicity assumption can rather easily
be transfered to simple conditions on f , φi, and ψi. In [25], uniqueness and conditional
stability of a solution (u, q(u)) to this inverse problem has been proven by Carleman
estimates. Note that under the monotonicity assumption (5.21) the range of temperature
is known a-priori, i.e.,
u(x, t) ∈ y ∈ R : ϕ1(0) ≤ u ≤ ϕ0(T ) =: [a, b].
Thus, q can be considered as a function on [a, b]. Without (5.21), the range of values
of u is not known, and therefore has to be identified simultaneously (cf., e.g., [56] for a
related discussion). Below, we also assume that q is known at low temperature, i.e.,
q ∈ K = p ∈ H1[a, b] : p(a) = pa).
Such a condition was required for the stability analysis in [25]. We will now investigate
the conditions of Assumptions 4.21 and 4.22:
For q ∈ H1[a, b], f ∈ L2(QT ), and ϕ0, ϕ1, u0 sufficiently smooth, (5.19) – (5.20) has
a unique solution u ∈ H2,1(QT ), hence the operator
F : K → (L2[0, 1])2, q 7→ [ux(0, ·), ux(1, ·)]
is well-defined. Furthermore, F is Frechet-differentiable and the derivative in direction
h is given by F (q)′h = [wx(0, ·), wx(1, ·)], where w = u′(q)[h] denotes the solution of the
linearized (at q in direction h) equation
−wt + wxx + q′(u)w = −h(u), u = u(q)
78 CHAPTER 5. EXAMPLES AND NUMERICAL TESTS
with homogeneous initial and boundary conditions. By our assumptions, h(u) ∈ H1(QT )
and consequently, w ∈ H2,1(QT ). Next we show that F (q)′ can be extended to a linear
operator on L2(0, T ), i.e., one has
‖F ′(q)h‖2Y = ‖wx(0, ·)‖2
L2(0,T ) + ‖wx(1, ·)‖2L2(0,T ) ≤ c‖w‖2
H2,1(QT )
≤ C ‖h(u)‖2L2(QT ),
and it remains to estimate ‖h(u)‖L2 . By the monotonicity assumption (5.21), we obtain
‖h(u)‖2L2(QT ) =
∫
QT
(h(u))2dxdt =
∫ T
0
∫ u(0,t)
u(1,t)
(h(y))2|u−1x | dydt
≤ C ‖h‖2L2[0,1].
Now consider the Hilbert scale induced by L2p = −pxx over
X0 = p ∈ H1[a, b] : p(a) = 0
with X1 = p ∈ H2[a, b] : p(a) = p′(b) = 0. By differentiation one obtains that
X−1 = L2[a, b], and hence (N3) holds at least with a = 1. As in the previous example,
we are not able to verify (N6), but with the same arguments as above, one can show at
least that
‖F ′(x)− F ′(x)‖X−a,Y ≤ c‖x− x‖0,
which implies the Lipschitz-continuity of the Frechet-derivative considered as operator
on X−a = L2[a, b] here.
For the implementation of the preconditioned iterations, we have to calculate the
action of the operator L−2sF ′(q)∗ on some element r ∈ Y : one can show that F ′(q)∗ is
given by p = F ′(q)∗r with p ∈ X0 such that
〈 px, hx 〉L2[a,b] = 〈h(u), R 〉L2(QT ) (5.22)
and R satisfies the adjoint equation
Rt +Rxx + q′(u)R = 0
with homogeneous terminal condition and boundary condition
R(0, t) = −(ux(0, t)− ψδ0(t)), R(1, t) = ux(1, t)− ψδ1(t).
Instead of (5.22), one can alternatively solve
〈 p, h 〉L2[a,b] = 〈h(u), R 〉L2(QT ), L2p = p,
yielding L−2sF ′(q)∗r = L1p = L−1p. As in previous examples, the action of L−1 can be
implemented efficiently by Fourier transformation.
5.3. SUMMARY 79
We now turn to a numerical test: let φ0 = 1+t, φ1 = t, u0 = 1−t, f = sin(π(1−x+t)),
and consider the reconstruction of
q†(y) = sin(32πy),
from the Neumann data
ux(0, t) = ut(1, t) = −1.
For preconditioning we choose s = −1/2 with L as above. As the iteration numbers
listed in Table 5.12 show, the rates for the stopping indices are not reduced in all cases
by preconditioning. Note that our theory can formally not be applied, since we could
not verify condition (N6). However, throughout our numerical tests, the preconditioned
iterations perform much faster than their standard variants.
δ‖uδ−u(q0)‖ lw hs-lw new-lw hs-new-lw new-ν hs-new-ν
0.04 431 29 12(1404) 6(111) 7(173) 3(25)
0.02 737 54 14(3179) 7(173) 8(266) 4(43)
0.01 1210 90 15(4777) 8(266) 8(266) 5(70)
0.005 1983 161 16(7174) 10(616) 9(406) 6(111)
0.0025 3800 346 18(16164) 12(1404) 10(616) 7(173)
η -0.77 -0.87 -0.82 -0.91 -0.42 -0.69
Table 5.12: Iteration numbers for iterative regularization methods and their Hilbert-
scale equivalents and the corresponding rates k∗ = O(δη); parameters τ = 2.1, ν = 2,
Nx = 200, Nt = 100 and Nu = 100.
Note that the rates at which the iteration numbers of the numerical test grow with
increasing noise level are note improved by preconditioning in Hilbert scales. Neverthe-
less, the number of iterations are reduced by a factor of more than 10. The numerical
reconstructions of all methods are comparable and the numerically observed conver-
gence rates for the iteration errors are approximately O(δ1/5) throughout. More numer-
ical tests for this example can be found in [25], where Holder stability of the inverse
problem is shown, essentially under simple smoothness assumptions on q†.
5.3 Summary
We have seen in various examples, that preconditioning in Hilbert scales may drastically
accelerate iterative regularization methods, in particular, if the solutions are not very
smooth. The relaxed assumptions, i.e., (L2) and (N3) can usually be verified easily.
The verification of the nonlinearity condition (N6) is more subtle. However, even if
(N6) cannot be proven, (N3) at least guarantees well-definedness of a single iteration
step. As the results of our numerical examples suggest, some of the conditions in the
convergence analysis might still be weakend.
80 CHAPTER 5. EXAMPLES AND NUMERICAL TESTS
As the second test in Section 5.1.4 suggests, the choice of the Hilbert scale, in partic-
ular, the incorporation of the appropriate boundary conditions is essential. Otherwise,
the application of Hilbert scale preconditioning might even lead to non-convergence (in
the strong norm X0).
Bibliography
[1] R. A. Adams, Sobolev Spaces, Academic Press, London, 1975.
[2] O. M. Alifanov, Inverse Heat Transfer Problems, Springer, New York, 1994.
[3] O. M. Alifanov, E. A. Artyukhin, and S. V. Rumyantsev, Extreme Meth-
ods for Solving Ill-Posed Problems with Applications to Inverse Heat Transfer Prob-
lems, Begell House Inc., New York, 1995.
[4] G. Aubin and P. Kornprobst, Mathematical Problems in Image Processing,
Springer, Berlin, 2001.
[5] A. B. Bakushinskii, Remarks on choosing the regularization parameter using the
quasi-optimality and ratio criterion, USSR Comp. Math. MAth. Phys. 24 (1984),
181–182.
[6] A. B. Bakushinskii and A. V. Goncharskii, Iterative methods for the solution
of incorrect problems, Nauka, Moscow, 1989.
[7] , Ill-Posed Problems: Theory and Applications, Kluwer, Dordrecht, 1994.
[8] A. W. Bakushinskii, The problem of the convergence of the iteratively regularized
Gauß-Newton method, Comput. Math. Math. Phys. 32 (1992), 1353–1359.
[9] H. Banks and K. Kunisch, Parameter Estimation Techniques for Distributed
Systems, Birkhauser, Braunschweig, 1989.
[10] J. Beck, B. Blackwell, and C. S. Clair, Inverse Heat Conductions, Wiley,
Sussex, 1985.
[11] M. Bertero and P. Boccacci, Introduction to Inverse Problems in Imaging,
Istitute of Physics Publishing, London, 1998.
[12] F. Black and M. Scholes, The pricing of options and corporate liabilities, J.
of Political Economy 81 (1973), 637–659.
[13] B. Blaschke, A. Neubauer, and O. Scherzer, On convergence rates for
the iteratively regularized Gauss-Newton method, IMA Journal of Numer. Anal. 17
(1997), 421–436.
81
82 BIBLIOGRAPHY
[14] I. Bouchouev and V. Isakov, Uniqueness, stability and numerical methods for
the inverse problem that arises in financial markets, Inverse Problems 15 (1999),
R95–116.
[15] H. Brakhage, On ill-posed problems and the method of conjugate gradients, in:
H. W. Engl and C. W. Groetsch, eds., Inverse and Ill-posed Problems, Academic
Press, Boston, New York, London, 1987, 165–175.
[16] G. Chavent and K. Kunisch, On weakly nonlinear inverse problems, SIAM J.
Appl. Math. 56,2 (1996), 542–572.
[17] F. Colonius and K. Kunisch, Stability for parameter estimation in two point
boundary value problems, J. Reine Angew. Math. 370 (1986), 1–29.
[18] D. Colton and R. Kress, Inverse Acoustic and Electromagnetic Scattering
Theory, Springer, Berlin, 1992.
[19] R. Courant, uber die Eigenwerte bei Differentialgleichungen der mathematischen
Physik, Math.Z. 7 (1920), 1–57.
[20] P. Deuflhard, H. W. Engl, and O. Scherzer, A convergence analysis of
iterative methods for the solution of nonlinear ill-posed problems under affinely
invariant conditions., Inverse Problems 14 (1998), 1081–1106.
[21] B. Dupire, Pricing with a smile, RISK 7 (1994), 18–20.
[22] H. Egger, Identification of Volatility Smiles in the Black-Scholes Equation via
Tikhonov Regularization, Master’s thesis, Johannes Kepler University Linz, 2002.
[23] , Accelerated Newton-Landweber iterations for regularizing nonlinear inverse
problems, SFB-Report 2005-3, Linz, January 2005.
[24] H. Egger and H.W. Engl, Tikhonov regularization applied to the inverse prob-
lem of option pricing: Convergence analysis and rates, Inverse Problems 21 (2005),
1027–1045.
[25] H. Egger, H. W. Engl, and M. V. Klibanov, Global uniqueness and Holder
stability for recovering a nonlinear source term in a parabolic equation, Inverse
Problems 21 (2005), 271–290.
[26] H. Egger and A. Neubauer, Preconditioning Landweber iteration in Hilbert
scales, Numer. Math. (2005), to appear.
[27] B. Eicke, A. K. Louis, and R. Plato, The instability of some gradient methods
for ill-posed problems, Numer. Math. 58 (1990), 129–134.
[28] H. W. Engl, M. Hanke, and A. Neubauer, Regularization of Inverse Prob-
lems, Kluwer Academic Publishers, 1996.
BIBLIOGRAPHY 83
[29] H. W. Engl, K. Kunisch, and A. Neubauer, Convergence rates for Tikhonov
regularization of nonlinear ill-posed problems, Inverse Problems 5 (1989), 523–540.
[30] H. W. Engl, A. K. Louis, and W. Rundell, eds., Inverse Problems in Geo-
physics, SIAM, Philadelphia, 1996.
[31] , eds., Inverse Problems in Medichal Imaging and Nondestructive Testing,
Springer, Wien, New York, 1996.
[32] H. W. Engl and W. Rundell, eds., Inverse Problems in Diffusion Processes,
SIAM, Philadelphia, 1995.
[33] R. Gorenflo and S. Vesella, Abel Integral Equations: Analysis and Applica-
tions, Lecture Notes in Math. 1461, Springer, Berlin, 1991.
[34] C. W. Groetsch, Generalized Inverses of Linear Operators, Dekker, New York,
Basel, 1977.
[35] , Inverse Problems in the Mathematical Sciences, Vieweg, Braunschweig, 1993.
[36] M. Hanke, Accelerated Landweber iterations for the solution of ill-posed equations,
Numer. Math. 60 (1991), 341–373.
[37] , A regularization Levenberg-Marquart scheme, with application to inverse
groundwater filtration problems, Inverse Problems 13 (1997), 79–95.
[38] , Regularizing properties of a truncated Newton-CG algorithm for nonlinear
inverse problems, Numer. Func. Anal. Optim. 18 (1997), 971–993.
[39] M. Hanke, A. Neubauer, and O. Scherzer, A convergence analysis of the
Landweber iteration for nonlinear ill-posed problems, Numer. Math. 72 (1995), 21–
37.
[40] B. Hofmann, Mathematik inverser Probleme, Teubner, Stuttgart, Leipzig, 1999.
[41] T. Hohage, Logarithmic convergence rates of the iteratively regularized Gauß-
Newton method for an inverse potential and an inverse scattering problem, Inverse
Problems 13 (1997), 1279–1299.
[42] , Regularization of exponentially ill-posed problems, Numer. Funct. Anal. Op-
tim. 21 (2000), 439–464.
[43] D. Isaacson and J. C. Newell, Electrical impedance tomography, SIAM Review
41 (1999), 85–101.
[44] V. Isakov, Inverse Source Problems, Vol. 34 of Mathematical Surveys and Mono-
graphs, American Mathematical Society, Providence, RI, 1990.
84 BIBLIOGRAPHY
[45] , Inverse Problems in Partial Differential Equations, Springer, Berlin, 1998.
[46] , Carleman type estimates and their applications, in: K. Bingham et al., ed.,
New analytic and geometric methods in inverse problems, Springer, Berlin, 2004,
93–125.
[47] B. Kaltenbacher, Some Newton-type methods for the regularization of nonlinear
ill-posed problems, Inverse Problems 13 (1997), 729–753.
[48] , A-posteriori parameter choice strategies for some Newton type methods for
the regularization of nonlinear ill-posed problems, Numer. Math. 79 (1998), 501–
528.
[49] B. Kaltenbacher, A. Neubauer, and O. Scherzer, Iterative Regularization
Methods for Nonlinear Problems, Springer, Dordrecht, 2005, to appear.
[50] W. J. Kammerer and M. Z. Nashed, Iterative methods for best approximate
solutions of linear integral equations of the first and second kinds, J. Math. Anal.
Appl. 40 (1972), 547–573.
[51] J. Keller, Inverse problems, Amer. Math. Monthly 83 (1976), 107–118.
[52] M. V. Klibanov and A. Timonov, Carleman Estimates for Coefficient Inverse
Problems and Numerical Applications, Inverse and Ill-Posed Problems Series, VSP,
Netherlands, 2004.
[53] M. A. Krasnoselskii, P. P. Zabreiko, E. I. Pustylnik, and P. E.
Sbolevskii, Integral Operators in spaces of Summable Functions, Nordhoff In-
ternational Publishing, Leyden, 1976.
[54] S. G. Krein and J. I. Petunin, Scales of Banach spaces, Russian Math. Surveys
21 (1966), 85–160.
[55] R. Kreß, Linear Integral Equations, Springer, Berlin, 1989.
[56] P. Kugler and H. W. Engl, Identification of a temperature dependent heat
conductivity by Tikhonov regularization, J. Inv. Ill-posed Problems 10 (2002), 67–
90.
[57] R. Lagnado and S. Osher, A technique for calibrating derivative security pric-
ing models: numerical solution of the inverse problem, J. Computational Finance
1 (1997), 13–25.
[58] J. L. Lions and E. Magenes, Non-Homogeneous Boundary Value Problems and
Applications: Volume I, Springer, Berlin - Heidelberg, 1972.
[59] J. Lishang and T. Youshan, Identifying the volatility of underlying assets from
option prices, Inverse Problems 17 (2001), 137–155.
BIBLIOGRAPHY 85
[60] A. K. Louis, Inverse und schlecht gestellte Probleme, Teubner, Stuttgart, 1989.
[61] V. A. Morozov, On the solution of functional equations by the method of regu-
larization, Soviet Math. Dokl. 7 (1966), 414–417.
[62] M. Z. Nashed, Generalized Inverses and Applications, Academic Press, 1976.
[63] F. Natterer, Error bounds for Tikhonov regularization in Hilbert scales, Appl.
Anal. 18 (1984), 29–37.
[64] , The Mathematics of Computerized Tomography, Teubner, Stuttgart, 1986.
[65] A. Neubauer, When do Sobolev spaces form a Hilbert scale, Proc. Amer. Math.
Soc. 103 (1988), 557–562.
[66] , Tikhonov regularization of nonlinear ill-posed problems in Hilbert scales,
Appl. Anal. 46 (1992), 59–72.
[67] , On converse and saturation results for regularization methods, in: E. Schock,
ed., Beitrage zur Angewandten Analysis und Informatik, Helmut Brakhage zu
Ehren, Shaker, Aachen, 1994, 262–270.
[68] , On Landweber iteration for nonlinear ill-posed problems in Hilbert scales,
Numer. Math. 85 (2000), 309–328.
[69] V. G. Romanov and S. I. Kabanikhin, Inverse Problems for Maxwell’s Equa-
tions, VSP, Utrecht, 1994.
[70] E. Schock, Approximate solution of ill-posed equations: arbitrarily slow conver-
gence vs. superconvergence, in: G. Hammerlein and K. H. Hoffmann, eds., Con-
structive Methods for the Practical Treatment of Integral Equations, Birkhauser,
Basel, 1985, 234–243.
[71] U. Tautenhahn, Error estimates for regularization methods in Hilbert scales,
SIAM J. Numer. Anal. 33 (1996), 2120–2130.
[72] A. N. Tikhonov, Regularization of incorrectly posed problems, Soviet Math. Dokl.
4 (1963), 1624–1627.
Eidesstattliche Erklarung
Ich, Herbert Egger, erklare an Eides statt, dass ich die vorliegende Dissertation
selbstandig und ohne fremde Hilfe verfasst, andere als die angegebenen Quellen und
Hilfsmittel nicht benutzt bzw. die wortlich oder sinngemaß entnommenen Stellen als
solche kenntlich gemacht habe.
Linz, Juli 2004
Herbert Egger
A1
A2
Curriculum Vitae
Personal Data
Name: Herbert Alexander Egger
Date of Birth: June 23, 1973
Place of Birth: 4400 Steyr, Austria
Nationality: Austrian
Education
1979 – 1983 Primary school in Enns
1983 – 1991 Highschool at ”Bischofliches Gymnasium am Kollegium
Petrinum”
1991 – 1995 Technical Chemistry (WITECH) at the J. Kepler University,
Linz
1996 – 2002 Studies in Technical Mathematics, Studienzweig ”Industrial
Mathematics”, J. Kepler University, Linz; Diploma Jan, 2002
since 2002 Doctoral student at J. Kepler University, Linz
Awards
06/2002 Master’s thesis awarded the Ludwig Scharinger Price, Raif-
feisen Landesbank OO,
Professional Career
02/2002 – 03/2004 Scientific staff at the Special Research Project ”Numerical
and Symbolic Computation”, SFB013, University Linz
since 04/2004 Research Scientist at the Inverse Problems Group of the
J. Radon Institute for Computational and Applied Mathe-
matics, Austrian Academy of Sciences
A3
Miscellaneous
07/1995 – 02/1996 Military Service.
07/2000 Participation at the ECMI-Modeling Week in Lund, Sweden
01/2001 – 06/2002 Foreign semester at the Oxford Center for Industrial and Ap-
plied Mathematics (OCIAM), UK
07/2001 Participation at the summerschool ”Industrial Mathematics”,
ISAM 2001 in Siena, Italy
08/2002 SFB-Conference on ”Computational Methods for Inverse
Problems” in Strobl, Austria
09/2002 ECMI Conference, Riga, Latvia
02/2003 MATHMOD 2003, Vienna
09/2003 Inverse Problems Workshop series during the Special
Semester on ”Computational Methods and Emerging Appli-
cations”, IPAM, UCLA, USA
06/2004 Invited Colloquium Talk at University Chemnitz.
01/2005 Workshop on ”Symmetries, Inverse Problems and Image Pro-
cessing”, RICAM, Linz
04/2005 GAMM Conference, Luxembourg
06/2005 Applied Inverse Problems (AIP) Conference, Cirencester, UK
A4