and assume for all ε∈ (0,1], d∈ n, x∈ rd that (rφε,d) ∈ c(rd,r),...

70
arXiv:1911.09647v2 [math.NA] 17 Feb 2020 Uniform error estimates for artificial neural network approximations for heat equations Lukas Gonon 1 , Philipp Grohs 2 , Arnulf Jentzen 3,4 , David Kofler 5 , and David ˇ Siˇ ska 6 1 Faculty of Mathematics and Statistics, University of St. Gallen, Switzerland, e-mail: [email protected] 2 Faculty of Mathematics and Research Platform Data Science, University of Vienna, Austria, e-mail: [email protected] 3 Department of Mathematics, ETH Zurich, Switzerland, e-mail: [email protected] 4 Faculty of Mathematics and Computer Science, University of M¨ unster, Germany, e-mail: [email protected] 5 Department of Mathematics, ETH Zurich, Switzerland, e-mail: david.kofl[email protected] 6 School of Mathematics, University of Edinburgh, United Kingdom, e-mail: [email protected] February 18, 2020 Abstract Recently, artificial neural networks (ANNs) in conjunction with stochas- tic gradient descent optimization methods have been employed to approx- imately compute solutions of possibly rather high-dimensional partial dif- ferential equations (PDEs). Very recently, there have also been a number of rigorous mathematical results in the scientific literature which examine the approximation capabilities of such deep learning based approxima- tion algorithms for PDEs. These mathematical results from the scientific literature prove in part that algorithms based on ANNs are capable of overcoming the curse of dimensionality in the numerical approximation of high-dimensional PDEs. In these mathematical results from the sci- entific literature usually the error between the solution of the PDE and the approximating ANN is measured in the L p -sense with respect to some p [1, ) and some probability measure. In many applications it is, how- ever, also important to control the error in a uniform L -sense. The key contribution of the main result of this article is to develop the techniques 1

Upload: others

Post on 22-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: and assume for all ε∈ (0,1], d∈ N, x∈ Rd that (Rφε,d) ∈ C(Rd,R), |(Rφε,d)(x)|+k(∇(Rφε,d))(x)k ≤ cdc(1+kxkc), (1) and P(φε,d) ≤ cdcε−c , |ϕd

arX

iv:1

911.

0964

7v2

[m

ath.

NA

] 1

7 Fe

b 20

20

Uniform error estimates for artificial neural

network approximations for heat equations

Lukas Gonon1, Philipp Grohs2, Arnulf Jentzen3,4,

David Kofler5, and David Siska6

1 Faculty of Mathematics and Statistics, University of St. Gallen,

Switzerland, e-mail: [email protected]

2 Faculty of Mathematics and Research Platform Data Science,

University of Vienna, Austria, e-mail: [email protected]

3 Department of Mathematics, ETH Zurich,

Switzerland, e-mail: [email protected]

4 Faculty of Mathematics and Computer Science, University of Munster,

Germany, e-mail: [email protected]

5 Department of Mathematics, ETH Zurich,

Switzerland, e-mail: [email protected]

6 School of Mathematics, University of Edinburgh,

United Kingdom, e-mail: [email protected]

February 18, 2020

Abstract

Recently, artificial neural networks (ANNs) in conjunction with stochas-tic gradient descent optimization methods have been employed to approx-imately compute solutions of possibly rather high-dimensional partial dif-ferential equations (PDEs). Very recently, there have also been a numberof rigorous mathematical results in the scientific literature which examinethe approximation capabilities of such deep learning based approxima-tion algorithms for PDEs. These mathematical results from the scientificliterature prove in part that algorithms based on ANNs are capable ofovercoming the curse of dimensionality in the numerical approximationof high-dimensional PDEs. In these mathematical results from the sci-entific literature usually the error between the solution of the PDE andthe approximating ANN is measured in the Lp-sense with respect to somep ∈ [1,∞) and some probability measure. In many applications it is, how-ever, also important to control the error in a uniform L∞-sense. The keycontribution of the main result of this article is to develop the techniques

1

Page 2: and assume for all ε∈ (0,1], d∈ N, x∈ Rd that (Rφε,d) ∈ C(Rd,R), |(Rφε,d)(x)|+k(∇(Rφε,d))(x)k ≤ cdc(1+kxkc), (1) and P(φε,d) ≤ cdcε−c , |ϕd

to obtain error estimates between solutions of PDEs and approximatingANNs in the uniform L∞-sense. In particular, we prove that the numberof parameters of an ANN to uniformly approximate the classical solutionof the heat equation in a region [a, b]d for a fixed time point T ∈ (0,∞)grows at most polynomially in the dimension d ∈ N and the reciprocal ofthe approximation precision ε > 0. This shows that ANNs can overcomethe curse of dimensionality in the numerical approximation of the heatequation when the error is measured in the uniform L∞-norm.

Contents

1 Introduction 2

2 Sobolev and Monte Carlo estimates 5

2.1 Monte Carlo estimates . . . . . . . . . . . . . . . . . . . . . . . . 52.2 Volumes of the Euclidean unit balls . . . . . . . . . . . . . . . . . 62.3 Sobolev type estimates for smooth functions . . . . . . . . . . . . 132.4 Sobolev type estimates for Monte Carlo approximations . . . . . 23

3 Stochastic differential equations with affine coefficients 29

3.1 A priori estimates for Brownian motions . . . . . . . . . . . . . . 293.2 A priori estimates for solutions . . . . . . . . . . . . . . . . . . . 323.3 A priori estimates for differences of solutions . . . . . . . . . . . 37

4 Error estimates 39

4.1 Quantitative error estimates . . . . . . . . . . . . . . . . . . . . . 394.2 Qualitative error estimates . . . . . . . . . . . . . . . . . . . . . . 504.3 Qualitative error estimates for artificial neural networks (ANNs) 52

5 ANN approximations for heat equations 56

5.1 Viscosity solutions for heat equations . . . . . . . . . . . . . . . . 565.2 Qualitative error estimates for heat equations . . . . . . . . . . . 615.3 ANN approximations for geometric Brownian motions . . . . . . 63

1 Introduction

Artificial neural networks (ANNs) play a central role in machine learning ap-plications such as computer vision (cf., e.g., [36, 46, 58]), speech recognition(cf., e.g., [27,35,61]), game intelligence (cf., e.g., [56,57]), and finance (cf., e.g.,[7,12,59]). Recently, ANNs in conjunction with stochastic gradient descent opti-mization methods have also been employed to approximately compute solutionsof possibly rather high-dimensional partial differential equations (PDEs); cf., forexample, [4,5,6,7,8,10,13,14,17,18,19,22,23,26,32,33,34,37,41,48,49,50,53,60]and the references mentioned therein. The numerical simulation results in theabove named references indicate that such deep learning based approximation

2

Page 3: and assume for all ε∈ (0,1], d∈ N, x∈ Rd that (Rφε,d) ∈ C(Rd,R), |(Rφε,d)(x)|+k(∇(Rφε,d))(x)k ≤ cdc(1+kxkc), (1) and P(φε,d) ≤ cdcε−c , |ϕd

methods for PDEs have the fundamental power to overcome the curse of dimen-sionality (cf., e.g., Bellman [9]) in the sense that the precise number of the realparameters of the approximating ANN grows at most polynomially in both thedimension d ∈ N = 1, 2, 3, . . . of the PDE under consideration and the recip-rocal ε−1 of the prescribed approximation precision ε > 0. Very recently, therehave also been a number of rigorous mathematical results examining the ap-proximation capabilities of these deep learning based approximation algorithmsfor PDEs (see, e.g., [11, 20, 28, 30, 33, 38, 43, 47, 54, 60]). These works prove inpart that algorithms based on ANNs are capable of overcoming the curse ofdimensionality in the numerical approximation of high-dimensional PDEs. Inparticular, the works [11, 20, 28, 30, 38, 43, 47, 54] provide mathematical conver-gence results of such deep learning based numerical approximation methods forPDEs with dimension-independent error constants and convergence rates whichdepend on the dimension only polynomially.

Except of in the article Elbrachter et al. [20], in each of the approximationresults in the above cited articles [11,28,30,33,38,43,47,54,60] the error betweenthe solution of the PDE and the approximating ANN is measured in the Lp-sense with respect to some p ∈ [1,∞) and some probability measure. In manyapplications it is, however, also important to control the error in a uniform L∞-sense. This is precisely the subject of this article. More specifically, it is the keycontribution of Theorem 5.4 in Subsection 5.2 below, which is the main result ofthis article, to prove that ANNs can overcome the curse of dimensionality in thenumerical approximation of the heat equation when the error is measured in theuniform L∞-norm. The arguments used to prove the approximation results inthe above cited articles, where the error between the solution of the PDE and theapproximating ANN is measured in the Lp-sense with respect to some p ∈ [1,∞)and some probability measure, can not be employed for the uniform L∞-normapproximation and the article Elbrachter et al. [20] is concerned with a specificclass of PDEs so that the PDEs can essentially be solved analytically and theerror analysis in [20] strongly exploits this explicit solution representation. Thekey contribution of the main result of this article, Theorem 5.4 in Subsection 5.2below, is to develop the techniques to obtain error estimates between solutionsof PDEs and approximating ANNs in the uniform L∞-sense. To illustrate thefindings of the main result of this article in more detail, we formulate in thenext result a particular case of Theorem 5.4.

Theorem 1.1. Let a ∈ R, b ∈ (a,∞), c, T ∈ (0,∞), a ∈ C1(R,R), letAd : R

d → Rd, d ∈ N, satisfy for all d ∈ N, x = (x1, . . . , xd) ∈ R

d that Ad(x) =(a(x1), . . . , a(xd)), let N = ∪L∈N∩[2,∞) ∪l0,...,lL∈N

(

×Lk=1 (R

lk×lk−1 × Rlk))

, letP : N → N and R : N → ∪m,n∈NC(R

m,Rn) satisfy for all L ∈ N ∩ [2,∞),l0, . . . , lL ∈ N, Φ = ((W1, B1), . . . , (WL, BL)) ∈ (×L

k=1(Rlk×lk−1 × R

lk)), x0 ∈R

l0 , . . . , xL−1 ∈ RlL−1 with ∀ k ∈ N ∩ (0, L) : xk = Alk(Wkxk−1 + Bk) that

P(Φ) =∑L

k=1 lk(lk−1+1), (RΦ) ∈ C(Rl0 ,RlL), and (RΦ)(x0) =WLxL−1+BL,let ϕd ∈ C(Rd,R), d ∈ N, let (φε,d)(ε,d)∈(0,1]×N ⊆ N, let ‖·‖ :

(

∪d∈NRd)

→[0,∞) satisfy for all d ∈ N, x = (x1, . . . , xd) ∈ R

d that ‖x‖ = [∑d

j=1 |xj |2]1/2,

3

Page 4: and assume for all ε∈ (0,1], d∈ N, x∈ Rd that (Rφε,d) ∈ C(Rd,R), |(Rφε,d)(x)|+k(∇(Rφε,d))(x)k ≤ cdc(1+kxkc), (1) and P(φε,d) ≤ cdcε−c , |ϕd

and assume for all ε ∈ (0, 1], d ∈ N, x ∈ Rd that

(Rφε,d) ∈ C(Rd,R) , |(Rφε,d)(x)|+ ‖(∇(Rφε,d))(x)‖ ≤ cdc(1 + ‖x‖c) , (1)

and P(φε,d) ≤ cdcε−c , |ϕd(x) − (Rφε,d)(x)| ≤ εcdc(1 + ‖x‖c) . (2)

Then

(i) there exist unique at most polynomially growing ud ∈ C([0, T ] × Rd,R),

d ∈ N, which satisfy for all d ∈ N, t ∈ (0, T ], x ∈ Rd that ud|(0,T ]×Rd ∈

C1,2((0, T ]× Rd,R), ud(0, x) = ϕd(x), and

( ∂∂tud)(t, x) = (∆xud)(t, x) (3)

and

(ii) there exist (ψε,d)(ε,d)∈(0,1]×N ⊆ N and κ ∈ R such that for all ε ∈ (0, 1],

d ∈ N it holds that P(ψε,d) ≤ κdκε−κ, (Rψε,d) ∈ C(Rd,R), and

supx∈[a,b]d

|ud(T, x)− (Rψε,d)(x)| ≤ ε. (4)

Theorem 1.1 follows directly from Corollary 5.5 in Subsection 5.2 below.Corollary 5.5, in turn, is a consequence of Theorem 5.4 in Subsection 5.2, themain result of the article. Let us add a few comments on some of the mathe-matical objects appearing in Theorem 1.1 above. The real number T ∈ (0,∞)denotes the time horizon on which we consider the heat equations in (3). Thefunction a ∈ C1(R,R) describes the activation function which we employ for theconsidered ANN approximations. In particular, Theorem 1.1 applies to ANNswith the standard logistic function as the activation function in which case thefunction a ∈ C1(R,R) in Theorem 1.1 satisfies that for all x ∈ R it holds thata(x) = (1 + e−x)−1. The set N contains all possible ANNs, where each ANNis described abstractly in terms of the number of hidden layers, the number ofnodes in each layer, and the values of the parameters (weights and biases ineach layer), the function P : N → N maps each ANN Φ ∈ N to its total numberof parameters P(Φ), and the function R : N → ∪m,n∈NC(R

m,Rn) maps eachANN Φ ∈ N to the actual function (RΦ) (its realization) associated to Φ (cf.,e.g., Grohs et al. [29, Section 2.1] and Petersen & Voigtlaender [52, Section 2]).Item (ii) in Theorem 1.1 above establishes under the hypotheses of Theorem 1.1that the solution ud : [0, T ] × R

d → R of the heat equation can at time T beapproximated by means of an ANN without the curse of dimensionality. Tosum up, roughly speaking, Theorem 1.1 shows that if the initial conditions ofthe heat equations can be approximated well by ANNs, then the number ofparameters of an ANN to uniformly approximate the classical solution of theheat equation in a region [a, b]d for a fixed time point T ∈ (0,∞) grows at mostpolynomially in the dimension d ∈ N and the reciprocal of the approximationprecision ε > 0.

In our proof of Theorem 1.1 and Theorem 5.4, respectively, we employ severalprobabilistic and analytic arguments. In particular, we use a Feynman-Kac

4

Page 5: and assume for all ε∈ (0,1], d∈ N, x∈ Rd that (Rφε,d) ∈ C(Rd,R), |(Rφε,d)(x)|+k(∇(Rφε,d))(x)k ≤ cdc(1+kxkc), (1) and P(φε,d) ≤ cdcε−c , |ϕd

type formula for PDEs of the Kolmogorov type (cf., for example, Hairer etal. [31, Corollary 4.17]), Monte Carlo approximations (cf. Proposition 2.3),Sobolev type estimates for Monte Carlo approximations (cf. Lemma 2.16), thefact that stochastic differerential equations with affine linear coefficient functionsare affine linear in the initial condition (cf. Grohs et al. [28, Proposition 2.20]),as well as an existence result for realizations of random variables (cf. Grohs etal. [28, Proposition 3.3]).

The rest of this research paper is structured in the following way. In Sec-tion 2 below preliminary results on Monte Carlo approximations together withSobolev type estimates are established. Section 3 contains preliminary resultson stochastic differential equations. In Section 4 we employ the results fromSections 2–3 to obtain uniform error estimates for ANN approximations. InSection 5 these uniform error estimates are used to prove Theorem 5.4 in Sub-section 5.2 below, the main result of this article.

2 Sobolev and Monte Carlo estimates

2.1 Monte Carlo estimates

In this subsection we recall in Lemma 2.2 below an estimate for the p-Kahane–Khintchine constant from the scientific literature (cf., for example, Cox et al. [16,Definition 5.4] or Grohs et al. [28, Definition 2.1]). Lemma 2.2, in particular,ensures that the p-Kahane–Khintchine constant grows at most polynomially inp. Lemma 2.2 will be employed in the proof of Corollary 4.2 in Subsection4.1 below. Our proof of Lemma 2.2 is based on an application of Hytonen etal. [40, Theorem 6.2.4] and is a slight extension of Grohs et al. [28, Lemma 2.2].For completeness we also recall in Definition 2.1 below the notion of the Kahane–Khintchine constant (cf., e.g., Cox et al. [16, Definition 5.4]). Proposition 2.3below is an Lp-approximation result for Monte-Carlo approximations. ThisLp-approximation result for Monte-Carlo approximations is one of the mainingredients in our proof of Lemma 2.16 in Subsection 2.4 below. Proposition 2.3is well-known in the literature and is proved, e.g., as Corollary 5.12 in Cox etal. [16].

Definition 2.1. Let p ∈ (0,∞). Then we denote by Kp ∈ [0,∞] the quantitygiven by

Kp =

sup

c ∈ [0,∞) :

∃ probability space (Ω,F ,P) :∃ R-Banach space (E, ‖ · ‖E) :∃ k ∈ N: ∃x1, . . . , xk ∈ E \ 0 :∃ P-Rademacher family rj : Ω → −1, 1, j ∈ N :(

E

[∥

∑kj=1 rjxj

p

E

])1/p

= c

(

E

[

∑kj=1 rjxj

2

E

])1/2

(5)

5

Page 6: and assume for all ε∈ (0,1], d∈ N, x∈ Rd that (Rφε,d) ∈ C(Rd,R), |(Rφε,d)(x)|+k(∇(Rφε,d))(x)k ≤ cdc(1+kxkc), (1) and P(φε,d) ≤ cdcε−c , |ϕd

and we refer to Kp as the p-Kahane–Khintchine constant.

Lemma 2.2. Let p ∈ [1,∞). Then Kp ≤√

max1, p− 1 (cf. Definition 2.1).

Proof of Lemma 2.2. Throughout this proof let (Ω,F ,P) be a probability space,let (E, ‖·‖E) be a R-Banach space, let k ∈ N, x1, . . . , xk ∈ E\0, and letrj : Ω → −1, 1, j ∈ N, be i.i.d. random variables which satisfy that

P(r1 = −1) = P(r1 = 1) =1

2. (6)

Observe that Jensen’s inequality ensures for all q ∈ [1, 2] that

E

k∑

j=1

rjxj

q

E

1/q

=

E

k∑

j=1

rjxj

2 q2

E

1/q

E

k∑

j=1

rjxj

2

E

1/2

.

(7)In addition, observe that [40, Theorem 6.2.4] (with q = q, p = 2 for q ∈ (2,∞)in the notation of [40, Theorem 6.2.4]) ensures for all q ∈ (2,∞) that

E

k∑

j=1

rjxj

q

E

1/q

≤√

q − 1

E

k∑

j=1

rjxj

2

E

1/2

. (8)

Combining this with (7) demonstrates that for all q ∈ [1,∞) it holds that

E

k∑

j=1

rjxj

q

E

1/q

≤√

max1, q − 1

E

k∑

j=1

rjxj

2

E

1/2

. (9)

The proof of Lemma 2.2 is thus completed.

Proposition 2.3. Let d, n ∈ N, p ∈ [2,∞), let ‖·‖ : Rd → [0,∞) be the standardnorm on R

d, let (Ω,F ,P) be a probability space, and let Xi : Ω → Rd, i ∈

1, . . . , n, be i.i.d. random variables with E[

‖X1‖]

<∞. Then

(

E

[∥

E[X1]−1

n

(

n∑

i=1

Xi

)∥

p])1/p

≤ 2Kp (E[‖X1 − E[X1]‖p])1/p

√n

. (10)

(cf. Definition 2.1).

2.2 Volumes of the Euclidean unit balls

In this subsection we provide in Corollary 2.8 below an elementary and well-known upper bound for the volumes of the Euclidean unit balls. Corollary 2.8will be used in our proof of Corollary 2.14 in Subsection 2.3 below. Our proof ofCorollary 2.8 employs the elementary and well-known results in Lemmas 2.4–2.7below. For completeness we also provide in this subsection detailed proofs forLemmas 2.4–2.7 and Corollary 2.8.

6

Page 7: and assume for all ε∈ (0,1], d∈ N, x∈ Rd that (Rφε,d) ∈ C(Rd,R), |(Rφε,d)(x)|+k(∇(Rφε,d))(x)k ≤ cdc(1+kxkc), (1) and P(φε,d) ≤ cdcε−c , |ϕd

Lemma 2.4. Let Γ: (0,∞) → (0,∞) and B: (0,∞)2 → (0,∞) satisfy for all

x, y ∈ (0,∞) that Γ(x) =∫∞0 tx−1e−t dt and B(x, y) =

∫ 1

0 tx−1(1− t)y−1 dt and

let x, y ∈ (0,∞). Then

(i) it holds that Γ(x+ 1) = xΓ(x),

(ii) it holds that B(x, y) = Γ(x)Γ(y)Γ(x+y) ,

(iii) it holds that Γ(1) = 1 and Γ(12 ) =√π, and

(iv) it holds that√

x

[x

e

]x

≤ Γ(x) ≤√

x

[x

e

]x

e1

12x . (11)

Proof of Lemma 2.4. Throughout this proof let Φ: (0,∞) × (0, 1) → (0,∞)2

satisfy for all s ∈ (0,∞), t ∈ (0, 1) that

Φ(s, t) = (s(1− t), st) (12)

and let f : (0,∞)2 → (0,∞) satisfy for all s, t ∈ (0,∞) that

f(s, t) = sx−1ty−1e−(s+t). (13)

Observe that integration by parts shows that

Γ(x+ 1) =

∫ ∞

0

txe−t dt =[

−txe−t]t=∞t=0

+ x

∫ ∞

0

tx−1e−t dt = xΓ(x). (14)

This establishes item (i). Next note that Fubini’s theorem ensures that

Γ(x)Γ(y) =

∫ ∞

0

sx−1e−s ds

∫ ∞

0

ty−1e−t dt

=

∫ ∞

0

∫ ∞

0

sx−1ty−1e−(s+t) ds dt =

∫ ∞

0

∫ ∞

0

f(s, t) ds dt .

(15)

Moreover, note that for all s ∈ (0,∞), t ∈ (0, 1) it holds that

det(Φ′(s, t)) = s ∈ (0,∞). (16)

This, (15), the integral transformation theorem (cf., for example, [15, Theorem6.1.7]), and Fubini’s theorem prove that

Γ(x)Γ(y) =

∫ ∞

0

∫ 1

0

f(Φ(s, t)) |det(Φ′(s, t))| dt ds

=

∫ ∞

0

∫ 1

0

(s(1− t))x−1(st)y−1e−(s(1−t)+st)s dt ds

=

∫ ∞

0

sx+y−1e−s ds

∫ 1

0

ty−1(1− t)x−1 dt = Γ(x+ y)B(y, x).

(17)

7

Page 8: and assume for all ε∈ (0,1], d∈ N, x∈ Rd that (Rφε,d) ∈ C(Rd,R), |(Rφε,d)(x)|+k(∇(Rφε,d))(x)k ≤ cdc(1+kxkc), (1) and P(φε,d) ≤ cdcε−c , |ϕd

Next note that the integral transformation theorem with the diffeomorphism(0, 1) ∋ t 7→ (1− t) ∈ (0, 1) ensures that

B(x, y) =

∫ 1

0

tx−1(1− t)y−1 dt =

∫ 1

0

(1− t)x−1ty−1 dt = B(y, x). (18)

Combining this with (17) establishes item (ii). Next note that

Γ(1) =

∫ ∞

0

t1−1e−t dt =

∫ ∞

0

e−t dt = 1. (19)

Item (ii) and the integral transformation theorem with the diffeomorphism(0, π2 ) ∋ t 7→ [sin(t)]2 ∈ (0, 1) therefore show that

[

Γ(

12

)]2

Γ(1)= B

(

12 ,

12

)

=

∫ 1

0

t−1/2(1− t)−

1/2 dt =

∫ π2

0

2 dt = π. (20)

Combining this with (19) establishes item (iii). Next note that Artin [3, Chapter3, (3.9)] ensures that there exists µ : (0,∞) → R which satisfies for all t ∈ (0,∞)that 0 < µ(t) < 1

12t and

Γ(t) =√2πtt−1/2e−teµ(t) . (21)

Hence, we obtain that√2πxx−1/2e−x ≤ Γ(x) ≤

√2πxx−1/2e−xe

112x . (22)

This establishes item (iv). The proof of Lemma 2.4 is thus completed.

Lemma 2.5. Let B: (0,∞)2 → (0,∞) satisfy for all x, y ∈ (0,∞) that B(x, y) =∫ 1

0 tx−1(1− t)y−1 dt. Then it holds for all p ∈ [0,∞) that

∫ π2

0

[sin(t)]pdt =

B(

p+12 , 12

)

2. (23)

Proof of Lemma 2.5. First, note that for all t ∈ (0, 1) it holds that

arcsin′(t) = (1− t2)−1/2. (24)

This and the integral transformation theorem with the diffeomorphism (0, 1) ∋t 7→ arcsin(t) ∈ (0, π2 ) ensure for all p ∈ [0,∞) that

∫ π2

0

[sin(t)]p dt =

∫ 1

0

tp(1− t2)−1/2 dt. (25)

The integral transformation theorem with the diffeomorphism (0, 1) ∋ t 7→√t ∈

(0, 1) hence implies for all p ∈ [0,∞) that∫ π

2

0

[sin(t)]p dt =1

2

∫ 1

0

tp/2−1/2(1 − t)−

1/2 dt

=1

2

∫ 1

0

t(p+1)/2−1(1 − t)

1/2−1 dt =B(

p+12 , 12

)

2.

(26)

The proof of Lemma 2.5 is thus completed.

8

Page 9: and assume for all ε∈ (0,1], d∈ N, x∈ Rd that (Rφε,d) ∈ C(Rd,R), |(Rφε,d)(x)|+k(∇(Rφε,d))(x)k ≤ cdc(1+kxkc), (1) and P(φε,d) ≤ cdcε−c , |ϕd

Lemma 2.6. Let R ∈ (0,∞], for every d ∈ N let ‖·‖Rd : Rd → [0,∞) be thestandard norm on R

d, for every d ∈ 2, 3, . . . let Bd = x ∈ Rd : ‖x‖Rd < R

and

Sd =

(0, 2π) for d = 2

(0, 2π)× (0, π)d−2 for d ∈ 3, 4, . . ., (27)

and let Td : (0, R) × Sd → Rd, d ∈ 2, 3, . . ., satisfy for all d ∈ 2, 3, . . .,

r ∈ (0, R), ϕ ∈ (0, 2π), ϑ1, . . . , ϑd−2 ∈ (0, π) that if d = 2 then T2(r, ϕ) =r(cos(ϕ), sin(ϕ)) and if d ≥ 3 then

Td(r, ϕ,ϑ1, . . . , ϑd−2) = r(

cos(ϕ)[

∏d−2i=1 sin(ϑi)

]

, sin(ϕ)[

∏d−2i=1 sin(ϑi)

]

,

cos(ϑ1)[

∏d−2i=2 sin(ϑi)

]

, . . . , cos(ϑd−3) sin(ϑd−2), cos(ϑd−2))

.(28)

Then

(i) it holds for all r ∈ (0, R), ϕ ∈ (0, 2π) that

| det((T2)′(r, ϕ))| = r, (29)

(ii) it holds for all d ∈ 3, 4, . . ., r ∈ (0, R), ϕ ∈ (0, 2π), ϑ1, . . . , ϑd−2 ∈ (0, π)that

| det((Td)′(r, ϕ, ϑ1, . . . , ϑd−2))| = rd−1[

∏d−2i=1 [sin(ϑi)]

i]

, (30)

and

(iii) it holds for all d ∈ 2, 3, . . . and all B(Rd)/B([0,∞))-measurable functionsf : Rd → [0,∞) that

Bd

f(x) dx =

∫ R

0

Sd

f(Td(r, φ)) | det((Td)′(r, φ))| dφ dr . (31)

Proof of Lemma 2.6. Throughout this proof for every d ∈ 2, 3, . . . let λRd :B(Rd) → [0,∞] be the Lebesgue–Borel measure on R

d. Observe that for allr ∈ (0, R), ϕ ∈ (0, 2π) it holds that

(T2)′(r, ϕ) =

(

cos(ϕ) −r sin(ϕ)sin(ϕ) r cos(ϕ)

)

. (32)

Hence, we obtain that for all r ∈ (0, R), ϕ ∈ (0, 2π) it holds that

∣det(

(T2)′(r, ϕ)

)∣

∣ = r[cos(ϕ)]2 + r[sin(ϕ)]2 = r. (33)

This establishes item (i). Next observe that Amann & Escher [2, Ch. X, Lemma8.8] establishes item (ii). To establish item (iii) we distinguish between the cased = 2 and the case d ∈ 3, 4, . . .. First, we consider the case d = 2. Note thatT2 : (0, R)× (0, 2π) → T2((0, R)× (0, 2π)) is a bijective function. This and item

9

Page 10: and assume for all ε∈ (0,1], d∈ N, x∈ Rd that (Rφε,d) ∈ C(Rd,R), |(Rφε,d)(x)|+k(∇(Rφε,d))(x)k ≤ cdc(1+kxkc), (1) and P(φε,d) ≤ cdcε−c , |ϕd

(i) show that T2 : (0, R) × (0, 2π) → T2((0, R) × (0, 2π)) is a diffeomorphism.Next observe that

T2((0, R)× (0, 2π)) = B2\(x, 0): x ∈ [0,∞). (34)

The fact that T2 : (0, R) × (0, 2π) → T2((0, R) × (0, 2π)) is a diffeomorphism,the fact that λR2((x, 0): x ∈ [0,∞)) = 0, and the integral transformationtheorem hence demonstrate that for all B(R2)/B([0,∞))-measurable functionsf : R2 → [0,∞) it holds that

B2

f(x) dx =

∫ R

0

S2

f(T2(r, φ))∣

∣ det(

(T2)′(r, φ)

)∣

∣ dφ dr . (35)

This establishes item (iii) in the case d = 2. Next we consider the cased ∈ 3, 4, . . .. Note that Amann & Escher [2, Ch. X, Lemma 8.8] impliesthat Td : (0, R) × (0, 2π) × (0, π)d−2 → Td((0, R) × (0, 2π)× (0, π)d−2) is a dif-feomorphism with

Td((0, R)× (0, 2π)× (0, π)d−2) = Bd\(

[0,∞)× 0 × Rd−2)

. (36)

The fact that λRd

(

[0,∞) × 0 × Rd−2)

= 0 and the integral transformationtheorem hence show that for all B(Rd)/B([0,∞))-measurable functions f : Rd →[0,∞) it holds that

Bd

f(x) dx =

∫ R

0

Sd

f(Td(r, φ))∣

∣ det(

(Td)′(r, φ)

)∣

∣ dφ dr . (37)

This establishes item (iii) in the case d ∈ 3, 4, . . .. The proof of Lemma 2.6 isthus completed.

Lemma 2.7. Let d ∈ N, let ‖·‖ : Rd → [0,∞) be the standard norm on Rd, let

λ : B(Rd) → [0,∞] be the Lebesgue–Borel measure on Rd, let B ⊆ R

d be theset given by B = x ∈ R

d : ‖x‖ < 1, and let Γ: (0,∞) → (0,∞) satisfy for allx ∈ (0,∞) that Γ(x) =

∫∞0tx−1e−t dt. Then

(i) for d ∈ 2, 3, . . . it holds that

λ(B) =2π

d

[

d−2∏

i=1

∫ π

0

[sin(ϑi)]i dϑi

]

(38)

and

(ii) it holds that

λ(B) =π

d/2

Γ(

d2 + 1

) . (39)

10

Page 11: and assume for all ε∈ (0,1], d∈ N, x∈ Rd that (Rφε,d) ∈ C(Rd,R), |(Rφε,d)(x)|+k(∇(Rφε,d))(x)k ≤ cdc(1+kxkc), (1) and P(φε,d) ≤ cdcε−c , |ϕd

Proof of Lemma 2.7. To establish (38) and (39) we distinguish between the cased = 1, the case d = 2, and the case d ≥ 3. First, we consider the case d = 1.Note that items (i) and (iii) in Lemma 2.4 show that

Γ(

12 + 1

)

=Γ(

12

)

2=π1/2

2. (40)

This implies thatπ1/2

Γ(

12 + 1

) = 2. (41)

Combining this and the fact that λ(B) = λ((−1, 1)) = 2 establishes (39) in thecase d = 1. Next we consider the case d = 2. Note that items (i) and (iii) inLemma 2.6 and Fubini’s theorem prove that

λ(B) =

B

dx =

∫ 2π

0

∫ 1

0

r dr dϕ = π. (42)

Next note that items (i) and (iii) in Lemma 2.4 show that

Γ(2) = Γ(1 + 1) = Γ(1) = 1. (43)

This implies thatπ

Γ(1 + 1)= π. (44)

Combining this with (42) establishes (38) and (39) in the case d = 2. Next weconsider the case d ≥ 3. Note that items (ii)–(iii) in Lemma 2.6 and Fubini’stheorem ensure that

λ(B) =

B

dx

=

∫ π

0

· · ·∫ π

0

∫ 2π

0

∫ 1

0

rd−1

[

d−2∏

i=1

[sin(ϑi)]i

]

dr dϕ dϑ1 · · · dϑd−2

=1

d

∫ 2π

0

[

d−2∏

i=1

∫ π

0

[sin(ϑi)]i dϑi

]

.

(45)

This establishes (38) in the case d ∈ 3, 4, . . .. Moreover, note that (45) and

the fact that for all k ∈ N it holds that∫ π

0[sin(t)]k dt = 2

∫ π2

0[sin(t)]k dt show

that

λ(B) =4

d

∫ π2

0

[

d−2∏

i=1

2

∫ π2

0

[sin(ϑi)]i dϑi

]

. (46)

Combining this, Lemma 2.5 (with p = i for i ∈ 0, . . . , d− 2 in the notation of

11

Page 12: and assume for all ε∈ (0,1], d∈ N, x∈ Rd that (Rφε,d) ∈ C(Rd,R), |(Rφε,d)(x)|+k(∇(Rφε,d))(x)k ≤ cdc(1+kxkc), (1) and P(φε,d) ≤ cdcε−c , |ϕd

Lemma 2.5), and item (ii) in Lemma 2.4 demonstrates that

λ(B) =2

dB(

12 ,

12

)

[

d−2∏

i=1

B(

i+12 , 12

)

]

=2

d

Γ(

12

)

Γ(

12

)

Γ(1)

[

d−2∏

i=1

Γ(

i+12

)

Γ(

12

)

Γ(

i+22

)

]

=2

d

[

Γ(

12

) ]d

Γ(

d2

) .

(47)

Items (i) and (iii) in Lemma 2.4 hence show that

λ(B) =

[

Γ(

12

) ]d

d2Γ(

d2

) =πd/2

Γ(

d2 + 1

) . (48)

This establishes (39) in the case d ∈ 3, 4, . . .. The proof of Lemma 2.7 is thuscompleted.

Corollary 2.8. Let d ∈ N, let ‖·‖ : Rd → [0,∞) be the standard norm on Rd,

and let λ : B(Rd) → [0,∞] be the Lebesgue–Borel measure on Rd. Then

λ(

x ∈ Rd : ‖x‖ < 1

)

≤ 1√dπ

[2πe

d

]d/2

. (49)

Proof of Lemma 2.8. Throughout this proof let Γ: (0,∞) → (0,∞) satisfy forall x ∈ (0,∞) that Γ(x) =

∫∞0tx−1e−t dt. Note that Lemma 2.7 shows that

λ(

x ∈ Rd : ‖x‖ < 1

)

d2

Γ(

d2 + 1

) . (50)

Moreover, note that items (i) and (iv) in Lemma 2.4 imply that for all x ∈ (0,∞)it holds that

Γ(x+ 1) = xΓ(x) ≥√2πx

[x

e

]x

. (51)

Hence, we obtain that

1

Γ(

d2 + 1

) ≤ 1√dπ

[

2e

d

]d/2

. (52)

Combining this with (50) demonstrates that

λ(

x ∈ Rd : ‖x‖ < 1

)

≤ 1√dπ

[2πe

d

]d/2

. (53)

The proof of Lemma 2.8 is thus completed.

12

Page 13: and assume for all ε∈ (0,1], d∈ N, x∈ Rd that (Rφε,d) ∈ C(Rd,R), |(Rφε,d)(x)|+k(∇(Rφε,d))(x)k ≤ cdc(1+kxkc), (1) and P(φε,d) ≤ cdcε−c , |ϕd

2.3 Sobolev type estimates for smooth functions

In this subsection we present in Corollary 2.15 below a Sobolev type estimatefor smooth functions with an explicit and dimension-independent constant inthe Sobolev type estimate. Corollary 2.15 will be employed in our proof ofCorollary 4.2 in Subsection 4.1 below. Corollary 2.15 is a consequence of theelementary results in Lemma 2.9 and Corollary 2.14 below. Corollary 2.14, inturn, follows from Proposition 2.13 below. Proposition 2.13 is a special caseof Mizuguchi et al. [51, Theorem 2.1]. For completeness we also provide inthis subsection a proof for Proposition 2.13. Our proof of Proposition 2.13employs the well-known results in Lemmas 2.10–2.12 below. Results similarto Lemmas 2.10–2.12 can, e.g., be found in Mizuguchi et al. [51, Lemma 3.1,Theorem 3.3, and Theorem 3.4] and Gilbarg & Trudinger [25, Lemma 7.16].

Lemma 2.9. Let Φ ∈ C1((0, 1),R) satisfy that∫ 1

0(|Φ(x)|+ |Φ′(x)|) dx < ∞.

Then

supx∈(0,1)

|Φ(x)| ≤∫ 1

0

(|Φ(x)| + |Φ′(x)|) dx . (54)

Proof of Lemma 2.9. First, note that the fundamental theorem of calculus en-sures that for all x ∈ (0, 1) it holds that

Φ(x) =

∫ 1

0

[

Φ(s)−∫ s

x

Φ′(t) dt

]

ds. (55)

The triangle inequality and the hypothesis that∫ 1

0(|Φ(x)|+ |Φ′(x)|) dx < ∞

hence show that for all x ∈ (0, 1) it holds that

|Φ(x)| =∣

∫ 1

0

[

Φ(s)−∫ s

x

Φ′(t) dt

]

ds

≤∫ 1

0

[

|Φ(s)|+∫ s

x

|Φ′(t)| dt]

ds

≤∫ 1

0

|Φ(s)| ds+∫ 1

0

|Φ′(t)| dt <∞.

(56)

This implies (54). The proof of Lemma 2.9 is thus completed.

Lemma 2.10. Let d ∈ 2, 3, . . ., p ∈ (d,∞), let ‖·‖ : Rd → [0,∞) be thestandard norm on R

d, let λ : B(Rd) → [0,∞] be the Lebesgue–Borel measure onR

d, and let W ⊆ Rd be a non-empty, bounded, and open set. Then

∪x∈W y−x : y∈W‖z‖(1−d) p

p−1 dz

≤ d(p− 1)

p− d

[

supv,w∈W

‖v − w‖]

p−dp−1

λ(

x ∈ Rd : ‖x‖ < 1

)

.

(57)

13

Page 14: and assume for all ε∈ (0,1], d∈ N, x∈ Rd that (Rφε,d) ∈ C(Rd,R), |(Rφε,d)(x)|+k(∇(Rφε,d))(x)k ≤ cdc(1+kxkc), (1) and P(φε,d) ≤ cdcε−c , |ϕd

Proof of Lemma 2.10. Throughout this proof let ρ ∈ (0,∞) satisfy that ρ =supv,w∈W ‖v − w‖, let V ⊆ R

d be the set given by V = ∪x∈W y − x : y ∈ W,let S ⊆ R

d−1 be the set given by

S =

(0, 2π) : d = 2

(0, 2π)× (0, π)d−2 : d ∈ 3, 4, . . ., (58)

let Br ⊆ Rd, r ∈ (0,∞), be the sets which satisfy for all r ∈ (0,∞) that

Br = x ∈ Rd : ‖x‖ < r, and let TR : (0, R) × S → R

d, R ∈ (0,∞], satisfy forall R ∈ (0,∞], r ∈ (0, R), ϕ ∈ (0, 2π), ϑ1, . . . , ϑd−2 ∈ (0, π) that if d = 2 thenTR(r, ϕ) = r(cos(ϕ), sin(ϕ)) and if d ∈ 3, 4, . . . then

TR(r, ϕ, ϑ1, . . . , ϑd−2) = r(

cos(ϕ)[

∏d−2i=1 sin(ϑi)

]

, sin(ϕ)[

∏d−2i=1 sin(ϑi)

]

,

cos(ϑ1)[

∏d−2i=2 sin(ϑi)

]

, . . . , cos(ϑd−3) sin(ϑd−2), cos(ϑd−2))

.(59)

Observe that

(1− d)p

p− 1+ d =

(1− d)p+ d(p− 1)

p− 1=p− d

p− 1. (60)

Next note that items (i)–(iii) in Lemma 2.6 and the fact that for all r ∈ (0, ρ),φ ∈ S it holds that ‖Tρ(r, φ)‖ = r show that∫

‖x‖(1−d) pp−1 dx =

∫ ρ

0

S

r(1−d) pp−1

∣det(

(Tρ)′(r, φ)

)∣

∣ dφ dr

=

∫ ρ

0

S

r(1−d) pp−1 rd−1

∣det(

(T∞)′(1, φ))∣

∣ dφ dr .

(61)

The fact that V ⊆ Bρ, Fubini’s theorem, and (60) hence demonstrate that∫

V

‖x‖(1−d) pp−1 dx ≤

‖x‖(1−d) pp−1 dx

=

S

(∫ ρ

0

r(1−d) pp−1 rd−1 dr

)

∣det(

(T∞)′(1, φ))∣

∣ dφ

= ρp−dp−1

(p− 1)

p− d

S

∣det(

(T∞)′(1, φ))∣

∣ dφ .

(62)

This, Lemma 2.6, and Lemma 2.7 hence prove that∫

V

‖x‖(1−d) pp−1 dx ≤ ρ

p−dp−1

(p− 1)

p− d

S

∣det(

(T∞)′(1, φ))∣

∣ dφ

= ρp−dp−1

(p− 1)

p− d

∫ 2π

0

∫ π

0

· · ·∫ π

0

[

∏d−2i=1 [sin(ϑi)]

i]

dϑ1 · · · dϑd−2 dϕ

= ρp−dp−1

(p− 1)

p− dd2π

d

d−2∏

i=1

∫ π

0

[sin(ϑi)]i dϑi = ρ

p−dp−1

d(p− 1)

p− dλ(B1) .

(63)

The proof of Lemma 2.10 is thus completed.

14

Page 15: and assume for all ε∈ (0,1], d∈ N, x∈ Rd that (Rφε,d) ∈ C(Rd,R), |(Rφε,d)(x)|+k(∇(Rφε,d))(x)k ≤ cdc(1+kxkc), (1) and P(φε,d) ≤ cdcε−c , |ϕd

Lemma 2.11. Let d ∈ 2, 3, . . ., p ∈ (d,∞), let W ⊆ Rd be a non-empty,

open, bounded, and convex set, let Φ ∈ C1(W,R), let ‖·‖ : Rd → [0,∞) be thestandard norm on R

d, let λ : B(Rd) → [0,∞] be the Lebesgue–Borel measure onR

d, and assume that∫

W (|Φ(x)|p + ‖(∇Φ)(x)‖p) dx < ∞. Then it holds for allx ∈ W that

λ(W )Φ(x) −∫

W

Φ(y) dy

≤ 1

d

[

supv,w∈W

‖v − w‖]d ∫

W

‖(∇Φ)(y)‖ ‖x− y‖1−ddy.

(64)

Proof of Lemma 2.11. Throughout this proof let x ∈ W , let ρ ∈ (0,∞) sat-isfy that ρ = supv,w∈W ‖v − w‖, let 〈·, ·〉 : Rd × R

d → R be the d-dimensional

Euclidean scalar product, let S ⊆ Rd−1 be the set given by

S =

(0, 2π) : d = 2

(0, 2π)× (0, π)d−2 : d ∈ 3, 4, . . ., (65)

let (ωv,w)(v,w)∈W 2 ⊆ Rd satisfy for all v, w ∈ W that

ωv,w =

w−v‖w−v‖ : v 6= w

0 : v = w,(66)

let Br ⊆ Rd, r ∈ (0,∞), satisfy for all r ∈ (0,∞) that Br = y ∈ R

d : ‖y−x‖ <r, let E : Rd → R

d satisfy for all y ∈ Rd that

E(y) =

(∇Φ)(y) : y ∈W

0 : y ∈ Rd\W, (67)

and let TR : (0, R)× S → Rd, R ∈ (0,∞], satisfy for all R ∈ (0,∞], r ∈ (0, R),

ϕ ∈ (0, 2π), ϑ1, . . . , ϑd−2 ∈ (0, π) that if d = 2 then TR(r, ϕ) = r(cos(ϕ), sin(ϕ))and if d ≥ 3 then

TR(r, ϕ, ϑ1, . . . , ϑd−2) = r(

cos(ϕ)[

∏d−2i=1 sin(ϑi)

]

, sin(ϕ)[

∏d−2i=1 sin(ϑi)

]

,

cos(ϑ1)[

∏d−2i=2 sin(ϑi)

]

, . . . , cos(ϑd−3) sin(ϑd−2), cos(ϑd−2))

.(68)

Observe that the hypothesis that∫

W(|Φ(y)|p + ‖(∇Φ)(y)‖p) dy < ∞, the hy-

pothesis that W is bounded, and Holder’s inequality ensure that

W

|Φ(y)|+ ‖(∇Φ)(y)‖ dy

≤ [λ(W )](p−1)/p

(

[∫

W

|Φ(y)|p dy]1/p

+

[∫

W

‖(∇Φ)(y)‖p dy]1/p)

<∞.

(69)

15

Page 16: and assume for all ε∈ (0,1], d∈ N, x∈ Rd that (Rφε,d) ∈ C(Rd,R), |(Rφε,d)(x)|+k(∇(Rφε,d))(x)k ≤ cdc(1+kxkc), (1) and P(φε,d) ≤ cdcε−c , |ϕd

Next note that the assumption that W is convex and the fundamental theoremof calculus demonstrate that for all y ∈ W it holds that

Φ(x) − Φ(y) = −(Φ(x+ rωx,y))|r=‖y−x‖r=0 = −

∫ ‖y−x‖

0

d

drΦ(x + rωx,y) dr. (70)

The fact that λ(W ) < ∞, (69), the Cauchy-Schwarz inequality, and the factthat for all y ∈W \ x it holds that ‖ωx,y‖ = 1 hence prove that

λ(W )Φ(x) −∫

W

Φ(y) dy

=

W

∫ ‖y−x‖

0

d

drΦ(x+ rωx,y) dr dy

=

W

∫ ‖y−x‖

0

〈(∇Φ)(x + rωx,y), ωx,y〉 dr dy∣

≤∫

W

∫ ‖y−x‖

0

|〈(∇Φ)(x + rωx,y), ωx,y〉| dr dy

≤∫

W

∫ ‖y−x‖

0

‖(∇Φ)(x + rωx,y)‖ ‖ωx,y‖ dr dy

=

W

∫ ‖y−x‖

0

‖(∇Φ)(x + rωx,y)‖ dr dy.

(71)

The fact that W ⊆ Bρ and Fubini’s theorem therefore show that

λ(W )Φ(x) −∫

W

Φ(y) dy

≤∫ ∞

0

‖E(x+ rωx,y)‖ dy dr. (72)

Next observe that the integral transformation theorem with the diffeomorphismv ∈ R

d : ‖v‖ < ρ ∋ y 7→ y + x ∈ Bρ, items (i)–(iii) in Lemma 2.6, the factthat for all r ∈ (0, ρ), φ ∈ S it holds that ‖Tρ(r, φ)‖ = r, and (68) imply thatfor all r ∈ (0, ρ) it holds that

‖E(x+ rωx,y)‖ dy =

v∈Rd : ‖v‖<ρ‖E(x+ rω0,y)‖ dy

=

∫ ρ

0

S

∥E(

x+ rTρ(s,φ)

‖Tρ(s,φ)‖

)∥

∣det(

(Tρ)′(s, φ)

)∣

∣ dφ ds

=

∫ ρ

0

S

‖E(x + T∞(r, φ))‖∣

∣det(

(T∞)′(s, φ))∣

∣ dφ ds .

(73)

16

Page 17: and assume for all ε∈ (0,1], d∈ N, x∈ Rd that (Rφε,d) ∈ C(Rd,R), |(Rφε,d)(x)|+k(∇(Rφε,d))(x)k ≤ cdc(1+kxkc), (1) and P(φε,d) ≤ cdcε−c , |ϕd

Items (i)–(ii) in Lemma 2.6, Fubini’s theorem, and (72) therefore prove that

λ(W )Φ(x) −∫

W

Φ(y) dy

≤∫ ∞

0

∫ ρ

0

S

‖E(x+ T∞(r, φ))‖∣

∣det(

(T∞)′(s, φ))∣

∣ dφ ds dr

=

∫ ∞

0

S

∫ ρ

0

‖E(x+ T∞(r, φ))‖∣

∣det(

(T∞)′(s, φ))∣

∣ ds dφ dr

=

∫ ∞

0

S

∫ ρ

0

‖E(x+ T∞(r, φ))‖∣

∣det(

(T∞)′(1, φ))∣

∣ sd−1 ds dφ dr

=ρd

d

∫ ∞

0

S

‖E(x+ T∞(r, φ))‖∣

∣det(

(T∞)′(1, φ))∣

∣ dφ dr .

(74)

Combining this, (68), items (i)-(iii) in Lemma 2.6, the fact that for all r ∈ (0,∞),φ ∈ S it holds that ‖T∞(r, φ)‖ = r, and (67) demonstrates that

λ(W )Φ(x) −∫

W

Φ(y) dy

≤ ρd

d

∫ ∞

0

S

‖E(x+ T∞(r, φ))‖∣

∣det(

(T∞)′(1, φ))∣

∣ r1−drd−1 dφ dr

=ρd

d

∫ ∞

0

S

‖E(x+ T∞(r, φ))‖ ‖T∞(r, φ)‖1−d∣

∣det(

(T∞)′(r, φ))∣

∣ dφ dr

=ρd

d

Rd

‖E(x+ y)‖ ‖y‖1−d dy

=ρd

d

Rd

‖E(y)‖ ‖x− y‖1−d dy =ρd

d

W

‖(∇Φ)(y)‖ ‖x− y‖1−d dy.

(75)

The proof of Lemma 2.11 is thus completed.

Lemma 2.12. Let d ∈ 2, 3, . . ., p ∈ (d,∞), let λ : B(Rd) → [0,∞] be theLebesgue–Borel measure on R

d, let W ⊆ Rd be an open, bounded, and convex

set with λ(W ) > 0, let Φ ∈ C1(W,R), let ‖·‖ : Rd → [0,∞) be the standardnorm on R

d, and assume that∫

W (|Φ(x)|p + ‖(∇Φ)(x)‖p) dx <∞. Then

supx∈W

Φ(x)− 1

λ(W )

W

Φ(y) dy

≤[

supv,w∈W ‖v − w‖]d

λ(W )d

[

∪x∈W x−y : y∈W‖z‖(1−d) p

p−1 dz

](p−1)/p

·[∫

W

‖(∇Φ)(y)‖p dy]1/p

<∞.

(76)

Proof of Lemma 2.12. Throughout this proof let ρ ∈ [0,∞) satisfy that ρ =supv,w∈W ‖v − w‖, let Wx ⊆ R

d, x ∈ W , satisfy for all x ∈ W that Wx =

17

Page 18: and assume for all ε∈ (0,1], d∈ N, x∈ Rd that (Rφε,d) ∈ C(Rd,R), |(Rφε,d)(x)|+k(∇(Rφε,d))(x)k ≤ cdc(1+kxkc), (1) and P(φε,d) ≤ cdcε−c , |ϕd

x− y : y ∈ W, let V ⊆ Rd be the set given by V = ∪x∈WWx, let ψ : Rd → R

satisfy for all x ∈ Rd that

ψ(x) =

‖x‖1−d : x ∈ V \00 : x ∈ (Rd\V ) ∪ 0, (77)

and let E : Rd → Rd satisfy for all x ∈ R

d that

E(x) =

(∇Φ)(x) : x ∈ W

0 : x ∈ Rd\W. (78)

Observe that the hypothesis that∫

W (|Φ(x)|p + ‖(∇Φ)(x)‖p) dx < ∞ ensuresthat

Rd

‖E(x)‖p dx =

W

‖(∇Φ)(x)‖p dx <∞. (79)

Moreover, the assumption that λ(W ) > 0, Lemma 2.11, and the integral trans-formation theorem prove that for all x ∈ W it holds that

Φ(x)− 1

λ(W )

W

Φ(y) dy

≤ ρd

λ(W )d

W

‖(∇Φ)(y)‖ ‖x− y‖1−ddy

=ρd

λ(W )d

Wx

‖(∇Φ)(x − y)‖ ‖y‖1−d dy

≤ ρd

λ(W )d

V

‖E(x− y)‖ ‖y‖1−ddy =

ρd

λ(W )d

Rd

‖E(x− y)‖ψ(y) dy .

(80)

Lemma 2.10, (79), and Holder’s inequality therefore demonstrate that for allx ∈ W it holds that

Φ(x) − 1

λ(W )

W

Φ(y) dy

≤ ρd

λ(W )d

[∫

Rd

‖E(x− y)‖p dy]1/p [∫

Rd

ψ(y)p/(p−1) dy

](p−1)/p

=ρd

λ(W )d

[∫

W

‖(∇Φ)(y)‖p dy]1/p [∫

V

‖y‖(1−d) pp−1 dy

](p−1)/p

<∞ .

(81)

The proof of Lemma 2.12 is thus completed.

Proposition 2.13. Let d ∈ 2, 3, . . ., p ∈ (d,∞), let W ⊆ Rd be an open,

bounded, and convex set, let Φ ∈ C1(W,R), let ‖·‖ : Rd → [0,∞) be the standardnorm on R

d, let λ : B(Rd) → [0,∞] be the Lebesgue–Borel measure on Rd, let

I be a finite and non-empty set, let Wi ⊆ Rd, i ∈ I, be open and convex sets,

assume for all i ∈ I, j ∈ I\i that λ(Wi) > 0, Wi∩Wj = ∅, and W = ∪i∈IWi,

18

Page 19: and assume for all ε∈ (0,1], d∈ N, x∈ Rd that (Rφε,d) ∈ C(Rd,R), |(Rφε,d)(x)|+k(∇(Rφε,d))(x)k ≤ cdc(1+kxkc), (1) and P(φε,d) ≤ cdcε−c , |ϕd

assume that∫

W(|Φ(x)|p+‖(∇Φ)(x)‖p) dx <∞, and let (Di)i∈I ⊆ [0,∞) satisfy

for all i ∈ I that

Di =

[

supv,w∈Wi‖v − w‖

]d

λ(Wi)d

[

∪x∈Wix−y : y∈Wi

‖z‖(1−d)p

p−1 dz

](p−1)/p

. (82)

Then

supx∈W

|Φ(x)| ≤ 2(p−1)/p max

maxi∈I

(

[λ(Wi)]−1/p

)

,maxi∈I

(Di)

·[∫

W

(|Φ(x)|p + ‖(∇Φ)(x)‖p) dx]1/p

.

(83)

Proof of Proposition 2.13. Note that the hypothesis that W ⊆ Rd is open and

convex and [55, Theorem 6.3] imply that for all i ∈ I it holds that Wi ⊆ W .Next observe that the hypothesis that

W|Φ(x)|p + ‖(∇Φ)(x)‖p dx < ∞, the

hypothesis that for all i ∈ I it holds thatWi is bounded, and Holder’s inequalitydemonstrate that for all i ∈ I it holds that

1

λ(Wi)

Wi

|Φ(y)| dy =

Wi

1

λ(Wi)Φ(y)

dy

≤[∫

Wi

[λ(Wi)]−p/(p−1) dy

](p−1)/p [∫

Wi

|Φ(y)|p dy]1/p

≤ [λ(Wi)]−1/p

[∫

Wi

|Φ(y)|p dy]1/p

<∞.

(84)

The hypothesis that W = ∪i∈IWi, the triangle inequality, and Lemma 2.12(with W =Wi for i ∈ I in the notation of Lemma 2.12) hence show that

supx∈W

|Φ(x)| = maxi∈I

(

supx∈Wi

|Φ(x)|)

≤ maxi∈I

(

supx∈Wi

Φ(x) − 1

λ(Wi)

Wi

Φ(y) dy

+

1

λ(Wi)

Wi

Φ(y) dy

)

≤ maxi∈I

(

Di

[∫

Wi

‖(∇Φ)(y)‖p dy]1/p

+ [λ(Wi)]−1/p

[∫

Wi

|Φ(y)|p dy]1/p)

≤ max

maxi∈I

(

[λ(Wi)]−1/p

)

,maxi∈I

(Di)

·maxi∈I

(

[∫

Wi

|Φ(y)|p dy]1/p

+

[∫

Wi

‖(∇Φ)(y)‖p dy]1/p)

.

(85)

Next note that for all (xi)i∈I ⊆ R it holds that

maxi∈I

|xi| ≤[

i∈I

|xi|p]1/p

. (86)

19

Page 20: and assume for all ε∈ (0,1], d∈ N, x∈ Rd that (Rφε,d) ∈ C(Rd,R), |(Rφε,d)(x)|+k(∇(Rφε,d))(x)k ≤ cdc(1+kxkc), (1) and P(φε,d) ≤ cdcε−c , |ϕd

Combining this, the hypothesis that for all i ∈ I, j ∈ I\i it holds thatWi ∩Wj = ∅ and W = ∪i∈IWi, and the fact that for all a, b ∈ [0,∞) it holdsthat (a+ b)p ≤ 2p−1(ap + bp) with (85) demonstrates that

supx∈W

|Φ(x)| ≤ max

maxi∈I

(

[λ(Wi)]−1/p

)

,maxi∈I

(Di)

·[

i∈I

[∫

Wi

|Φ(y)|p dy]1/p

+

[∫

Wi

‖(∇Φ)(y)‖p dy]1/p∣

p]1/p

≤ 2p−1p max

maxi∈I

(

[λ(Wi)]−1/p

)

,maxi∈I

(Di)

·[

i∈I

Wi

(|Φ(y)|p + ‖(∇Φ)(y)‖p) dy]1/p

= 2p−1p max

maxi∈I

(

[λ(Wi)]−1/p

)

,maxi∈I

(Di)

·[∫

W

(|Φ(y)|p + ‖(∇Φ)(y)‖p) dy]1/p

.

(87)

The proof of Proposition 2.13 is thus completed.

Corollary 2.14. Let d ∈ 2, 3, . . ., Φ ∈ C1((0, 1)d,R), let ‖·‖ : Rd → [0,∞) be

the standard norm on Rd, and assume that

(0,1)d(|Φ(x)|d2

+‖(∇Φ)(x)‖d2

) dx <

∞. Then

supx∈(0,1)d

|Φ(x)| ≤ 8√e

[

(0,1)d

(

|Φ(x)|d2

+ ‖(∇Φ)(x)‖d2)

dx

]1/d2

. (88)

Proof of Corollary 2.14. Throughout this proof let λ : B(Rd) → [0,∞] be theLebesgue–Borel measure on R

d, let m ∈ N satisfy that

m = min

(

N ∩[

d

2 ln(2)(ln(2) + ln(π) + 1),∞

))

, (89)

let B ⊆ Rd be the set given by B = x ∈ R

d : ‖x‖ < 1, let I be the set givenby I = 1, . . . , 2md, let Wi, Vi ⊆ R

d, i ∈ I, be the sets which satisfy for alli = (i1, . . . , id) ∈ I that

Wi = (×dj=1((ij − 1)2−m, ij2

−m)) and Vi = ∪x∈Wix− y : y ∈Wi, (90)

and let (Di)i∈I , (ρi)i∈I ⊆ (0,∞) satisfy for all i ∈ I that

ρi = supv,w∈Wi

‖v − w‖ and Di =ρdi

λ(Wi)d

[∫

Vi

‖x‖(1−d)d2

d2−1 dx

]

d2−1d2

. (91)

20

Page 21: and assume for all ε∈ (0,1], d∈ N, x∈ Rd that (Rφε,d) ∈ C(Rd,R), |(Rφε,d)(x)|+k(∇(Rφε,d))(x)k ≤ cdc(1+kxkc), (1) and P(φε,d) ≤ cdcε−c , |ϕd

Observe that (89) ensures that

[2πe]d/2 = e

d2 [ln(2)+ln(π)+1] = 2

d2 ln(2)

[ln(2)+ln(π)+1] ≤ 2m. (92)

Hence, we obtain that[2πe]

d/22−m ≤ 1. (93)

Next note that (90) shows that for all i ∈ I it holds that

λ(Wi) = 2−dm and ρi =√d2−m. (94)

Moreover, note that (89) implies that

d

2 ln(2)(ln(2) + ln(π) + 1) ≤ m ≤ d

2 ln(2)(ln(2) + ln(π) + 1) + 1. (95)

This, (94), and (92) hence show that

maxi∈I

(

[λ(Wi)]−1/d2

)

= 2m/d ≤ 2

ln(2)+ln(π)+12 ln(2)

+ 1d = 2

1d

√2πe. (96)

Next note that Lemma 2.10 (with p = d2, W = Wi for i ∈ I in the notation ofLemma 2.10) assures that for all i ∈ I it holds that

[∫

Vi

‖x‖(1−d) d2

d2−1 dx

]d2−1

d2

≤ ρd−1d

i

(

d(d2 − 1)

d2 − dλ(B)

)

d2−1

d2

= ρd−1d

i ((d+ 1)λ(B))d2−1

d2 .

(97)

Corollary 2.8, the fact that (d + 1) 1√dπ

(2πe)d2 ≥ 1, the fact that (d + 1) ≤ 2d,

the fact that d2−1d2 ≤ 1, and (94) hence prove that for all i ∈ I it holds that

[∫

Vi

‖x‖(1−d) d2

d2−1 dx

]d2−1

d2

≤ (√d2−m)

d−1d

(

(d+ 1)1√dπ

(2πe)d2

)d2−1

d2

d−d2

d2−1

d2

≤ (√d2−m)

d−1d 2d

1√dπ

(2πe)d2 d−

d2

d2−1

d2

= 21+md−md1+

d−12d

1√dπ

(2πe)d2 d−

d2

d2−1

d2

= 21+md−md1+

12− 1

2d+12d− d

21√dπ

(2πe)d2

= 21+md−md1−

d2

1√π(2πe)

d2 =

21+md−md1−

d2 (2πe)

d/2

√π

.

(98)

21

Page 22: and assume for all ε∈ (0,1], d∈ N, x∈ Rd that (Rφε,d) ∈ C(Rd,R), |(Rφε,d)(x)|+k(∇(Rφε,d))(x)k ≤ cdc(1+kxkc), (1) and P(φε,d) ≤ cdcε−c , |ϕd

This, (95), (92), and (93) therefore demonstrate that for all i ∈ I it holds that

[∫

Vi

‖x‖(1−d) d2

d2−1 dx

]d2−1

d2

≤ 21+md d1−

d2√

π

≤ 21+ln(2)+ln(π)+1

2 ln(2) + 1d d1−

d2√

π= 21+

1d

√2ed1−

d2 .

(99)

Next note that (94) ensures that for all i ∈ I it holds that

ρdiλ(Wi)d

= dd2−12−dm2dm = d

d2−1. (100)

Combining this with (99) demonstrates that

maxi∈I

(Di) = maxi∈I

ρdiλ(Wi)d

[∫

Vi

‖x‖(1−d) d2

d2−1 dx

]d2−1

d2

≤ 21+1d

√2e. (101)

Combining this and (96) with the hypothesis that d ∈ 2, 3, . . . demonstratesthat

max

maxi∈I

(

[λ(Wi)]− 1

d2

)

,maxi∈I

(Di)

= max

21d

√2πe, 21+

1d

√2e

= 21+1d

√2e ≤ 2

√2√2e = 4

√e.

(102)

Next note that (90) ensures that for all i ∈ I, j ∈ I\i it holds that

Wi ∩Wj = ∅ and [0, 1]d = ∪i∈IWi. (103)

This, (102), (94), Proposition 2.13 (with p = d2, I = I, W = (0, 1)d, Wi = Wi

for i ∈ I in the notation of Proposition 2.13), and the hypothesis that d ∈2, 3, . . . hence imply that

supx∈(0,1)d

|Φ(x)| ≤ 21−1d2 max

maxi∈I

(

[λ(Wi)]− 1

d2

)

,maxi∈I

(Di)

·[

(0,1)d

(

|Φ(x)|d2

+ ‖(∇Φ)(x)‖d2)

dx

]1/d2

≤ 8√e

[

(0,1)d

(

|Φ(x)|d2

+ ‖(∇Φ)(x)‖d2)

dx

]1/d2

.

(104)

The proof of Corollary 2.14 is thus completed.

22

Page 23: and assume for all ε∈ (0,1], d∈ N, x∈ Rd that (Rφε,d) ∈ C(Rd,R), |(Rφε,d)(x)|+k(∇(Rφε,d))(x)k ≤ cdc(1+kxkc), (1) and P(φε,d) ≤ cdcε−c , |ϕd

Corollary 2.15. Let d ∈ N, Φ ∈ C1((0, 1)d,R), let ‖·‖ : Rd → [0,∞) be the

standard norm on Rd, and assume that

(0,1)d(|Φ(x)|max2,d2+‖(∇Φ)(x)‖max2,d2) dx <

∞. Then

supx∈(0,1)d

|Φ(x)| ≤ 8√e

[

(0,1)d

(

|Φ(x)|max2,d2+ ‖(∇Φ)(x)‖max2,d2

)

dx

]1/max2,d2

.

(105)

Proof of Corollary 2.15. To establish (105) we distinguish between the case d =1 and the case d ∈ 2, 3, . . .. First, we consider the case d = 1. Note thatLemma 2.9 ensures that

supx∈(0,1)

|Φ(x)| ≤∫ 1

0

(|Φ(x)|+ |Φ′(x)|) dx

≤ 8√e

[∫ 1

0

(

|Φ(x)|2 + |Φ′(x)|2)

dx

]1/2

.

(106)

This establishes (105) in the case d = 1. Next we consider the case d ∈ 2, 3, . . ..Note that Corollary 2.14 shows that

supx∈(0,1)d

|Φ(x)| ≤ 8√e

[

(0,1)d

(

|Φ(x)|d2

+ ‖(∇Φ)(x)‖d2)

dx

]1/d2

. (107)

This establishes (105) in the case d ∈ 2, 3, . . .. The proof of Corollary 2.15 isthus completed.

2.4 Sobolev type estimates for Monte Carlo approxima-tions

In this subsection we provide in Lemma 2.16 below a Sobolev type estimate forMonte Carlo approximations. Lemma 2.16 is one of the main ingredients in ourproof of Lemma 4.1 in Subsection 4.1 below.

Lemma 2.16. Let d, n ∈ N, ζ, a ∈ R, b ∈ (a,∞), p ∈ [1,∞), let ‖·‖ : Rd →[0,∞) be the standard norm on R

d, let K ∈ (0,∞) be the max2, p-Kahane–Khintchine constant (cf. Definition 2.1 and Lemma 2.2), let (Ω,F ,P) be a prob-ability space, let ξi : R

d×Ω → R, i ∈ 1, . . . , n, be i.i.d. random fields satisfyingfor all i ∈ 1, . . . , n, ω ∈ Ω that ξi(·, ω) ∈ C1(Rd,R), let ξ : Rd ×Ω → R be therandom field satisfying for all x ∈ R

d, ω ∈ Ω that ξ(x, ω) = ξ1(x, ω), assume forall Φ ∈ C1((0, 1)d,R) with

(0,1)d(|Φ(x)|max2,p + ‖(∇Φ)(x)‖max2,p) dx < ∞

that

supx∈(0,1)d

|Φ(x)| ≤ ζ

[

(0,1)d

(

|Φ(x)|max2,p+ ‖(∇Φ)(x)‖max2,p

)

dx

]1/max2,p

,

(108)

23

Page 24: and assume for all ε∈ (0,1], d∈ N, x∈ Rd that (Rφε,d) ∈ C(Rd,R), |(Rφε,d)(x)|+k(∇(Rφε,d))(x)k ≤ cdc(1+kxkc), (1) and P(φε,d) ≤ cdcε−c , |ϕd

and assume that for all x ∈ [a, b]d it holds that

infδ∈(0,∞)

supv∈[−δ,δ]d

E

[

|ξ(x + v)|1+δ + ‖(∇ξ)(x + v)‖1+δ

]

<∞. (109)

Then

(i) it holds that

supx∈[a,b]d

E[

ξ(x)]

− 1

n

(

n∑

i=1

ξi(x)

)∣

p

(110)

is a random variable and

(ii) it holds that

(

E

[

supx∈[a,b]d

E[

ξ(x)]

− 1

n

(

n∑

i=1

ξi(x)

)∣

p ])1/p

≤ 4Kζ√n

(

supx∈[a,b]d

[

(

E[

|ξ(x)|max2,p])1/max2,p

+ (b− a)

E

[

‖(∇ξ)(x)‖max2,p]∣

1/max2,p ])

.

(111)

Proof of Lemma 2.16. Throughout this proof let 〈·, ·〉 : Rd × Rd → R be the d-

dimensional Euclidean scalar product, let q ∈ [2,∞) satisfy that q = max2, p,let ρ : Rd → R

d satisfy for all x = (x1, . . . , xd) ∈ Rd that

ρ(x) =(

(b − a)x1 + a, (b− a)x2 + a, . . . , (b− a)xd + a)

, (112)

let e1, . . . , ed ∈ Rd satisfy that e1 = (1, 0, . . . , 0), . . . , ed = (0, . . . , 0, 1), let

Y : [0, 1]d × Ω → R be the random field which satisfies for all x ∈ [0, 1]d that

Y (x) = E[

ξ(ρ(x))]

− 1

n

(

n∑

i=1

ξi(ρ(x))

)

, (113)

let Z : [0, 1]d×Ω → Rd be the random field which satisfies for all x ∈ [0, 1]d that

Z(x) = (b− a)

[

E [(∇ξ)(ρ(x))] − 1

n

(

n∑

i=1

(∇ξi)(ρ(x)))]

, (114)

and let E : Ω → [0,∞) be the random variable given by

E = supx∈[a,b]d∩Qd

E[

ξ(x)]

− 1

n

(

n∑

i=1

ξi(x)

)∣

. (115)

24

Page 25: and assume for all ε∈ (0,1], d∈ N, x∈ Rd that (Rφε,d) ∈ C(Rd,R), |(Rφε,d)(x)|+k(∇(Rφε,d))(x)k ≤ cdc(1+kxkc), (1) and P(φε,d) ≤ cdcε−c , |ϕd

Note that (112) ensures that ρ([0, 1]d) = [a, b]d. Furthermore, note that (109)ensures that for all x ∈ [a, b]d there exists δx ∈ (0,∞) such that

supv∈[−δx,δx]d

E

[

‖(∇ξ)(x + v)‖1+δx]

<∞. (116)

Holder’s inequality therefore shows that for all x ∈ [a, b]d there exists δx ∈ (0,∞)which satisfies that

supv∈[−δx,δx]d

E [‖(∇ξ)(x + v)‖]

≤ supv∈[−δx,δx]d

(

E

[

‖(∇ξ)(x + v)‖1+δx])1/(1+δx)

<∞.(117)

Next note that the collection Sx = y ∈ [a, b]d : y − x ∈ (−δx, δx)d, x ∈ [a, b]d,is an open cover of [a, b]d. The fact that [a, b]d is compact hence ensures thatthere exists N ∈ N and xk ∈ [a, b]d, k ∈ 1, . . . , N which satisfies that thecollection Sxk

, k ∈ 1, . . . , N is a finite open cover of [a, b]d. Combining thiswith (117) demonstrates that

supx∈[a,b]d

E [‖(∇ξ)(x)‖]

≤ maxk∈1,...,N

supv∈(−δxk

,δxk)dE [‖(∇ξ)(xk + v)‖] <∞ .

(118)

Moreover, note that the fact that for all ω ∈ Ω it holds that the functions [a, b]d ∋x 7→ ξ(x, ω) ∈ R and [a, b]d ∋ x 7→ (∇ξ)(x, ω) ∈ R

d are continuous ensures that[a, b]d×Ω ∋ (x, ω) 7→ ξ(x, ω) ∈ R and [a, b]d×Ω ∋ (x, ω) 7→ (∇ξ)(x, ω) ∈ R

d areCaratheodory functions. This implies that [a, b]d × Ω ∋ (x, ω) 7→ ξ(x, ω) ∈ R is(B([a, b]d)⊗ F)/B(R)-measurable and [a, b]d × Ω ∋ (x, ω) 7→ (∇ξ)(x, ω) ∈ R

d is(B([a, b]d) ⊗ F)/B(Rd)-measurable, see, e.g., Aliprantis and Border [1, Lemma4.51]). Next note that the fundamental theorem of calculus ensures that for allx, y ∈ R

d it holds that

ξ(x) − ξ(y) =

∫ 1

0

〈(∇ξ) (y + t(x− y)) , x− y〉 dt. (119)

Hence, we obtain that for all x, y ∈ [a, b]d it holds that

|ξ(x) − ξ(y)| ≤ ‖x− y‖∫ 1

0

‖(∇ξ) (y + t(x− y))‖ dt. (120)

Combining this with Fubini’s theorem shows that for all x, y ∈ [a, b]d it holds

25

Page 26: and assume for all ε∈ (0,1], d∈ N, x∈ Rd that (Rφε,d) ∈ C(Rd,R), |(Rφε,d)(x)|+k(∇(Rφε,d))(x)k ≤ cdc(1+kxkc), (1) and P(φε,d) ≤ cdcε−c , |ϕd

that∣

∣E[ξ(x)] − E[ξ(y)]∣

∣ ≤ E[

|ξ(x) − ξ(y)|]

≤ ‖x− y‖E[∫ 1

0

‖(∇ξ) (y + t(x− y))‖ dt]

= ‖x− y‖∫ 1

0

E [‖(∇ξ) (y + t(x− y))‖ ] dt

≤ ‖x− y‖ supv∈[a,b]d

E [‖(∇ξ) (v)‖] .

(121)

This and (118) prove that [a, b]d ∋ x 7→ E[ξ(x)] ∈ R is a Lipschitz continuousfunction. Hence, we obtain for all ω ∈ Ω that [0, 1]d ∋ x 7→ Y (x, ω) ∈ R is acontinuous function. Combining this with (115) implies that

E = supx∈[0,1]d

|Y (x)| . (122)

This establishes item (i). Next note that (112) implies that for all j ∈ 1, . . . , d,x ∈ [0, 1]d, h ∈ R it holds that

ρ(x + hej)− ρ(x) = (b − a)hej . (123)

This and (119) show that for all j ∈ 1, . . . , d, x ∈ [0, 1]d, h ∈ R \ 0 it holdsthat

ξ(ρ(x + hej))− ξ(ρ(x))

h= (b− a)

∫ 1

0

(∇ξ)(

ρ(x) + t(b− a)hej)

, ej⟩

dt . (124)

Moreover, note that (109) implies that for all x ∈ [0, 1]d there exists δx ∈ (0,∞)such that

supv∈[−δx,δx]d

E

[

‖(∇ξ)(ρ(x) + v)‖1+δx]

<∞. (125)

This, Holder’s inequality, and Fubini’s theorem show that for all j ∈ 1, . . . , d,x ∈ [0, 1]d there exists δx ∈ (0,∞) such that for all h ∈ h′ ∈ R : |(b−a)h′| < δxit holds that

E

[

∫ 1

0

(∇ξ)(

ρ(x) + t(b− a)hej)

dt

1+δx]

≤ E

[∫ 1

0

∥(∇ξ)(

ρ(x) + t(b− a)hej)∥

1+δxdt

]

=

∫ 1

0

E

[

∥(∇ξ)(

ρ(x) + t(b − a)hej)∥

1+δx]

dt

≤ supv∈[−δx,δx]d

E

[

∥(∇ξ)(

ρ(x) + v)∥

1+δx]

<∞ .

(126)

26

Page 27: and assume for all ε∈ (0,1], d∈ N, x∈ Rd that (Rφε,d) ∈ C(Rd,R), |(Rφε,d)(x)|+k(∇(Rφε,d))(x)k ≤ cdc(1+kxkc), (1) and P(φε,d) ≤ cdcε−c , |ϕd

This and (124) show that for all j ∈ 1, . . . , d, x ∈ [0, 1]d there exists δx ∈(0,∞) such that for all h ∈ h′ ∈ R \ 0 : |(b − a)h′| < δx it holds that

E

[

ξ(ρ(x + hej))− ξ(ρ(x))

h

1+δx]

≤ (b− a)1+δxE

[

∫ 1

0

(∇ξ)(

ρ(x) + t(b − a)hej)

dt

1+δx]

≤ (b− a)1+δx supv∈[−δx,δx]d

E

[

∥(∇ξ)(

ρ(x) + v)∥

1+δx]

<∞.

(127)

This, the theorem of de la Vallee–Poussin (see, e.g., [45, Theorem 6.19]), andVitali’s convergence theorem (see, e.g., [45, Theorem 6.25]) show that for allx ∈ [0, 1]d, j ∈ 1, . . . , d there exists δx ∈ (0,∞) such that for all (hm)m∈N ⊆h′ ∈ R \ 0 : |(b− a)h′| < δx with limm→∞ hm = 0 it holds that

limm→∞

E

[

ξ(ρ(x+ hmej))− ξ(ρ(x))

hm

]

= E

[

limm→∞

ξ(ρ(x+ hmej))− ξ(ρ(x))

hm

]

.

(128)

Therefore, we obtain that for all x ∈ [0, 1]d, j ∈ 1, . . . , d there exists δx ∈(0,∞) such that for all (hm)m∈N ⊆ h′ ∈ R \ 0 : |(b − a)h′| < δx withlimm→∞ hm = 0 it holds that

limm→∞

E

[

ξ(ρ(x + hmej))− ξ(ρ(x))

hm

]

= (b− a)E[

〈(∇ξ)(ρ(x)), ej〉]

. (129)

Furthermore, the theorem of de la Vallee–Poussin, Vitali’s convergence theorem,and (125) prove that for all x ∈ [0, 1]d, j ∈ 1, . . . , d it holds that

lim supRd\0∋h→0

|E[〈(∇ξ)(ρ(x) + h), ej〉]− E[〈(∇ξ)(ρ(x)), ej 〉]| = 0. (130)

This and (129) imply that for all ω ∈ Ω, x ∈ (0, 1)d it holds that ((0, 1)d ∋y 7→ Y (y, ω) ∈ R) ∈ C1((0, 1)d,R) and (∇Y )(x, ω) = Z(x, ω). Combining this,(108), and (122) demonstrates that

E = supx∈(0,1)d

|Y (x)| ≤ ζ

[

(0,1)d(|Y (x)|q + ‖Z(x)‖q) dx

]1/q

. (131)

Next observe that (108) ensures that ζ ∈ [0,∞). Holder’s inequality, (131), and

27

Page 28: and assume for all ε∈ (0,1], d∈ N, x∈ Rd that (Rφε,d) ∈ C(Rd,R), |(Rφε,d)(x)|+k(∇(Rφε,d))(x)k ≤ cdc(1+kxkc), (1) and P(φε,d) ≤ cdcε−c , |ϕd

Fubini’s theorem hence show that

(

E[

|E|p])1/p ≤

(

E[

|E|q])1/q ≤ ζ

(

E

[

(0,1)d|Y (x)|q + ‖Z(x)‖q dx

])1/q

= ζ

[

(0,1)dE[|Y (x)|q + ‖Z(x)‖q] dx

]1/q

≤ ζ

[

supx∈[0,1]d

E[|Y (x)|q + ‖Z(x)‖q]]1/q

.

(132)

Next note that (109) and Proposition 2.3 prove that for all x ∈ [0, 1]d it holdsthat

(

E[

|Y (x)|q])1/q ≤ 2K√

n

(

E[

|ξ(ρ(x)) − E[ξ(ρ(x))]|q])1/q

(133)

and

(

E[

‖Z(x)‖q])1/q ≤ 2K (b− a)√

n

(

E

[∥

∥(∇ξ)(ρ(x)) − E

[

(∇ξ)(ρ(x))]∥

q])1/q

.

(134)

This and (132) imply that

(

E[

|E|p])1/p ≤2Kζ√

n

[

supx∈[0,1]d

(

E

[

∣ξ(ρ(x)) − E[

ξ(ρ(x))]∣

q

+ (b− a)q∥

∥(∇ξ)(ρ(x)) − E[

(∇ξ)(ρ(x))]∥

q])

]1/q

.

(135)

Hence, we obtain that

(

E[

|E|p])1/p ≤2Kζ√

n

[

supx∈[a,b]d

(

E

[

|ξ(x) − E[ξ(x)]|q

+ (b − a)q ‖(∇ξ)(x) − E[(∇ξ)(x)]‖q])

]1/q

.

(136)

The fact that for all r, s ∈ [0,∞) it holds that (r + s)1/q ≤ r1/q + s1/q and the

28

Page 29: and assume for all ε∈ (0,1], d∈ N, x∈ Rd that (Rφε,d) ∈ C(Rd,R), |(Rφε,d)(x)|+k(∇(Rφε,d))(x)k ≤ cdc(1+kxkc), (1) and P(φε,d) ≤ cdcε−c , |ϕd

triangle inequality therefore demonstrate that

(

E[

|E|p])1/p ≤2Kζ√

n

(

supx∈[a,b]d

[

(

E

[

|ξ(x) − E[ξ(x)]|q])1/q

+ (b− a)(

E[

‖(∇ξ)(x) − E[

(∇ξ)(x)]

‖q])1/q

]

)

≤4Kζ√n

(

supx∈[a,b]d

[

(

E[

|ξ(x)|q])1/q

+ (b− a)(

E[

‖(∇ξ)(x)‖q])1/q

]

)

.

(137)

This establishes item (ii). The proof of Lemma 2.16 is thus completed.

3 Stochastic differential equations with affine co-

efficient functions

3.1 A priori estimates for Brownian motions

In this subsection we provide in Lemma 3.1 below essentially well-known a prioriestimates for standard Brownian motions. Lemma 3.1 will be employed in ourproof of Corollary 3.5 in Subsection 3.2 below. Our proof of Lemma 3.1 is aslight adaption of the proof of Lemma 2.5 in Hutzenthaler et al. [39].

Lemma 3.1. Let d,m ∈ N, T ∈ [0,∞), p ∈ (0,∞), A ∈ Rd×m, let ‖·‖ : Rd →

[0,∞) be the standard norm on Rd, let (Ω,F ,P) be a probability space, and let

W : [0, T ] × Ω → Rm be a standard Brownian motion. Then it holds for all

t ∈ [0, T ] that

(

E[

‖AWt‖p])1/p ≤

max1, p− 1Trace(A∗A) t. (138)

Proof of Lemma 3.1. Throughout this proof for every n ∈ N let ‖·‖Rn : Rn

→ [0,∞) be the standard norm on Rn, let (qr)r∈[0,∞) ⊆ N0 satisfy for all

r ∈ [0,∞) that qr = max(N0 ∩ [0, r/2]), let fr : Rm → R, r ∈ [0,∞), satisfy for

all r ∈ [0,∞), x ∈ Rm that

fr(x) = ‖Ax‖rRd , (139)

and let β(i) : [0, T ]× Ω → R, i ∈ 1, . . . ,m, be the stochastic processes whichsatisfy for all i ∈ 1, . . . ,m, t ∈ [0, T ] that

Wt =(

β(1)t , . . . , β

(m)t

)

. (140)

Note that for all r ∈ [2,∞), x ∈ Rm it holds that

(∇fr)(x) = r‖Ax‖r−2Rd A∗Ax. (141)

29

Page 30: and assume for all ε∈ (0,1], d∈ N, x∈ Rd that (Rφε,d) ∈ C(Rd,R), |(Rφε,d)(x)|+k(∇(Rφε,d))(x)k ≤ cdc(1+kxkc), (1) and P(φε,d) ≤ cdcε−c , |ϕd

This implies that for all r ∈ [2,∞), x ∈ Rm it holds that

(Hess fr)(x) = r ‖Ax‖(r−2)

Rd A∗A+ 1x 6=0r(r − 2)‖Ax‖(r−4)

Rd (A∗Ax) (A∗Ax)∗ .(142)

The fact that for allB ∈ Rm×d, x ∈ R

d it holds that ‖Bx‖2Rm ≤ Trace(B∗B)‖x‖2Rd

and Trace(B∗B) = Trace(BB∗) hence shows that for all r ∈ [2,∞), x ∈ Rm it

holds that

Trace((Hess fr)(x))

= Trace(

r ‖Ax‖(r−2)

Rd A∗A+ 1x 6=0 r (r − 2) ‖Ax‖(r−4)

Rd (A∗Ax) (A∗Ax)∗)

= r ‖Ax‖(r−2)

Rd Trace(A∗A) + 1x 6=0 r (r − 2) ‖Ax‖(r−4)

Rd ‖A∗Ax‖2Rm

≤ r ‖Ax‖(r−2)

Rd Trace(A∗A) + 1x 6=0r (r − 2) ‖Ax‖(r−4)

Rd Trace(AA∗) ‖Ax‖2Rd

= r ‖Ax‖(r−2)

Rd Trace(A∗A) + r (r − 2) ‖Ax‖(r−2)

Rd Trace(A∗A)

= r (r − 1)Trace(A∗A) fr−2(x) .

(143)

Moreover, note that the fact that W : [0, T ]× Ω → Rm is a stochastic process

with continuous sample paths ensures thatW : [0, T ]×Ω → Rm is a (B([0, T ])⊗

F)/B(Rm)-measurable function. The fact for all r ∈ [2,∞) it holds that fr ∈C2(Rm,R) hence implies that for all r ∈ [2,∞), i ∈ 1, . . . ,m it holds that

[0, T ]× Ω ∋ (t, ω) 7→ ( ∂∂xi

fr)(Wt(ω)) ∈ R (144)

is a (B([0, T ])⊗F)/B(R)-measurable function. Combining this and (141) demon-strates that for all r ∈ [2,∞), i ∈ 1, . . . ,m it holds that

∫ T

0

E

[

∣( ∂∂xi

fr)(Wt)∣

2]

dt ≤∫ T

0

E

[

‖(∇fr)(Wt)‖2Rm

]

dt

=

∫ T

0

E

[

r2 ‖AWt‖2r−4Rd ‖A∗AWt‖2Rm

]

dt

≤∫ T

0

E

[

r2 ‖A∗‖2L(Rd,Rm) ‖AWt‖2r−2Rd

]

dt

≤∫ T

0

E

[

r2 Trace(A∗A) ‖AWt‖2r−2Rd

]

dt

≤ r2 Trace(A∗A)T

(

supt∈[0,T ]

E

[

‖AWt‖2r−2Rd

]

)

.

(145)

Next note that the fact that for all r ∈ [2,∞) it holds that 2r − 2 ∈ [2,∞)ensures that for all r ∈ [2,∞) it holds that

supt∈[0,T ]

E

[

‖AWt‖2r−2Rd

]

=

(

supt∈[0,T ]

tr−1

)

E

[

‖AW1‖2r−2Rd

]

<∞. (146)

30

Page 31: and assume for all ε∈ (0,1], d∈ N, x∈ Rd that (Rφε,d) ∈ C(Rd,R), |(Rφε,d)(x)|+k(∇(Rφε,d))(x)k ≤ cdc(1+kxkc), (1) and P(φε,d) ≤ cdcε−c , |ϕd

Combining this with (145) demonstrates that for all r ∈ [2,∞), i ∈ 1, . . . ,mit holds that

∫ T

0

E

[

∣( ∂∂xi

fr)(Wt)∣

2]

dt <∞. (147)

This proves that for all r ∈ [2,∞), i ∈ 1, . . . ,m, t ∈ [0, T ] it holds that

E

[

∫ t

0

(

∂∂xi

fr)

(Ws) dβ(i)s

]

= 0. (148)

Ito’s formula, Fubini’s theorem, (139), and (143) hence show that for all r ∈[2,∞), t ∈ [0, T ] it holds that

E[fr(Wt)]

= E

[

fr(W0) +

m∑

i=1

(∫ t

0

(

∂∂xi

fr)

(Ws) dβ(i)s +

1

2

∫ t

0

(

∂2

∂x2i

fr)

(Ws) ds

)

]

=1

2

∫ t

0

E[

Trace((Hess fr)(Ws))]

ds

≤ r(r − 1)Trace(A∗A)

2

∫ t

0

E[

fr−2(Ws)]

ds.

(149)

This and (139) demonstrate that for all t ∈ [0, T ] it holds that

E

[

‖AWt‖2Rd

]

≤ 2(2− 1)

2Trace(A∗A)

∫ t

0

E[

f0(Ws)]

ds

= Trace(A∗A) t.

(150)

Holder’s inequality therefore proves that for all r ∈ [0, 2), t ∈ [0, T ] it holds that

E[

fr(Wt)]

= E[

‖AWt‖rRd

]

≤(

E[

‖AWt‖2Rd

])r/2 ≤ (Trace(A∗A) t)r/2. (151)

Hence, we obtain that for all r ∈ (0, 2], t ∈ [0, T ] it holds that

(

E[

‖AWt‖rRd

])1/r ≤√

Trace(A∗A) t. (152)

Next note that (149), the fact that for all r ∈ (2,∞) it holds that r−2qr ∈ [0, 2),

31

Page 32: and assume for all ε∈ (0,1], d∈ N, x∈ Rd that (Rφε,d) ∈ C(Rd,R), |(Rφε,d)(x)|+k(∇(Rφε,d))(x)k ≤ cdc(1+kxkc), (1) and P(φε,d) ≤ cdcε−c , |ϕd

and (151) imply that for all r ∈ (2,∞), s0 ∈ [0, T ] it holds that

E[‖AWs0‖rRd ] ≤

[

∏qr−1i=0 (r − 2i)(r − 1− 2i)

]

2qr[Trace(A∗A)]qr

·∫ s0

0

· · ·∫ sqr−1

0

E[

fr−2qr(Wsqr )]

dsqr · · · ds1

[

∏qr−1i=0 (r − 2i)(r − 1− 2i)

]

2qr[Trace(A∗A)]qr+

r−2qr2

·∫ s0

0

· · ·∫ sqr−1

0

(sqr )r−2qr

2 dsqr · · · ds1

=

[

∏qr−1i=0 (r − 2i)(r − 1− 2i)

]

2qr2qr

[

∏qr−1i=0 (r − 2i)

] [Trace(A∗A)]r/2sr/20

=

[

qr−1∏

i=0

(r − 1− 2i)

]

[Trace(A∗A)]r/2sr/20 .

(153)

The fact that for all r ∈ (2,∞) it holds that qr ≤ r2 hence demonstrates that

for all r ∈ (2,∞), t ∈ [0, T ] it holds that

(

E[‖AWt‖rRd ])1/r ≤

[

qr−1∏

i=0

(r − 1− 2i)

]1/r√

Trace(A∗A) t

≤ (r − 1)qrr

Trace(A∗A) t

≤ (r − 1)r2r

Trace(A∗A) t

=√

(r − 1)Trace(A∗A) t.

(154)

Combining this with (152) establishes (138). The proof of Lemma 3.1 is thuscompleted.

3.2 A priori estimates for solutions

In this subsection we present in Lemma 3.4 and Corollary 3.5 below essentiallywell-known a priori estimates for solutions of stochastic differential equationswith at most linearly growing drift coefficient functions and constant diffusioncoefficient functions. Corollary 3.5 is one of the main ingredients in our proofof Lemma 4.1 in Subsection 4.1 below and is a straightforward consequence ofLemma 3.1 above and Lemma 3.4 below. Our proof of Lemma 3.4 is a slightadaption of the proof of Lemma 2.6 in Beck et al. [5]. In our formulation of thestatements of Lemma 3.4 and Corollary 3.5 below we employ the elementaryresult in Lemma 3.3 below. In our proof of Lemma 3.3 we employ the elementaryresult in Lemma 3.2 below. Lemma 3.2 and Lemma 3.3 study measurabilityproperties for time-integrals of suitable stochastic processes.

32

Page 33: and assume for all ε∈ (0,1], d∈ N, x∈ Rd that (Rφε,d) ∈ C(Rd,R), |(Rφε,d)(x)|+k(∇(Rφε,d))(x)k ≤ cdc(1+kxkc), (1) and P(φε,d) ≤ cdcε−c , |ϕd

Lemma 3.2. Let T ∈ [0,∞), let (Ω,F ,P) be a probability space, let Y : [0, T ]×Ω → R be (B([0, T ]) ⊗ F)/B(R)-measurable, and assume that for all ω ∈ Ω it

holds that∫ T

0 |Yt(ω)| dt < ∞. Then the function Ω ∋ ω 7→∫ T

0 Yt(ω) dt ∈ R isF/B(R)-measurable.

Proof. Throughout this proof let Y + : [0, T ]×Ω → R and Y − : [0, T ]×Ω→ R bethe functions which satisfy for all t ∈ [0, T ], ω ∈ Ω that Y +

t (ω) = maxYt(ω), 0and Y −

t (ω) = −minYt(ω), 0. Note that for all ω ∈ Ω it holds that∫ T

0

Yt(ω) dt =

∫ T

0

Y +t (ω) dt−

∫ T

0

Y −t (ω) dt . (155)

Moreover, observe that Tonelli’s theorem implies that Ω ∋ ω 7→∫ T

0 Y +t (ω) dt ∈

R is F/B(R)-measurable and Ω ∋ ω 7→∫ T

0 Y −t (ω) dt ∈ R is F/B(R)-measurable.

This and (155) prove that Ω ∋ ω 7→∫ T

0Yt(ω) dt ∈ R is F/B(R)-measurable.

This completes the proof of Lemma 3.2.

Lemma 3.3. Let d ∈ N, c, C, T ∈ [0,∞), let ‖·‖ : Rd → [0,∞) be the standardnorm on R

d, let µ : Rd → Rd be a B(Rd)/B(Rd)-measurable function which

satisfies for all x ∈ Rd that ‖µ(x)‖ ≤ C + c‖x‖, let (Ω,F ,P) be a probability

space, and let X : [0, T ]×Ω → Rd be a stochastic process with continuous sample

paths. Then for all t ∈ [0, T ] the function Ω ∋ ω 7→∫ t

0µ(Xs(ω)) ds ∈ R

d is

F/B(Rd)-measurable.

Proof. The fact that X : [0, T ]×Ω → Rd is a stochastic process with continuous

sample paths and Aliprantis and Border [1, Lemma 4.51] ensure that for allt ∈ [0, T ] the function [0, t]×Ω ∋ (s, ω) 7→ Xs(ω) ∈ R

d is (B([0, t])⊗F)/B(Rd)-measurable. This and the hypothesis that µ : Rd → R

d is B(Rd)/B(Rd)-measurableimply that for all i ∈ 1, . . . , d, t ∈ [0, T ] it holds that the function [0, t]×Ω ∋(s, ω) 7→ µi(Xs(ω)) ∈ R is (B([0, t])⊗F)/B(R)-measurable. Moreover, note thatthe hypothesis that for all x ∈ R

d it holds that ‖µ(x)‖ ≤ C + c‖x‖ and the factthat for all ω ∈ Ω the function [0, T ] ∋ s 7→ Xs(ω) ∈ R

d is continuous imply

that for all t ∈ [0, T ], ω ∈ Ω, i ∈ 1, . . . , d it holds that∫ t

0 |µi(Xs(ω))| ds <∞.Lemma 3.2 (with T = t, Y = µi(X) for t ∈ [0, T ], i ∈ 1, . . . , d in the notationof Lemma 3.2) hence proves that for all i ∈ 1, . . . , d, t ∈ [0, T ] the function

Ω ∋ ω 7→∫ t

0µi(Xs(ω)) ds ∈ R is F/B(R)-measurable. This implies that for all

t ∈ [0, T ] the function Ω ∋ ω 7→∫ t

0µ(Xs(ω)) ds ∈ R

d is F/B(Rd)-measurable.This completes the proof of Lemma 3.3.

Lemma 3.4. Let d,m ∈ N, x ∈ Rd, p ∈ [1,∞), c, C, T ∈ [0,∞), A ∈ R

d×m,let ‖·‖ : Rd → [0,∞) be the standard norm on R

d, let (Ω,F ,P) be a probabilityspace, let W : [0, T ] × Ω → R

m be a standard Brownian motion, let µ : Rd →R

d be a B(Rd)/B(Rd)-measurable function which satisfies for all y ∈ Rd that

‖µ(y)‖ ≤ C + c‖y‖, and let X : [0, T ] × Ω → Rd be a stochastic process with

continuous sample paths which satisfies for all t ∈ [0, T ] that

P

(

Xt = x+

∫ t

0

µ(Xs) ds+AWt

)

= 1 (156)

33

Page 34: and assume for all ε∈ (0,1], d∈ N, x∈ Rd that (Rφε,d) ∈ C(Rd,R), |(Rφε,d)(x)|+k(∇(Rφε,d))(x)k ≤ cdc(1+kxkc), (1) and P(φε,d) ≤ cdcε−c , |ϕd

(cf. Lemma 3.3). Then

(

E[‖XT ‖p])1/p ≤

(

‖x‖+ CT +(

E[

‖AWT ‖p])1/p

)

ecT . (157)

Proof of Lemma 3.4. Throughout this proof for every n ∈ N let ‖·‖Rn : Rn

→ [0,∞) be the standard norm on Rn, let β(i) : [0, T ]×Ω → R, i ∈ 1, . . . ,m,

be the stochastic processes which satisfy for all i ∈ 1, . . . ,m, t ∈ [0, T ] that

Wt =(

β(1)t , . . . , β

(m)t

)

(158)

and let B ⊆ Ω be the set given by

B =⋂

t∈[0,T ]

Xt = x+

∫ t

0

µ(

Xs

)

ds+AWt

=

ω ∈ Ω:

(

∀ t ∈ [0, T ] : Xt(ω) = x+

∫ t

0

µ(

Xs(ω))

ds+AWt(ω)

)

.

(159)

Observe that the fact that X : [0, T ] × Ω → Rd and W : [0, T ] × Ω → R

m arestochastic processes with continuous sample paths demonstrates that

B =

t∈[0,T ]∩Q

Xt = x+

∫ t

0

µ(

Xs

)

ds+AWt

∈ F . (160)

Combining this and (156) proves that

P(B) = P

t∈[0,T ]∩Q

Xt = x+

∫ t

0

µ(

Xs

)

ds+AWt

= 1− P

Ω \

t∈[0,T ]∩Q

Xt = x+

∫ t

0

µ(

Xs

)

ds+AWt

= 1− P

t∈[0,T ]∩Q

Xt 6= x+

∫ t

0

µ(

Xs

)

ds+AWt

≥ 1−

t∈[0,T ]∩Q

P

(

Xt 6= x+

∫ t

0

µ(

Xs

)

ds+AWt

)

= 1.

(161)

Next note that the triangle inequality and the hypothesis that for all y ∈ Rd it

34

Page 35: and assume for all ε∈ (0,1], d∈ N, x∈ Rd that (Rφε,d) ∈ C(Rd,R), |(Rφε,d)(x)|+k(∇(Rφε,d))(x)k ≤ cdc(1+kxkc), (1) and P(φε,d) ≤ cdcε−c , |ϕd

holds that ‖µ(y)‖ ≤ C + c‖y‖ ensure that for all ω ∈ B, t ∈ [0, T ] it holds that

‖Xt(ω)‖ ≤ ‖x‖+ ‖AWt(ω)‖ +∫ t

0

‖µ(Xs(ω))‖ ds

≤ ‖x‖+ ‖AWt(ω)‖ + Ct+ c

∫ t

0

‖Xs(ω)‖ ds

≤ ‖x‖+[

sups∈[0,T ]

‖AWs(ω)‖]

+ CT + c

∫ t

0

‖Xs(ω)‖ ds.

(162)

Moreover, note that the assumption that X : [0, T ] × Ω → Rd is a stochastic

process with continuous sample paths assures that for all ω ∈ Ω it holds that∫ T

0

‖Xs(ω)‖ ds <∞. (163)

Grohs et al. [28, Lemma 2.11] (with α = ‖x‖ + supt∈[0,T ] ‖AWt(ω)‖ + CT ,β = c, f = ([0, T ] ∋ t 7→ ‖Xt(ω)‖ ∈ [0,∞)) for ω ∈ B in the notation of Grohset al. [28, Lemma 2.11]) and (162) hence prove that for all ω ∈ B, t ∈ [0, T ] itholds that

‖Xt(ω)‖ ≤(

‖x‖+[

sups∈[0,T ]

‖AWs(ω)‖]

+ CT

)

ect. (164)

Next note that(

E

[

sups∈[0,T ]

‖AWs‖p])1/p

≤ ‖A‖L(Rm,Rd)

(

E

[

sups∈[0,T ]

‖Ws‖pRm

])1/p

. (165)

Moreover, if p ∈ [1, 2], then it holds that

sups∈[0,T ]

‖Ws‖pRm = sups∈[0,T ]

(

m∑

i=1

|β(i)s |2

)p/2

≤ sups∈[0,T ]

(

1 +

m∑

i=1

|β(i)s |2

)p/2

≤ 1 +

m∑

i=1

[

sups∈[0,T ]

|β(i)s |2

]

.

(166)

Hence, if p ∈ [1, 2], then the Burkholder–Davis–Gundy inequality (see, e.g.,Karatzas and Shreve [44, Theorem 3.28]) implies that

E

[

sups∈[0,T ]

‖Ws‖pRm

]

≤ 1 +mE

[

sups∈[0,T ]

|β(1)s |2

]

<∞ . (167)

Next note that if p ∈ (2,∞), then Holder’s inequality implies that

sups∈[0,T ]

‖Ws‖pRm = sups∈[0,T ]

(

m∑

i=1

|β(i)s |2

)p/2

≤ mp/2−1 sup

s∈[0,T ]

(

m∑

i=1

|β(i)s |p

)

≤ mp/2−1

m∑

i=1

[

sups∈[0,T ]

|β(i)s |p

]

.

(168)

35

Page 36: and assume for all ε∈ (0,1], d∈ N, x∈ Rd that (Rφε,d) ∈ C(Rd,R), |(Rφε,d)(x)|+k(∇(Rφε,d))(x)k ≤ cdc(1+kxkc), (1) and P(φε,d) ≤ cdcε−c , |ϕd

Hence, if p ∈ (2,∞), then the Burkholder–Davis–Gundy inequality (see, e.g.,Karatzas and Shreve [44, Theorem 3.28]) implies that

E

[

sups∈[0,T ]

‖Ws‖pRm

]

≤ mp/2−1mE

[

sups∈[0,T ]

|β(1)s |p

]

<∞ . (169)

This, (165), and (167) hence imply that

(

E

[

sups∈[0,T ]

‖AWs‖p])1/p

<∞. (170)

Combining this, (164), and (161) demonstrates that

∫ T

0

(

E[

‖Xt‖p])1/p

dt

≤ T

[

supt∈[0,T ]

(

E[

‖Xt‖p])1/p

]

≤ T∣

∣E

[

∣‖x‖+ sups∈[0,T ] ‖AWs‖+ CT∣

pepcT

]∣

1/p

≤ T

‖x‖+∣

E

[

sups∈[0,T ]

‖AWs‖p]∣

1/p

+ CT

ecT <∞.

(171)

Next observe that the hypothesis that X : [0, T ]×Ω → Rd is a stochastic process

with continuous sample paths ensures that X : [0, T ]×Ω → Rd is a (B([0, T ])⊗

F)/B(Rd)-measurable function. Hence, we obtain that for all t ∈ [0, T ] it holdsthat

(

E

[

∫ t

0

‖Xs‖ ds∣

p])1/p

≤∫ t

0

(E[‖Xs‖p])1/p ds (172)

(cf., for example, Garling [24, Corollary 5.4.2] or Jentzen & Kloeden [42, Propo-sition 8 in Appendix A]). The triangle inequality, the fact that for all t ∈ [0, T ]

it holds that Wt has the same distribution as√t√TWT , (161), and (162) therefore

show that for all t ∈ [0, T ] it holds that

(

E[

‖Xt‖p])1/p

≤ ‖x‖+(

E[

‖AWt‖p])1/p

+ Ct+ c

∫ t

0

(

E[

‖Xs‖p])1/p

ds

= ‖x‖+√t√T

(

E[

‖AWT ‖p])1/p

+ Ct+ c

∫ t

0

(

E[

‖Xs‖p])1/p

ds

≤ ‖x‖+(

E[

‖AWT ‖p])1/p

+ CT + c

∫ t

0

(

E[

‖Xs‖p])1/p

ds.

(173)

36

Page 37: and assume for all ε∈ (0,1], d∈ N, x∈ Rd that (Rφε,d) ∈ C(Rd,R), |(Rφε,d)(x)|+k(∇(Rφε,d))(x)k ≤ cdc(1+kxkc), (1) and P(φε,d) ≤ cdcε−c , |ϕd

Combining Grohs et al. [28, Lemma 2.11] (with α = ‖x‖+(E[‖AWT ‖p])1/p+CT ,β = c, f = ([0, T ] ∋ t 7→ (E[‖Xt‖p])1/p ∈ [0,∞)) in the notation of Grohs etal. [28, Lemma 2.11]) and (171) hence establishes that for all t ∈ [0, T ] it holdsthat

(

E[

‖Xt‖p])1/p ≤

(

‖x‖+(

E[

‖AWT ‖p])1/p

+ CT)

ect. (174)

The proof of Lemma 3.4 is thus completed.

Corollary 3.5. Let d,m ∈ N, x ∈ Rd, p ∈ [1,∞), c, C, T ∈ [0,∞), A ∈ R

d×m,let ‖·‖ : Rd → [0,∞) be the standard norm on R

d, let (Ω,F ,P) be a probabilityspace, let W : [0, T ] × Ω → R

m be a standard Brownian motion, let µ : Rd →R

d be a B(Rd)/B(Rd)-measurable function which satisfies for all y ∈ Rd that

‖µ(y)‖ ≤ C + c‖y‖, and let X : [0, T ] × Ω → Rd be a stochastic process with

continuous sample paths which satisfies for all t ∈ [0, T ] that

P

(

Xt = x+

∫ t

0

µ(Xs) ds+AWt

)

= 1 (175)

(cf. Lemma 3.3). Then

(

E[

‖XT ‖p])1/p ≤

(

‖x‖+ CT +√

max1, p− 1Trace(A∗A)T)

ecT . (176)

Proof of Corollary 3.5. Observe that Lemma 3.1 and Lemma 3.4 establish (176).The proof of Corollary 3.5 is thus completed.

3.3 A priori estimates for differences of solutions

In this subsection we provide in Lemma 3.6 below well-known a priori estimatesfor differences of solutions of stochastic differential equations with Lipschitz con-tinuous drift coefficient functions and constant diffusion coefficient functions.Lemma 3.6 is one of the main ingredients in our proof of Lemma 4.1 in Sub-section 4.1 below. Our proof of Lemma 3.6 is a slight adaption of the proof ofLemma 2.6 in Beck et al. [5].

Lemma 3.6. Let d,m ∈ N, p ∈ [1,∞), l ∈ [0,∞), T ∈ [0,∞), A ∈ Rd×m,

let ‖·‖ : Rd → [0,∞) be the standard norm on Rd, let (Ω,F ,P) be a probability

space, let W : [0, T ]×Ω → Rm be a standard Brownian motion, let µ : Rd → R

d

be a B(Rd)/B(Rd)-measurable function which satisfies for all x, y ∈ Rd that

‖µ(x) − µ(y)‖ ≤ l‖x− y‖, and let Xx : [0, T ]× Ω → Rd, x ∈ R

d, be stochasticprocesses with continuous sample paths which satisfy for all x ∈ R

d, t ∈ [0, T ]that

P

(

Xxt = x+

∫ t

0

µ(Xxs ) ds+AWt

)

= 1. (177)

Then it holds for all x, y ∈ Rd, t ∈ [0, T ] that

(E[‖Xxt −Xy

t ‖p])1/p ≤ elt‖x− y‖. (178)

37

Page 38: and assume for all ε∈ (0,1], d∈ N, x∈ Rd that (Rφε,d) ∈ C(Rd,R), |(Rφε,d)(x)|+k(∇(Rφε,d))(x)k ≤ cdc(1+kxkc), (1) and P(φε,d) ≤ cdcε−c , |ϕd

Proof of Lemma 3.6. Throughout this proof let Bx ⊆ Ω, x ∈ Rd, be the sets

which satisfy for all x ∈ Rd that

Bx =⋂

t∈[0,T ]

Xxt = x+

∫ t

0

µ(

Xxs

)

ds+AWt

=

ω ∈ Ω:

(

∀ t ∈ [0, T ] : Xxt (ω) = x+

∫ t

0

µ(

Xxs (ω)

)

ds+AWt(ω)

)

.

(179)

Observe that the fact that for all x ∈ Rd it holds that Xx : [0, T ] × Ω → R

d

and W : [0, T ]×Ω → Rm are stochastic processes with continuous sample paths

demonstrates that for all x ∈ Rd it holds that

Bx =

t∈[0,T ]∩Q

Xxt = x+

∫ t

0

µ(

Xxs

)

ds+AWt

∈ F . (180)

Combining this and (177) proves that for all x ∈ Rd it holds that

P(Bx) = P

t∈[0,T ]∩Q

Xxt = x+

∫ t

0

µ(

Xxs

)

ds+AWt

= 1− P

Ω \

t∈[0,T ]∩Q

Xxt = x+

∫ t

0

µ(

Xxs

)

ds+AWt

= 1− P

t∈[0,T ]∩Q

Xxt 6= x+

∫ t

0

µ(

Xxs

)

ds+AWt

≥ 1−

t∈[0,T ]∩Q

P

(

Xxt 6= x+

∫ t

0

µ(

Xxs

)

ds+AWt

)

= 1.

(181)

Hence, we obtain that for all x, y ∈ Rd it holds that

P(Bx ∩By) = 1− P(Ω\[Bx ∩By])

= 1− P(

Bcx ∪Bc

y

)

≥ 1−[

P(Bcx) + P

(

Bcy

)]

= 1.(182)

Next note that the triangle inequality and the assumption that for all x, y ∈ Rd

it holds that ‖µ(x)−µ(y)‖ ≤ l‖x−y‖ ensure that for all x, y ∈ Rd, ω ∈ Bx∩By,

t ∈ [0, T ] it holds that

‖Xxt (ω)−Xy

t (ω)‖ ≤ ‖x− y‖+∫ t

0

‖µ(Xxs (ω))− µ(Xy

s (ω))‖ ds

≤ ‖x− y‖+ l

∫ t

0

‖Xxs (ω)−Xy

s (ω)‖ ds.(183)

38

Page 39: and assume for all ε∈ (0,1], d∈ N, x∈ Rd that (Rφε,d) ∈ C(Rd,R), |(Rφε,d)(x)|+k(∇(Rφε,d))(x)k ≤ cdc(1+kxkc), (1) and P(φε,d) ≤ cdcε−c , |ϕd

Moreover, note that the assumption that for all x ∈ Rd it holds that Xx : [0, T ]

×Ω → Rd is a stochastic process with continuous sample paths assures that for

all x, y ∈ Rd, ω ∈ Ω it holds that

∫ T

0

‖Xxs (ω)−Xy

s (ω)‖ ds <∞. (184)

Grohs et al. [28, Lemma 2.11] (with α = ‖x − y‖, β = l, f = ([0, T ] ∋ t 7→‖Xx

t (ω) − Xyt (ω)‖ ∈ [0,∞)) for x, y ∈ R

d, ω ∈ Bx ∩ By in the notation ofGrohs et al. [28, Lemma 2.11]) and (183) hence prove that for all x, y ∈ R

d,ω ∈ Bx ∩By, t ∈ [0, T ] it holds that

‖Xxt (ω)−Xy

t (ω)‖ ≤ ‖x− y‖elt. (185)

This and (182) prove that for all x, y ∈ Rd, t ∈ [0, T ] it holds that

(E[‖Xxt −Xy

t ‖p])1/p ≤ elt‖x− y‖. (186)

The proof of Lemma 3.6 is thus completed.

4 Error estimates

4.1 Quantitative error estimates

In this subsection we establish in Corollary 4.3 below a quantitative approxima-tion result for viscosity solutions (cf., for example, Hairer et al. [31]) of Kolo-mogorov PDEs with constant coefficient functions. Our proof of Corollary 4.3employs the quantitative approximation results in Lemma 4.1 and Corollary 4.2below. Corollary 4.3 is one of the key ingredients which we use in our proof ofProposition 4.4 and Proposition 4.6 below, respectively, in order to constructANN approximations for viscosity solutions of Kolmogorov PDEs with constantcoefficient functions.

Lemma 4.1. Let d, n ∈ N, ϕ ∈ C(Rd,R), c, l, a ∈ R, b ∈ (a,∞), ζ, ε, T ∈(0,∞), v,v, w,w, z, z ∈ [0,∞), p ∈ [1,∞), let ‖·‖ : Rd → [0,∞) be the stan-dard norm on R

d, let 〈·, ·〉 : Rd ×Rd → R be the d-dimensional Euclidean scalar

product, let K ∈ (0,∞) be the max2, p-Kahane–Khintchine constant (cf. Def-inition 2.1 and Lemma 2.2), assume for all Φ ∈ C1((0, 1)d,R) that

supx∈(0,1)d

|Φ(x)| ≤ ζ

[

(0,1)d

(

|Φ(x)|max2,p+ ‖(∇Φ)(x)‖max2,p

)

dx

]1/max2,p

,

(187)let φ ∈ C1(Rd,R) satisfy for all x ∈ R

d that

|φ(x)| ≤ cdz(1 + ‖x‖z), ‖(∇φ)(x)‖ ≤ cdw(1 + ‖x‖w), (188)

and |ϕ(x) − φ(x)| ≤ εdv(1 + ‖x‖v), (189)

39

Page 40: and assume for all ε∈ (0,1], d∈ N, x∈ Rd that (Rφε,d) ∈ C(Rd,R), |(Rφε,d)(x)|+k(∇(Rφε,d))(x)k ≤ cdc(1+kxkc), (1) and P(φε,d) ≤ cdcε−c , |ϕd

let µ : Rd → Rd satisfy for all x, y ∈ R

d, λ ∈ R that µ(λx + y) + λµ(0) =λµ(x) + µ(y) and

‖µ(x)− µ(y)‖ ≤ l‖x− y‖, (190)

let A = (Ai,j)(i,j)∈1,...,d2 ∈ Rd×d be a symmetric and positive semi-definite

matrix, let u ∈ C([0, T ]×Rd,R), assume for all x ∈ R

d that u(0, x) = ϕ(x), as-

sume that infγ∈(0,∞) sup(t,x)∈[0,T ]×Rd

( |u(t,x)|1+‖x‖γ

)

<∞, and assume that u|(0,T )×Rd

is a viscosity solution of

( ∂∂tu)(t, x) = 〈µ(x), (∇xu)(t, x)〉 + 1

2

∑di,j=1 Ai,j (

∂2

∂xi∂xju)(t, x) (191)

for (t, x) ∈ (0, T )× Rd. Then there exist W1, . . . ,Wn ∈ R

d×d, B1, . . . , Bn ∈ Rd

such that

supx∈[a,b]d

u(T, x)−[

1

n

n∑

k=1

φ(Wkx+Bk)

]∣

≤ εdv(

1 + evlT[

T ‖µ(0)‖+√

max1,v− 1T Trace(A)

+ supx∈[a,b]d

‖x‖]v)

+4Kζ√n

(

cdz(

1 + ezlT[

T ‖µ(0)‖+√

max1, zmax2, p − 1T Trace(A)

+ supx∈[a,b]d

‖x‖]z)

+ (b− a)cdw+1/2elT(

1 + ewlT[

T ‖µ(0)‖

+√

max1,wmax2, p+w − 1T Trace(A) + supx∈[a,b]d

‖x‖]w)

)

.

(192)

Proof of Lemma 4.1. Throughout this proof let e1, . . . , ed ∈ Rd satisfy that e1 =

(1, 0, . . . , 0), . . . , ed = (0, . . . , 0, 1), let mr : (0,∞) → [r,∞), r ∈ R, satisfy forall r ∈ R, x ∈ (0,∞) that mr(x) = maxr, x, let (Ω,F ,P) be a probabilityspace with a normal filtration (Ft)t∈[0,T ], let W

i : [0, T ] × Ω → Rd, i ∈ N, be

independent standard (Ft)t∈[0,T ]-Brownian motions, and let X i,x : [0, T ]×Ω →R

d, i ∈ N, x ∈ Rd, be (Ft)t∈[0,T ]-adapted stochastic processes with continuous

sample paths which satisfy that

(a) for all i ∈ N, x ∈ Rd, t ∈ [0, T ] it holds P-a.s. that

X i,xt = x+

∫ t

0

µ(X i,xs ) ds+

√AW i

t (193)

and

(b) for all i ∈ N, x, y ∈ Rd, λ ∈ R, t ∈ [0, T ], ω ∈ Ω it holds that X i,λx+y

t (ω) +λX i,0

t (ω) = λX i,xt (ω) +X i,y

t (ω)

40

Page 41: and assume for all ε∈ (0,1], d∈ N, x∈ Rd that (Rφε,d) ∈ C(Rd,R), |(Rφε,d)(x)|+k(∇(Rφε,d))(x)k ≤ cdc(1+kxkc), (1) and P(φε,d) ≤ cdcε−c , |ϕd

(cf. Grohs et al. [28, Proposition 2.20]). Note that item (b) and Grohs et al. [28,Corollary 2.8] (with d = d, m = d, ϕ = (Rd ∋ x 7→ X i,x

T (ω) ∈ Rd) for i ∈ N,

ω ∈ Ω in the notation of Grohs et al. [28, Corollary 2.8]) ensure that for alli ∈ N, ω ∈ Ω there exist Wi,ω ∈ R

d×d and Bi,ω ∈ Rd which satisfy that for all

x ∈ Rd it holds that

X i,xT (ω) = Wi,ωx+ Bi,ω. (194)

Combining this with the assumption that φ ∈ C1(Rd,R) proves that for allω ∈ Ω, i ∈ N it holds that

(

Rd ∋ x 7→ φ

(

X i,xT (ω)

)

∈ R

)

∈ C1(Rd,R). (195)

Next note that (188) and (189) ensure that ϕ : Rd → R is an at most polyno-mially growing function. This, (190), and the Feynman-Kac formula (cf., forexample, Grohs et al. [28, Proposition 2.22] or Hairer et al. [31, Corollary 4.17])imply that for all x ∈ R

d it holds that

u(T, x) = E[

ϕ(X1,xT )

]

. (196)

This shows that

supx∈[a,b]d

∣u(T, x)− E[

φ(X1,xT )

]

= supx∈[a,b]d

∣E[

ϕ(X1,xT )− φ(X1,x

T )]

≤ supx∈[a,b]d

E

[

∣ϕ(X1,xT )− φ(X1,x

T )∣

dv(

1 +∥

∥X1,xT

v)

dv(

1 +∥

∥X1,xT

v)

]

≤ dv

[

supy∈Rd

( |ϕ(y)− φ(y)|dv(1 + ‖y‖v)

)

][

supx∈[a,b]d

E

[

1 +∥

∥X1,xT

v]

]

.

(197)

Jensen’s inequality and (189) therefore show that

supx∈[a,b]d

∣u(T, x)− E[

φ(X1,xT )

]

≤ dv

[

supy∈Rd

( |ϕ(y)− φ(y)|dv(1 + ‖y‖v)

)

][

1 + supx∈[a,b]d

E

[

∥X1,xT

v]

]

≤ εdv

[

1 + supx∈[a,b]d

∣E[∥

∥X1,xT

m1(v)]∣

v/m1(v)

]

.

(198)

Item (a), Corollary 3.5 (with m = d, C = ‖µ(0)‖, c = l, W = W 1, A =√A,

X = X1,x, p = m1(v) for x ∈ [a, b]d in the notation of Corollary 3.5), and the

41

Page 42: and assume for all ε∈ (0,1], d∈ N, x∈ Rd that (Rφε,d) ∈ C(Rd,R), |(Rφε,d)(x)|+k(∇(Rφε,d))(x)k ≤ cdc(1+kxkc), (1) and P(φε,d) ≤ cdcε−c , |ϕd

fact that m1(m1(v) − 1) = m1(v − 1) hence demonstrate that

supx∈[a,b]d

∣u(T, x)− E[

φ(X1,xT )

]

≤ εdv(

1 + evlT[

‖µ(0)‖T +√

m1(v − 1)T Trace(A)

+ supx∈[a,b]d

‖x‖]v)

.

(199)

Next note that

supx∈[a,b]d

∣E

[

|φ(X1,xT )|m2(p)

]∣

1/m2(p)

= supx∈[a,b]d

E

φ(X1,xT )

dz(

1 +∥

∥X1,xT

z)

dz(

1 +∥

∥X1,xT

z)

m2(p)

1/m2(p)

≤ dz

[

supy∈Rd

|φ(y)|dz(1 + ‖y‖z)

][

supx∈[a,b]d

∣E

[

(

1 +∥

∥X1,xT

z)m2(p)]∣

1/m2(p)

]

.

(200)

This, (188), the triangle inequality, and Holder’s inequality show that

supx∈[a,b]d

∣E

[

|φ(X1,xT )|m2(p)

]∣

1/m2(p)

≤ cdz

[

1 + supx∈[a,b]d

∣E

[

‖X1,xT ‖zm2(p)

]∣

1/m2(p)

]

≤ cdz

[

1 + supx∈[a,b]d

∣E

[

‖X1,xT ‖m1(zm2(p))

]∣

z/m1(zm2(p))

]

.

(201)

Item (a), Corollary 3.5 (with m = d, C = ‖µ(0)‖, c = l, A =√A, W = W 1,

X = X1,x, p = m1(zm2(p)) for x ∈ [a, b]d in the notation of Corollary 3.5), andthe fact that m1(m1(zm2(p))− 1) = m1(zm2(p)− 1) hence demonstrate that

supx∈[a,b]d

∣E

[

|φ(X1,xT )|m2(p)

]∣

1/m2(p)

≤ cdz(

1 + ezlT[

T ‖µ(0)‖+√

m1(zm2(p)− 1)T Trace(A)

+ supx∈[a,b]d

‖x‖]z)

.

(202)

42

Page 43: and assume for all ε∈ (0,1], d∈ N, x∈ Rd that (Rφε,d) ∈ C(Rd,R), |(Rφε,d)(x)|+k(∇(Rφε,d))(x)k ≤ cdc(1+kxkc), (1) and P(φε,d) ≤ cdcε−c , |ϕd

Next note that

supx∈[a,b]d

E

[

lim supRd\0∋h→0

(

|φ(X1,x+hT )− φ(X1,x

T )|m2(p)

‖h‖m2(p)

)]∣

1/m2(p)

= supx∈[a,b]d

E

[

(

∂∂xX

1,xT

)∗(∇φ)(X1,x

T )∥

m2(p)]∣

1/m2(p)

≤ supx∈[a,b]d

E

[

∥(∇φ)(X1,xT )

m2(p)∥∥

(

∂∂xX

1,xT

)

m2(p)

L(Rd,Rd)

]∣

1/m2(p)

.

(203)

Moreover, observe that for all x ∈ [a, b]d it holds that

E

[

∥(∇φ)(X1,xT )

m2(p)∥∥

(

∂∂xX

1,xT

)

m2(p)

L(Rd,Rd)

]∣

1/m2(p)

=

E

[

∥(∇φ)(X1,xT )

m2(p)[

dw(

1 + ‖X1,xT ‖w

)]m2(p)

[

dw(

1 + ‖X1,xT ‖w

)]m2(p)

(

∂∂xX

1,xT

)

m2(p)

L(Rd,Rd)

]∣

1/m2(p)

≤ dw

[

supy∈Rd

‖(∇φ)(y)‖dw(1 + ‖y‖w)

]

E

[

(

1 + ‖X1,xT ‖w

)m2(p)∥

(

∂∂xX

1,xT

)

m2(p)

L(Rd,Rd)

]∣

1/m2(p)

.

(204)

This, (203), and (188) prove that

supx∈[a,b]d

E

[

lim supRd\0∋h→0

(

|φ(X1,x+hT )− φ(X1,x

T )|m2(p)

‖h‖m2(p)

)]∣

1/m2(p)

≤ cdw supx∈[a,b]d

E

[

(

1 + ‖X1,xT ‖w

)m2(p)∥

(

∂∂xX

1,xT

)

m2(p)

L(Rd,Rd)

]∣

1/m2(p)

.

(205)

This, Holder’s inequality, and the triangle inequality show that

supx∈[a,b]d

E

[

lim supRd\0∋h→0

(

|φ(X1,x+hT )− φ(X1,x

T )|m2(p)

‖h‖m2(p)

)]∣

1/m2(p)

≤ cdw

[

supx∈[a,b]d

(

∣E

[

(

∂∂xX

1,xT

)∥

m2(p)(m2(p)+1)

L(Rd,Rd)

]∣

1/(m2(p)(m2(p)+1))

·∣

∣E

[

(

1 +∥

∥X1,xT

w)m2(p)+1]∣

1/(m2(p)+1))]

≤ cdw

[

supx∈[a,b]d

(

∣E

[

(

∂∂xX

1,xT

)∥

m2(p)(m2(p)+1)

L(Rd,Rd)

]∣

1/(m2(p)(m2(p)+1))

·(

1 +∣

∣E

[

∥X1,xT

w(m2(p)+1)]∣

1/(m2(p)+1))

)]

.

(206)

43

Page 44: and assume for all ε∈ (0,1], d∈ N, x∈ Rd that (Rφε,d) ∈ C(Rd,R), |(Rφε,d)(x)|+k(∇(Rφε,d))(x)k ≤ cdc(1+kxkc), (1) and P(φε,d) ≤ cdcε−c , |ϕd

Jensen’s inequality therefore demonstrates that

supx∈[a,b]d

E

[

lim supRd\0∋h→0

(

|φ(X1,x+hT )− φ(X1,x

T )|m2(p)

‖h‖m2(p)

)]∣

1/m2(p)

≤ cdw

[

supx∈[a,b]d

(

∣E

[

(

∂∂xX

1,xT

)∥

m2(p)(m2(p)+1)

L(Rd,Rd)

]∣

1/(m2(p)(m2(p)+1))

·(

1 +∣

∣E

[

∥X1,xT

m1(w(m2(p)+1))]∣

w/m1(w(m2(p)+1)))

)]

.

(207)

Next note that (194) implies that for all ω ∈ Ω, x, y ∈ Rd it holds that

(

∂∂xX

1,xT (ω)

)

y = X1,yT (ω)−X1,0

T (ω). (208)

This and Holder’s inequality show that for all x ∈ Rd it holds that

∣E

[

(

∂∂xX

1,xT

)∥

m2(p)(m2(p)+1)

L(Rd,Rd)

]∣

1/(m2(p)(m2(p)+1))

E

(

d∑

i=1

(

∂∂xX

1,xT

)

ei∥

2

)(m2(p)(m2(p)+1))/2

1/(m2(p)(m2(p)+1))

≤∣

E

[

dm2(p)(m2(p)+1)−2

2

(

d∑

i=1

(

∂∂xX

1,xT

)

ei∥

m2(p)(m2(p)+1)

)]∣

1/(m2(p)(m2(p)+1))

= dm2(p)(m2(p)+1)−2

2m2(p)(m2(p)+1)

d∑

i=1

E

[

(

∂∂xX

1,xT

)

ei∥

m2(p)(m2(p)+1)]

1/(m2(p)(m2(p)+1))

≤ d1/2 max

i∈1,...,d

(

∣E

[

(

∂∂xX

1,xT

)

ei∥

m2(p)(m2(p)+1)]∣

1/(m2(p)(m2(p)+1)))

= d1/2 max

i∈1,...,d

(

∣E

[

∥X1,eiT −X1,0

T

m2(p)(m2(p)+1)]∣

1/(m2(p)(m2(p)+1)))

.

(209)

Combining this with (207) shows that

supx∈[a,b]d

E

[

lim supRd\0∋h→0

(

|φ(X1,x+hT )− φ(X1,x

T )|m2(p)

‖h‖m2(p)

)]∣

1/m2(p)

≤ cdw+1/2 maxi∈1,...,d

(

∣E

[

∥X1,eiT −X1,0

T

m2(p)(m2(p)+1)]∣

1/(m2(p)(m2(p)+1)))

·[

1 + supx∈[a,b]d

∣E

[

∥X1,xT

m1(w(m2(p)+1))]∣

w/m1(w(m2(p)+1))]

.

(210)

44

Page 45: and assume for all ε∈ (0,1], d∈ N, x∈ Rd that (Rφε,d) ∈ C(Rd,R), |(Rφε,d)(x)|+k(∇(Rφε,d))(x)k ≤ cdc(1+kxkc), (1) and P(φε,d) ≤ cdcε−c , |ϕd

Item (a), Corollary 3.5 (with m = d, C = ‖µ(0)‖, c = l, A =√A, W = W 1,

X = X1,x, p = m1(w(m2(p)+1)) for x ∈ [a, b]d in the notation of Corollary 3.5),Lemma 3.6 (with m = d, l = l, A =

√A, W = W 1, X = X1, x = ei, y = 0,

p = m2(p)(m2(p) + 1) for i ∈ 1, . . . , d in the notation of Lemma 3.6), and thefact that m1(m1(w(m2(p)+1))−1) = m1(w(m2(p)+1)−1) hence demonstratethat

supx∈[a,b]d

E

[

lim supRd\0∋h→0

(

|φ(X1,x+hT )− φ(X1,x

T )|m2(p)

‖h‖m2(p)

)]∣

1/m2(p)

≤ cdw+1/2 maxi∈1,...,d

(

elT ‖ei‖)

(

1 + ewlT[

T ‖µ(0)‖

+√

m1(w(m2(p) + 1)− 1)T Trace(A) + supx∈[a,b]d

‖x‖]w)

= cdw+1/2elT(

1 + ewlT[

T ‖µ(0)‖

+√

m1(wm2(p) +w − 1)T Trace(A) + supx∈[a,b]d

‖x‖]w)

.

(211)

Combining this, (202), and (195) with item (ii) in Lemma 2.16 (with n = n,ξi = ((Rd × Ω) ∋ (x, ω) 7→ φ(X i,x

T (ω)) ∈ R) for i ∈ 1, . . . , n in the notation ofLemma 2.16) implies that

E

[

supx∈[a,b]d

E[

φ(X1,xT )

]

−[

1

n

n∑

k=1

φ(Xk,xT )

]∣

p]∣∣

1/p

≤ 4Kζ√n

(

cdz(

1 + ezlT[

T ‖µ(0)‖+√

m1(zm2(p)− 1)T Trace(A)

+ supx∈[a,b]d

‖x‖]z)

+ (b − a)cdw+1/2elT(

1 + ewlT[

T ‖µ(0)‖

+√

m1(wm2(p) +w− 1)T Trace(A) + supx∈[a,b]d

‖x‖]w)

)

.

(212)

Grohs et al. [28, Proposition 3.3] hence shows that there exists ωn ∈ Ω whichsatisfies that

supx∈[a,b]d

E[

φ(X1,xT )

]

−[

1

n

n∑

k=1

φ(Xk,xT (ωn))

]∣

≤ 4Kζ√n

(

cdz(

1 + ezlT[

T ‖µ(0)‖+√

m1(zm2(p)− 1)T Trace(A)

+ supx∈[a,b]d

‖x‖]z)

+ (b − a)cdw+1/2elT(

1 + ewlT[

T ‖µ(0)‖

+√

m1(wm2(p) +w− 1)T Trace(A) + supx∈[a,b]d

‖x‖]w)

)

.

(213)

45

Page 46: and assume for all ε∈ (0,1], d∈ N, x∈ Rd that (Rφε,d) ∈ C(Rd,R), |(Rφε,d)(x)|+k(∇(Rφε,d))(x)k ≤ cdc(1+kxkc), (1) and P(φε,d) ≤ cdcε−c , |ϕd

Combining this, (194), (196), and (199) with the triangle inequality demon-strates that

supx∈[a,b]d

u(T, x)−[

1

n

n∑

k=1

φ(Wk,ωnx+ Bk,ωn

)

]∣

= supx∈[a,b]d

E[ϕ(X1,xT )]−

[

1

n

n∑

k=1

φ(Xk,xT (ωn))

]∣

≤ supx∈[a,b]d

∣E[ϕ(X1,xT )]− E[φ(X1,x

T )]∣

+ supx∈[a,b]d

E[φ(X1,xT )]−

[

1

n

n∑

k=1

φ(Xk,xT (ωn))

]∣

≤ εdv(

1 + evlT[

T ‖µ(0)‖+√

m1(v − 1)T Trace(A)

+ supx∈[a,b]d

‖x‖]v)

+4Kζ√n

(

cdz(

1 + ezlT[

T ‖µ(0)‖+√

m1(zm2(p)− 1)T Trace(A)

+ supx∈[a,b]d

‖x‖]z)

+ (b− a)cdw+1/2elT(

1 + ewlT[

T ‖µ(0)‖

+√

m1(wm2(p) +w − 1)T Trace(A) + supx∈[a,b]d

‖x‖]w)

)

.

(214)

The proof of Lemma 4.1 is thus completed.

Corollary 4.2. Let d, n ∈ N, ϕ ∈ C(Rd,R), α, a ∈ R, β ∈ [0,∞), b ∈ (a,∞),ε, T, c ∈ (0,∞), v,v, w,w, z, z ∈ [0,∞), µ ∈ R

d, let ‖·‖ : Rd → [0,∞) be thestandard norm on R

d, let φ ∈ C1(Rd,R), let A = (Ai,j)(i,j)∈1,...,d2 ∈ Rd×d be

a symmetric and positive semi-definite matrix, assume for all x ∈ Rd that

|φ(x)| ≤ cdz(1 + ‖x‖z), ‖(∇φ)(x)‖ ≤ cdw(1 + ‖x‖w), (215)

|ϕ(x) − φ(x)| ≤ εdv(1 + ‖x‖v),√

Trace(A) ≤ cdβ, (216)

and ‖µ‖ ≤ cdα, let u ∈ C([0, T ]× Rd,R), assume for all x ∈ R

d that u(0, x) =

ϕ(x), assume that infγ∈(0,∞) sup(t,x)∈[0,T ]×Rd

( |u(t,x)|1+‖x‖γ

)

< ∞, and assume that

u|(0,T )×Rd is a viscosity solution of

( ∂∂tu)(t, x) =

∑di,j=1 Ai,j (

∂2

∂xi∂xju)(t, x) +

∑di=1 µi (

∂∂xi

u)(t, x) (217)

for (t, x) ∈ (0, T )× Rd. Then there exist W1, . . . ,Wn ∈ R

d×d, B1, . . . , Bn ∈ Rd

46

Page 47: and assume for all ε∈ (0,1], d∈ N, x∈ Rd that (Rφε,d) ∈ C(Rd,R), |(Rφε,d)(x)|+k(∇(Rφε,d))(x)k ≤ cdc(1+kxkc), (1) and P(φε,d) ≤ cdcε−c , |ϕd

such that

supx∈[a,b]d

u(T, x)−[

1

n

n∑

k=1

φ(Wkx+Bk)

]∣

≤ εdv+vmaxα,β,1/2

·(

1 +[√2max1, T max1,√v(2c+max|a|, |b|)

]v)

+ d1+maxz+zmaxα,β+1,w+1/2+wmaxα,β+1

· 32√ec√n

[

(1 + (b − a))

·(

1 +[√6max1, T max1,√z,

√w(2c+max|a|, |b|)

]maxz,w)].

(218)

Proof of Corollary 4.2. Throughout this proof let K ∈ (0,∞) be the max2, d2-Kahane–Khintchine constant (cf. Definition 2.1 and Lemma 2.2). Observe thatfor all x ∈ [a, b]d it holds that

‖x‖ =

[

d∑

i=1

|xi|2]1/2

≤[

d∑

i=1

[max|a|, |b|]2]1/2

= d1/2 max|a|, |b|. (219)

This proves thatsup

x∈[a,b]d‖x‖ ≤ d

1/2 max|a|, |b|. (220)

Next note that Lemma 2.2 (with p = max2, d2 in the notation of Lemma 2.2)ensures that

K ≤√

max

1,max2, d2 − 1

≤ d. (221)

Combining this, (220), Corollary 2.15, (216), and the hypothesis that ‖µ‖ ≤ cdα

with Lemma 4.1 (with l = 0, n = n, ζ = 8√e, p = d2, µ = (Rd ∋ x 7→ µ ∈

Rd), A = 2A in the notation of Lemma 4.1) demonstrates that there exist

W1, . . . ,Wn ∈ Rd×d, B1, . . . , Bn ∈ R

d which satisfy that

supx∈[a,b]d

u(T, x)−[

1

n

n∑

k=1

φ(Wkx+Bk)

]∣

≤ εdv(

1 +[

Tcdα +√

2max1,v− 1Tcdβ

+ d1/2 max|a|, |b|

]v)

+32

√ed√n

(

cdz(

1 +[

Tcdα +√

2max1, zmax2, d2 − 1Tcdβ

+ d1/2 max|a|, |b|

]z)

+ (b− a)cdw+1/2(

1 +[

Tcdα

+√

2max1,wmax2, d2+w − 1Tcdβ + d1/2 max|a|, |b|

]w)

)

.

(222)

47

Page 48: and assume for all ε∈ (0,1], d∈ N, x∈ Rd that (Rφε,d) ∈ C(Rd,R), |(Rφε,d)(x)|+k(∇(Rφε,d))(x)k ≤ cdc(1+kxkc), (1) and P(φε,d) ≤ cdcε−c , |ϕd

Next note that

max1, zmax2, d2 − 1 ≤ max1, zmax2, d2 ≤ max1, z(d2 + 1)

≤ 2max1, zd2.(223)

Moreover, note that

max1,wmax2, d2+w − 1≤ max1,wmax2, d2+max1,w ≤ max1,w(d2 + 1) + max1,w≤ 2max1,wd2 +max1,w ≤ 3max1,wd2.

(224)

Combining this, the fact that max1,v − 1 ≤ max1,v, (222), and (223)demonstrates that

supx∈[a,b]d

u(T, x)−[

1

n

n∑

k=1

φ(Wkx+Bk)

]∣

≤ εdv+vmaxα,β,1/2(

1 +[

Tc+√

2max1,vTc+max|a|, |b|]v)

+ d1+maxz+zmaxα,β+1,w+1/2+wmaxα,β+1

· 32√ec√n

(

(

1 +[

Tc+√

4max1, zTc+max|a|, |b|]z)

+ (b− a)(

1 +[

Tc+√

6max1,wTc+max|a|, |b|]w)

)

.

(225)

Hence, we obtain that

supx∈[a,b]d

u(T, x)−[

1

n

n∑

k=1

φ(Wkx+Bk)

]∣

≤ εdv+vmaxα,β,1/2

·(

1 +[√2max1, T max1,

√v(2c+max|a|, |b|)

]v)

+ d1+maxz+zmaxα,β+1,w+1/2+wmaxα,β+1

· 32√ec√n

[

(1 + (b − a))

·(

1 +[√6max1, T max1,√z,

√w(2c+max|a|, |b|)

]maxz,w)].

(226)

The proof of Corollary 4.2 is thus completed.

Corollary 4.3. Let d, n ∈ N, ϕ ∈ C(Rd,R), α, a ∈ R, β ∈ [0,∞), b ∈ (a,∞),ε, T ∈ (0,∞), c ∈ [1/2,∞), v,v, w,w, z, z ∈ [0,∞), µ ∈ R

d, let ‖·‖ : Rd → [0,∞)

48

Page 49: and assume for all ε∈ (0,1], d∈ N, x∈ Rd that (Rφε,d) ∈ C(Rd,R), |(Rφε,d)(x)|+k(∇(Rφε,d))(x)k ≤ cdc(1+kxkc), (1) and P(φε,d) ≤ cdcε−c , |ϕd

be the standard norm on Rd, let A = (Ai,j)(i,j)∈1,...,d2 ∈ R

d×d be a symmetric

and positive semi-definite matrix, let φ ∈ C1(Rd,R), assume for all x ∈ Rd that

|φ(x)| ≤ cdz(1 + ‖x‖z), ‖(∇φ)(x)‖ ≤ cdw(1 + ‖x‖w), (227)

|ϕ(x) − φ(x)| ≤ εdv(1 + ‖x‖v),√

Trace(A) ≤ cdβ, (228)

and ‖µ‖ ≤ cdα, let u ∈ C([0, T ]× Rd,R), assume for all x ∈ R

d that u(0, x) =

ϕ(x), assume that infγ∈(0,∞) sup(t,x)∈[0,T ]×Rd

( |u(t,x)|1+‖x‖γ

)

< ∞, and assume that

u|(0,T )×Rd is a viscosity solution of

( ∂∂tu)(t, x) =

∑di,j=1 Ai,j (

∂2

∂xi∂xju)(t, x) +

∑di=1 µi (

∂∂xi

u)(t, x) (229)

for (t, x) ∈ (0, T )× Rd. Then there exist W1, . . . ,Wn ∈ R

d×d, B1, . . . , Bn ∈ Rd

such that

supx∈[a,b]d

u(T, x)−[

1

n

n∑

k=1

φ(Wkx+Bk)

]∣

≤ εdv+vmaxα,β,1/2 [5c+ T +√v + |a|+ |b|

]4v+1

+ d1+maxz+zmaxα,β+1,w+1/2+wmaxα,β+1

· 1√n

[

7c+ T +√z+

√w+ |a|+ |b|

]5+4(z+w).

(230)

Proof of Corollary 4.3. Observe that Corollary 4.2 ensures that there existW1, . . . ,Wn ∈R

d×d, B1, . . . , Bn ∈ Rd which satisfy that

supx∈[a,b]d

u(T, x)−[

1

n

n∑

k=1

φ(Wkx+Bk)

]∣

≤ εdv+vmaxα,β,1/2

·(

1 +[√2max1, T max1,√v(2c+max|a|, |b|)

]v)

+ d1+maxz+zmaxα,β+1,w+1/2+wmaxα,β+1 32√ec√n

[

(1 + (b− a))

·(

1 +[√6max1, T max1,

√z,√w(2c+max|a|, |b|)

]maxz,w)].

(231)

Next note that the assumption that c ∈ [1/2,∞) implies that

1 +[√2max1, T max1,

√v(2c+max|a|, |b|)

]v

≤ 1 + max√2, T,

√v, 2c+max|a|, |b|4v

≤ 1 +[

2√2c+ 2c+ T +

√v + |a|+ |b|

]4v

≤ 2[

5c+ T +√v + |a|+ |b|

]4v

≤[

5c+ T +√v + |a|+ |b|

]4v+1.

(232)

49

Page 50: and assume for all ε∈ (0,1], d∈ N, x∈ Rd that (Rφε,d) ∈ C(Rd,R), |(Rφε,d)(x)|+k(∇(Rφε,d))(x)k ≤ cdc(1+kxkc), (1) and P(φε,d) ≤ cdcε−c , |ϕd

Moreover, note that the assumption that c ∈ [1/2,∞) shows that

32√ec[

(1 + (b− a))

·(

1 +[√6max1, T max1,√z,

√w(2c+max|a|, |b|)

]maxz,w)]

≤ 53c(1 + |a|+ |b|)[

1 +(

2√6c+ 2c+ T +

√z+

√w + |a|+ |b|

)4maxz,w]

≤ 53c(1 + |a|+ |b|)[

1 +(

7c+ T +√z+

√w + |a|+ |b|

)4maxz,w].

(233)

Hence, we obtain that

32√ec[

(1 + (b− a))

·(

1 +[√6max1, T max1,

√z,√w(2c+max|a|, |b|)

]maxz,w)]

≤[

7c+ T +√z+

√w+ |a|+ |b|

]5+4(z+w).

(234)

Combining this with (231) and (232) establishes (230). The proof of Corol-lary 4.3 is thus completed.

4.2 Qualitative error estimates

In this subsection we provide in Proposition 4.4 below a qualitative approx-imation result for viscosity solutions (cf., for example, Hairer et al. [31]) ofKolomogorov PDEs with constant coefficient functions. Informally speaking,we can think of the approximations in Proposition 4.4 as linear combinationsof realizations of ANNs with a suitable continuously differentiable activationfunction. Proposition 4.4 will be employed in our proof of Proposition 4.6 inSubsection 4.3 below.

Proposition 4.4. Let d ∈ N, ϕ ∈ C(Rd,R), α, a ∈ R, β ∈ [0,∞), b ∈ (a,∞),r, T ∈ (0,∞), c ∈

[

1/2,∞)

, v,v, w,w, z, z ∈ [0,∞), C = 12 [5c+ T +

√v + |a|+

|b|]−4v−1, C = 4[7c+T+√z+

√w+ |a|+ |b|]10+8(z+w), p = v+vmaxα, β, 1/2,

p = 2 + max2z + 2zmaxα, β + 1, 2w + 1 + 2wmaxα, β + 1, µ ∈ Rd, let

‖·‖ : Rd → [0,∞) be the standard norm on Rd, let φε ∈ C1(Rd,R), ε ∈ (0, r],

let A = (Ai,j)(i,j)∈1,...,d2 ∈ Rd×d be a symmetric and positive semi-definite

matrix, and assume for all ε ∈ (0, r], x ∈ Rd that

|φε(x)| ≤ cdz(1 + ‖x‖z), ‖(∇φε)(x)‖ ≤ cdw(1 + ‖x‖w), (235)

|ϕ(x) − φε(x)| ≤ εcdv(1 + ‖x‖v),√

Trace(A) ≤ cdβ , (236)

and ‖µ‖ ≤ cdα. Then

(i) there exists a unique u ∈ C([0, T ]×Rd,R) which satisfies for all x ∈ R

d that

u(0, x) = ϕ(x), which satisfies that infγ∈(0,∞) sup(t,x)∈[0,T ]×Rd

( |u(t,x)|1+‖x‖γ

)

<

50

Page 51: and assume for all ε∈ (0,1], d∈ N, x∈ Rd that (Rφε,d) ∈ C(Rd,R), |(Rφε,d)(x)|+k(∇(Rφε,d))(x)k ≤ cdc(1+kxkc), (1) and P(φε,d) ≤ cdcε−c , |ϕd

∞, and which satisfies that u|(0,T )×Rd is a viscosity solution of

( ∂∂tu)(t, x) =

∑di,j=1 Ai,j (

∂2

∂xi∂xju)(t, x) +

∑di=1 µi (

∂∂xi

u)(t, x) (237)

for (t, x) ∈ (0, T )× Rd and

(ii) it holds for all ε ∈ (0, r], n ∈ N ∩ [ε−2dpC,∞) that there exist W1, . . . ,Wn ∈ R

d×d, B1, . . . , Bn ∈ Rd such that

supx∈[a,b]d

u(T, x)−[

1

n

n∑

k=1

φεc−1d−pC(Wkx+Bk)

]∣

≤ ε. (238)

Proof of Proposition 4.4. First, note that (235) and (236) ensure that ϕ : Rd

→ R is an at most polynomially growing function. Grohs et al. [28, Corollary2.23] (see also Hairer et al. [31, Corollary 4.17]) hence implies that there exists aunique continuous function u : [0, T ]×R

d → R which satisfies for all x ∈ Rd that

u(0, x) = ϕ(x), which satisfies that infγ∈(0,∞) sup(t,x)∈[0,T ]×Rd

( |u(t,x)|1+‖x‖γ

)

< ∞,

and which satisfies that u|(0,T )×Rd is a viscosity solution of

( ∂∂tu)(t, x) =

∑di,j=1 Ai,j (

∂2

∂xi∂xju)(t, x) +

∑di=1 µi (

∂∂xi

u)(t, x) (239)

for (t, x) ∈ (0, T )×Rd. This establishes item (i). Next note that the hypothesis

that c ∈ [1/2,∞) shows that

C

c=

1

2c

1

[5c+ T +√v + |a|+ |b|]4v+1

≤ 1. (240)

Hence, we obtain that for all ε ∈ (0, r] it holds that εc−1d−pC ≤ ε. This and(236) prove that for all ε ∈ (0, r], x ∈ R

d it holds that

|ϕ(x) − φεc−1d−pC(x)| ≤ εc−1d−pCcdv(1+ ‖x‖v) = εd−pCdv(1+ ‖x‖v). (241)

Corollary 4.3 (with ε = εd−pC, φ = φεc−1d−pC for ε ∈ (0, r] in the notation ofCorollary 4.3) hence demonstrates that for all ε ∈ (0, r], n ∈ N ∩ [ε−2dpC,∞)there exist W1, . . . ,Wn ∈ R

d×d, B1, . . . , Bn ∈ Rd such that

supx∈[a,b]d

u(T, x)−[

1

n

n∑

k=1

φεc−1d−pC(Wkx+ Bk)

]∣

≤ εd−pCdp

2C+

√Cdp

2√n

≤ ε

2+ε

2= ε.

(242)

This establishes item (ii). The proof of Proposition 4.4 is thus completed.

51

Page 52: and assume for all ε∈ (0,1], d∈ N, x∈ Rd that (Rφε,d) ∈ C(Rd,R), |(Rφε,d)(x)|+k(∇(Rφε,d))(x)k ≤ cdc(1+kxkc), (1) and P(φε,d) ≤ cdcε−c , |ϕd

4.3 Qualitative error estimates for artificial neural net-works (ANNs)

In this subsection we prove in Corollary 4.7 below that ANNs with continuouslydifferentiable activation functions can overcome the curse of dimensionality inthe uniform approximation of viscosity solutions of Kolmogorov PDEs withconstant coefficient functions. Corollary 4.7 is an immediate consequence ofProposition 4.6 below. Proposition 4.6, in turn, follows from Proposition 4.4above and the well-known fact that linear combinations of realizations of ANNsare again realizations of ANNs. To formulate Corollary 4.7 we introduce inSetting 4.5 below a common framework from the scientific literature (cf., e.g.,Grohs et al. [29, Section 2.1] and Petersen & Voigtlaender [52, Section 2]) tomathematically describe ANNs.

Setting 4.5. Let N be the set given by

N = ∪L∈N∩[2,∞) ∪(l0,...,lL)∈((NL)×1)(

×Lk=1 (R

lk×lk−1 × Rlk))

, (243)

let a ∈ C1(R,R), let An : Rn → R

n, n ∈ N, satisfy for all n ∈ N, x =(x1, . . . , xn) ∈ R

n that An(x) = (a(x1), . . . , a(xn)), and let N ,L,P ,P : N →N and R : N → ∪d∈NC(R

d,R) satisfy for all L ∈ N ∩ [2,∞), (l0, . . . , lL) ∈((NL) × 1), Φ = ((W1, B1), . . . , (WL, BL)) = (((W

(i,j)1 )i∈1,...,l1,j∈1,...,l0,

(Bi1)i∈1,...,l1), . . . , ((W

(i,j)L )i∈1,...,lL,j∈1,...,lL−1, (B

ik)i∈1,...,lL)) ∈ (×L

k=1

(Rlk×lk−1 × Rlk)), x0 ∈ R

l0 , . . ., xL−1 ∈ RlL−1 with ∀ k ∈ N ∩ (0, L) : xk =

Alk(Wkxk−1+Bk) that N (Φ) =∑L

k=0 lk, L(Φ) = L+1, P(Φ) =∑L

k=1 lk(lk−1+1), (RΦ) ∈ C(Rl0 ,R), (RΦ)(x0) =WLxL−1 +BL, and

P(Φ) =

L∑

k=1

lk∑

i=1

1R\0(Bik) +

lk−1∑

j=1

1R\0(W(i,j)k )

. (244)

Proposition 4.6. Assume Setting 4.5, let d ∈ N, µ ∈ Rd, ϕ ∈ C(Rd,R), α, a ∈

R, β ∈ [0,∞), b ∈ (a,∞), r, T ∈ (0,∞), c ∈[

1/2,∞)

, v,v, w,w, z, z ∈ [0,∞),let

C = (4[7c+ T +√z+

√w + |a|+ |b|]10+8(z+w) + 1)(1 + r2) ,

p = 2 +max2z + 2zmaxα, β + 1, 2w + 1 + 2wmaxα, β + 1 ,C = 1

2 [5c+ T +√v + |a|+ |b|]−4v−1 ,

p = v + vmaxα, β, 1/2 ,

(245)

let ‖·‖ : Rd → [0,∞) be the standard norm on Rd, let (φε)ε∈(0,r] ⊆ N, let A =

(Ai,j)(i,j)∈1,...,d2 ∈ Rd×d be a symmetric and positive semi-definite matrix, and

assume for all ε ∈ (0, r], x ∈ Rd that (Rφε) ∈ C(Rd,R),

|(Rφε)(x)| ≤ cdz(1 + ‖x‖z), ‖(∇(Rφε))(x)‖ ≤ cdw(1 + ‖x‖w), (246)

|ϕ(x) − (Rφε)(x)| ≤ εcdv(1 + ‖x‖v),√

Trace(A) ≤ cdβ , (247)

and ‖µ‖ ≤ cdα. Then

52

Page 53: and assume for all ε∈ (0,1], d∈ N, x∈ Rd that (Rφε,d) ∈ C(Rd,R), |(Rφε,d)(x)|+k(∇(Rφε,d))(x)k ≤ cdc(1+kxkc), (1) and P(φε,d) ≤ cdcε−c , |ϕd

(i) there exists a unique u ∈ C([0, T ]×Rd,R) which satisfies for all x ∈ R

d that

u(0, x) = ϕ(x), which satisfies that infγ∈(0,∞) sup(t,x)∈[0,T ]×Rd

( |u(t,x)|1+‖x‖γ

)

<

∞, and which satisfies that u|(0,T )×Rd is a viscosity solution of

( ∂∂tu)(t, x) =

∑di,j=1 Ai,j (

∂2

∂xi∂xju)(t, x) +

∑di=1 µi (

∂∂xi

u)(t, x) (248)

for (t, x) ∈ (0, T )× Rd and

(ii) there exists (ψε)ε∈(0,r] ⊆ N such that for all ε ∈ (0, r] it holds that N (ψε) ≤Cdpε−2N (φεc−1d−pC), L(ψε) = L(φεc−1d−pC), P(ψε) ≤ C2d2pε−4P(φεc−1d−pC),P(ψε) ≤ Cdpε−2P(φεc−1d−pC), (Rψε) ∈ C(Rd,R), and

supx∈[a,b]d

|u(T, x)− (Rψε)(x)| ≤ ε. (249)

Proof of Proposition 4.6. Throughout this proof let ε ∈ (0, r], let

C = 4[7c+ T +√z+

√w + |a|+ |b|]10+8(z+w) , (250)

let n = min(N ∩ [Cdpε−2,∞)), let γ1, . . . , γn ∈ Rd×d, δ1, . . . , δn ∈ R

d satisfy

supx∈[a,b]d

u(T, x)−[

1

n

n∑

k=1

(Rφεc−1d−pC)(γkx+ δk)

]∣

≤ ε (251)

(cf. item (ii) in Proposition 4.4), let L ∈ N∩[2,∞), (l0, . . . , lL) ∈ (d×(NL−1)×1), ((W1, B1), . . . , (WL, BL)) ∈ (×L

k=1(Rlk×lk−1 × R

lk)) satisfy

φεc−1d−pC = ((W1, B1), . . . , (WL, BL)), (252)

and let

ψ = ((W1W0,W1B0 + B1), (W2,B2), . . . , (WL−1,BL−1), (WL, BL))

∈ (Rnl1×l0 × Rnl1)× (×L−1

k=2 (Rnlk×nlk−1 × R

nlk))× (RlL×nlL−1 × RlL)

(253)

satisfy for all k ∈ 1, . . . , L− 1 that

W0 =

γ1γ2...γn

, Wk = diag(Wk, . . . ,Wk), WL =1

n

(

WL . . . WL

)

, (254)

B0 =

δ1...δn

, and Bk =

Bk

...Bk

. (255)

53

Page 54: and assume for all ε∈ (0,1], d∈ N, x∈ Rd that (Rφε,d) ∈ C(Rd,R), |(Rφε,d)(x)|+k(∇(Rφε,d))(x)k ≤ cdc(1+kxkc), (1) and P(φε,d) ≤ cdcε−c , |ϕd

Observe that item (i) in Proposition 4.4 (with r = r, φε = (Rφε) for ε ∈ (0, r]in the notation of Proposition 4.4) establishes item (i). Next note that the factthat n ∈ [Cdpε−2,Cdpε−2 + 1] and the fact that r2ε−2 ∈ [1,∞) prove that

n ≤ Cdpε−2 + 1 ≤ (C+ 1)dp max1, ε−2≤ (C+ 1)max1, r−2r2dpε−2

≤ Cdpε−2.

(256)

This and (253) show that

N (ψ) = l0 +

L−1∑

k=1

nlk + lL ≤ n

L∑

k=0

lk

= nN (φεc−1d−pC) ≤ Cdpε−2N (φεc−1d−pC).

(257)

Next note that (253) and (256) demonstrate that

P(ψ) = nl1l0 + nl1 +

L−1∑

k=2

nlk(nlk−1 + 1) + nlLlL−1 + lL

≤ n2

[

l1(l0 + 1) +

L−1∑

k=2

lk(lk−1 + 1) + lL(lL−1 + 1)

]

= n2P(φεc−1d−pC) ≤ C2d2pε−4P(φεc−1d−pC).

(258)

Moreover, note that (253) and (256) ensure that

P(ψ) ≤ nl1(l0 + 1) + n

L∑

k=2

lk(lk−1 + 1)

= nP(φεc−1d−pC) ≤ Cdpε−2P(φεc−1d−pC).

(259)

Furthermore, (253), (254), and (255) imply that for all x ∈ Rd it holds that

(Rψ)(x) = 1

n

n∑

k=1

(Rφεc−1d−pC)(γkx+ δk). (260)

Combining this and (251) demonstrates that

supx∈[a,b]d

|u(T, x)− (Rψ)(x)| ≤ ε. (261)

Next observe that (252) and (253) show that L(ψ) = L + 1 = L(φεc−1d−pC).Combining this with (257), (258), (259), and (261) establishes item (ii). Theproof of Proposition 4.6 is thus completed.

Corollary 4.7. Assume Setting 4.5, let α, c, a ∈ R, β ∈ [0,∞), b ∈ (a,∞),r, T ∈ (0,∞), p, q, v, v, w, w, z, z ∈ [0,∞), p = 2+max2z + 2zmaxα, β +

54

Page 55: and assume for all ε∈ (0,1], d∈ N, x∈ Rd that (Rφε,d) ∈ C(Rd,R), |(Rφε,d)(x)|+k(∇(Rφε,d))(x)k ≤ cdc(1+kxkc), (1) and P(φε,d) ≤ cdcε−c , |ϕd

1, 2w + 1 + 2wmaxα, β + 1, p = v + vmaxα, β, 1/2, for every d ∈ N let‖·‖Rd : Rd → [0,∞) be the standard norm on R

d, let (φε,d)(ε,d)∈(0,r]×N ⊆ N, let

µd ∈ Rd, d ∈ N, let Ad = (A

(i,j)d )(i,j)∈1,...,d2 ∈ R

d×d, d ∈ N, be symmetric and

positive semi-definite matrices, let ϕd ∈ C(Rd,R), d ∈ N, and assume for allε ∈ (0, r], d ∈ N, x ∈ R

d that (Rφε,d) ∈ C(Rd,R),

|(Rφε,d)(x)| ≤ cdz(1 + ‖x‖zRd), ‖(∇(Rφε,d))(x)‖Rd ≤ cdw(1 + ‖x‖wRd),(262)

|ϕd(x)− (Rφε,d)(x)| ≤ εcdv(1 + ‖x‖vRd),√

Trace(Ad) ≤ cdβ , (263)

‖µd‖Rd ≤ cdα, and P(φε,d) ≤ cdpε−q. (264)

Then

(i) there exist unique ud ∈ C([0, T ] × Rd,R), d ∈ N, which satisfy for all

d ∈ N, x ∈ Rd that ud(0, x) = ϕd(x), which satisfy for all d ∈ N that

infγ∈(0,∞) sup(t,x)∈[0,T ]×Rd

( |ud(t,x)|1+‖x‖γ

Rd

)

< ∞, and which satisfy that for all

d ∈ N it holds that ud|(0,T )×Rd is a viscosity solution of

( ∂∂tud)(t, x) = ( ∂

∂xud)(t, x)µd +∑d

i,j=1 A(i,j)d ( ∂2

∂xi∂xjud)(t, x) (265)

for (t, x) ∈ (0, T )× Rd and

(ii) there exist (ψε,d)(ε,d)∈(0,r]×N ⊆ N, C ∈ R such that for all ε ∈ (0, r], d ∈ N

it holds that N (ψε,d) ≤ Cdp+p+pqε−(q+2), P(ψε,d) ≤ Cdp+2p+pqε−(q+4),P(ψε,d) ≤ Cdp+p+pqε−(q+2), (Rψε,d) ∈ C(Rd,R), and

supx∈[a,b]d

|ud(T, x)− (Rψε,d)(x)| ≤ ε. (266)

Proof of Corollary 4.7. Throughout this proof let C = (4[7max1/2, c + T +√z +

√w + |a| + |b|]10+8(z+w) + 1)(1 + r2), c = max1/2, c, C = 1

2 [5c + T +√v+ |a|+ |b|]−4v−1. Note that item (i) in Proposition 4.6 (with c = max1/2, c,

µ = µd, ϕ = ϕd, (φε)ε∈(0,r] = (φε,d)ε∈(0,r], A = Ad, for d ∈ N in the notation ofProposition 4.6) implies that for every d ∈ N there exists a unique continuousfunction ud : [0, T ] × R

d → R which satisfies for all x ∈ Rd that ud(0, x) =

ϕd(x), which satisfies that infγ∈(0,∞) sup(t,x)∈[0,T ]×Rd

( |ud(t,x)|1+‖x‖γ

Rd

)

<∞, and which

satisfies that ud|(0,T )×Rd is a viscosity solution of

( ∂∂tud)(t, x) = ( ∂

∂xud)(t, x)µd +∑d

i,j=1 A(i,j)d ( ∂2

∂xi∂xjud)(t, x) (267)

for (t, x) ∈ (0, T ) × Rd. This establishes item (i). Next note that for all L ∈

N ∩ [2,∞), (l0, . . . , lL) ∈ NL × 1, Φ ∈ (×L

k=1(Rlk×lk−1 × R

lk)) it holds that

N (Φ) =

L∑

k=0

lk ≤ l1l0 +

L∑

k=1

lk +

L∑

k=2

lklk−1 = P(Φ). (268)

55

Page 56: and assume for all ε∈ (0,1], d∈ N, x∈ Rd that (Rφε,d) ∈ C(Rd,R), |(Rφε,d)(x)|+k(∇(Rφε,d))(x)k ≤ cdc(1+kxkc), (1) and P(φε,d) ≤ cdcε−c , |ϕd

Moreover, note that Proposition 4.6 (with c = max1/2, c, µ = µd, ϕ = ϕd,(φε)ε∈(0,r] = (φε,d)ε∈(0,r], A = Ad, for d ∈ N in the notation of Proposition 4.6)assures that there exist ψε,d ∈ N, d ∈ N, ε ∈ (0, r], which satisfy that for ev-ery d ∈ N, ε ∈ (0, r] it holds that N (ψε,d) ≤ Cdpε−2N (φεc−1d−pC,d), P(ψε,d) ≤C2d2pε−4P(φεc−1d−pC,d),P(ψε,d) ≤ Cdpε−2P(φεc−1d−pC,d), (Rψε,d) ∈ C(Rd,R),and

supx∈[a,b]d

|ud(T, x)− (Rψε,d)(x)| ≤ ε. (269)

Next note that (264) implies that for all ε ∈ (0, r], d ∈ N it holds that

P(φεc−1d−pC,d) ≤ cdp(εc−1d−pC)−q = ccqdp+pqC−qε−q . (270)

Combining this, (268), and (269) hence demonstrates that for all d ∈ N, ε ∈ (0, r]it holds thatN (ψε,d) ≤ Cccqdp+p+pqC−qε−(q+2), P(ψε,d) ≤ C2ccqd2p+p+pqC−qε−(q+4),and P(ψε,d) ≤ Cccqdp+p+pqC−qε−(q+2). This and (269) establish item (ii). Theproof of Corollary 4.7 is thus completed.

5 Artifical neural network approximations for

heat equations

5.1 Viscosity solutions for heat equations

In this subsection we establish in Lemma 5.3 below a well-known connectionbetween viscosity solutions and classical solutions of heat equations with at mostpolynomially growing initial conditions. Lemma 5.3 will be employed in ourproof of Theorem 5.4 below, the main result of this article. Lemma 5.3 is a simpleconsequence of Lemma 5.2 and the Feynman-Kac formula for viscosity solutionsof Kolmogorov PDEs (cf., for example, Hairer et al. [31]). Lemma 5.2, in turn,is an elementary and well-known existence result for solutions of heat equations(cf., for example, Evans [21, Theorem 1 in Subsection 2.3.1]). For completenesswe also provide in this subsection a detailed proof for Lemma 5.2. Our proof ofLemma 5.2 employs the elementary and well-known result in Lemma 5.1 below.

Lemma 5.1. Let p ∈ [0,∞), d ∈ N, and let ‖·‖ : Rd → [0,∞) be the standardnorm on R

d. Then it holds for all t ∈ (0,∞), x ∈ Rd that

Rd

‖y‖pe− ‖x−y‖2

4t dy <∞. (271)

Proof of Lemma 5.1. Throughout this proof let S be the set given by

S =

(−1, 1) : d = 1

(0, 2π) : d = 2

(0, 2π)× (0, π)d−2 : d ∈ 3, 4, . . .(272)

56

Page 57: and assume for all ε∈ (0,1], d∈ N, x∈ Rd that (Rφε,d) ∈ C(Rd,R), |(Rφε,d)(x)|+k(∇(Rφε,d))(x)k ≤ cdc(1+kxkc), (1) and P(φε,d) ≤ cdcε−c , |ϕd

and for every n ∈ 2, 3, . . . let Tn : (0,∞)×(0, 2π)×(0, π)n−2 → R be the func-tion which satisfies for all n ∈ 2, 3, . . ., r ∈ (0,∞), ϕ ∈ (0, 2π), ϑ1, . . . , ϑn−2 ∈(0, π) that if n = 2 then T2(r, ϕ) = r and if n ≥ 3 then

Tn(r, ϕ, ϑ1, . . . , ϑn−2) = rn−1[

∏n−2i=1 [sin(ϑi)]

i]

. (273)

Observe that the integral transformation theorem with the diffeomorphism (0,∞) ∋r 7→ √

r ∈ (0,∞) implies that∫ ∞

0

rp+d−1e−r2 dr =

∫ ∞

0

r(p+d−1)/2e−r 1

2r1/2dr =

1

2

∫ ∞

0

r(p+d)/2−1e−r dr.

(274)

Item (iv) in Lemma 2.4 (with x = p+d2 in the notation of Lemma 2.4) hence

shows that

∫ ∞

0

rp+d−1e−r2 dr ≤ 1

2

p+ d

[

p+ d

2e

]p+d2

e1

6(p+d) <∞. (275)

Next note that the integral transformation theorem with the diffeomorphismR

d ∋ y 7→ 2√ty ∈ R

d for t ∈ (0,∞), the triangle inequality, and the fact thatfor all a, b ∈ [0,∞) it holds that (a + b)p ≤ max1, 2p−1(ap + bp) ensure thatfor all t ∈ (0,∞), x ∈ R

d it holds that∫

Rd

‖y‖pe− ‖x−y‖2

4t dy =

Rd

‖x− y‖pe− ‖y‖2

4t dy

=

Rd

‖x− 2√ty‖pe−‖y‖2

(2√t)d dy

≤ max1, 2p−1(2√t)d‖x‖p

Rd

e−‖y‖2

dy

+max1, 2p−1(2√t)p+d

Rd

‖y‖pe−‖y‖2

dy.

(276)

To establish (271) we distinguish between the case d = 1 and the case d ∈N ∩ [2,∞). First, we consider the case d = 1. Note that

Rd

‖y‖pe−‖y‖2

dy = 2

∫ ∞

0

ype−y2

dy. (277)

Combining this with (275) and (276) establishes (271) in the case d = 1. Next weconsider the case d ∈ 2, 3, . . .. Note that (272), (273), item (iii) in Lemma 2.6,and Fubini’s theorem ensure that

Rd

‖y‖pe−‖y‖2

dy =

∫ ∞

0

S

rp+d−1e−r2Td(1, φ) dφ dr

=

S

∫ ∞

0

rp+d−1e−r2 dr Td(1, φ) dφ

≤ 2πd−1

∫ ∞

0

rp+d−1e−r2 dr.

(278)

57

Page 58: and assume for all ε∈ (0,1], d∈ N, x∈ Rd that (Rφε,d) ∈ C(Rd,R), |(Rφε,d)(x)|+k(∇(Rφε,d))(x)k ≤ cdc(1+kxkc), (1) and P(φε,d) ≤ cdcε−c , |ϕd

Combining this with (275) and (276) establishes (271) in the case d ∈ 2, 3, . . ..The proof of Lemma 5.1 is thus completed.

Lemma 5.2. Let d ∈ N, ϕ ∈ C(Rd,R), let ‖·‖ : Rd → [0,∞) be the standard

norm on Rd, assume that infγ∈(0,∞) supx∈Rd

( |ϕ(x)|1+‖x‖γ

)

<∞, and let Φ: (0,∞)×R

d → R be the function which satisfies for all t ∈ (0,∞), x = (x1, . . . , xd) ∈ Rd

that

Φ(t, x) =

Rd

1

(4πt)d2

e−(x1−y1)2+...+(xd−yd)2

4t ϕ(y) dy. (279)

Then it holds for all t ∈ (0,∞), x ∈ Rd that Φ ∈ C1,2((0,∞)× R

d,R) and

( ∂∂tΦ)(t, x) = (∆xΦ)(t, x). (280)

Proof of Lemma 5.2. Throughout this proof let ρ : (0,∞) × Rd → R be the

function which satisfies for all t ∈ (0,∞), x = (x1, . . . , xd) ∈ Rd that

ρ(t, x) =1

(4πt)d2

e−x21+...+x2

d4t . (281)

Observe that for all t ∈ (0,∞), x = (x1, . . . , xd) ∈ Rd it holds that

( ∂∂tρ)(t, x) =

[

x21 + . . .+ x2d4t2

− d

2t

]

ρ(t, x). (282)

Next note that for all i ∈ 1, . . . , d, t ∈ (0,∞), x = (x1, . . . , xd) ∈ Rd it holds

that( ∂∂xi

ρ)(t, x) = −xi2tρ(t, x). (283)

This implies that for all i, j ∈ 1, . . . , d, t ∈ (0,∞), x = (x1, . . . , xd) ∈ Rd it

holds that

( ∂2

∂xi∂xjρ)(t, x) =

[

x2i

4t2 − 12t

]

ρ(t, x) : i = jxixj

4t2 ρ(t, x) : i 6= j.(284)

Hence, we obtain that for all t ∈ (0,∞), x = (x1, . . . , xd) ∈ Rd it holds that

(∆xρ)(t, x) =d∑

i=1

( ∂2

∂x2i

ρ)(t, x) =

[

x21 + . . .+ x2d4t2

− d

2t

]

ρ(t, x). (285)

Combining this with (282) demonstrates that for all t ∈ (0,∞), x ∈ Rd it holds

that( ∂∂tρ)(t, x)− (∆xρ)(t, x) = 0. (286)

Next note that the hypothesis that infγ∈(0,∞) supx∈Rd

( |ϕ(x)|1+‖x‖γ

)

< ∞ ensures

that there exist γ ∈ (0,∞), C ∈ R which satisfy that for all x ∈ Rd it holds that

|ϕ(x)| ≤ C(1 + ‖x‖γ). (287)

58

Page 59: and assume for all ε∈ (0,1], d∈ N, x∈ Rd that (Rφε,d) ∈ C(Rd,R), |(Rφε,d)(x)|+k(∇(Rφε,d))(x)k ≤ cdc(1+kxkc), (1) and P(φε,d) ≤ cdcε−c , |ϕd

This and Lemma 5.1 show that for all t ∈ (0,∞), x ∈ Rd it holds that

|Φ(t, x)| ≤∫

Rd

|ρ(t, x− y)ϕ(y)| dy ≤ C

Rd

ρ(t, x− y)(1 + ‖y‖γ) dy <∞. (288)

Next note that (282), (287), the triangle inequality, and Lemma 5.1 demonstratethat for all t ∈ (0,∞), x ∈ R

d it holds that

Rd

∣( ∂∂tρ)(t, x− y)ϕ(y)

∣ dy

≤ C

Rd

[‖x− y‖24t2

+d

2t

]

ρ(t, x− y)(1 + ‖y‖γ) dy <∞.

(289)

Combining this, (288), and (282) with Amann & Escher [2, Ch. X, Theorem3.18] shows that for all t ∈ (0,∞), x ∈ R

d it holds that Φ ∈ C1,0((0,∞)×Rd,R)

and

( ∂∂tΦ)(t, x) =

Rd

( ∂∂tρ)(t, x− y)ϕ(y) dy. (290)

Next observe that (283), (287), and Lemma 5.1 ensure that for all i ∈ 1, . . . , d,t ∈ (0,∞), x ∈ R

d it holds that

Rd

∣( ∂∂xi

ρ)(t, x− y)ϕ(y)∣

∣ dy ≤ C

2t

Rd

‖x− y‖ρ(t, x− y)(1 + ‖y‖γ) dy <∞.

(291)

Combining this, (288), and (283) with (290) and Amann & Escher [2, Ch. X,Theorem 3.18] shows that for all i ∈ 1, . . . , d, t ∈ (0,∞), x = (x1, . . . , xd) ∈ R

d

it holds that Φ ∈ C1,1((0,∞)× Rd,R) and

( ∂∂xi

Φ)(t, x) =

Rd

( ∂∂xi

ρ)(t, x− y)ϕ(y) dy. (292)

Next note that (284), (287), the fact that for all a, b ∈ R it holds that ab ≤a2 + b2, and Lemma 5.1 ensure that for all i, j ∈ 1, . . . , d, t ∈ (0,∞), x =(x1, . . . , xd) ∈ R

d it holds that

Rd

∣( ∂2

∂xi∂xjρ)(t, x− y)ϕ(y)

∣ dy

≤ C

Rd

[‖x− y‖24t2

+1

2t

]

ρ(t, x− y)(1 + ‖y‖γ) dy <∞.

(293)

Combining this, (291), and (284) with (292) and Amann & Escher [2, Ch. X,Theorem 3.18] shows that for all i, j ∈ 1, . . . , d, t ∈ (0,∞), x = (x1, . . . , xd) ∈R

d it holds that Φ ∈ C1,2((0,∞)× Rd,R) and

( ∂2

∂xi∂xjΦ)(t, x) =

Rd

( ∂2

∂xi∂xjρ)(t, x − y)ϕ(y) dy. (294)

59

Page 60: and assume for all ε∈ (0,1], d∈ N, x∈ Rd that (Rφε,d) ∈ C(Rd,R), |(Rφε,d)(x)|+k(∇(Rφε,d))(x)k ≤ cdc(1+kxkc), (1) and P(φε,d) ≤ cdcε−c , |ϕd

Hence, we obtain that for all t ∈ (0,∞), x ∈ Rd it holds that

(∆xΦ)(t, x) =

d∑

i=1

( ∂2

∂x2i

Φ)(t, x)

=

Rd

[

d∑

i=1

( ∂2

∂x2i

ρ)(t, x− y)ϕ(y)

]

dy

=

Rd

(∆xρ)(t, x− y)ϕ(y) dy.

(295)

Combining this and (290) with (286) establishes (280). The proof of Lemma 5.2is thus completed.

Lemma 5.3. Let d ∈ N, T ∈ (0,∞), ϕ ∈ C(Rd,R), u ∈ C([0, T ]× Rd,R), let

‖·‖ : Rd → [0,∞) be a norm, assume for all x ∈ Rd that u(0, x) = ϕ(x), assume

that infγ∈(0,∞) sup(t,x)∈[0,T ]×Rd

( |u(t,x)|1+‖x‖γ

)

< ∞, and assume that u|(0,T )×Rd is a

viscosity solution of

( ∂∂tu)(t, x) = (∆xu)(t, x) (296)

for (t, x) ∈ (0, T )×Rd. Then it holds for all t ∈ (0, T ], x ∈ R

d that u|(0,T ]×Rd ∈C1,2((0, T ]× R

d,R) and

( ∂∂tu)(t, x) = (∆xu)(t, x). (297)

Proof of Lemma 5.3. Throughout this proof let ‖·‖2 : Rd → [0,∞) be the stan-dard norm on R

d, let (Ω,F ,P) be a probability space with a normal filtration(Ft)t∈[0,T ], and let W : [0, T ]×Ω → R

d be a standard (Ft)t∈[0,T ]-Brownian mo-

tion. Observe that there exist C ∈ R, c ∈ (0,∞) such that for all x ∈ Rd it

holds thatc‖x‖2 ≤ ‖x‖ ≤ C‖x‖2. (298)

This and the fact that infγ∈(0,∞) supx∈Rd

( |ϕ(x)|1+‖x‖γ

)

<∞ show that

infγ∈(0,∞)

supx∈Rd

( |ϕ(x)|1 + ‖x‖γ2

)

<∞. (299)

Hence, we obtain that ϕ : Rd → R is an at most polynomially growing func-tion. The Feynman-Kac formula (cf., for example, Grohs et al. [28, Proposi-tion 2.22(iii)] and Hairer et al. [31, Corollary 4.17]) hence ensures that for allt ∈ [0, T ], x ∈ R

d it holds that

u(t, x) = E[ϕ(x +√2Wt)]. (300)

Next note that the fact that for all t ∈ (0, T ] it holds that Wt is a N0,tIRd

distributed random variable implies that for all t ∈ (0, T ], x ∈ Rd it holds that

60

Page 61: and assume for all ε∈ (0,1], d∈ N, x∈ Rd that (Rφε,d) ∈ C(Rd,R), |(Rφε,d)(x)|+k(∇(Rφε,d))(x)k ≤ cdc(1+kxkc), (1) and P(φε,d) ≤ cdcε−c , |ϕd

x+√2Wt is a Nx,2tI

Rddistributed random variable. Combining this with (300)

demonstrates that for all t ∈ (0, T ], x ∈ Rd it holds that

u(t, x) =

Rd

1

(4πt)d2

e−(x1−y1)2+...+(xd−yd)2

4t ϕ(y) dy. (301)

Lemma 5.2 hence proves that for all t ∈ (0, T ], x ∈ Rd it holds that u|(0,T ]×Rd ∈

C1,2((0, T ]× Rd,R) and

( ∂∂tu)(t, x) = (∆xu)(t, x). (302)

The proof of Lemma 5.3 is thus completed.

5.2 Qualitative error estimates for heat equations

It is the subject of this subsection to state and prove Theorem 5.4 below, whichis the main result of this work. Theorem 5.4 establishes that ANNs do not sufferfrom the curse of dimensionality in the uniform numerical approximation of heatequations. Corollary 5.5 below specializes Theorem 5.4 to the case in which theconstants c ∈ R, p, q, v,v, w,w, z, z ∈ [0,∞), which are used to formulate thehypotheses in (303)–(304) below, all coincide.

Theorem 5.4. Assume Setting 4.5, let c, a ∈ R, b ∈ (a,∞), r, T ∈ (0,∞),p, q, v,v, w,w, z, z ∈ [0,∞), p = 2+max2z+3z, 2w+3w+1, for every d ∈ N

let ‖·‖Rd : Rd → [0,∞) be the standard norm on Rd, let ϕd ∈ C(Rd,R), d ∈ N,

let (φε,d)(ε,d)∈(0,r]×N ⊆ N, and assume for all ε ∈ (0, r], d ∈ N, x ∈ Rd that

(Rφε,d) ∈ C(Rd,R),

|(Rφε,d)(x)| ≤ cdz(1 + ‖x‖zRd), ‖(∇(Rφε,d))(x)‖Rd ≤ cdw(1 + ‖x‖wRd),(303)

|ϕd(x)− (Rφε,d)(x)| ≤ εcdv(1 + ‖x‖vRd), and P(φε,d) ≤ cdpε−q. (304)

Then

(i) there exist unique ud ∈ C([0, T ]×Rd,R), d ∈ N, which satisfy for all d ∈ N,

t ∈ (0, T ], x ∈ Rd that ud|(0,T ]×Rd ∈ C1,2((0, T ]×R

d,R), ud(0, x) = ϕd(x),

infγ∈(0,∞) sup(s,y)∈[0,T ]×Rd

( |ud(s,y)|1+‖y‖γ

Rd

)

<∞, and

( ∂∂tud)(t, x) = (∆xud)(t, x) (305)

and

(ii) there exist (ψε,d)(ε,d)∈(0,r]×N ⊆ N, C ∈ R such that for all ε ∈ (0, r],

d ∈ N it holds that (Rψε,d) ∈ C(Rd,R), N (ψε,d) ≤ Cdp+p+q(v+ 12v)ε−(q+2),

P(ψε,d) ≤ Cdp+2p+q(v+ 12v)ε−(q+4), P(ψε,d) ≤ Cdp+p+q(v+ 1

2v)ε−(q+2), and

supx∈[a,b]d

|ud(T, x)− (Rψε,d)(x)| ≤ ε. (306)

61

Page 62: and assume for all ε∈ (0,1], d∈ N, x∈ Rd that (Rφε,d) ∈ C(Rd,R), |(Rφε,d)(x)|+k(∇(Rφε,d))(x)k ≤ cdc(1+kxkc), (1) and P(φε,d) ≤ cdcε−c , |ϕd

Proof of Theorem 5.4. First, observe that for all d ∈ N it holds that

Trace(IRd) =

[

d∑

i=1

1

]1/2

= d1/2 ≤ max1, cd1/2. (307)

Corollary 4.7 (with α = 0, β = 12 , c = max1, c, µd = 0, Ad = IRd for

d ∈ N in the notation of Corollary 4.7) hence implies that there exist uniqueud ∈ C([0, T ]×R

d,R), d ∈ N, which satisfy for all d ∈ N, x ∈ Rd that ud(0, x) =

ϕd(x), which satisfy for all d ∈ N that infγ∈(0,∞) sup(t,x)∈[0,T ]×Rd

( |ud(t,x)|1+‖x‖γ

Rd

)

<

∞, and which satisfy that for all d ∈ N it holds that ud|(0,T )×Rd is a viscositysolution of

( ∂∂tud)(t, x) = (∆xud)(t, x) (308)

for (t, x) ∈ (0, T ) × Rd and there exist (ψε,d)(ε,d)∈(0,r]×N ⊆ N, C ∈ R such

that for all ε ∈ (0, r], d ∈ N it holds that N (ψε,d) ≤ Cdp+p+q(v+ 12v)ε−(q+2),

P(ψε,d) ≤ Cdp+2p+q(v+ 12v)ε−(q+4),P(ψε,d) ≤ Cdp+p+q(v+ 1

2v)ε−(q+2), (Rψε,d) ∈C(Rd,R), and

supx∈[a,b]d

|ud(T, x)− (Rψε,d)(x)| ≤ ε. (309)

This proves item (ii). Next note that (303) and (304) ensure that for all d ∈ N

it holds that ϕd : Rd → R is an at most polynomially growing function. Hence,

we obtain that for all d ∈ N it holds that

infγ∈(0,∞)

supx∈Rd

( |ϕd(x)|1 + ‖x‖γ

Rd

)

<∞. (310)

Lemma 5.3 (with T = T , ϕ = ϕd, u = ud for d ∈ N in the notation of Lemma 5.3)hence shows that for all d ∈ N, t ∈ (0, T ], x ∈ R

d it holds that ud|(0,T ]×Rd ∈C1,2((0, T ]× R

d,R) and

( ∂∂tud)(t, x) = (∆xud)(t, x). (311)

This, (308), and Hairer et al. [31, Remark 4.1]) prove item (i). The proof ofTheorem 5.4 is thus completed.

Corollary 5.5. Assume Setting 4.5, let a ∈ R, b ∈ (a,∞), c, T ∈ (0,∞), forevery d ∈ N let ‖·‖Rd : Rd → [0,∞) be the standard norm on R

d, let ϕd ∈C(Rd,R), d ∈ N, let (φε,d)(ε,d)∈(0,1]×N ⊆ N, and assume for all ε ∈ (0, 1],

d ∈ N, x ∈ Rd that (Rφε,d) ∈ C(Rd,R),

P(φε,d) ≤ cdcε−c , |ϕd(x)− (Rφε,d)(x)| ≤ εcdc(1 + ‖x‖cRd) , (312)

and|(Rφε,d)(x)| + ‖(∇(Rφε,d))(x)‖Rd ≤ cdc(1 + ‖x‖cRd). (313)

Then

62

Page 63: and assume for all ε∈ (0,1], d∈ N, x∈ Rd that (Rφε,d) ∈ C(Rd,R), |(Rφε,d)(x)|+k(∇(Rφε,d))(x)k ≤ cdc(1+kxkc), (1) and P(φε,d) ≤ cdcε−c , |ϕd

(i) there exist unique ud ∈ C([0, T ]×Rd,R), d ∈ N, which satisfy for all d ∈ N,

t ∈ (0, T ], x ∈ Rd that ud|(0,T ]×Rd ∈ C1,2((0, T ]×R

d,R), ud(0, x) = ϕd(x),

infγ∈(0,∞) sup(s,y)∈[0,T ]×Rd

( |ud(s,y)|1+‖y‖γ

Rd

)

<∞, and

( ∂∂tud)(t, x) = (∆xud)(t, x) (314)

and

(ii) there exist (ψε,d)(ε,d)∈(0,1]×N ⊆ N, κ ∈ R such that for all ε ∈ (0, 1], d ∈ N

it holds that P(ψε,d) ≤ κdκε−κ, (Rψε,d) ∈ C(Rd,R), and

supx∈[a,b]d

|ud(T, x)− (Rψε,d)(x)| ≤ ε. (315)

Proof of Corollary 5.5. First, observe that (312), (313), and item (i) in Theo-rem 5.4 (with r = 1, c = c, p = c, q = c, v = c, v = c, w = c, w = c, z = c,z = c in the notation of Theorem 5.4) establish item (i). Next note that (312),(313), and item (ii) in Theorem 5.4 (with r = 1, c = c, p = c, q = c, v = c,v = c, w = c, w = c, z = c, z = c in the notation of Theorem 5.4) ensure thatthere exist (ψε,d)(ε,d)∈(0,1]×N ⊆ N, C ∈ R which satisfy that for all ε ∈ (0, 1],

d ∈ N it holds that P(ψε,d) ≤ Cd32 c

2+11c+6ε−(c+4), (Rψε,d) ∈ C(Rd,R), and

supx∈[a,b]d

|ud(T, x)− (Rψε,d)(x)| ≤ ε. (316)

Hence, we obtain that for all ε ∈ (0, 1], d ∈ N it holds that

P(ψε,d) ≤ maxC, 32c2 + 11c+ 6dmaxC, 32 c2+11c+6ε−maxC, 32 c

2+11c+6. (317)

Combining this with (316) establishes item (ii). The proof of Corollary 5.5 isthus completed.

5.3 ANN approximations for geometric Brownian motions

In this subsection we specialize Theorem 5.4 above in Corollary 5.6 below toan example in which the activation function is the softplus function (R ∋ x 7→ln(1 + ex) ∈ (0,∞)).

Corollary 5.6. Let c, a ∈ R, b ∈ (a,∞), p ∈ [0,∞), T ∈ (0,∞), for everyd ∈ N let ‖·‖Rd : Rd → [0,∞) be the standard norm on R

d, let N be the setgiven by

N = ∪L∈N∩[2,∞) ∪l0,...,lL∈N

(

×Lk=1 (R

lk×lk−1 × Rlk))

, (318)

let Ad : Rd → R

d, d ∈ N, satisfy for all d ∈ N, x = (x1, . . . , xd) ∈ Rd

that Ad(x) = (ln(1 + ex1), . . . , ln(1 + exd)), let P : N → N and R : N →∪m,n∈NC(R

m,Rn) be the functions which satisfy that for all L ∈ N ∩ [2,∞),l0, . . . , lL ∈ N, Φ = ((W1, B1), . . . , (WL, BL)) ∈ (×L

k=1(Rlk×lk−1 × R

lk)), x0 ∈

63

Page 64: and assume for all ε∈ (0,1], d∈ N, x∈ Rd that (Rφε,d) ∈ C(Rd,R), |(Rφε,d)(x)|+k(∇(Rφε,d))(x)k ≤ cdc(1+kxkc), (1) and P(φε,d) ≤ cdcε−c , |ϕd

Rl0 , . . . , xL−1 ∈ R

lL−1 with ∀ k ∈ N ∩ (0, L) : xk = Alk(Wkxk−1 + Bk) it holds

that P(Φ) =∑L

k=1 lk(lk−1 + 1), (RΦ) ∈ C(Rl0 ,RlL), and

(RΦ)(x0) =WLxL−1 +BL, (319)

and let (Kd)d∈N ⊆ R satisfy for all d ∈ N that |Kd| ≤ cdp. Then

(i) there exist unique ud ∈ C([0, T ]×Rd,R), d ∈ N, which satisfy for all d ∈ N,

t ∈ (0, T ], x = (x1, . . . , xd) ∈ Rd that ud|(0,T ]×Rd ∈ C1,2((0, T ] × R

d,R),

ud(0, x) = ln(1+ex1+...+xd−Kd)+Kd, infγ∈(0,∞) sup(s,y)∈[0,T ]×Rd

( |ud(s,y)|1+‖y‖γ

Rd

)

<

∞, and

( ∂∂tud)(t, x) = (∆xud)(t, x) (320)

and

(ii) there exist (ψε,d)(ε,d)∈(0,1]×N ⊆ N, κ ∈ R such that for all ε ∈ (0, 1], d ∈ N

it holds that P(ψε,d) ≤ κd11+4maxp,1/2ε−4, (Rψε,d) ∈ C(Rd,R), and

supx∈[a,b]d

|ud(T, x)− (Rψε,d)(x)| ≤ ε. (321)

Proof of Corollary 5.6. Throughout this proof let ϕd : Rd → R, d ∈ N, be the

functions which satisfy for all d ∈ N, x = (x1, . . . , xd) ∈ Rd that

ϕd(x) = ln(1 + ex1+...+xd−Kd) +Kd (322)

and let (φd)d∈N ⊆ N satisfy for all d ∈ N that

φd = (((1, . . . , 1),−Kd), (1,Kd)) ∈ (R1×d × R)× (R× R). (323)

Observe that (323) assures that for all d ∈ N it holds that

P(φd) = 1(d+ 1) + 1(1 + 1) = d+ 3 ≤ 4d ≤ max4, cd. (324)

Next note that the fact that (R ∋ x 7→ ln(1 + ex) ∈ R) ∈ C1(R,R), (319), and(323) imply that for all d ∈ N, x = (x1, . . . , xd) ∈ R

d it holds that (Rφd) ∈C(Rd,R) and

(Rφd)(x) = ln(1 + ex1+...+xd−Kd) +Kd = ϕd(x). (325)

Next note that for all d ∈ N it holds that ln(1 + e−Kd) ≤ ln 2 + |Kd| and forany d ∈ N, x ∈ R

d it holds that ‖(∇ϕd)(x)‖Rd ≤ d1/2. This and the hypothesisthat for all d ∈ N it holds that |Kd| ≤ cdp hence demonstrate that for all d ∈ N,x = (x1, . . . , xd) ∈ R

d it holds that

|(Rφd)(x)| = |ϕd(x)| ≤ supy∈Rd

[‖(∇ϕd)(y)‖Rd‖x‖Rd + |ϕd(0)|]

≤ d1/2‖x‖Rd + ln 2 + 2|Kd| ≤ d1/2‖x‖Rd + 1 + 2cdp

≤ max1, 2c(1 + dp + d1/2‖x‖Rd)

≤ 2max1, 2cdmaxp,1/2(1 + ‖x‖Rd) .

(326)

64

Page 65: and assume for all ε∈ (0,1], d∈ N, x∈ Rd that (Rφε,d) ∈ C(Rd,R), |(Rφε,d)(x)|+k(∇(Rφε,d))(x)k ≤ cdc(1+kxkc), (1) and P(φε,d) ≤ cdcε−c , |ϕd

Next note that for all d ∈ N, x ∈ Rd it holds that

(

∇(Rφd))

(x)∥

Rd = ‖(∇ϕd)(x)‖Rd ≤ d1/2 ≤ 2max1, 2cd1/2(1 + ‖x‖0Rd).

(327)Combining this, (324), (325), (326), and the fact that (R ∋ x 7→ ln(1+ ex) ∈

R) ∈ C1(R,R) with Theorem 5.4 (with c = max4, 4c, r = 1, p = 1, q = 0,v = 0, v = 0, w = 1/2, w = 0, z = maxp, 1/2, z = 1, a =

(

R ∋ x 7→ln(1 + ex) ∈ R

)

in the notation of Theorem 5.4) establishes items (i)–(ii). Theproof of Corollary 5.6 is thus completed.

Acknowledgements

This project is based on the master thesis of DK written from March 2018 toSeptember 2018 at ETH Zurich under the supervision of AJ.

References

[1] Aliprantis, C. D., and Border, K. C. Infinite dimensional analysis.Springer, Berlin, 2006.

[2] Amann, H., and Escher, J. Analysis. III. Birkhauser, Basel, 2009.

[3] Artin, E. The gamma function. Holt, Rinehart and Winston, New York-Toronto-London, 1964.

[4] Beck, C., Becker, S., Cheridito, P., Jentzen, A., and Neufeld,

A. Deep splitting method for parabolic PDEs. arXiv:1907.03452 (2019),40 pages.

[5] Beck, C., Becker, S., Grohs, P., Jaafari, N., and Jentzen, A.

Solving stochastic differential equations and Kolmogorov equations bymeans of deep learning. arXiv:1806.00421 (2018), 56 pages.

[6] Beck, C., E, W., and Jentzen, A. Machine Learning Approxima-tion Algorithms for High-Dimensional Fully Nonlinear Partial DifferentialEquations and Second-order Backward Stochastic Differential Equations.J. Nonlinear Sci. 29, 4 (2019), 1563–1619.

[7] Becker, S., Cheridito, P., and Jentzen, A. Deep optimal stopping.Journal of Machine Learning Research 20, 74 (2019), 1–25.

[8] Becker, S., Cheridito, P., Jentzen, A., and Welti, T. Solv-ing high-dimensional optimal stopping problems using deep learning.arXiv:1908.01602 (2019), 42 pages.

[9] Bellman, R. Dynamic programming. Princeton University Press, Prince-ton, NJ, 1957.

65

Page 66: and assume for all ε∈ (0,1], d∈ N, x∈ Rd that (Rφε,d) ∈ C(Rd,R), |(Rφε,d)(x)|+k(∇(Rφε,d))(x)k ≤ cdc(1+kxkc), (1) and P(φε,d) ≤ cdcε−c , |ϕd

[10] Berg, J., and Nystrom, K. A unified deep artificial neural networkapproach to partial differential equations in complex geometries. Neuro-computing 317 (2018), 28–41.

[11] Berner, J., Grohs, P., and Jentzen, A. Analysis of the general-ization error: Empirical risk minimization over deep artificial neural net-works overcomes the curse of dimensionality in the numerical approxima-tion of Black-Scholes partial differential equations. Revision requested fromSIAM Journal on Mathematics of Data Science; arXiv:1809.03062 (2018),35 pages.

[12] Buehler, H., Gonon, L., Teichmann, J., and Wood, B. Deep hedg-ing. Quantitative Finance 19, 8 (2019), 1271–1291.

[13] Chan-Wai-Nam, Q., Mikael, J., and Warin, X. Machine learning forsemi linear PDEs. J. Sci. Comput. 79, 3 (2019), 1667–1712.

[14] Chen, Y., and Wan, J. W. L. Deep neural network framework based onbackward stochastic differential equations for pricing and hedging americanoptions in high dimensions. arXiv:1909.11532 (2019), 35 pages.

[15] Cohn, D. L. Measure theory. Birkhauser, New York, 2013.

[16] Cox, S., Hutzenthaler, M., Jentzen, A., van Neerven, J., and

Welti, T. Convergence in Holder norms with applications to Monte Carlomethods in infinite dimensions. Revision requested from IMA. Journal onNumerical Analysis; arXiv:1605.00856 (2017), 48 pages.

[17] Dockhorn, T. A discussion on solving partial differential equations usingneural networks. arXiv:1904.07200 (2019), 9 pages.

[18] E, W., Han, J., and Jentzen, A. Deep learning-based numerical meth-ods for high-dimensional parabolic partial differential equations and back-ward stochastic differential equations. Communications in Mathematicsand Statistics 5, 4 (2017), 349–380.

[19] E, W., and Yu, B. The deep Ritz method: A deep learning-based nu-merical algorithm for solving variational problems. Commun. Math. Stat.6, 1 (2018), 1–12.

[20] Elbrachter, D., Grohs, P., Jentzen, A., and Schwab, C. DNN Ex-pression Rate Analysis of High-dimensional PDEs: Application to OptionPricing. arXiv:1809.07669 (2018), 50 pages.

[21] Evans, L. C. Partial differential equations, second ed., vol. 19 of GraduateStudies in Mathematics. American Mathematical Society, Providence, RI,2010.

[22] Farahmand, A.-m., Nabi, S., and Nikovski, D. Deep reinforcementlearning for partial differential equation control. 2017 American ControlConference (ACC) (2017), 3120–3127.

66

Page 67: and assume for all ε∈ (0,1], d∈ N, x∈ Rd that (Rφε,d) ∈ C(Rd,R), |(Rφε,d)(x)|+k(∇(Rφε,d))(x)k ≤ cdc(1+kxkc), (1) and P(φε,d) ≤ cdcε−c , |ϕd

[23] Fujii, M., Takahashi, A., and Takahashi, M. Asymptotic expansionas prior knowledge in deep learning method for high dimensional BSDEs.Asia-Pacific Financial Markets (Mar 2019).

[24] Garling, D. J. H. Inequalities: a journey into linear analysis. CambridgeUniversity Press, Cambridge, 2007.

[25] Gilbarg, D., and Trudinger, N. S. Elliptic partial differential equa-tions of second order. Springer, Berlin, 2001. Reprint of the 1998 edition.

[26] Goudenege, L., Molent, A., and Zanette, A. Machine Learning forPricing American Options in High Dimension. arXiv:1903.11275 (2019),11 pages.

[27] Graves, A., Mohamed, A.-r., and Hinton, G. Speech recognition withdeep recurrent neural networks. In Proceedings of the IEEE Conference onAcoustics, Speech and Signal Processing, ICASSP (2013), pp. 6645–6649.

[28] Grohs, P., Hornung, F., Jentzen, A., and von Wurstemberger,

P. A proof that artificial neural networks overcome the curse of dimension-ality in the numerical approximation of Black-Scholes partial differentialequations. To appear in Memoires of the American Mathematical Society;arXiv:1809.02362 (2018), 124 pages.

[29] Grohs, P., Hornung, F., Jentzen, A., and Zimmermann, P. Space-time error estimates for deep neural network approximations for differentialequations. arXiv:1908.03833 (2019), 86 pages.

[30] Grohs, P., Jentzen, A., and Salimova, D. Deep neural network ap-proximations for Monte Carlo algorithms. arXiv:1908.10828 (2019), 45pages.

[31] Hairer, M., Hutzenthaler, M., and Jentzen, A. Loss of regularityfor Kolmogorov equations. Annals of Probabability 43, 2 (2015), 468–527.

[32] Han, J., Jentzen, A., and E, W. Solving high-dimensional partial differ-ential equations using deep learning. Proceedings of the National Academyof Sciences 115, 34 (2018), 8505–8510.

[33] Han, J., and Long, J. Convergence of the Deep BSDE Method forCoupled FBSDEs. arXiv:1811.01165 (2018), 26 pages.

[34] Henry-Labordere, P. Deep Primal-Dual Algorithm for BSDEs: Appli-cations of Machine Learning to CVA and IM. (November 15, 2017), 16pages. Available at SSRN: https://ssrn.com/abstract=3071506.

[35] Hinton, G., Deng, L., Yu, D., Dahl, G., rahman Mohamed, A.,

Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.,

and Kingsbury, B. Deep neural networks for acoustic modeling in speechrecognition. Signal Processing Magazine (2012).

67

Page 68: and assume for all ε∈ (0,1], d∈ N, x∈ Rd that (Rφε,d) ∈ C(Rd,R), |(Rφε,d)(x)|+k(∇(Rφε,d))(x)k ≤ cdc(1+kxkc), (1) and P(φε,d) ≤ cdcε−c , |ϕd

[36] Huang, G., Liu, Z., and Weinberger, K. Q. Densely connected convo-lutional networks. 2017 IEEE Conference on Computer Vision and PatternRecognition (CVPR) (2017), 2261–2269.

[37] Hure, C., Pham, H., and Warin, X. Some machine learning schemesfor high-dimensional nonlinear PDEs. arXiv:1902.01599 (2019), 33 pages.

[38] Hutzenthaler, M., Jentzen, A., Kruse, T., and Nguyen, T. A.

A proof that rectified deep neural networks overcome the curse of dimen-sionality in the numerical approximation of semilinear heat equations. Re-vision requested from SN Partial Differential Equations and Applications;arXiv:1901.10854 (2019), 29 pages.

[39] Hutzenthaler, M., Jentzen, A., and Wang, X. Exponential in-tegrability properties of numerical approximation processes for nonlinearstochastic differential equations. Mathematics of Computation 87, 311(2018), 1353–1413.

[40] Hytonen, T., van Neerven, J., Veraar, M., and Weis, L. Analysisin Banach spaces. Vol. II. Springer, 2017.

[41] Jacquier, A., and Oumgari, M. Deep PPDEs for rough local stochasticvolatility. arXiv:1906.02551 (2019), 21 pages.

[42] Jentzen, A., and Kloeden, P. E. Taylor approximations for stochasticpartial differential equations, vol. 83 of CBMS-NSF Regional ConferenceSeries in Applied Mathematics. Society for Industrial and Applied Mathe-matics (SIAM), Philadelphia, PA, 2011.

[43] Jentzen, A., Salimova, D., and Welti, T. A proof that deep artificialneural networks overcome the curse of dimensionality in the numerical ap-proximation of Kolmogorov partial differential equations with constant dif-fusion and nonlinear drift coefficients. arXiv:1809.07321 (2018), 48 pages.

[44] Karatzas, I., and Shreve, S. E. Brownian motion and stochastic cal-culus, second ed. Springer, New York, 1991.

[45] Klenke, A. Probability theory. Springer, 2014.

[46] Krizhevsky, A., Sutskever, I., and Hinton, G. E. Imagenet classi-fication with deep convolutional neural networks. In Advances in NeuralInformation Processing Systems 25, F. Pereira, C. J. C. Burges, L. Bottou,and K. Q. Weinberger, Eds. Curran Associates, Inc., 2012, pp. 1097–1105.

[47] Kutyniok, G., Petersen, P., Raslan, M., and Schneider, R.

A theoretical analysis of deep neural networks and parametric PDEs.arXiv:1904.00377 (2019), 43 pages.

[48] Long, Z., Lu, Y., Ma, X., and Dong, B. PDE-Net: Learning PDEsfrom Data. In Proceedings of the 35th International Conference on MachineLearning (2018), pp. 3208–3216.

68

Page 69: and assume for all ε∈ (0,1], d∈ N, x∈ Rd that (Rφε,d) ∈ C(Rd,R), |(Rφε,d)(x)|+k(∇(Rφε,d))(x)k ≤ cdc(1+kxkc), (1) and P(φε,d) ≤ cdcε−c , |ϕd

[49] Lye, K. O., Mishra, S., and Ray, D. Deep learning observables incomputational fluid dynamics. arXiv:1903.03040 (2019), 57 pages.

[50] Magill, M., Qureshi, F., and de Haan, H. W. Neural networkstrained to solve differential equations learn general representations. InAdvances in Neural Information Processing Systems (2018), pp. 4071–4081.

[51] Mizuguchi, M., Tanaka, K., Sekine, K., and Oishi, S. Estimation ofSobolev embedding constant on a domain dividable into bounded convexdomains. Journal of Inequalities and Applications (2017).

[52] Petersen, P., and Voigtlaender, F. Optimal approximation of piece-wise smooth functions using deep ReLU neural networks. arXiv:1709.05289(2017), 54 pages.

[53] Pham, H., and Warin, X. Neural networks-based backward scheme forfully nonlinear PDEs. arXiv:1908.00412 (2019), 15 pages.

[54] Reisinger, C., and Zhang, Y. Rectified deep neural networks over-come the curse of dimensionality for nonsmooth value functions in zero-sumgames of nonlinear stiff systems. arXiv:1903.06652 (2019), 34 pages.

[55] Rockafellar, R. T. Convex Analysis. Princeton University Press, 1970.

[56] Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L.,

van den Driessche, G., Schrittwieser, J., Antonoglou, I., Pan-

neershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham,

J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M.,

Kavukcuoglu, K., Graepel, T., and Hassabis, D. Mastering thegame of go with deep neural networks and tree search. Nature 529 (Jan.2016), 484.

[57] Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I.,

Huang, A., Guez, A., Hubert, T., Baker, L., Lai, M., Bolton,

A., Chen, Y., Lillicrap, T., Hui, F., Sifre, L., van den Driessche,

G., Graepel, T., and Hassabis, D. Mastering the game of go withouthuman knowledge. Nature 550 (Oct. 2017), 354.

[58] Simonyan, K., and Zisserman, A. Very deep convolutional networksfor large-scale image recognition. arXiv:1409.1556 (2014).

[59] Sirignano, J., and Cont, R. Universal features of price formation infinancial markets: perspectives from deep learning. Quantitative Finance19, 9 (2019), 1449–1459.

[60] Sirignano, J., and Spiliopoulos, K. DGM: A deep learning algorithmfor solving partial differential equations. J. Comput. Phys. 375 (2018),1339–1364.

69

Page 70: and assume for all ε∈ (0,1], d∈ N, x∈ Rd that (Rφε,d) ∈ C(Rd,R), |(Rφε,d)(x)|+k(∇(Rφε,d))(x)k ≤ cdc(1+kxkc), (1) and P(φε,d) ≤ cdcε−c , |ϕd

[61] Wu, C., Karanasou, P., Gales, M. J., and Sim, K. C. Stimulateddeep neural network for speech recognition. In Interspeech 2016 (2016),pp. 400–404.

70