polynomial semimartingales and a deep learning approach to

185
Polynomial Semimartingales and a Deep Learning Approach to Local Stochastic Volatility Calibration Dissertation zur Erlangung des Doktorgrades vorgelegt von Wahid Khosrawi-Sardroudi an der Fakultät für Mathematik und Physik der Albert-Ludwigs-Universität Freiburg April 2019

Upload: others

Post on 05-Oct-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Polynomial Semimartingales and a Deep Learning Approach to

Polynomial Semimartingales and a DeepLearning Approach to Local Stochastic

Volatility Calibration

Dissertation zur Erlangung des Doktorgrades

vorgelegt von

Wahid Khosrawi-Sardroudi

an der Fakultät für Mathematik und Physikder Albert-Ludwigs-Universität Freiburg

April 2019

Page 2: Polynomial Semimartingales and a Deep Learning Approach to
Page 3: Polynomial Semimartingales and a Deep Learning Approach to

i

Dekan: Prof. Dr. Gregor Herten

Referenten: Prof. Dr. Thorsten SchmidtProf. Dr. Martin Keller-Ressel

Datum der mündlichen Prüfung: 29.05.2019

Page 4: Polynomial Semimartingales and a Deep Learning Approach to

ii

Page 5: Polynomial Semimartingales and a Deep Learning Approach to

Preface

Financial markets have experienced a precipitous increase in complexity overthe past decades, posing a significant challenge from a risk management pointof view. This complexity motivates the application and development of so-phisticated models based on the theory of stochastic processes and in par-ticular stochastic calculus. In this regard, the contribution of this thesis istwofold, namely by extending the class if tractable stochastic processes inform of polynomial processes and polynomial semimartingales and by show-ing how efficient calibration of local stochastic volatility models is possibleby applying machine learning techniques.

In the first part - the main part - we extend the class of polynomial processesthat has previously been established to include beyond stochastic discontinu-ity. This extension is motivated by the fact that certain events in financialmarkets take place at a deterministic time point but without foreseeable out-come. Such events consist e.g. of decisions regarding interest rates of centralbanks or political elections/votes. Since the outcome has a significant impacton markets, it is therefore desirable to consider stochastic processes, that canreproduce such jumps at previously specified time points. Such an extensionhas already been introduced in the affine framework. We will show that simi-lar modifications hold true in the polynomial case. In particular, we will showhow after this extension, computation of mixed moments in a multivariatesetting reduces to solving a measure ordinary differential equation, posing asignificant reduction in complexity to the measure partial differential case inthe context of Kolmogorow equations. A central role in the theory of time-homogeneous polynomial processes is played by the theory of one parametermatrix semigroups. Hence, we will develop a two parameter version of thematrix semigroup theory under lower regularity then what exists in the lit-erature. This accounts for time-inhomogeneity of the stochastic processeswe consider. While in the one parameter case, full regularity follows alreadyfrom very mild assumptions, we will see that this is not the case anymore inthe two parameter case.

In the second part of this thesis we investigate a more applied topic, namely

iii

Page 6: Polynomial Semimartingales and a Deep Learning Approach to

iv

the exact calibration of local stochastic volatility models to financial data. Weshow how this computationally challenging problem can be efficiently solvedby applying machine learning techniques in form of deep neural networks.These methods have dramatically surged in the literature. Since this surgewas accompanied by the development of highly efficient machine learninglibraries, we can exploit this and make use of sophisticated computationaltools such as gpu accelerated numerical computation. We will provide a shortexposition to the underlying concepts and give numerical examples in form oftoy models. We will further show how this high dimensional problem can bemade tractable by the application of an auxiliary machine learning methodin the context of variance reduction for Monte-Carlo pricing methods.

I want to take this opportunity to express my great gratitude to severalpeople who have enriched my lives over the past years. Let me start with mysupervisor Thorsten Schmidt. Thank you for taking a chance on me whengiving me this position and thank you for sharing with me your experienceover the past years. It would be a futile attempt to express in words justhow much I learned during my time here in Freiburg. I also want to thankyou for your friendship and advice whenever I needed it. Thank you for themany evenings we spent together with the whole AG Schmidt making food(specially the pizza), drinking coffee and frequenting the various bars herein Freiburg. Thank you also for all the opportunities you provided me thatwent far beyond anything I would have expected when I started my PhD.This in particular includes the fellow mathematicians I was able to meetthanks to your support, including those I like to call the “Austrians in mylife”. This includes in particular Josef Teichmann, who not only allowed meto learn from him as a mathematician, shared with me his passion and whohelped me take my first steps into the fantastic field of mathematical machinelearning, but who also spend numerous hours talking with me about politics,a shared passion of ours. Thank you Josef, for your friendship, your guidanceand for the doors you have opened for me. I hope you know how grateful I amfor everything you have done for me. I further would like to thank ChristaCuchiero. I become sentimental thinking back to that legendary night onthe rooftop bar in Kiel, where you, Thorsten, Josef and I spent a wonderfultime talking and discussing about maths, politics and everything interestingwe could think of. Wonderful, how this became the starting point of thelegendary FWZ-seminar. I hereby want to apologize to you and the othersfor making you try the “Mexikaner” shots. Being from Hamburg, I was sureyou all would enjoy them as much as I do. Thank you also for inviting meto work together with you and Josef on our joint project and thank you forinviting me to Vienna where we made significant progress together. Next,I would like to thank Philipp Harms, another Austrian who enriched mylife. Thank you for your friendship and help whenever I needed someone

Page 7: Polynomial Semimartingales and a Deep Learning Approach to

v

to discuss mathematics. How happy I am for you to have come join us inFreiburg. No question for me that our friendship will continue beyond mytime in Freiburg. Let me finish the group of “FWZ-Austrians” by mentioningIrene Klein. Thank you for our discussions and for your work in co-organizingthe FWZ-seminar. I apologize to you for my failed attempts to emulate theAustrian dialect. I promise I will not stop practicing in the years to come. Ialso want to thank Sandrine, for being the best “PhD-sister”. Thank you fortaking time to discuss with me, for the laughs we had and for the numerousinvitations to your and Mo’s apartment, you two were always wonderful hosts.Further, I want to thank all members of the mathematical stochastic groupin Freiburg. The environment here was always positive, motivating and madeit very easy to come to the office every day. This includes in no particularorder Tolu, Monika, Lukas, Yves, Maria, Bene, Marcus, Felix, Elli, Ernst-August, Pascal, Johannes, Timo and Joni. Also my thanks to the professorsPeter Pfaffelhuber, Angelika Rohde, Ludger Rüschendorf, Ernst Eberlein,Hans Rudolf Lerche and Steffan Tappe. I further would like to thank all thecolleagues I have met outside of Freiburg. I want to thank Sam Cohen, MartinKeller-Ressel, Claudio Fontana, Kathrin Glau, Martin Larsson, Sergio Pulido,Damir Filipović and Johannes Muhle-Karbe for all the help and advice theyhave given me over the past years. Further thanks go to Sara, Lukas, Marvin,Eduardo, Matteo and all other colleagues I was privileged to meet.

Many thanks to all my friends here in Freiburg from outside the world ofmathematics. In particular many thanks to Philip and Nadja, you are bothamazing and you both made my time in Freiburg so much better.

Finally, I want to thank my family. Thanks goes to my sister Parisa and herhusband Christian. Thank you Parisa for always being the first in our familyto walk the path I would follow shortly after. Thank you for removing allthe obstacles in my way. I hope you know how much I love you and howimportant you are to me. The greatest imaginable thanks to my parents Aliand Zohreh. What the two of you did for Parisa and me goes beyond whatwords could describe. I feel privileged knowing that the two of you love meand it is this very love that motivates me to continue my path. Both of youare my role models and everything I am and everything I achieve in my life isthanks to the love the two of you gave me and the many sacrifices you made.

Freiburg, April of 2019

Page 8: Polynomial Semimartingales and a Deep Learning Approach to

Contents

I Polynomial Semimartingales 1

1 Introduction 3

2 Preliminaries 72.1 Basic notational conventions . . . . . . . . . . . . . . . . . . . . . . . . . . 72.2 Notation and conventions for multivariate polynomials . . . . . . . . . . . 92.3 Function spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3 General theory 133.1 General probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.2 Processes of locally bounded variation . . . . . . . . . . . . . . . . . . . . 16

4 Evolution systems and measure integral equations 194.1 Matrix semigroups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204.2 Matrix evolution systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4.2.1 Cauchy-Sincov-Equations . . . . . . . . . . . . . . . . . . . . . . . 254.2.2 Matrix evolution systems revisited . . . . . . . . . . . . . . . . . . 274.2.3 Existence and uniqueness of extended generators . . . . . . . . . . 34

4.3 Ordinary measure integral equations . . . . . . . . . . . . . . . . . . . . . 424.3.1 Existing literature . . . . . . . . . . . . . . . . . . . . . . . . . . . 424.3.2 Existence and uniqueness for measure integral equations . . . . . . 444.3.3 A Gronwall type inequality for measure differential equations . . . 53

5 Polynomials and matrix evolution systems 575.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 575.2 Degree preserving properties of time-homogeneous polynomial processes . 635.3 Degree preserving properties of matrix evolution systems and extended

generators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

6 Polynomial Semimartingales 736.1 Setting and the polynomial property . . . . . . . . . . . . . . . . . . . . . 746.2 Regular polynomial processes and representations in finite dimensions . . 856.3 Extended generators for k-polynomial processes . . . . . . . . . . . . . . . 866.4 Polynomial Processes as Semimartingales . . . . . . . . . . . . . . . . . . 106

vi

Page 9: Polynomial Semimartingales and a Deep Learning Approach to

CONTENTS vii

6.5 The discrete time case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

Appendices 131

A Existence of càdlàg modifications 131A.1 Modifications and indistinguishability . . . . . . . . . . . . . . . . . . . . 131A.2 Existence of càdlàg modifications . . . . . . . . . . . . . . . . . . . . . . . 134

B Additional notes 139B.1 Proof of a naive uniform integrability result . . . . . . . . . . . . . . . . . 139

II A Deep Learning Approach to Local Stochastic Volatility Cal-ibration 141

7 Preliminaries and deep learning 1437.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1437.2 Preliminaries on deep learning . . . . . . . . . . . . . . . . . . . . . . . . . 145

7.2.1 Artificial neural networks . . . . . . . . . . . . . . . . . . . . . . . 1457.2.2 Stochastic gradient descent . . . . . . . . . . . . . . . . . . . . . . 146

7.3 Variance reduction for pricing and calibration . . . . . . . . . . . . . . . . 1477.3.1 Black Scholes hedge . . . . . . . . . . . . . . . . . . . . . . . . . . 1497.3.2 Neural networks hedge . . . . . . . . . . . . . . . . . . . . . . . . . 149

8 Calibration of LSV models 1518.1 Minimizing the calibration functional . . . . . . . . . . . . . . . . . . . . . 152

8.1.1 Standard gradient descent . . . . . . . . . . . . . . . . . . . . . . . 1528.1.2 Standard gradient descent with control variates . . . . . . . . . . . 153

8.2 Numerical Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . 1538.2.1 Toy-model specification . . . . . . . . . . . . . . . . . . . . . . . . 1548.2.2 A first numerical experiment for the variance reduction . . . . . . . 1558.2.3 Numerical results for the calibration problem . . . . . . . . . . . . 1578.2.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

9 Plots and Histograms 161

Page 10: Polynomial Semimartingales and a Deep Learning Approach to

viii CONTENTS

Page 11: Polynomial Semimartingales and a Deep Learning Approach to

Part I

Polynomial Semimartingales

1

Page 12: Polynomial Semimartingales and a Deep Learning Approach to
Page 13: Polynomial Semimartingales and a Deep Learning Approach to

Chapter 1

Introduction

The first part of this thesis aims to generalize the class of finite dimensional polynomialprocesses to include processes beyond stochastic continuity. Stochastic processes arewidely used in financial modeling for various applications. In the context of equitymodeling, the seminal work of Black and Scholes [BS73] introduced a by today’s standardsimple model that allows for the valuation of European vanilla options. Other areasof finance have profited from the introduction of financial models based on stochasticcalculus, see for example Schönbucher [Sch98] in the context of term structure modelingand the seminal Nobel price winning work by Merton [Mer74] in the context of creditrisk management.

When dealing with financial markets, one is often confronted with two challenges. First,the model needs to “capture” empirical properties usually observed in financial data.Second, it is necessary to correctly model dependencies in a market in order to notundervalue risks. Regarding the latter, this motivates the simultaneous modeling ofmultiple prices where the movement of one price is not independent of the movement ofthe others. This in particular becomes a necessity when considering financial derivativesthat depend on multiple underlying assets.

One the other hand, for the first, such properties can be the geometry of the observed datasuch as bounded state spaces of prices. An example where this is a needed feature wouldbe the modeling of electricity prices. Such prices show a strong seasonality, motivatingthe consideration of time-inhomogeneous models. From a simplified point of view, amodel defined by an Itô diffusion would need to allow for time depending parameters.

Probably the best known short cumming of the Black-Scholes model is that it is incapableto reproduce the implied volatility smile observed in the market. In the past decades,stochastic volatility models have aimed to address the issue of “fitting the smile”. Thiswork was pioneered by Hull and White [HW87] and the most famous example wouldprobably be the Heston model, see Heston [Hes93].

3

Page 14: Polynomial Semimartingales and a Deep Learning Approach to

4 CHAPTER 1. INTRODUCTION

In the last years in particular, stochastic volatility models have gained attention sinceempirical data suggests that the volatility process is “rough”, see in particular Gatheral etal. [Gat+14], motivating the extension to fractional stochastic volatility models. Thesemodels were originally considered to capture long memory effects that can not be re-produced by a Markov model, see Comte and Renault [CR98]. Advances in this fieldwere in particular achieved in the context of affine processes, see El Euch and Rosen-baum [EER19], Jaber et al. [Jab+17], Cuchiero and Teichmann [CT18] and Gatheral andKeller-Ressel [GKR18]. In this thesis, we will not deal with an extension of polynomialprocesses to a fraction setting.

From an applications point of view, one faces the problem that the model needs to besufficiently tractable so that pricing and calibration is feasible, standing in direct contrastto the above formulated desirable features of financial models. A class of stochasticprocesses bridging this gap is given by affine processes. The study of such stochasticprocesses has a long history and we refer to the introduction in Cuchiero [Cuc11] for anoverview.

An extension of of this class of processes is given in form of polynomial processes. Afirst systematic treatment of the multivariate case was given in Cuchiero et al. [Cuc+12]where a full characterization in terms of semimartingale characteristics is provided. Inthis reference, the authors introduce polynomial processes as time homogeneous Markovprocesses where they allow for a non-zero killing rate. An alternative but related approachis given in Filipović and Larsson [FL16] and Filipović and Larsson [FL17]. The approachtaken there is one from a semimartingale point of view without requiring the consideredprocesses to be Markov. We in particular want to refer to the introduction of the lastreference where the authors give examples of applications of polynomial processes inmathematical finance. Let us however mention the comprehensive study of multivariatepolynomial processes in a jump diffusion setting and supported on the unit simplex inCuchiero et al. [Cuc+18b], together with their applications in stochastic portfolio theory,see Cuchiero [Cuc18].

All these approaches have in common that the stochastic processes considered are stochas-tically continuous which is essentially a consequence of the fact that time homogeneityimplies a semigroup structure which in return, already under very mild regularity as-sumptions, implies strong regularity of the semigroup, see Proposition 4.2 in this thesis.Regularity of the semigroup in return is inherited by the time dependency of conditionalmoments which by means of the Markov inequality then implies stochastic continuity.It is however a desirable feature of financial models to allow for jump-times that arepreviously known, see e.g. the introduction of [KR+18] and the references therein.

A first extension to a time-inhomogeneous setting in the spirit of [Cuc+12] was providedby C. A. Hurtado [CAH17], however under strong regularity assumptions on the Markovsemigroup that imply stochastic continuity. From an application point of view however,there are numerous examples of events such as interest rate decisions, credit defaultevents or political elections, where the time point of the event is known, but not the

Page 15: Polynomial Semimartingales and a Deep Learning Approach to

CHAPTER 1. INTRODUCTION 5

outcome. Since the “type” of outcome has a significant impact on markets, being ableto model stochastic discontinuity falls under the class of desirable features of financialmodels.

A first extension in the affine framework, allowing for stochastic discontinuity whilepreserving tractability, was given in Keller-Ressel et al. [KR+18]. In this part of thethesis, we want to follow their lead and extend the class of polynomial processes to includeexamples with stochastic discontinuity while preserving tractability. Tractability in thecontext of polynomial processes means that one should be able to compute conditionalmixed moments in arbitrary but finite dimension in an efficient way. While in the time-homogeneous setting, this boils down to the computation of a matrix exponential, thetime-inhomogeneous setting in [CAH17] requires solving a potentially high-dimensionalordinary differential equation. We will show that in our framework, the computation ofmixed moments is achieved by solving a measure differential equation, where the timederivative is replaced by a Radon-Nikodým derivative. This is due to the fact thatthe Lebesgue measure is replaced with a measure that charges points, corresponding todeterministic jump-times. This is very much similar to the results presented in [KR+18],where instead of Riccati equations, the authors show that the conditional characteristicfunction is given as the solution of a measure Riccati equation. Let us further note,that our characterization results wrt semimartingale characteristics differ from the affinecase. A central role in this characterization is played by the existence of an increasingcàdlàg function A wrt which the differential characteristics are given. While in [KR+18,Theorem 3.2] the characteristics are given wrt the continuous part of A so that predictablejumps are “taken care of” by the discontinuous part of the predictable compensator of theassociated jump measure, we formulate our results without decomposing A to continuousand pure jump part.

Before giving a brief overview of our approach, let us briefly mention the work on polyno-mial processes in infinite dimensions, namely Cuchiero et al. [Cuc+18c] and Benth et al.[Ben+18].

Outline of this part of the thesis

We want to take the viewpoint expressed in the introduction of Filipović and Larsson[FL17] and opt to not assume a priory a Markov process since from an applicationpoint of view, it seems preferable to consider a semimartingale framework. However, wewant to give an (almost) complete characterization for stochastic processes that havethe fundamental property that conditional moments are expressed by a polynomial of acorresponding degree, we call this the polynomial property. Hence we will consider càdlàgprocesses on a fixed stochastic basis and show how the polynomial property is related toa Markovian type structure on the space of multivariate polynomials of at most a certaindegree, allowing us to give results similar to those in [Cuc+12] without assuming an

Page 16: Polynomial Semimartingales and a Deep Learning Approach to

6 CHAPTER 1. INTRODUCTION

underlying Markov process. We will repeatedly take advantage of finite dimensionalityof the space of polynomials of at most a given degree, allowing us to develop a theoryunder significantly fewer regularity assumptions. Due to time-inhomogeneity, we can notrely on well established results of matrix semigroups. Hence after fixing some notationand reviewing some basic results from probability in the subsequent two chapters, we willconsider the time-inhomogeneous version we will refer to as matrix evolution systems inChapter 4. While evolution systems have been considered in both, finite and infinitedimensional settings ([EN99] and [Böt14] to name just two), usually the assumptions areunder strong regularity and rely on space-time homogenization. Since such an approachis not reasonable in our case as we do not want to leave a finite dimensional setting,the presentation we give under the regularity assumptions we make, to the best of ourknowledge, is not available in the literature. In the same chapter, we will introduce aconcept of infinitesimal description of such matrix evolution systems that is suitable forour regularity assumptions. We will then show how such an infinitesimal description,named extended generator and motivated by extended generators of Markov processes,relates to measure differential equations (or rather integral equations) that will play acentral role in our characterization results. As extended generators for matrix evolutionsystems and Markov processes share the same name, this could lead to confusion. In thisthesis however, it will always be clear from context which of the two concepts is meant,avoiding any ambiguity on this regard.

In Chapter 5, we will relate certain invariance properties of matrix evolution systems toinvariance properties of the extended generators. While a triviality in the regular case,we will see that these results are not as clear anymore in our situation. For that, wewill in particular formalize the well known relationship between polynomials of boundeddegree and Euclidean spaces.

In Chapter 6, we will finally introduce a class of processes we call (k-)polynomial processesin the spirit of [Cuc+12]. We will show how these processes relate to matrix evolutionsystems and how the extended generator of the stochastic process corresponds to theextended generator of a related matrix evolution system. We will see that these processesare special semimartingales and their characteristics satisfy certain properties. We willthen introduce a class of semimartingales called polynomial semimartingales in the spiritof [FL17]. These are special semimartingales with characteristics that satisfy assumptionsmotivated by the results from the previously introduced class of polynomial processes andwe show that under reasonable assumptions, these semimartingales indeed belong to theclass of polynomial processes.

In this thesis, we do not allow for a positive killing rate, unlike [Cuc+12]. Hence, inparticular regarding [Cuc+12, Proposition 2.12], our setting corresponds to a killing rateγ = 0 and by that hitting time of the “coffin state” T∆ = ∞. On the other hand, thesetting in [FL16] and [FL17] is entirely covered by our approach. We do not howeverprovide results such as uniqueness to corresponding martingale problems, preservingproperties under subordination or boundary attainment of polynomial (jump)-diffusions.

Page 17: Polynomial Semimartingales and a Deep Learning Approach to

Chapter 2

Preliminaries

Let us briefly fix some notation we will use in this thesis. We will in particular makeconventions that will hold for the whole thesis unless explicitly stated otherwise. Wewill adapt most of the notation from Jacod and Shiryaev [JS13] which should be to thebenefit of readers, familiar with this reference.

2.1 Basic notational conventions

Throughout this thesis, we have

(i) N0 = N ∪ 0 with 0 /∈ N

(ii) R+ = [0,∞), and R>0 = (0,∞)

(iii) T = (s, t) ∈ R2 : 0 ≤ s ≤ t

(iv)⋃· i∈I Ai is the disjoint union of the sets Ai for i ∈ I

(v) inf∅ :=∞ and sup∅ = 0.

(vi) a ∧ b = mina, b and a ∨ b = maxa, b, where a, b ∈ A for a totally ordered set(A,≤)

(vii) x+ = x ∨ 0 and x− = x ∧ 0 for x ∈ R

(viii) Ac = the complement of a set A

(ix) bxc = supn ∈ N0 : n ≤ x for x ∈ R+

(x) tn ↓ t a sequence of real numbers converging to t ∈ R with tn > t for all n ≥ 1

7

Page 18: Polynomial Semimartingales and a Deep Learning Approach to

8 CHAPTER 2. PRELIMINARIES

(xi) tn ↑ t a sequence of real numbers converging to t ∈ R with tn < t for all n ≥ 1

(xii) limε↓0 = limε→0,ε>0

(xiii) absolute continuity between measures

(xiv) a.s. = almost surely and a.e. = almost everywhere

(xv) 1A indicator function of the set A

(xvi) × the Cartesian product

(xvii) ⊗ product σ-algebra

(xviii) ‖x‖Rd any norm on Rd for x ∈ Rd

(xix) ‖x‖p =(∑d

i=1 |xi|p) 1p for x ∈ Rd and p ∈ [1,∞)

(xx)∨

[a,b](f) the total variation of a function f on the interval [a, b]

(xxi) Polk the set of polynomials over Rd of at most degree k

(xxii) deg f = the degree of a polynomial f , hence the smallest k for which f ∈ Polk

(xxiii) Mn(R) = Rn×n, the set of square matrices over R in dimension n

(xxiv) detM , the determinant of a matrix M ∈ Mn(R)

Fix a dimension d ∈ N. Let I ⊂ R+. For a function u : I → Rd, define

u∗t := sups∈I,s≤t

‖u(s)‖Rd .

For I = R>0, we extend a function u to R+ unless otherwise stated by setting u(0) = 0.If u has left limits, we fix

ut− := limε↓0

ut−ε, ∆ut := ut − ut−,

where we make the convention u0− = u0 which implies ∆u0 = 0, compare the conventionsmade in [JS13, p. I.1.8]. We extend these conventions to stochastic processes X : R+ ×Ω → Rd by making them for each ω ∈ Ω fixed, i.e. for each function t 7→ X(t, ω).We will interchangeably use the notation Xt = X(t) for both, stochastic processes anddeterministic functions.

For a matrix M ∈ Rd×d we denote by ‖M‖ = sup‖x‖Rd=1 ‖Mx‖ the induced operatornorm for ‖·‖Rd . It will be clear from context which vector norm is meant or a statementof the form ‖M‖ <∞ will be independent of the concrete choice of the norm by meansof equivalence of norms.

Page 19: Polynomial Semimartingales and a Deep Learning Approach to

CHAPTER 2. PRELIMINARIES 9

2.2 Notation and conventions for multivariate polynomials

Following Filipović and Larsson [FL16], we consider the set Polk of polynomials over Rdwith real coefficients and of most degree k, i.e.

Polk =

x 7→k∑|l|=0

αlxl | αl ∈ R,l = (l1, . . . , ld), x

l = xl11 . . . xldd

.

Denote by Nk = N the dimension of Polk. Note that Polk is a linear space of finitedimension. Fix a basis β = β1, . . . , βN of Polk and define the vector valued function

Hβ(x) := (β1(x), . . . , βN (x))>,

such that we then have that given f ∈ Polk, there is a unique coordinate representationf ∈ RN such that f(x) = f>Hβ(x) for all x ∈ Rd. It is clear that f depends on theconcrete choice of β. In Chapter 5, we will introduce additional conventions regarding βthat allow for a convenient notation when considering multiple basis β for several degreesl of Poll.

Throughout, we will use the following notation. For x ∈ Rd, define fi(x) = xi, i =1, . . . , d, the projection on the i-th coordinate. Hence, fi ∈ Pol1. We will further use themulti-index notation. For k = (k1, . . . , kd) ∈ Nd0, define the polynomial

fk(x) = f(k1,...,kd)(x) =d∏i=1

fi(x)ki .

Hence, for |k| =∑d

i=1 ki, we have fk ∈ Pol|k|. When d = 1, we want to distinguish fifrom fk by making the convention

fi(x) = x while f(i)(x) = xi,

even though the left expression only makes sense for i = 1 while the one on the right forall i ∈ N0. We denote by ≤ the partial order on Nd, i.e.

l ≤ k⇔ li ≤ ki ∀i ∈ 1, . . . , d.

Hence for a sequence (cl)l∈Nd , the sum∑

l≤k cl is taken over all multi-indices satisfyingl ≤ k. This implies |l| < |k| for l 6= k. Further, if j ≤ k are two elements of N0, wedenote with

∑k|l|=j cl the sum over all multi indices l satisfying j ≤ |l| ≤ k. Hence if

k = |k|, the sum contains all elements with degree k, not just k in contrast to the firstnotation for sums we introduced above. It will further be convenient to use the multiindex notation for binomial coefficients, also referred to as multi-binomial coefficients.These are defined by (

k

l

):=

d∏i=1

(kili

).

Page 20: Polynomial Semimartingales and a Deep Learning Approach to

10 CHAPTER 2. PRELIMINARIES

Note that we have that if l k, then(kl

)= 0. Hence, for k = |k| we get the equality

∑l≤k

(k

l

)cl =

k∑|l|=0

(k

l

)cl.

This is in particular useful in the context of the multi-binomial theorem. Recall that forx, y ∈ Rd we have

fk(x+ y) = (x+ y)k =∑l≤k

(k

l

)xk−lyl =

k∑|l|=0

(k

l

)xk−lyl. (2.1)

Let us make the convention that a sum, indexed by an empty set, is zero. The followinglemma will be useful when we characterize k-polynomial processes as semimartingales inChapter 6. A proof can be found in [CAH17], but we present a shorter and more directproof for the readers convenience. We denote with ei ∈ Nd the element with all entrieszero except the i-th being one. In particular we have fei(x) = fi(x). Further, we denotewith Di and Dij the partial derivatives of a function f : Rd → R as is customary in theliterature.

Lemma 2.1. Let x, y ∈ R2 and k ∈ Nd0 with k = |k|. We then have

(x+ y)k − xk −d∑i=1

Difk(x)fi(y) =k∑|l|=2

(k

l

)xk−lyl (2.2)

Proof. Note that we have

l ∈ Nd : |k| = 0 = (0, . . . , 0) and l ∈ Nd : |k| = 1 = ei, i ∈ 1, . . . , d.

Since for l with |l| = 0, we get Difl(x) = 0, the case |k| = 0 is clear. Consider now|k| = 1, hence k = ei for some i. Then

(x+ y)ei − xei −d∑j=1

Djfei(x)fj(y) = xi + yi − xi − yj = 0,

hence the case |k| = 1 is shown too. Consider now |k| ≥ 2. By the multi-binomialtheorem (2.1) we have

(x+ y)k =

k∑|l|=0

(k

l

)xk−lyl.

We now have0∑|l|=0

(k

l

)xk−lyl = xk.

Page 21: Polynomial Semimartingales and a Deep Learning Approach to

CHAPTER 2. PRELIMINARIES 11

Further, we have

1∑|l|=1

(k

l

)xk−lyl =

d∑i=1

(k

ei

)xk−eiyei =

d∑i=1

kixk−eifi(y).

On the other hand, we have

d∑i=1

Difk(x)fi(y) =d∑i=1

kifk−ei(x)fi(y) =d∑i=1

kixk−eifi(y),

which shows (2.2) as claimed.

By introducing an order on the set of multi-indices, we will be able to identify a multi-index with a basis function βi. That will become convenient later. Consider the sets

Λk =α ∈ Nd : |α| ≤ k

.

Then fα : α ∈ Λk is a basis of Polk. We define the graded reverse-lexicographic

orderglex≤ on Λk the following way. Let

lex≤ be the lexicographic order on Λk. For

α, γ ∈ Λk, we define the total orderglex≤ by:

• If |α| 6= |γ|, then αglex≤ γ for |α| < |γ|, else γ

glex≤ α.

• If |α| = |γ|, then αglex≤ γ for α

lex≥ γ, else γ

glex≤ α.

Since α lex= γ implies α = γ, this order is well defined. Under this order, we can enumerate

the elements of Λk. Let αi : i = 1, . . . , |Λk| be this enumeration. Then βi = fαi definesa basis of Polk.

Example 2.2. Consider the case d = 2 and k = 2. Then

α1 = (0, 0), α2 = (1, 0), α3 = (0, 1), α4 = (2, 0), α5 = (1, 1), α6 = (0, 2).

From here on, we refer to the basis βi = fαi as the canonical basis of Polk. Since thecanonical basis of Polk+1 is a basis extension of the canonical basis of Polk, we haveactually defined a basis of Pol = ∪n∈NPoln and refer to that one as the canonical basisof Pol.

Page 22: Polynomial Semimartingales and a Deep Learning Approach to

12 CHAPTER 2. PRELIMINARIES

2.3 Function spaces

We denote by C(X,Y ) the set of continuous functions f : X 7→ Y for two topologicalspaces (usually some form of Rd) X and Y . The set of Borel-measurable functions isdenoted by E(X,Y ). In case of Y = R, we simply write C(X) (resp. E(X)). Therestriction to bounded functions is denoted by bC(X) (resp. bE(X)). With D(I,Rd),we denote the set of all functions that are indexed by I ⊂ R, take values in Rd and arecàdlàg. For a set I equipped with a topology (usually Rd or some subset), we denotewith B(I) the corresponding Borel σ-field.

Let (Ω,F , P ) be a probability space and p ∈ [1,∞). We denote with Lp(Ω,F , P ) theset of all F measurable random variables X, taking value in Rd for some d ∈ N such thatE[‖X‖pRd

]<∞ where the expectation is taken under the measure P .

Note that under this definition, two random variables X,Y might belong to the setLp(Ω,F , P ) while having different state spaces. For our purposes, this will not pose aproblem since we only want to indicate integrability.

Page 23: Polynomial Semimartingales and a Deep Learning Approach to

Chapter 3

General theory

In this section we want to prepare some of the general theory of stochastic processes forthe readers convenience. The first part will state some general results from probabilitytheory as we will need it. We will not introduce all concepts we use, for the main partof the theory of semimartingales, we follow the school of Protter [Pro05] and Jacod andShiryaev [JS13] and refer the reader to these references for any concept or notation wefail to introduce here.

3.1 General probability

Let us start with a fundamental definition.

Definition 3.1. We call a collection (Ω,F ,F, P ) a stochastic basis, if (Ω,F , P ) is aprobability space and F = (Fs)s≥0 a filtration of sub-σ-algebras of F , satisfying the usualconditions:

(i) non-decreasing, i.e. for all s ≤ t, Fs ⊂ Ft,

(ii) right continuous, i.e. Ft = ∩ε>0Ft+ε,

(iii) completeness, F0 is P complete and F0 contains all P -null sets of F .

This definition slightly differs from Jacod and Shiryaev [JS13, Definition I.1.2 and I.1.3]as we always assume property (iii) above whenever we speak of a stochastic basis in thisthesis.

In this thesis, all equalities between random variables is to be understood in an almostsure sense unless otherwise stated. Sometimes we want to emphasize this or want to

13

Page 24: Polynomial Semimartingales and a Deep Learning Approach to

14 CHAPTER 3. GENERAL THEORY

emphasize under which measure (not always probability measure) an equality is to beunderstood. Hence for a measure Q, we write

X = Y [Q],

wheneverX and Y agree outside aQ-zero set. Further, for a sequence of random variables(Xn) and a random variable X, we write Xn

Lp→ X, if

limn→∞

E[‖Xn −X‖pRd

]= 0, p ∈ [1,∞).

Consider the probability space (Ω,F , P ). We will need the concept of uniformly in-tegrable family of random variables defined on a common probability space, compareMeyer [Mey66, Definition D17 on page 16]. Note that this concept exists in a moregeneral measure theoretical framework, see for example Rudin [Rud87].

Definition 3.2. Let H be a family of Rd-valued random variables X ∈ L1(Ω,F , P ). His called uniformly integrable, if for all ε > 0 there is a c ∈ R+ such that

E[‖X‖Rd 1‖X‖Rd≥c

]< ε ∀X ∈ H. (3.1)

The motivation for the concept of uniform integrability becomes clear with the nexttheorem which is a generalization of the dominated convergence theorem. Again we referto Meyer [Mey66, Theorem 21 on page 18] for the proof.

Theorem 3.3. Let (Xn)n≥1 be a sequence of Rd-valued random variables such that forall n ∈ N, Xn ∈ L1(Ω,F , P ) and assume that they converge a.s. to a random variable

X. Then X ∈ L1(Ω,F , P ) and it holds XnL1→ X if and only if the sequence (Xn)n≥0 is

uniformly integrable.

We will now state a theorem due to La Vallée Poussin (1866 – 1962) that allows for aconvenient characterization of uniform integrability. For the proof see Meyer [Mey66,Theorem T22].

Theorem 3.4. Let H be a set of Rd-valued random variables which is a subset ofL1(Ω,F , P ). The following properties are equivalent:

(1) H is uniformly integrable

(2) There exists a positive, convex and increasing function ϕ(t) on R+ such that

limt→+∞

ϕ(t)

t= +∞, (3.2)

andsupX∈H

E [ϕ(‖X‖Rd)] <∞. (3.3)

Page 25: Polynomial Semimartingales and a Deep Learning Approach to

CHAPTER 3. GENERAL THEORY 15

Let (Ω,F ,F, P ) be a stochastic basis. We will need the following lemma when dealingwith polynomial processes.

Lemma 3.5. Let X be an R-valued adapted càdlàg process and assume that for everyt > 0, we have that

Xs : 0 ≤ s ≤ t is uniformly integrable.

Then it holds that

(i)sup

0≤s≤tE [|Xs|] <∞, sup

0<s≤tE [|Xs−|] <∞ ∀ t ≥ 0.

(ii) for every tn ↓ t we have for fixed s,

E [Xtn | Fs]L1→ E [Xt | Fs] (3.4)

and for tn ↑ tE[Xtn| Fs

] L1→ E [Xt− | Fs] . (3.5)

Proof. Choose c > 0 (depends on t) large enough such that

sup0≤s≤t

E[|Xs|1|Xs|≥c

]≤ ε.

This is possible by the definition of uniform integrability. Hence we get

sup0≤s≤t

E [|Xs|] = sup0≤s≤t

E[|Xs| (1|Xs|≥c + 1|Xs|<c)

]≤ ε+ c

Further, for un ↑ u for 0 < u ≤ t, we have by continuity of |·| and the càdlàg property ofX and Fatou’s lemma that

E [|Xu−|] = E[lim infn→∞

|Xun |]≤ lim inf

n→∞E [|Xun |] ≤ sup

0≤s≤tE [|Xs|] <∞.

Hence, since the upper bound holds for all u, we have shown (i). Regarding (ii), notethat (i) implies that the conditional expectations in (ii) are well defined. Again by thecàdlàg property of X we have for tn ↑ t and tn ↑ t,

Xtn−→Xt and Xtn−→Xt− [P ].

Together with the uniform integrability assumption, (ii) follows from theorem 3.3 since

E [|E [Xtn |Fs]− E [Xt|Fs]|] ≤ E [|Xtn −Xt|] ,

which goes goes to zero by L1 convergence. The same is true for tn.

Page 26: Polynomial Semimartingales and a Deep Learning Approach to

16 CHAPTER 3. GENERAL THEORY

We follow the notation in Jacod and Shiryaev [JS13] and denote with M the set of alluniformly integrable martingales. For a set of processes A, we denote with Aloc thelocalized class. To be precise, we call a sequence of stopping times (Tn)n a localkizingsequence, if Tn → ∞ a.s. for n → ∞. A process M is then an element of Aloc, if thestopped process MTn ∈ A for all n. In our notation we have MTn

t = Mt∧Tn . Hence,Mloc denotes the set of all local martingales. We further make the following standardconvention, where a Rd valued process is a (local) martingale (resp. semimartingale), ifeach component is a (local) martingale (resp. semimartingale).

The following lemma is a well known result and gives a sufficient condition for a localmartingale to be a true martingale, see e.g. [CAH17]. It is stated for dimension d = 1which implies the general case.

Lemma 3.6. Let M be a R-valued local martingale such that for all t ≥ 0,

E [M∗t ] = E[

sup0≤s≤t

|Ms|]<∞. (3.6)

Then M is a true martingale.

Proof. First note that since for all t ≥ 0, we have |Mt| ≤ sup0≤s≤t |Ms|, the conditionE [|Mt|] < ∞ is satisfied for all t ≥ 0. Further, let τn be a localizing sequence of M .Hence, for each n, M τn is a true martingale. For all stopping times τ ∈ [0, t] a.s. we have

|Mτ | ≤ sup0≤s≤t

|Ms| , (3.7)

which due to u ∧ τn ∈ [0, t] for 0 ≤ u ≤ t justifies the application of the dominatedconvergence theorem yielding

E [Mt | Fs] = E[

limn→∞

Mt∧τnFs]

= limn→∞

E [Mt∧τn | Fs]

= limn→∞

M τns = Ms.

(3.8)

Hence, M is a true martingale.

3.2 Processes of locally bounded variation

This section intends to give a brief overview regarding processes (including deterministicfunctions) that are càdlàg and have finite variation on compacts. These processes playan important role in the theory of semimartingales and we do not aim for a completeoverview but rather want to use this opportunity to introduce some additional notation.We refer the reader to [JS13] for a comprehensive overview that exhausts all the pointswe make in this section. In particular we refer the reader to this reference for the proofsof the statements we make in this section.

Page 27: Polynomial Semimartingales and a Deep Learning Approach to

CHAPTER 3. GENERAL THEORY 17

We denote with V+ the set of all R-valued processes A that are càdlàg , adapted and withA0 = 0 such that for all paths, t 7→ At(ω) is nondecreasing. Denote by V = V+V+. Inparticular we get for each A ∈ V that for arbitrary finite T ≥ 0 we have for all ω ∈ Ω,that ∨

[0,T ]

(A)(Ω) <∞,

where∨

[0,T ](A)(ω) is the total variation of the path t 7→ At(ω) on the interval [0, T ],compare [JS13, Proposition I.3.3]. We will sometimes refer to these functions as of finitevariation (FV).

Next we introduce the set of functions as above that satisfy an additional integrabilitycondition. Define A+ to be the set of functions A ∈ V+ such that

E [A∞] <∞.

Similarly, define A to be the set of functions A ∈ V, such that

E

limT→∞

∨[0,T ]

(A)

<∞.Again by [JS13, Proposition I.3.3], we get A = A+ A+. Overall we get the followinginclusions:

Aloc = A+loc A

+loc, A+ ⊂ A+

loc ⊂ V+, A ⊂ Aloc ⊂ V.

It is easy to see that for deterministic A ∈ V one gets A ∈ Aloc. A more general version ofthis result is given by [JS13, Lemma I.3.10] that we state here for the readers convenience.

Lemma 3.7. For a predictable process A ∈ V there exists a localizing sequence (τn) suchthat (

∨[0,·](A))τn = (

∨[0,·∧τn](A)) ≤ n. In particular we get A ∈ Aloc.

Further, we say a function (process) that takes values in Rn1×n2 is in A, if f(·)i,j ∈ Aholds for all i, j. We do the same for V,V+ etc. Finally, let us define V+

Ωas the set of all

deterministic, càdlàg and nondecreasing functions A with A0 = 0. Hence, by identifyingA as an element of V+ that is constant on Ω, we have V+

Ω⊂ V+. In the same way, define

VΩ = V+

Ω V+

Ω.

Since a process A ∈ V defines a signed measure for each ω ∈ Ω, we can define the integral∫(0,t]

f(u)A(du) = (f • A)t, (3.9)

for an appropriate function f . We always define such an integral in a Lebesgue-Stieltjessense and not as a Riemann-Stieltjes integral. The same notation will be used for thestochastic integral in a semimartingale setting, and we will switch between both notationsin (3.9) whenever we believe it to be preferable.

Page 28: Polynomial Semimartingales and a Deep Learning Approach to

18 CHAPTER 3. GENERAL THEORY

Page 29: Polynomial Semimartingales and a Deep Learning Approach to

Chapter 4

Matrix evolution systems andmeasure integral equations

This chapter is intended to give an introduction to the concept of matrix evolution sys-tems (MESs) and their special case in terms of homogeneity in form of matrix semigroups(MSGs). We start by discussing the semigroup case including the concept of infinitesimalgenerators. As certain regularity properties that are already implied under very mild as-sumptions for the semigroup case are lost when dealing with the general evolution systemsetting - making it unclear how the concept of infinitesimal generators can be extendedbeyond homogeneity - we show how a generalizing notion of extended generators canbe established, that coincides in the homogeneous case with the classical definition andis applicable in situations, where a classical generator does not even exist.

After having established first results for infinitesimal generators of MESs, we will continuewith showing how such evolution systems are related to certain integral equations throughtheir extended generators. This will equip us with the tools needed to study a class ofstochastic processes we call polynomial processes and certain types of semimartingaleswe call polynomial semimartingales. In particular we can then show that extendedgenerators of these stochastic processes are tightly connected to the extended generatorsof evolution systems induced by these processes.

Before we start, let us first give a motivation for the study of evolution systems and bythat semigroups as a special case. This should only be regarded as an informal exposition,we will formalize these concepts in their respective sections.

Assume we have a deterministic dynamical system starting at some configuration x ∈ Efor some state vector space E at time s ≥ 0 such that the state of the system at timet ≥ s is described by the function φ(s, x; t). If we denote that state at t with y, we havey = φ(s, x; t). Now imagine we consider the state of the system at t + u, u ≥ 0, againstarting in x at time s, we get that the state z satisfies z = φ(s, x; t + u). Then we can

19

Page 30: Polynomial Semimartingales and a Deep Learning Approach to

20 CHAPTER 4. MES AND MEASURE INTEGRAL EQUATIONS

"split" the evolution of the system into two parts, the first being that we start at s inx end evolve until time t. The second that we start in y at time t and evolve until timet+ u, landing in z. Hence, this flow property leads to the following well established flowequations

φ(s, x; t+ u) = φ(t, φ(s, x; t), t+ u), 0 ≤ s ≤ t, u ≥ 0, x ∈ E.

When we consider evolution systems, we assume x 7→ φ(s, x; t) to be a linear map fromE into itself for any 0 ≤ s ≤ t. Hence one assumes to have a family of linear operatorsτ indexed by (s, t) with s ≤ t and τs,t : E → E such that φ(s, x; t) = τs,tx. This yieldsthe operator equation τs,t+u = τt,t+u τs,t and considering that the system should remainunchanged when no time has passed, one has additionally φ(s, x; s) = x and by thatτs,s = IdE , the identity on E. When we say the system is homogeneous, we mean bythat that the evolution of the system only depends on the length of time it evolved, notat what time this evolution started. Hence one would have φ(s, x; t) = φ(0, x; t− s). Inthe linear case, this will lead to the special case of semigroups where one would definea one-parameter family of linear operators by T (t) := M(0, t). The flow property thentranslates to T (s+ t) = T (s)T (t) = T (t)T (s) and one would have T (0) = IdE .

Note that when studying polynomial processes or polynomial semimartingales, we makeuse of certain algebraic properties that allow to reduce the discussions to finite dimen-sions, or matrix evolution systems. Hence one can assume E = Rn and by fixing a basisof E, τs,t can be identified with a family of matrices M(s, t) ∈ Rn×n. Further we shalldeviate from standard convention in the literature on evolution systems by requiring theflow property, or as we will refer to it from here on evolution property, for the transposedof the matrices, as that will be more convenient in our following discussions, see theformal definition of matrix evolution systems in the respective section.

Let us finish this prelude by fixing some notation. Throughout we fix some dimensionn ∈ N and we denote the ring of n × n matrices over R by Mn(R) = (Rn×n,+, ·),with standard component wise addition ” + ” and matrix multiplication ” · ” where themultiplication ”·” is usually omitted in tradition of standard convention, i.e. unless statedotherwise, we always mean for two elements L, J ∈Mn(R) the matrix multiplication whenwe write L J . The identity matrix is denoted by Idn. With GLn ⊂ Mn(R) we denotethe general linear group, i.e. all those matrices in Mn(R) that are invertible.

Further, define the index set T by

T =

(s, t) ∈ R2 : 0 ≤ s ≤ t.

4.1 Matrix semigroups

This section intends to introduce matrix semigroups (MSGs) and give a first set of resultsfor these. These results, together with proofs and further references can be found in

Page 31: Polynomial Semimartingales and a Deep Learning Approach to

CHAPTER 4. MES AND MEASURE INTEGRAL EQUATIONS 21

[EN99, Chapter 1]. We in particular want to name this reference for the general infinitedimensional case. The study of these MSGs and many more is subject of the theory offunctional equations. It is worth mentioning that it was the french mathematician A.Cauchy who motivated the study of the scalar case (see [EN99, page 2] for reference andthe translation):

Déterminer la fonction φ(x) de manière qu’elle reste continue entredeux limites réelles quelconques de la variable x, et que l’on aitpour toutes les valeurs réelles des variables x et y

φ(x+ s) = φ(x)φ(y).

(A. Cauchy, 1921 )

Note that φ(0) = 1 is not required and indeed this would follow already when excludingφ(x) ≡ 0. Analogously, in the matrix case, one would not need to assume T (0) = IdE ,provided one assumes there exists a t ≥ 0 such that T (t) ∈ GLn. This is due to the factthat one has for any x ∈ E

T (t)x = T (t)T (0)x,

which due to T (t) ∈ GLn implies T (0)x = x ∀x ∈ E, hence T (0) = Idn. As we areprimarily interested in evolution systems, where T (0) = Idn is a priory satisfied, we shallinclude this property in our first definition.

Definition 4.1. We call T : R+ → Mn(R) a matrix semigroup (MSG) (in dimension nor on Mn(R)), if

• T (0) = Idn,

• T (s+ t) = T (s)T (t).

If T (t) ∈ GLn for all t ≥ 0 and t 7→ T (t) ∈ C(R+,Mn(R)), we call T regular. In thatcase, T can always be extended to a group over R by setting T (−t) = T (t)−1 for t ≥ 0.

Let us establish some first preliminary results regarding regularity of the map t 7→ T (t).The following proposition shows that T is already regular under mild assumptions.

Proposition 4.2. Let T be a MSG on Mn(R) and assume t 7→ T (t) is right continuousat t = 0. Then T is regular.

Proof. First note that by right continuity, there is a ε∗ > 0 such that T ([0, ε∗]) ⊂ GLn,since GLn is an open subset of Mn(R). But that actually implies that T (t) ∈ GLn forall t ≥ 0. To see that, assume

t∗ = inf t > 0 : T (t) /∈ GLn <∞.

Page 32: Polynomial Semimartingales and a Deep Learning Approach to

22 CHAPTER 4. MES AND MEASURE INTEGRAL EQUATIONS

Note that it holds t∗ ≥ ε∗. Otherwise there exists a t ∈ (t∗, ε∗) s.t. T (t) /∈ GLn bythe very definition of t∗. As T (ε∗) = T (t)T (ε∗ − t), this implies T (ε∗) /∈ GLn which isa contradiction to the choice of ε∗, hence we have t∗ ≥ ε∗. Then for 0 < ε ≤ ε∗, wehave T (t∗ − ε) ∈ GLn and T (ε) ∈ GLn. Hence, because GLn is a group wrt the matrixmultiplication, we get

T (t∗) = T (t∗ − ε)T (ε) ∈ GLn .

By the assumption t∗ < ∞, we have that T (t∗ + ε) /∈ GLn for ε > 0 small enough. Butnow we continue with

T (t∗)T (ε) = T (t∗ + ε),

which implies T (t∗ + ε) ∈ GLn, which is a contradiction to the definition of t∗. Hencewe get t ≥ 0 : T (t) /∈ GLn = ∅. Therefore, we can extend T to a group by settingT (−t) = T (t)−1 for t ≥ 0. Let t > 0. We first show left continuity. Note that we havefor ε > 0 small,

T (t) = T (t− ε)T (ε).

By assumption, T (ε)→ Idn as ε ↓ 0. This yields T (t) = limε↓0 T (t− ε), which shows leftcontinuity. Next we show right continuity, where we shall make use of the extension toa group. It holds

T (t) = T (t)T (ε)T (−ε) = T (t+ ε)T (−ε) = T (t+ ε) (T (ε))−1 .

Since the function A 7→ A−1 is continuous on GLn, right continuity follows by the samearguments we made for left continuity. As t > 0 was arbitrary, this concludes theproof.

The next theorem is a very well known fact from MSG theory. It shows that regularMSGs are of a particular form. In particular, in combination with the previous result, itshows the following chain of implications (compare [EN99, section 1.6]):

right continuity at t = 0⇒ T is regular⇒ The map t 7→ T (t) is smooth.

As Engel and Nagel noted in [EN99, section 1.6], this can be seen as a contribution tothe second part of Hilbert’s fifth problem. There, one can also find the following quoteby David Hilbert, together with a translation into the English language:

Überhaupt werden wir auf das weite und nicht uninteressante Feldder Funktionalgleichungen geführt, die bisher meist nur unter Vo-raussetzung der Differenzierbarkeit der auftretenden Funktionenuntersucht worden ist. Insbesondere die von Abel mit so vielemScharfsinn behandelten Funktionalgleichungen, die Differenzengle-ichungen und andere in der Literatur vorkommende Gleichungenweisen an sich nichts auf, was zur Forderung der Differenzierbarkeitder auftretenden Funktionen zwingt. (...) In allen Fällen erhebt

Page 33: Polynomial Semimartingales and a Deep Learning Approach to

CHAPTER 4. MES AND MEASURE INTEGRAL EQUATIONS 23

sich daher die Frage, inwieweit etwa die Aussagen, die wir im Falleder Annahme differenzierbarer Funktionen machen können, untergeeigneten Modifikationen ohne diese Voraussetzung gültig sind.

(D. Hilbert, 1900, see [Hil70] )

We now state the theorem that yields smoothness under mild assumptions. We notethat we do not state the most general form of the following theorem as one can alreadyestablish strong results by just assuming t 7→ T (t) is measurable, see Exercise 1.7 in[EN99]. As we are interested in using MSGs and MESs primarily in the study of càdlàgprocesses, it suffices for our purposes to assume more regularity.

Theorem 4.3. Let T be a regular MSG on Mn(R). Then there exists a unique matrixG ∈Mn(R) such that we have

T (t) = exp (tA) , ∀t ≥ 0.

In reverse, any function T that is defined by the above matrix exponential for an arbitraryA ∈Mn(R), is a regular MSG.

Proof. The reverse statement is just a consequence of elementary properties of the matrixexponential, for the proof see Proposition 2.3 in [EN99]. The first direction is proved byTheorem 2.9 in [EN99]. To be complete we should note that by that theorem, one canonly deduce the existence of a matrix A ∈Mn(C). However, as Mn(R) is a closed subsetof Mn(C), and since by Proposition 2.8 in [EN99] one has

A = limε↓0

T (ε)− T (0)

ε, (4.1)

it immediately follows thatA ∈Mn(R). Regarding uniqueness, assume there areA1, A2 ∈Mn(R) with T (t) = exp(tA1) = exp(tA2) ∀t ≥ 0. Hence (4.1) implies A1 = A2 whichfinishes the proof.

The matrix A in the theorem above plays an important role in the study of MSGs as itallows for an infinitesimal description of T . We want to discuss this a little bit further.First, let us make the following definition, compare Definition 2.4 in [EN99].

Definition 4.4. Given a regular MSG T with T (t) = exp(tA), we call A the (infinitesi-mal) generator of T . Conversely, we say T is the regular MSG generated by A.

To make clear what we mean by infinitesimal description of T by A, let us state thefollowing result we have already cited above.

Proposition 4.5. Given the a MSG T (t) = exp(tA), it holds T ∈ C∞(R+,Mn(R)) andddtT (t) = AT (t) = T (t)A, for t ≥ 0,

T (0) = Idn .

Page 34: Polynomial Semimartingales and a Deep Learning Approach to

24 CHAPTER 4. MES AND MEASURE INTEGRAL EQUATIONS

Proof. This is just [EN99, Proposition 2.8]. Note that from that proposition we getddtT (t)|t=0 = A. Hence it holds

AT (t) =

(limε↓0

T (ε)− Idnε

)T (t) = lim

ε↓0

T (t+ ε)− T (t)

ε= T (t)A,

which is a well known fact for MSGs.

In the next section we shall study the inhomogeneous case in form of evolution systems.There, unlike above, the regularity assumptions we shall make do not imply the evolutionsystem to be differentiable or even continuous. Nonetheless we will be able to to define aconcept of infinitesimal description in form of what we will call the extended generatorof an evolution system. The idea is to look at

d

dtT (t) = T (t)A (4.2)

as the defining property of the extended generator where the time derivative is replacedwith a more general Radon-Nikodým derivative where the measure that replaces theLebesgue measure is connected to the discontinuities of the evolution system. We willformalize this in the following section.

4.2 Matrix evolution systems

This section can be seen as the time inhomogeneous counter part to the previous section.Instead of MSGs, we will consider matrix evolution systems. Evolution systems havebeen treated in the literature for a long time in an infinite dimensional setting, see inparticular [EN99, Chapter 9] where such systems are introduced in the study of non-autonomous Cauchy problems and [Böt14] where generators are defined by left and rightlimits corresponding to the semigroup case for the study of time inhomogeneous Markovprocesses. We also refer the reader to the plenty of references given in those two sources.In both cases, strong continuity assumptions are made and a key tool in the analysis isthe transformation to the homogeneous setting by means of space-time homogenization ofeither the evolution system stemming from a partial differential equation to a semigroup,compare [EN99, Lemma 9.10] or a time inhomogeneous Markov process to a homogeneousone, see [EN99, Section 3].

Our situation however is different since we are interested in evolution systems as a toolto study a certain class of stochastic processes we call polynomial processes. First, wewant to include cases beyond strongly continuous evolution systems. Second, the stochas-tic processes considered have a defining algebraic property, that allows it to consider finitedimensional evolution systems. Hence, a space-time homogenization would force us toleave the finite dimensional setting. It is however this finite dimensionality that we are

Page 35: Polynomial Semimartingales and a Deep Learning Approach to

CHAPTER 4. MES AND MEASURE INTEGRAL EQUATIONS 25

interested in as it allows us to consider (measure) ordinary differential equations ratherthen (measure) partial differential equations for polynomials as initial conditions, seee.g. the step (iii)⇒ (i) in the proof of [Cuc+12, Theorem 2.7] where this is done in thehomogeneous case with strong time derivatives rather then Radon-Nikodým derivatives.

As mentioned already, we do not want to assume continuity for our evolution systemswe will shortly introduce. In the study of càdlàg processes, as we intend to do, it seemsmore appropriate to consider such evolution systems that only satisfy càdlàg propertiesas we introduce further down in Definition 4.10. The following subsection will give a shortsummary of results for the inhomogeneous version of Cauchy’s functional equation. Therethe reader will see a full characterization in the spirit of Theorem 4.3. Unfortunately,these results will require assumptions that we do not wish to incorporate a priory andindeed we will see that they do not hold valid in all cases. Hence, the reader may skipthe following subsection as it is not needed for the discussions that follow it.

4.2.1 Cauchy-Sincov-Equations

This subsection only intends to present results to the inhomogeneous version of Cauchy’smultiplicative functional equation known as Cauchy-Sincov functional equations. Wewant to mention that there are even further generalizations in form of Cauchy-Pexider-Sincov equations where the binary operation between two elements of the range of therespective functions (in the above setting this was the matrix multiplication) is substi-tuted with a general function.

For a more detailed exposition, we refer the reader to the excellent book of Aczél [Acz66],where the theory of functional equations is treated to a greater extent with many morefunctional equations that go beyond the Cauchy types, see [Acz66, Section 2.1].

We consider the following functional equation. Let A be an arbitrary set and G a set onwhich there is a binary operation : G×G → G. Consider a function F : A× A → G.The Cauchy-Sincov functional equation is given by

F (x, y) F (y, z) = F (x, z), for all x, y, z ∈ A, (C-S FE)

and the goal is to characterize all those functions F that satisfy (C-S FE). In the contextof matrix evolution systems, the above setting is different since A can be an arbitraryset. When dealing with evolution systems, we consider A = R+ and additionally requirex ≤ y in F (x, y), or (x, y) ∈ T. On the other hand, recall that we have seen in the MSGcase that T (t) ∈ GLn under very mild assumptions. Hence, if we assume G to be a groupwrt , A = R+ and denote for g ∈ G the inverse element by g−1, we can extend F toR×R by setting F (y, x) := F (x, y)−1 for x ≤ y and using F (−x, y) = F (−x, 0)F (0, y) =F (0, x)−1 F (0, y) for x, y ≥ 0. This extended F still satisfies (C-S FE), and one has

F (x, y) = F (x, x) F (x, y) = F (x, y) F (y, y),

Page 36: Polynomial Semimartingales and a Deep Learning Approach to

26 CHAPTER 4. MES AND MEASURE INTEGRAL EQUATIONS

yielding that F (x, x) = F (y, y) = e, where e is the neutral element of G. In particular,one has for s ≤ u ≤ t that

F (s, t) F (t, u) = F (s, t) F (u, t)−1 = F (s, u) F (u, t) F (u, t)−1 = F (s, u).

It is this very assumption that F takes values in a group G, allowing for an inverseoperation, that yields a full characterization of solutions to (C-S FE). The followingtheorem can be found in [Acz66, Theorem 2 in Section 8.1].

Theorem 4.6. Assume G is a group with group operation and A an arbitrary set suchthat ∀x, y ∈ A it holds F (x, y) ∈ G. Then all solutions to

F (x, y) F (y, z) = F (x, z) for all x, y, z ∈ A

are of the formF (x, y) = n(x)−1n(y),

for some function n : A→ G.

Note that the above theorem does not assume any regularity on F , only the groupstructure is important. Further down, we see that it is desirable to have an evolutionsystem that takes values in a group, namely GLn as that simplifies the definition of theextended generator. The hope is therefore, that by posing regularity assumptions wrtthe domain of F , it is possible to show that F automatically takes values in a group,compare Proposition 4.2. However, we will show that there are examples where despiteour regularity assumptions (càdlàg properties of x 7→ F (x, y) and y 7→ F (x, y)), F willnot always lie in a group.

Nonetheless there is a result that guaranties that G is indeed a group. Unfortunately,the conditions are too strong for us so we only state this theorem for completeness. Forreference and proof, see [Acz66, Theorem 3 in Section 8.1].

Theorem 4.7. If a function F (x, y) is defined for x, y ∈ A and if its values lie in aset G such that there is a binary operation between two elements F (x, y), F (y, z) (i.e.the second argument of the left coincides with the first of the right), for which for anyx, y, z ∈ A the functional equation

F (x, y) F (y, z) = F (x, z)

is satisfied, and such that we have

(a) For all x ∈ G fixed, range(F (x, ·)) = G,

(b) there exists an element a ∈ G, such that range(F (·, a)) = G,

then G is a group wrt the operation .

Page 37: Polynomial Semimartingales and a Deep Learning Approach to

CHAPTER 4. MES AND MEASURE INTEGRAL EQUATIONS 27

4.2.2 Matrix evolution systems revisited

We will now begin with our detailed discussions of matrix evolution systems (MESs),and introduce the concept of extended generators that will play a central role whenconsidering polynomial processes in Chapter 6. Recall that

T = (s, t) ∈ R2 : 0 ≤ s ≤ t.

We shall define MESs as functions of two parameters (such that the pair is in T) thattake values inMn(R) such that they are solutions to (C-S FE) together with the propertythat when both parameters coincide, we get the neutral element ofMn(R), compare withthe remarks made right after (C-S FE). Let us make this formal.

Definition 4.8. We call a function P : T → Mn(R) with (s, t) 7→ P (s, t) a matrixevolution system (in dimension n), if

(i) P (s, s) = Idn for all s ∈ [0,∞),

(ii) For any 0 ≤ s ≤ u ≤ t, P satisfies

P (s, u)P (u, t) = P (s, t).

Note that we do not need P (s, t) to be defined if t < s which would yield the existenceof inverse elements, implying that P takes values in GLn. Further down, we will seethat even after the additional regularity assumptions we will make, it is possible for Pto become singular, see Example 4.24.

The following shows how MESs are a generalization of MSGs.

Definition 4.9. A matrix evolution system P is called homogeneous if for any 0 ≤s ≤ t,

P (s, t) = P (0, t− s).

In other words, the study of P can be reduced to the study of the one parameter semi-group T defined via

T (t) = P (0, t).

The semigroup properties hold for obvious reasons. As we have seen in section 4.1,assuming that the map t 7→ T (t) is right continuous at t = 0 implies

T (t) = exp(tA),

for a unique A ∈ Mn(R) that we referred to as generator. We now want to introducethe regularity assumptions we shall make for MESs. The goal is to define a reasonableconcept of generator suited for our purposes when studying polynomial semimartingales.

Page 38: Polynomial Semimartingales and a Deep Learning Approach to

28 CHAPTER 4. MES AND MEASURE INTEGRAL EQUATIONS

Definition 4.10. We call a matrix evolution system P càdlàg in the first coordinate,if for all t ≥ 0, s 7→ P (s, t) is càdlàg on [0, t]. We call it càdlàg in the secondcoordinate, if for all s ≥ 0, t 7→ P (s, t) is càdlàg on [s,∞).

We call P càdlàg, if it is càdlàg in the first and in the second coordinate.

We can immediately give the following relation between these properties.

Proposition 4.11. Let P be a matrix evolution system which is càdlàg in the secondcoordinate. Then for t > 0 fixed, the function s 7→ P (s, t) is right continuous on [0, t).

Proof. We need to show for any s ∈ [0, s), that

limε↓0

P (s+ ε, t) = P (s, t)

holds. By the evolution system property we have for any ε > 0,

P (s, t) = P (s, s+ ε)P (s+ ε, t).

Since P (s, s) = Idn, and by the right continuity in the second variable, we have thatthere is a ε > 0 such that P (s, s+ ε) ∈ GLn ∀ ε ≤ ε. Hence we get

limε↓0

P (s+ ε, t) = limε↓0

P (s, s+ ε)−1P (s, t).

The claim now follows by the fact that (·)−1 is continuous on GLn and t 7→ P (s, t) isright continuous by assumption.

The following example shows that the previous proposition can not be improved in thesense of existence of left limits.

Example 4.12. Consider the following evolution system ρ in dimension n = 1 given by

ρ(s, t) =η(t)

η(s),

where we have η(x) = (1 − x)1x<1 + 1x≥1. It is then easy to see that this evolutionsystem is càdlàg in the second coordinate but fails to have a left limit in the first coordinateat s = 1 for t = 1 since ρ(1− ε, 1) = 1/ε which does not exist as ε ↓ 0.

Finally, let us note that if an evolution system that is càdlàg in the second coordinatefails to have left limits in a certain point, then this is equivalent to a slightly strongerstatement as illustrated by the following proposition that is an immediate consequenceof the evolution property.

Page 39: Polynomial Semimartingales and a Deep Learning Approach to

CHAPTER 4. MES AND MEASURE INTEGRAL EQUATIONS 29

Proposition 4.13. Let P be a matrix evolution system càdlàg in the second coordinate.Then the following two statements are equivalent:

(i) The limit P (s−, s) does not exist,

(ii) There is a δ > 0, such that the limit P (s−, t) does not exist for all t ∈ [s, s+ δ).

Proof. The direction (ii)⇒ (i) is obviously true. For the other direction, note that

lims↓0

P (s− ε, t) =

(limε↓0

P (s− ε, s))P (s, t).

Hence, by right continuity in the second coordinate, there exists a δ > 0 such thatP (s, t) ∈ GLn. This implies ¬(ii)⇒ ¬(i)", which finishes the proof.

We now want to motivate the next definition regarding regularity of P by the followingobservations. Consider again a regular MSG T . As we have seen in section 4.1, we have

T (t) =dT (t)

dt= T (t)A. (4.3)

Let us collect a few facts about this result. First note that the assumptions we madegive us smoothness for t 7→ T (t). Second, we get that T (t) ∈ GLn∀t ≥ 0 which actuallyallows us to extend T to a group indexed by all of R. Finally, note that we can write(4.3) in integral form as

T (t) = Idn +

∫ t

0T (u)A du. (4.4)

Further, note that the generator A can be represented by

A = T (t)−1T (t),

which will become useful in a more general setting where (4.3) can not be applied any-more.

Before we introduce the concept of extended generators for MESs, let us first extend thedefinition of (infinitesimal) generators to MESs. We shall define these under very strongregularity assumptions.

Definition 4.14. Let P be a MES such that for each s ≥ 0, we have t 7→ P (s, t) is rightdifferentiable. Define the function f : T→ Mn(R) by

f(s, t) := limε↓0

P (s, t+ ε)− P (s, t)

ε.

We call the function G : R+ → Mn(R) defined by G(s) := f(s, s) the (infinitesimal)generator of P .

Page 40: Polynomial Semimartingales and a Deep Learning Approach to

30 CHAPTER 4. MES AND MEASURE INTEGRAL EQUATIONS

Remark 4.15. Note that by the evolution property we have f(s, t) = P (s, t)G(t). If weassume G to be continuous, this implies t 7→ P (s, t) to be continuously right differentiable,meaning that t 7→ f(s, t) is continuous. Then we would have

P (s, t)− P (s, u) =

∫ t

uf(s, u)du,

where the integral above is a Riemann integral. If we were to replace G continuouswith bounded, we could extend this definition to the absolute continuous setting (wrt theLebesgue integral). This idea is contained in the next definition as a special case.

Let us revisit the semigroup case. The points we made regarding regularity can alsobe formulated in the following way. As T is smooth, we have T ∈ V. By definingγ((s, t]) := T (t)− T (s) one can therefore define a signed vector measure γ on R+. From(4.4) one then gets γ dt where dt is the Lebesgue measure. This motivates the nextnotion of regularity.

Definition 4.16. We call a matrix evolution system P regular in the second coor-dinate, if it is càdlàg in the second coordinate and the function t 7→ P (s, t) has finitevariation on [s, T ] for all T ∈ (s,∞). If P is even càdlàg in the sense of Definition 4.10,we call it regular.

The finite variation property for regular evolution systems implies that P (s, ·) can beassociated with a signed (vector) Borel measure γs on (s,∞), i.e. for 0 ≤ s ≤ u ≤ t,

γs((u, t]) = P (s, t)− P (s, u). (4.5)

For i, j ∈ 1, . . . , n, we denote by γi,js the i, j-component of γs, i.e. γi,js ((u, t]) =(P (s, t))i,j − (P (s, u))i,j .

By Equation (4.3), it holds that T (t) is differentiable. Here we are interested in measureswhich are only absolutely continuous with respect to some fixed measure, hence thefollowing definition.

Definition 4.17. Let µ be a σ-finite measure on R>0 and t > 0. The family of measuresγ = (γs)s≥0 introduced in (4.5) is called dominated by µ until t, if for all i, j ∈1, . . . , n, we have

(γu)i,j µ|(u,∞), ∀0 ≤ u ≤ t.

We write γt µ if the family of measures γ is dominated by µ until t. If this is the case

for all t > 0, we simply write γ∞ µ.

Remark 4.18. If one assumes P (s, t) ∈ GLn for all s ≤ t, then the above definitionwould simplify due to the following fact. Because,

γs′((u, t]) = P (s′, t)− P (s′, u) = P (s′, s) (P (s, t)− P (s, u)) = P (s′, s)γs((u, t]),

for 0 ≤ s′ ≤ s ≤ u ≤ t , it would suffice to require γ0 µ component wise whichwould then imply γ

∞ µ, since µ((u, t]) = 0 implies γ0((u, t]) = 0 which is equivalent to

γs((u, t]) = 0 since P (0, s) ∈ GLn.

Page 41: Polynomial Semimartingales and a Deep Learning Approach to

CHAPTER 4. MES AND MEASURE INTEGRAL EQUATIONS 31

Let A ∈ V+

Ω. Then A defines a measure on R>0 which we denote by dA. Note that with

expressions of the form ∫f(u)dAu =

∫f(u)A(du)

we always mean the Lebesgue-Stieltjes integral of a measurable function f against themeasure dA, not the Riemann-Stieltjes integral of a sufficiently regular function f againstthe right-continuous function A.

Definition 4.19. We call a collection (P, γ,A) a proper matrix evolution systemif P is a regular matrix evolution system, γ is the family of vector-measures induced byP as above and A ∈ V+

Ωsuch that γ

∞ dA.

We are now able to introduce the concept of extended generators for MESs which inlight of (4.4) can be seen as a generalization of generators for MSGs.

Definition 4.20. Let G : R>0 → Mn(R) be a a Borel-measurable function. We call Gthe extended generator of the proper MES (P, γ,A), if for any s ≤ t we have

P (s, t) = Idn +

∫(s,t]

P (s, u−)G(u)A(du).

Due to the evolution system property it is clear that the above condition follows if it issatisfied for s = 0. We further will say G is the extended A-generator of P , if G isthe extended generator of (P, γ,A) for the measure γ induced by P .

The definition above requires G to satisfy the A-a.e. identity

dγsdA

(·) = P (s, ·−)G(·) (4.6)

on (s,∞), hence a family G that is A-a.e. equal to G is also a generator. We will dealwith existence and uniqueness for G further down.

In the following we will see that unlike in the semigroup case, our càdlàg assumptionsdo not guaranty that P (s, t) ∈ GLn for all 0 ≤ s ≤ t. If we assume however, thatP (s, t) ∈ GLn∀t ≥ s, then we can compute G on (s,∞) simply by

G(·) = P (s, ·−)−1dγsdA

(·), (4.7)

for arbitrary version of dγ0/dA. Note that the assumption also implies P (s, t−) ∈ GLnas we will see further down when we deal with existence and uniqueness of G, see Propo-sition 4.23 and Proposition 4.31 below.

Remark 4.21. Note that by Remark 4.15, extended generators are strict generalizationsof infinitesimal generators. By that we mean that if G is an infinitesimal generator ofP , then it is the extended generator of the proper MES (P, γ,A), where γ is the familyof measures defined by P (s, ·) ∈ VΩ for each s ≥ 0 and At = t, while the reverse is nottrue as we will later see with Example 4.24.

Page 42: Polynomial Semimartingales and a Deep Learning Approach to

32 CHAPTER 4. MES AND MEASURE INTEGRAL EQUATIONS

We will now collect a few facts about matrix evolution systems.

Lemma 4.22. Let P be a càdlàg MES. Then, for 0 ≤ s ≤ u ≤ t, then

(i) P (s, u−)P (u−, t) = P (s, t) , P (s−, s)P (s, t) = P (s−, t) and if additionally u < t,then P (s, u)P (u, t−) = P (s, t−),

(ii) for s < u < t we have P (s−, t−) is well defined and P (s−, t−)P (t−, t) = P (s−, t)as well as P (s, u−)P (u−, t−) = P (s, t−).

If there is a family of measure γ and A ∈ V+

Ωsuch that (P, γ,A) is a proper MES,

(iii) then we have on (u,∞), that dγsdA (·)|(u,∞) = P (s, u)dγudA (·) where the equality is in

an dA-a.e. sense.

Define the matrix valued function Γs(·) with domain (s,∞) by

Γs(·) := P (s−, s)dγsdA

(·). (4.8)

Then,

(iv) the function t 7→ P (s−, t) is FV over [s, T ] for any T > s with γs− A|(s,∞) andwith density dγs−

dA = Γs. Further it holds

dγsdA|(u,∞) = P (s, u−)

dγu−dA

anddγs−dA

= P (s−, s)dγsdA

(4.9)

for u > s with the equalities are to be understood in a A-a.e. sense.

Proof. Let (εn)n and (δn)n be two positive zero sequences.

(i) Since the two limits P (s, u−) and P (u−, t) exist, and we have

P (s, t) = P (s, u− εn)P (u− εn, t) ∀n, (4.10)

we getP (s, t) = lim

n(P (s, u− εn)P (u− εn, t)) = P (s, u−)P (u−, t). (4.11)

The other statement follows similarly. Again we have

P (s−, s)P (s, t) = limn→∞

(P (s− εn, s))P (s, t) = limn→∞

(P (s− εn, t)) = P (s−, t).

(ii) We need to show that the two limits satisfy

limn

limmP (s− δm, t− εn) = lim

mlimnP (s− δm, t− εn) (4.12)

Page 43: Polynomial Semimartingales and a Deep Learning Approach to

CHAPTER 4. MES AND MEASURE INTEGRAL EQUATIONS 33

and define P (s−, t−). Indeed we have for n,m large enough

limn

limmP (s− δm, t− εn) = lim

n(limmP (s− δm, s))P (s, t− εn) =

P (s−, s)P (s, t−)(4.13)

Likewise we have

limm

limnP (s− δm, t− εn) = lim

mP (s− δm, s) lim

nP (s, t− εn) = P (s−, s)P (s, t−). (4.14)

Using this, we get

P (s−, t−)P (t−, t) = P (s−, s)P (s, t−)P (t−, t) (i)= P (s−, s)P (s, t)

(i)= P (s−, t).

Finally, we have using the previous results,

P (s, u−)P (u−, t−) = P (s, u−)P (u−, u)P (u, t−) = P (s, t−).

(iii) An application of the evolution property yields∫(u,t]

P (s, u)dγudA

(v)dAv = P (s, u)(P (u, t)− P (u, u)) = P (s, t)− P (s, u), (4.15)

hence we get that P (s, u)dγudA is a version of dγsdA restricted to (u,∞).

(iv) We first show t 7→ P (s−, t) is locally FV. This is clear since we have by (i)

P (s−, t) = P (s−, s)P (s, t), (4.16)

where P (s−, s) is constant in t and P (s, t) is locally FV by assumption. The claimγs− A(s,∞) follows from the fact that we have

γs−((u, t]) = P (s−, t)− P (s−, u) = P (s−, s)γs((u,∞]),

for s ≤ u ≤ t. Hence, since we have γs A(s,∞) by assumption we get the claim. Finally,we have∫

(u,t]P (s−, s)dγs

dA(u)dAu = P (s−, s) (P (s, t)− P (s, u)) = P (s−, t)− P (s−, u).

This shows similarly to (iii) that P (s−, s)dγsdA is a version if dγs−dA .

Page 44: Polynomial Semimartingales and a Deep Learning Approach to

34 CHAPTER 4. MES AND MEASURE INTEGRAL EQUATIONS

4.2.3 Existence and uniqueness of extended generators

We now want to discuss existence and uniqueness of the extended generator for a properMES (P, γ,A). As noted before, if one assumes for fixed s ≥ 0, that P (s, t) ∈ GLn ∀t ≥ s,G exists on (s,∞) since for an arbitrary version of dγsdA one has a version of G on (s,∞)given by (4.7). As a first step, let us proof the following proposition that is a directconsequence of the group structure of GLn.

Proposition 4.23. Let P be a càdlàg MES. If for given s ≥ 0, there exists t∗ > s suchthat we have P (s, t) ∈ GLn∀t ∈ (s, t∗), then also P (s, t−) ∈ GLn and for s < u ≤ t < t∗

we have P (u−, t) ∈ GLn.

Proof. Note that we have by Lemma 4.22 (i) that

P (s, t) = P (s, u−)P (u−, t).

Since P (s, t) ∈ GLn, so are P (s, u−) and P (u−, t) (Just apply the determinant functionto both sides to see that this must hold true). Hence the proof is complete.

The following example shows that MESs can leave GLn even if they are càdlàg and hence"start" in GLn.

Example 4.24. Consider the sets Mi ⊂ R2 defined for i ∈ N via

Mi :=

(s, t) ∈ R2 : i− 1 ≤ s ≤ t < i.

Define M :=⋃i∈NMi. The set M is shown in Figure 4.1. Define φ(s, t) := 1M (s, t). By

construction of M , φ is a càdlàg MES on Mn(R) for n = 1.1

However, for any s fixed, 0 = φ(s, t) /∈ GL1 = 0 if t satisfies t > s + 1. Likewise, forany càdlàg MES P with n arbitrary, P (s, t) := φ(s, t)P (s, t) is again a càdlàg MES forwhich P ∈ GLn is violated.

The above example shows that it is not clear if for a proper MES (P, γ,A) an extendedgenerator G exists on all R>0. However, by the càdlàg property we know that for anys > 0 there is an εs > 0 such that P (s, t) ∈ GLn for s ≤ t ≤ εs. Further down,we present sufficient conditions based on this observation that imply the existence ofextended generators, see Proposition 4.31 and Theorem 4.35. Regarding "leaving" GLn,let us proof the following result.

Proposition 4.25. Let P be a càdlàg MES. Fix some s ≥ 0 and define t∗ by

t∗ := inf t > s : P (s, t) /∈ GLn .

Then the following points hold true1The author of this thesis would like to thank Dr. Lukas Steinberger for providing this example.

Page 45: Polynomial Semimartingales and a Deep Learning Approach to

CHAPTER 4. MES AND MEASURE INTEGRAL EQUATIONS 35

Figure 4.1: Graphical visualization of the set M . The red lines indicate that that linesegment does not belong to M , e.g. (s, 1) : 0 ≤ s < 1 ∩M = ∅.

(i) P (s, t∗) /∈ GLn,

(ii) P (s, u), P (s, u−), P (u−, u) ∈ GLn∀u ∈ (s, t∗) ,

(iii) P (u, t∗), P (u−, t∗) /∈ GLn ∀u ∈ (s, t∗) and P (t∗−, t∗) /∈ GLn,

(iv) P (s, t∗ + ε) /∈ GLn ∀ε ≥ 0.

Proof. (i) Note that we have by the càdlàg property that there is an ε > 0 such thatP (t∗, t∗ + ε) ∈ GLn. Further, by the evolution property we have

P (s, t∗ + ε) = P (s, t∗)P (t∗, t∗ + ε).

By definition of t∗, it holds that P (s, t∗ + ε) /∈ GLn, hence P (s, t∗) can not beinvertible as claimed.

(ii) The first statement, that is P (s, u) ∈ GLn is obvious by the very definition of t∗.For the next two, note that we can apply Lemma 4.22(i) since it only requires Pto be càdlàg . Hence we have

P (s, u) = P (s, u−)P (u−, u).

As P (s, u) ∈ GLn, so are P (s, u−) and P (u−, u) which follows from the fact thatwe are dealing with square matrices and

detP (s, u) = detP (s, u−) detP (u−, u).

Page 46: Polynomial Semimartingales and a Deep Learning Approach to

36 CHAPTER 4. MES AND MEASURE INTEGRAL EQUATIONS

(iii) By (ii), we have P (s, u) ∈ GLn, hence

P (u, t∗) = P (s, u)−1P (s, t∗).

Since P (s, t∗) singular, the first claim follows. For the second, note that

P (u−, u)P (u, t∗) = P (u−, t∗),

which implies the second statement. For the final statement, note that we have

P (t∗−, t∗) = limε↓0

P (t∗ − ε, t∗).

For each ε > 0 small enough, we have seen that P (t∗ − ε, t∗) /∈ GLn, hence bythe closedness of the set of singular matrices in Mn(R), we conclude P (t∗−, t∗) /∈GLn.

We will see that when dealing with existence of the extended generator, the set

t > 0 : P (t−, t) /∈ GLn

will play an important role. We will not be able to show existence of the extendedgenerator for any proper MES (P, γ,A). We will however be able to do so under anadditional assumption that we will formulate as a condition. To motivate this condition,let us make the following observation.

Assume there exists an interval (s, t) such that for any u ∈ (s, t) it holds P (v, u) /∈GLn ∀v < u. Assume further that P is càdlàg . Then, since Mn(R) \ GLn is closed, weget P (u−, u) /∈ GLn for all v ∈ (s, t). However, by the càdlàg assumption, there existsan ε > 0 such that u+ ε ∈ (s, t) and P (u, u+ ε) ∈ GLn. But since s < u < u+ ε < t, theinitial assumption would require P (v, v+ ε) /∈ GLn, hence a contradiction. Based on theobservation above where there was a continuum of u ∈ (s, t) such that P (u−, u) /∈ GLn,we formulate the following two conditions. Note that the first condition implies thesecond, see Remark 4.27.

Condition 4.26. Consider a proper MES (P, γ,A).

(i) The sett > 0 : P (t−, t) /∈ GLn

is at most countable.

(ii) For dA denoting the measure induced by A, we have

dA (t > 0 : P (t−, t) /∈ GLn,∆At = 0) = 0.

Page 47: Polynomial Semimartingales and a Deep Learning Approach to

CHAPTER 4. MES AND MEASURE INTEGRAL EQUATIONS 37

Remark 4.27. Note that in the condition above, (i) implies (ii). Indeed assume (i)holds. Then because A is càdlàg , we have ∆At = 0 ⇒ dA(t) = 0. Hence, (i) impliesthat the set in (ii) can be written as the at most countable union of zero sets. Furtherdown, we formulate Theorem 4.35 under (ii). However, we believe that condition (i)always holds for a càdlàg MES P . So far this is only a conjecture since we can not provethis at this point.

In the sequel, let (P, γ,A) be a fixed proper MES. Let us briefly present a result regardinglimits of the form P (t−, t) as they appear in Condition 4.26 in the case where one assumesinvertibility.

Proposition 4.28. Assume P (s, t) ∈ GLn for some s < t and ∆At = 0. ThenP (t−, t) = Idn.

Proof. By Proposition 4.25(ii) we have P (t−, t) ∈ GLn. Further, note that we have

P (s, t) = P (s, t−)P (t−, t) = P (s, t−) +dγsdA

(t)∆At.

Hence, P (s, t) = P (s, t−). That implies

P (s, t)P (t−, t)P (t−, t)−1 = P (s, t)P (t−, t)P (t−, t).

Hence we have P (t−, t) = P (t−, t)−1 showing the claim.

Before presenting our most general existence result, let us first proof a special case inthe form of Proposition 4.31. Note that the conditions we make in that theorem implyCondition 4.26 (i) and (ii). This proposition will illustrate the basic idea we follow toshow existence. The general case is then treated in Theorem 4.35 below.

Let us make a few definitions in order to prepare Proposition 4.31. Given a proper MES(P, γ,A), set t0 = 0 and define for n ∈ N

tn := inf t > tn−1 : P (tn−1, t) /∈ GLn , (4.17)

if tn−1 < ∞ and tn = ∞ else. Note that if tn−1, tn < ∞, then by the càdlàg propertytn − tn−1 > 0.

The following can be seen as a corollary to Proposition 4.25. It shows that the definitionof the sequence (tn)n is in some sense universal.

Corollary 4.29. Let s ∈ [tn−1, tn). Then for

t∗ = inf t > s : P (s, t) /∈ GLn ,

we have t∗ = tn and P (s, t∗) /∈ GLn.

Page 48: Polynomial Semimartingales and a Deep Learning Approach to

38 CHAPTER 4. MES AND MEASURE INTEGRAL EQUATIONS

Proof. Since we have P (tn−1, s) ∈ GLn, for any u < tn we have P (s, u) ∈ GLn. Hencet∗ = tn since P (s, tn) /∈ GLn.

Remark 4.30. We have seen that ∀s ∈ [tn−1, tn), it holds P (s, tn) /∈ GLn and alsoP (tn−, tn) /∈ GLn which implies Condition 4.26 if limn tn = ∞. We can however notsay anything regarding singularity about P (s, tn−). We can however make the followingstatement about the kernel. Consider tn−1 ≤ s ≤ s′ < tn. Then

kerP (s, tn−) = kerP (s′, tn−),

which follows from P (s, s′) ∈ GLn.

Proposition 4.31. Consider a proper MES (P, γ,A) and define the sequence (tn)n∈N0 asin (4.17). If

∑n∈N 1(tn−1<∞)(tn − tn−1) = limn→∞ tn = ∞, then an extended generator

G exists for (P, γ,A) on all of R>0. Further, G is dA a.e. unique outside a countableset.

Proof. Define N ∈ N ∪ ∞ by

N = inf n ∈ N : tn =∞ .

Note that the condition limn→∞ tn =∞ implies

R>0 =N⋃·n=1

(tn−1, tn) ∪·N−1⋃·n=1

tn. (4.18)

By construction, for any n ≤ N , we have P (tn−1, t) ∈ GLn for all t ∈ (tn−1, tn) andhence by Proposition 4.23 P (tn−1, t−) ∈ GLn. Hence we can define

G(·)|(tn−1,tn) = P (tn−1, ·−)−1dγtn−1

dA(·).

We are left to define G on⋃· N−1n=1 tn such that G satisfies the definition of the extended

generator. Take an arbitrary tn. For tn−1 ≤ u < tn, G needs to satisfy

P (tn−1, tn)− P (tn−1, u) =

∫(u,tn]

P (tn−1, s−)G(s)A(ds)

=

∫(u,tn)

P (tn−1, s−)G(s)A(ds) + P (tn−1, tn−)G(tn)∆Atn .

(4.19)

If ∆Atn = 0, we can just set G(tn) = 0 and the extended generator property is satisfied.Now assume A charges tn, i.e. ∆Atn > 0. Using the evolution system property of P , weget that (4.19) is equivalent to

P (tn−1, tn)− P (tn−1, u)

= P (tn−1, tn−)− P (tn−1, u) + P (tn−1, tn−)G(tn)∆Atn .(4.20)

Page 49: Polynomial Semimartingales and a Deep Learning Approach to

CHAPTER 4. MES AND MEASURE INTEGRAL EQUATIONS 39

canceling out P (tn−1, u) gives equivalently

P (tn−1, tn) = P (tn−1, tn−) + P (tn−1, tn−)G(tn)∆Atn . (4.21)

An application of the evolution property yields

P (tn−1, tn−)P (tn−, tn) = P (tn−1, tn−)(Idn +G(tn)∆Atn).

Hence, it is sufficient for G to satisfy

P (tn−, tn)− Idn = G(tn)∆Atn .

Since by assumption ∆Atn > 0, (4.20) is satisfied if we set

G(tn) =P (tn−, tn)− Idn

∆Atn, (4.22)

which is always possible. Hence we have defined G on all of R>0 such that the definingconditions of the extended generator are satisfied. This shows existence. As We haveP (tn−1, u−) ∈ GLn for u < tn, we get that G is unique on the intervals (tn−1, tn) outsidedA-zero sets. This is a consequence of Lemma 4.22 (iii), which for s ≤ u and

G(·) := P (s, ·−)−1dγsdA

(·) and G(·) := P (u, ·−)−1dγudA

(·),

implies for l > u,

G(l) = P (s, l−)−1dγsdA

(l) = (P (s, u)P (u, l−))−1 P (s, u)dγudA

(l) = G(l) [dA],

since dγs/dA is dA-a.e. unique. Hence, non uniqueness is only possible on⋃· N−1n=1 tn,

which is a countable set as claimed.

Remark 4.32. Let us briefly discuss the condition limn→∞ tn =∞ in Proposition 4.31.The following example shows that this assumption does not always hold. Consider Ex-ample 4.24 where limn→∞ tn <∞ holds. We modify this example by defining the set Mas

M =⋃·i∈N

Mi ∪·⋃·i∈N

Mi,

where we are left to define the sets Mi and Mi. First, define the sequence (ji)i∈N0 byj0 := 0 and

ji := ji−1 +1

2j−1.

Hence, limn→∞ jn =∑

i∈N 2−(j−1) = 2 <∞. Next, for i ∈ N we define

Mi =

(s, t) ∈ R2 : ji−1 ≤ s ≤ t ≤ ji.

Page 50: Polynomial Semimartingales and a Deep Learning Approach to

40 CHAPTER 4. MES AND MEASURE INTEGRAL EQUATIONS

Further, we define for i ∈ N,

Mi =

(s, t) ∈ R2 : 2 + (i− 1) ≤ s ≤ t ≤ 2 + i.

By construction, φ(s, t) = 1M (s, t) is a càdlàg MES over Mn(R) for n = 1. However,the sequence tn in (4.17) satisfies limn→∞ tn = 2 < ∞. On the other hand, φ can beembedded in a proper MES (P, γ,A) with

φ(s, t)− φ(s, u) =

∫(u,t]

ρ(s, u)A(du),

where ρ(s, u) = 2i−1(φ(s, t) − 1) for 0 ≤ s ≤ u < 2 and A(t) = ji for t ∈ [ji−1, ji).Further, on [2,∞), ρ is given by ρ(s, u) = φ(s, u)− 1 and A is given by A(t) = 2 + i fort ∈ [2 + (i−1), 2 + i). Hence γ is given by dγs

dA (·) = ρ(s, ·). Indeed, while Proposition 4.31can not be applied to this example, we will see that by Theorem 4.35 this proper MES toohas an extended generator.

An interesting consequence in case of continuity is the following proposition showingunder stronger continuity assumptions the extended generator always exists.

Proposition 4.33. Consider a càdlàg MES P and assume s 7→ P (s, t) is even continu-ous. Then P (s, t) ∈ GLn for all (s, t) ∈ T.

Proof. Since GLn ⊂ Mn(R) is a closed subset and s 7→ P (s, t) is continuous for arbitraryt > 0 on [0, t], we have that there exists an εt > 0 such that P (t− εt, t) ∈ GLn where weuse P (t, t) = Idn ∈ GLn. Let s ≥ 0 be arbitrary. Define

t∗ := inf t > s : P (s, t) /∈ GLn ,

and assume t∗ <∞. By Proposition 4.25 we have P (s, t∗) /∈ GLn. Let εt∗ be like above.Then, we have

P (s, t∗) = P (s, t∗ − εt∗)P (t∗ − εt∗ , t∗).

The right hand side is a product of two matrices, that by our assumptions are non-singular. However, that implies P (s, t∗) ∈ GLn, which is a contradiction to t∗ <∞. Thisyields P (s, t) ∈ GLn as desired.

Remark 4.34. Let us apply Proposition 4.31 to Example 4.24. Recall that we havedefined the MES φ by

φ(s, t) = 1M (s, t), 0 ≤ s ≤ t.

Let n ∈ N be such that (n − 1) ≤ s < n. Then for t ≥ s, we have φ(s, t) = 1t<n. Weget that tn = n, hence the condition is satisfied. Indeed, it is easy to see that γ dA for

A(t) := n− 1, for n− 1 ≤ t < n.

By setting G(t) = −1N(t) it is easy to check that G is an A extended generator for φ andindeed satisfies (4.22).

Page 51: Polynomial Semimartingales and a Deep Learning Approach to

CHAPTER 4. MES AND MEASURE INTEGRAL EQUATIONS 41

Let us now treat the more general case where the set in Condition 4.26(i) can haveaccumulation points. We will see that this is just a modification of the special casepresented above. Recall that in the first part of the condition implies the second so weproof the following theorem under the second condition.

Theorem 4.35. Given a proper MES (P, γ,A), assume Condition 4.26(ii) holds. Then,there exists an extended generator G that is unique dA-a.e. outside a countable subset ofR+.

Proof. For q ∈ Q+ = R+ ∩ Q, let εq = infε > 0 : P (q, q + ε) /∈ GLn which is biggerthen zero by P being càdlàg . By Proposition 4.25 (ii), that implies P (q, v−) ∈ GLn

for all v ∈ (q, q + εq). Define the intervals Iq = (q, q + εq). We can therefor defineG(·)|Iq = P (q, ·)−1 dγq

dA (·). Hence, G is defined on the set

I =⋃q∈Q+

Iq,

and by the same arguments as at the end of the proof of Proposition 4.31, G is welldefined since for two q, p ∈ Q+ with Iq ∩ Ip 6= ∅, the two definitions of G are equal up toa dA-zero set. Since P (q, ·−) ∈ GLn on Iq, G is further unique up to dA-zero set due tothe dA uniqueness of the Radon-Nikodým derivative.

We are left to deal with Ic = R+ \ I. Let s ∈ Ic. Let ql ↑ s and q′l ↓ s be two strictlymonotonic sequences in Q+. Then, P (ql, q

′j) /∈ GLn for all l, j ∈ N because otherwise

s ∈ Iql ⊂ I. This holds for all l, j and since GLn is a closed subspace of Mn(R), we getthat P (s−, q′j) /∈ GLn for all j ≥ 1. Again by the same argument, we can take a limitonce again to get P (s−, s) /∈ GLn. Note that we have

t > 0 : P (t−, t) /∈ GLn= t > 0 : P (t−, t) /∈ GLn,∆At = 0∪· t > 0 : P (t−, t) /∈ GLn,∆At > 0.

By assumption, the set where ∆At = 0 has measure zero wrt. dA. Therefore we candefine G on that set arbitrarily, hence we just set G to be the zero matrix on that set.Since the set on the right is at most countable as A is càdlàg and hence has at mostcountably many jumps, we have already shown the uniqueness claim. We are left toargue that G can be defined on the set on the right. But we have already shown in theproof of Proposition 4.31, that G can be defined on that set by

G(t) =P (t−, t)− Idn

∆At, ∀ t ∈ t > 0 : P (t−, t) /∈ GLn,∆At > 0, (4.23)

finishing the proof.

Remark 4.36. We will later see that Condition 4.26 (ii) plays an important role whenstudying MESs with certain invariance properties motivated by the study of polynomialsemimartingales.

Page 52: Polynomial Semimartingales and a Deep Learning Approach to

42 CHAPTER 4. MES AND MEASURE INTEGRAL EQUATIONS

4.3 Ordinary measure integral equations

In this section we want to discuss the link between evolution systems previously intro-duced, their extended generators and a certain class of linear measure integral equationsor as they are commonly referred to, measure differential equations. We will use bothterminologies interchangeably but since the integral formulation is equivalent to the weakRadon-Nikodým derivative formulation, this should not lead to any confusions.

For the rest of this section, we denote with ‖·‖Rn any fixed norm on Rn. Whenever anoperator norm occurs, it is always the one corresponding to the fixed norm ‖·‖Rn . Whenthere is no confusion, we will also just write ‖·‖ if it is clear that this is a norm on Rn.

4.3.1 Existing literature

The classical case, that is where the derivatives are classical derivatives, are amongthe most common topics in mathematics. In particular, the existence and uniquenessresult by Picard-Lindelöf are treated on an introductory level when studying ordinarydifferential equations, see e.g. [Arn92] or [Wal13] and the references therein.

The continuous version of our problem is given in the context of systems of linear differ-ential equations of the form

dy(t)

dt= V (t, y(t)), y(s) = x ∈ Rn,

for a vector field V : [0,∞)× Rn 7→ Rn satisfying Lipschitz and continuity assumptionsguaranteeing existence and uniqueness of solutions. It is well understood that whenV is linear in y, that a fundamental solution M(s, t) ∈ Mn(R) for (s, t) ∈ T to theseequations exist such that for the flow of the ode above denoted by φ(s, x; t), one hasφ(s, x; t) = M(s, t)x corresponding to the linearity of the equation. Indeed, by studyingthe so called Wronski-determinant of these solutions, one can not only establish existenceand uniqueness results of the equation but show that solutions to different initial valuesdo not intersect. Since by assumption, everything is sufficiently regular, one can make atime reversal transformation to derive an ode with initial condition F (t), such that thesolution "after" time t− s is at x. This shows that M(s, t) is invertible for any s ≤ t.

In our setup, the assumptions made above do not hold anymore. Further, we do notassume absolute continuity wrt the Lebesgue measure. The problem we shall deal withis of the following form. Let A be a deterministic and increasing càdlàg function, i.e.A ∈ V+

Ω. Given a function V : (s,∞)×Rn → Rn, s ≥ 0, we consider measure integral or

measure differential equations (MDE) of the form F (t) = F (s) +

∫(s,t]

V (u, F (u−))dAu

F (s) = x,(MDE:A, V )

Page 53: Polynomial Semimartingales and a Deep Learning Approach to

CHAPTER 4. MES AND MEASURE INTEGRAL EQUATIONS 43

with initial value x ∈ Rn at time s. Note that if the above equation permits existence anduniqueness of solutions for any x ∈ Rn, this defines the associated flow φ(s, x; t) = F (t)where Fs is the unique solution to (MDE:A, V ) with initial data Fs(s) = x. It willsometimes be convenient to make the dependency of the solution F with respect to sand x explicit which we shall do with Fs,x(t). When showing existence and uniqueness ofa global solution, it is enough to restrict to a finite time horizon T provided that T canbe chosen arbitrary large. Hence we shall fix T > s and consider the time interval [s, T ]instead of R+. We are interested in the case where V is Lipschitz which we define later.Note that we need V to be defined on (s, T ] only since for t ≤ s, the integral above isover the empty set. Since we do not require V to be continuous in the first variable, theLipschitz property in the second variable, namely that there exists a L <∞ such that

‖V (t, x)− V (t, y)‖ ≤ L ‖x− y‖ , for all t ∈ (s, T ], (4.24)

will not be enough for us to establish existence and uniqueness. Hence we will need anadditional integrability assumption in the first variable. We will make this precise later,see Definition 4.43.

Let us briefly discuss the pertinent literature. In [Sha72] the authors consider measuredifferential equations of type (MDE:A, V ), however they only show existence and unique-ness on some interval, hence they only provide a result for a local solution. The same istrue for [DS72].

The setting in [Gil07] is slightly different from ours as it would require A to be leftcontinuous. Further, [Gil07, Theorem 19.7.1] poses continuity assumptions on the firstvariable of V , which is not satisfied in our setting. Hence we will proof a suitable existenceand uniqueness result for (MDE:A, V ).

Another approach would be too consider (MDE:A, V ) as a stochastic differential equa-tion. This is possible since A is a càdlàg process of locally finite variation and hence aSemimartingale under the measure P where P is the Dirac measure on D(R+,R) chargingA. Uniqueness would then be a consequence of Protter [Pro05, Theorem 7 in Chapter V].For that theorem however, it is necessary to make slightly stronger regularity assumptionson V then we would like.

We believe it is preferable to provide a result tailored to our setting, allowing for a moredirect treatment of the issues at hand. In particular we believe this to be an advantagefor the reader, hence we will do so in the following section.

We will also discuss Gronwall type inequalities for measure differential equations as wewill need them in our discussions regarding regularity of solutions wrt. initial data andin the next chapter when studying polynomial processes. These kind of inequalities havebeen studied among others in [AP97], [DS80], [EK90], [Hor96] and [Gil07]. However,there the assumptions made (such as only isolated points of discontinuity for A) are notgiven in our case. Nonetheless, a version of a Gronwall inequality almost suitable for ushas been essentially given in Ethier and Kurtz [EK09] by means of Theorem 5.1 in theappendix and we will present a slight modification of that proof that covers our situation.

Page 54: Polynomial Semimartingales and a Deep Learning Approach to

44 CHAPTER 4. MES AND MEASURE INTEGRAL EQUATIONS

4.3.2 Existence and uniqueness for measure integral equations

We will now state our result on existence and uniqueness of (MDE:A, V ), which is aPicard-Lindelöf type theorem. The proof follows the idea of the classical version meaningwe want to apply Banach’s fixed point theorem to a contraction mapping Φ where a fixedpoint of Φ is a solution of (MDE:A, V ). Since in general the Lipschitz constant L of Vis not smaller then one, it is not possible to just use the norm of uniform convergence onthe space of all càdlàg functions. In fact, one needs to define an equivalent norm suchthat Φ becomes a contraction. To the best of our knowledge, such a theorem has notbeen proven in the literature.

Let us note that assuming V is bounded, the right-continuity of A implies that any func-tion that satisfies (MDE:A, V ) is right continuous. Further, in order for the expressionF (u−) to make even sense, one needs the existence of left limits for F . Hence it isreasonable to assume that any solution is a càdlàg function.

We begin by making a few short remarks regarding our solution space. As a first step,let us make precise what we mean by a solution to (MDE:A, V ).

Definition 4.37. We call F ∈ D([s, T ],Rn) a solution of (MDE:A, V ), if F satisfies(MDE:A, V ) for all t ∈ [s, T ]. We say the solution is unique on [s, T ], if F is the onlyelement in D([s, T ],Rn) wrt the norm

‖κ‖∞ := supu∈[s,T ]

‖κ(u)‖Rn ,

that satisfies (MDE:A, V ).

Let us recall some basic facts about the space of càdlàg functions. First, note that it is aBanach space with respect to the topology of uniform convergence induced by the norm‖·‖∞, see e.g. [JS13, Chapter VI]. Unlike the space of continuous function however,it fails to be separable wrt this norm. To see that, just consider the family of càdlàgfunction κs defined on (0, T ] via κs(t) := 1(0,s)(t) indexed by s for each s ∈ (0, T ). Then,for s′ 6= s, we have ‖κs − κs′‖∞ = 1.

Non separability can often proof to be a problem in probability theory which is why inmany cases, the topology of uniform convergence is replaced with the Skorochod topology,again see e.g. [JS13, Chapter VI] for more details and results. In our situation however,we do not need separability, completeness is sufficient for Banach’s fixed point theorem.

As mentioned before, a key tool in proving existence and uniqueness is the definition of asuitable norm such that the Picard mapping is a contraction (we will define shortly whatwe mean by that). In the classical setting, for κ ∈ C([s, T ],Rn) one defines the norm

‖κ‖ρ,(s,T ] := supu∈(s,T ]

ρ(u) ‖κ(u)‖Rn (4.25)

Page 55: Polynomial Semimartingales and a Deep Learning Approach to

CHAPTER 4. MES AND MEASURE INTEGRAL EQUATIONS 45

for ρ(u) = exp(−L(u− s)

), where L > L, with L being the Lipschitz constant of V .

Usually, one finds L = 2L in the literature. Notice that ρ−1 solves

ρ−1(t) = ρ−1(s) + L

∫ t

sρ−1(u)du, (4.26)

where the integral is a Riemann integral. In fact, ρ−1 is the only solution to (4.26)and the exponential function could even be defined as the unique solution to (4.26)for L = 1. The general case corresponding to our setting now coincides precisely witha widely studied object in semimartingale theory, a generalization of the exponentialfunction (where the Riemann integral is replaced with a stochastic integral) referred toas stochastic exponential. We only need the version where the semimartingale is a processof locally finite version (actually increasing is enough) reducing the stochastic integralto the Lebesgue-Stieltjes integral. Hence we can solve pathwise including a deterministicsetting and the term stochastic should not lead to confusions.

Definition 4.38 (Stochastic exponential). Consider A ∈ V+

Ω. We call a càdlàg function

κ ∈ D([s, T ],R) the stochastic exponential of A, if

κ(t) = 1 +

∫(s,t]

κ(u−)A(du), (4.27)

for all t ∈ [s, T ].

The next result on stochastic exponentials can be found in [JS13, Theorem 4.61 in I.4],together with the general version for semimartingales.

Theorem 4.39. Consider A ∈ V+

Ω. Then there is a unique càdlàg solution to 4.27

denoted by E(A) with E(A) ∈ V+

Ω. Further, it holds

E(A)t = exp

At −As − ∑s<u≤t

∆Au

∏s<u≤t

(1 + ∆Au) ≥ 1. (4.28)

We can now state the following corollary that will be central for our main result in thissection.

Corollary 4.40. Let L, ρ−1(s) > 0. There exists exactly one càdlàg solution ρ−1 to theequation

ρ−1(t) = ρ−1(s) + L

∫(s,t]

ρ−1(u−)A(du), (4.29)

where ρ−1(t) ≥ ρ−1(s) for all t ∈ [s, T ] and we have for each such t,∫(s,t] ρ

−1(u−)A(du)

ρ−1(t)≤ 1

L.

Further, ρ−1 is non decreasing which implies ρ−1(t−) ≤ ρ−1(t).

Page 56: Polynomial Semimartingales and a Deep Learning Approach to

46 CHAPTER 4. MES AND MEASURE INTEGRAL EQUATIONS

Proof. The first statement follows by just considering ρ−1 = ρ−1/ρ−1(s) as well as thetransformation A = LA to which we apply Theorem 4.39. The second follows by basicmanipulations of (4.29). Monotonicity is a direct consequence of A being non decreasing.

Remark 4.41. As the notation ρ−1 suggests, we are interested in the reciprocal whichwe denote by ρ. Note that by the above corollary, ρ−1(t) ≥ ρ−1(s) > 0 for all t ∈ [s, T ],which implies ρ is càdlàg on [s, T ]. Hence for any T > s fixed, we have

0 < ρ(T ) = ρ− = infu∈[s,T ]

ρ(u) ≤ supu∈[s,T ]

ρ(u) = ρ+ = ρ(s) <∞.

Now, if we define the norm ‖·‖ρ,[s,T ] on D((s, T ],Rn) by (4.25) where ρ−1 is the solutionto (4.29) for given ρ−1(s), L > 0, we can conclude that

ρ− ‖·‖∞,[s,T ] ≤ ‖·‖ρ,[s,T ] ≤ ρ+ ‖·‖∞,[s,T ] ,

for ‖·‖∞,[s,t] being the norm of uniform convergence on [s, T ] which then implies that‖·‖∞,[s,T ] and ‖·‖ρ,[s,T ] are two equivalent norms on D([s, T ],Rn).

Remark 4.42. Recall that for a Banach space X with two equivalent norms ‖·‖1 , ‖·‖2and a mapping T : X → X such that T is a contraction wrt ‖·‖1 with unique fixed pointx∗ ∈ X , i.e. for any x, y ∈ X one has ‖T (x)− T (y)‖1 < ‖x− y‖1 and T (x∗) = x∗, thestatement of Banach’s fixed point theorem holds true for ‖·‖2 too, i.e for any x ∈ X wehave

liml→∞

T l(x)‖·‖2→ x∗.

Let us briefly make precise what we meant in (4.24) by Lipschitz and integrability in thefirst variable. Recall that for A ∈ V+

Ωwe denote with dA the measure induced by A.

Definition 4.43. We call a measurable function V : (s, T ] × Rn → Rn A-Lipschitz on(s, T ] for an A ∈ V+

Ω, if there exists a L > 0 and a set N ⊂ (s, T ] with dA(N) = 0 such

that for any t ∈ N c we have

‖V (t, x)− V (t, y)‖Rn ≤ L ‖x− y‖Rn .

If additionally, there exists a x∗ ∈ Rn such that the function t 7→ ‖V (t, x∗)‖ is A-essentially bounded on (s, T ], i.e. it holds∫

(s,T ]‖V (u, x∗)‖Rn A(du) ≤ ess sup

u∈(s,T ]‖V (u, x∗)‖Rn (A(T )−A(s)) <∞,

we call V A-Lipschitz and integrable on (s, T ]. We call a measurable function V : R>0 ×Rn → Rn locally A-Lipschitz and integrable, if for all T > 0, there exists a constantLT > 0 such that V is locally A-Lipschitz and integrable on (0, T ]. If there is no ambiguitywrt. the function A, we call V (locally) Lipschitz and integrable.

Page 57: Polynomial Semimartingales and a Deep Learning Approach to

CHAPTER 4. MES AND MEASURE INTEGRAL EQUATIONS 47

Remark 4.44. (i) The condition that a point x∗ exists such that V (t, x∗) is integrableis always satisfied for such V where V (t, ·) is a linear map since then one can simplychoose x∗ = 0.

(ii) Note that if V is locally A-Lipschitz and integrable, then it is A-Lipschitz andintegrable on all intervals (s, T ].

The following definition will play a central role when showing existence and uniqueness of(MDE:A, V ) and is widely referred to as Picard map since it defines the Picard iterationby means of composition.

Definition 4.45. Let A, V be as in (MDE:A, V ) and x ∈ Rn. Define the map TA,V,x :D([s, T ],Rn) 7→ D([s, T ],Rn) by

TA,V,x(κ)t = x+

∫(s,t]

V (u, κ(u−))A(du), t ∈ (s, T ].

We call TA,V,x the Picard map of (MDE:A, V ) (or just Picard map if there is noambiguity).

Remark 4.46. Note that this map is not a priori well defined since the integral does notneed to exist for arbitrary V while the Picard map must map càdlàg functions again tocàdlàg functions. We give a sufficient condition with the next lemma.

Lemma 4.47. If V is Lipschitz and integrable with constant L > 0, then the Picard mapis well defined.

Proof. We calculate for arbitrary t ∈ (s, T ] and κ ∈ D([s, T ],Rn),∫(s,t]‖V (u, κ(u−))‖Rn A(du) =

∫(s,t]‖V (u, κ(u−))− V (u, x∗) + V (u, x∗)‖Rn A(du)

≤∫

(s,t]‖V (u, κ(u−))− V (u, x∗)‖Rn A(du) +

∫(s,t]‖V (u, x∗)‖Rn A(du)

≤ L∫

(s,t]‖κ(u−)− x∗‖Rn A(du) +

∫(s,t]‖V (u, x∗)‖Rn A(du).

Since κ(u) − x∗ is càdlàg , it is bounded on [s, T ]. Hence, κ(u−) − x∗ is bounded on(s, T ] which shows the existence of the first integral as dA((s, T ]) < ∞. The second isfinite by the Lipschitz assumption on V . We are left to show that T (κ) is indeed càdlàg. Regarding right continuity, we need to show

limε↓0

∫(t,t+ε]

‖V (u, κ(u−))‖Rn = 0.

But this already follows from the previous calculations using the fact that A is càdlàgand hence A(t+ ε)−A(t)→ 0 as ε ↓ 0. The existence of left limits follows from

limε↓0

∫(s,t−ε]

V (u, κ(u−))A(du) =

∫(s,t)

V (u, κ(u−))A(du),

making use of (s, t) ⊂ (s, t] for the existence of the rhs.

Page 58: Polynomial Semimartingales and a Deep Learning Approach to

48 CHAPTER 4. MES AND MEASURE INTEGRAL EQUATIONS

We can now state our main existence and uniqueness result for measure integral equa-tions.

Theorem 4.48 (Picard-Lindelöf). Let V : (s, T ]×Rn → Rn be A-Lipschitz and integrableon (s, T ] for an A ∈ V+

Ω. Then, there exists a unique solution to (MDE:A, V ) for any

initial value x ∈ Rn.

Proof. Let TA,V,x be the Picard map of (MDE:A, V ). By Lemma 4.47, the map is welldefined. Hence, a function κ ∈ D((s, T ],Rn) is a solution if and only if T (κ) = κ, orin another words a fixed point of TA,V,x. If we show that there exists a unique fixedpoint, we are done. By Banach’s fixed point theorem it is sufficient to show that TA,V,xis a contraction under a norm that is equivalent to the norm of uniform convergence.We shall pick the norm ‖·‖ρ,[s,T ] where ρ is defined via (4.29) for some ρ(s) > 0 withL = L(1 + ε) for ε > 0 and L being the Lipschitz constant of V . We compute for twoarbitrary κ, ν ∈ D((s, T ],Rn),

‖TA,V,x(κ)− TA,V,x(ν)‖ρ,[s,T ] = supu∈[s,T ]

ρ(u) ‖(TA,V,x(κ)− TA,V,x(ν))(u)‖Rn

= supu∈[s,T ]

ρ(u)

∥∥∥∥∥∫

(s,u]V (ξ, κ(ξ−))− V (ξ, ν(ξ−)))A(dξ)

∥∥∥∥∥Rn

≤ supu∈[s,T ]

ρ(u)

∫(s,u]‖V (ξ, κ(ξ−))− V (ξ, ν(ξ−)))‖Rn A(dξ)

≤ L supu∈[s,T ]

ρ(u)

∫(s,u]

ρ(ξ)ρ−1(ξ) ‖κ(ξ−)− ν(ξ−)‖Rn A(dξ)

≤ L ‖κ− ν‖ρ[s,T ] supu∈[s,T ]

ρ(u)

∫(s,u]

ρ−1(ξ)A(dξ)

≤ L ‖κ− ν‖ρ[s,T ] supu∈[s,T ]

∫(s,u] ρ

−1(ξ−)A(dξ)

ρ(u)

≤ 1

1 + ε‖κ− ν‖ρ[s,T ] ,

(4.30)

where we used Corollary 4.40. Hence, by equivalence of the two norms, we have shownexistence and uniqueness to (MDE:A, V ).

We can also state the next obvious corollary to the previous theorem.

Corollary 4.49. Let V be locally A-Lipschitz and integrable. Then there exists a uniquesolution of (MDE:A, V ) on the interval (s, T ] for all s < T with initial value x ∈ Rn.

From here on we always consider V to be defined on R+ ×Rn. The next theorem showsthat the solution flows of (MDE:A, V ) satisfy certain càdlàg properties. Recall that the

Page 59: Polynomial Semimartingales and a Deep Learning Approach to

CHAPTER 4. MES AND MEASURE INTEGRAL EQUATIONS 49

flow φ : T×Rn → Rn of (MDE:A, V ) is defined in case (MDE:A, V ) has a unique solutionfor all x ∈ Rn and s ∈ R+ on all intervals [s, T ] the following way. For (s, t) ∈ T andx ∈ Rn, set

φ(s, x; t) := Fs,x(t),

where Fs,x is the unique solution with initial value x at time s. In particular, it is wellknown that these flows satisfy the evolution property we have already seen when we dealtwith evolution systems.

Theorem 4.50. Consider (MDE:A, V ) with locally Lipschitz and integrable V and de-note by φ the corresponding flow. Then for any x ∈ Rn and 0 ≤ s ≤ t, the functions

s 7→ φ(s, x; t), t 7→ φ(s, x; t)

are càdlàg on [0, t] (resp. [s,∞)).

Proof. We have already shown that solutions to (MDE:A, V ) are càdlàg , which yieldst 7→ φ(s, x; t) is càdlàg on [s, T ] for all T > s, hence on [s,∞). The more challenging partis the càdlàg property in s. Let us first show right continuity, so we can assume s < t.We will apply the Gronwall inequality, see Theorem 4.55, to the function Fs+ε,x(t) whereε ≥ 0 such that s + ε < t. Let K∗ := ess supt∈(0,T ] ‖V (t, x∗)‖Rn . By our existence anduniqueness result we have

‖Fs+ε,x(t)‖Rn ≤ ‖x‖Rn +

∫(s+ε,t]

‖V (ξ, Fs+ε,x(ξ−))‖Rn A(dξ)

= ‖x‖Rn +

∫(s+ε,t]

‖V (ξ, Fs+ε,x(ξ−))− V (ξ, x∗) + V (ξ, x∗)‖Rn A(dξ)

≤ ‖x‖Rn + L

∫(s+ε,t]

‖Fs+ε,x(ξ−)− x∗‖Rn A(dξ) +K∗A(T )

≤[‖x‖Rn +A(T )(K∗ + ‖x∗‖Rn)

]+ L

∫(s+ε,t]

‖Fs+ε,x(ξ−)‖Rn A(dξ).

Define K ′ = ‖x‖Rn +A(T )(K∗ + ‖x∗‖Rn). Then, by the Gronwall inequality we have

‖Fs+ε,x(t)‖Rn ≤ K′ exp L(A(T )−A(s+ ε))

≤ K ′ exp LA(T ) =: KF .(4.31)

Hence, we get a uniform bound KF wrt ε and t. We compute

‖Fs+ε,x(t)− Fs,x(t)‖Rn ≤∫

(s,s+ε]‖V (ξ, Fs,x(ξ−))‖Rn A(dξ)

+

∫(s+ε,t]

‖V (ξ, Fs+ε,x(ξ−))− V (ξ, Fs,x(ξ−))‖Rn A(dξ)

(4.32)

Page 60: Polynomial Semimartingales and a Deep Learning Approach to

50 CHAPTER 4. MES AND MEASURE INTEGRAL EQUATIONS

We first show that the first integral on the rhs tends to zero as ε ↓ 0. We have∫(s,s+ε]

‖V (ξ, Fs,x(ξ−))‖Rn A(dξ)

=

∫(s,s+ε]

‖V (ξ, Fs,x(ξ−))− V (ξ, x∗) + V (ξ, x∗)‖Rn A(dξ)

≤∫

(s,s+ε]‖V (ξ, Fs,x(ξ−))− V (ξ, x∗)‖Rn A(dξ) +

∫(s,s+ε]

‖V (ξ, x∗)‖Rn A(dξ)

≤ L∫

(s,s+ε]‖Fs,x(ξ−)− x∗‖Rn A(dξ) +K∗(A(s+ ε)−A(s))

≤ (L(KF + ‖x∗‖Rn) +K∗)(A(s+ ε)−A(s)) =: ρ(ε).

Since A is càdlàg , this tends to zero as ε ↓ 0. Now consider again (4.32). We get

‖Fs+ε,x(t)− Fs,x(t)‖Rn ≤ ρ(ε) + L

∫(s+ε,t]

‖Fs+ε,x(ξ−)− Fs,x(ξ−)‖Rn A(dξ) (4.33)

We can therefore apply the Gronwall inequality again and get

‖Fs+ε,x(t)− Fs,x(t)‖Rn ≤ ρ(ε) exp LA(T ) .

As we have already shown that ρ(ε) tends to zero as ε ↓ 0, we have shown right continuity.

We will now show that left limits exist, so assume 0 < s ≤ t. We shall do so again byapplying the Gronwall inequality in an appropriate way. For the existence of left limits,we need to show that the limit

limε↓0

Fs−ε,x(t)

exists for all x and t ∈ [s, T ]. It suffices to show that for arbitrary positive zero sequenceεn with sn = s − εn we have that Fsn,x(t) forms a Cauchy sequence. Let m,n ∈ N belarge enough such that sn, sm > 0. Further, let sn = sn ∧ sm and sm = sn ∨ sm. Wecompute

‖Fsn,x(t)− Fsm,x(t)‖Rn ≤∫

(sn,sm]‖V (ξ, Fsn,x(ξ−))− V (ξ, x∗) + V (ξ, x∗)‖Rn A(dξ)

+

∫(sm,t]

‖V (ξ, Fsn,x(ξ−))− V (ξ, Fsm,x(ξ−))‖Rn A(dξ)

≤ L∫

(sn,sm]‖Fsn,x(ξ−)− x∗‖Rn A(dξ) +K∗ (A(sm)−A(sn))

+ L

∫(sm,t]

‖Fsn,x(ξ−)− Fsm,x(ξ−)‖Rn A(dξ).

(4.34)

Note that we have

L

∫(sn,sm]

‖Fsn,x(ξ−)− x∗‖Rn A(dξ) +K∗ (A(sm)−A(sn))

≤(LKF + L ‖x∗‖Rn +K∗

) (A(sm)−A(sn)

)=: K(n,m),

Page 61: Polynomial Semimartingales and a Deep Learning Approach to

CHAPTER 4. MES AND MEASURE INTEGRAL EQUATIONS 51

where we use (4.31) to get

maxξ∈(sn,sm]

‖Fsn,x(ξ−)‖Rn ≤ maxξ∈[sn,sm]

‖Fsn,x(ξ)‖Rn ≤ KF .

Since we have assumed t ≥ s > sm, again the Gronwall inequality yields

‖Fsn,x(t)− Fsm,x(t)‖Rn ≤ K(n,m) exp LA(T ) .

Hence, since A is càdlàg , we get that suph∈NK(n, n + h) goes to zero as n goes toinfinity, proving that indeed we have a Cauchy sequence. By completeness, the limitexists, finishing the proof.

Linear measure differential equations and MESs

We will now show how our previous discussions relate to MESs. For that, we considerthe case where V is linear, that is the exists a measurable function G : R>0 → Mn(R),such that V (t, x) = G(t)>x. Fix a function A ∈ V+

Ωand denote the induced measure on

R>0 with dA. Then,‖G‖dA,t := ess sup

dA(‖G(·)‖∞ 1·≤t),

where the norm is the operator norm wrt some fixed norm on ‖·‖Rn on Rn. By finitedimensionality, ‖·‖dA,t differs for a different choice of ‖·‖Rn only by a multiplicativeconstant since all norms in finite dimensions are equivalent. The next Lemma just statesthe obvious.

Lemma 4.51. Assume ‖G‖dA,t < ∞ for all t ≥ 0. Then, V is locally A-Lipschitz andintegrable.

Proof. Let us note that for the operator norm ‖·‖∞ we have∥∥G>∥∥∞ ≤ C ‖G‖∞ for all

matrices G ∈ Mn(R) where the constant C > 0 is independent of G. Just consider thematrix norm of the maximum of all entries of a matrix. Then under this norm, a matrixhas the same norm as it’s transposed and the rest follows from equivalence of matrixnorms due to finite dimensionality.2 Fix some arbitrary T > 0. Then ‖G‖dA,T < ∞implies the existence of a set N ⊂ (0, T ] with dA(N) = 0 s.t.

‖V (t, x)− V (t, y)‖Rn =∥∥∥G(t)>(x− y)

∥∥∥Rn≤ C ‖G(t)‖∞ ‖x− y‖Rn

≤ C ‖G‖dA,T ‖x− y‖Rn ∀t ∈ N c,

hence V is A-Lipschitz on (0, T ]. Regarding the integrability condition, we choose x∗ = 0since G(t)>x∗ = 0 for all t ≥ 0. Hence, V is A-Lipschitz and integrable on (0, T ]. SinceT was arbitrary, this finishes the proof.

2In case ‖·‖Rn is the p = 2 norm, it is easy to see that for the corresponding operator norm we have‖G‖∞ =

∥∥G>∥∥∞.

Page 62: Polynomial Semimartingales and a Deep Learning Approach to

52 CHAPTER 4. MES AND MEASURE INTEGRAL EQUATIONS

Remark 4.52. Note that the assumption in the lemma above imply existence and unique-ness of (MDE:A, V ). The important implication of V being defined by a matrix-functionG is that solutions to (MDE:A, V ) obey the superposition principle. That is under theassumption in Lemma 4.51, we can denote the unique solutions of (MDE:A, V ) withFs.x(t) and we have for any x, y ∈ Rn and (s, t) ∈ T the equality

Fs,x(t) + Fs,y(t) = Fs,x+y(t).

Linearity now implies that there exists a matrix function (s, t) 7→ M(s, t), called thefundamental solution, such that

Fs,x(t) = M(s, t)x ∀x ∈ Rn and (s, t) ∈ T.

Proposition 4.53. Consider the assumptions in Lemma 4.51. Then, the fundamentalsolution for V (t, x) = G(t)>x of (MDE:A, V ) denoted by M defines a MES P throughP (s, t) = M(s, t)> and G is the extended generator of the proper MES (P, γ,A).

Proof. We have already argued the existence ofM in Remark 4.52. By Theorem 4.50, wehave that for any x ∈ Rn, the function t 7→M(s, t)x is càdlàg on [s,∞) and s 7→M(s, t)xis càdlàg on [0, t]. This is equivalent to (s, t) 7→M(s, t) being càdlàg in the sense of MESssince we are dealing with matrices. Further, since Fs,x(s) = x for all s ≥ 0, we haveM(s, t) = Idn Note that the evolution property of solution flows implies for s ≤ u ≤ t,

M(u, t)M(s, u) = M(s, t)⇒ P (s, u)P (u, t) = P (s, t).

Hence we have that P is a càdlàg MES. Next, note that we have

P (s, t)>x = M(s, t)x = Fs,x(t) = x+

∫(s,t]

G(u)>Fs,x(u−)A(du)

= x+

∫(s,t]

G(u)>M(s, u−)xA(du)

= (Idn +

∫(s,t]

G(u)>M(s, u−))xA(du)

= (Idn +

∫(s,t]

G(u)>P (s, u−)>A(du))x

= (Idn +

∫(s,t]

P (s, u−)G(u)A(du))>x.

Since this holds for all x ∈ Rn, we get the equality

P (s, t) = Idn +

∫(s,t]

P (s, u−)G(u)A(du).

This implies that for all s ≥ 0, the function t 7→ P (s, t) is of finite variation on compacts,defining a signed vector measure γs such that P (s, u−)G(u) is a version of the Radon-Nikodým derivative wrt dA of γs. Since this holds for all s ≥ 0, we get γ

∞ dA. Hence,

(P, γ,A) is a proper MES with extended generator G as claimed.

Page 63: Polynomial Semimartingales and a Deep Learning Approach to

CHAPTER 4. MES AND MEASURE INTEGRAL EQUATIONS 53

Remark 4.54. When discussing MESs, we saw that an important property such an MESP with is whether it becomes singular. Assume G satisfies the condition of Lemma 4.51.Let M = P> be the fundamental solution of (MDE:A, V ) for g. Since P (s, t) ∈ GLn ⇔M(s, t) ∈ GLn, we have a clear interpretation for the time point t∗ defined by

t∗ = inf t > s : P (s, t) /∈ GLn .

By Proposition 4.25 (i), P (s, t∗) /∈ GLn which implies M(s, t∗) /∈ GLn. Hence, for0 /∈ y ∈ kerM(s, t∗) we have

Fs,y(t∗) = 0 or Fs,x+y(t

∗) = Fs,x(t∗).

Hence, t∗ is the first time solutions to different initial values at starting time s merge,something that can not happen in the classical, fully regular case.

4.3.3 A Gronwall type inequality for measure differential equations

Here we just state a Gronwall type inequality applicable to our setup. As mentionedbefore, this inequality has been established in [EK09] in a slightly different way. Theproof however, is also valid for our situation and we shall only proof this result here forthe readers convenience.

The setup is as follows. As before, let A ∈ V+

Ωand denote the corresponding measure

with dA. Then, dA defines a Radon measure on (R>0,B(R>0)). Note that we havedA((0, t]) = A(t)−A(0) = A(t).

Theorem 4.55. Let u : [s, T ]→ R+ for some T ∈ R>0 ∪ ∞ and s ≥ 0. Assume u tobe càdlàg . Assume further, that there exists a non-negative function u0(t) ≥ 0 such thatu satisfies the following inequality for any T ≥ t ≥ s,

0 ≤ u(t) ≤ u0(t) +

∫(s,t]

u(ξ−)A(dξ).

Then for all t ≤ T ,

u(t) ≤ u0(t) exp (A(t)−A(s)) ≤ u0(t) exp (A(t)) (4.35)

Let us first proof the following lemma about the measure of the n-dimensional simplexwrt the product measure of dA, denoted by dA⊗n. We define the n-dimensional simplexover (s, t] to be

Ins,t := (s1, . . . , sn) ∈ (s, t]n : s < s1 < s2 < · · · < sn = t .

Page 64: Polynomial Semimartingales and a Deep Learning Approach to

54 CHAPTER 4. MES AND MEASURE INTEGRAL EQUATIONS

The measure of Ins,t wrt dA⊗n is then given by

dA⊗n(Ins,t) =

∫(s,t]

∫(s,ξ1)

· · ·∫

(s,ξn−1)A(dξ)A(dξn−1) · · ·A(dξ1),

in particular it holds

dA⊗1(I1s,t) = dA(I1

s,t) =

∫(s,t]

A(dξ) = A(t)−A(s).

The following is a well established lemma for the measure of the simplex and we onlyprovide a proof for the reader’s convenience.

Lemma 4.56. For 0 ≤ s < t and n ∈ N it holds

dA⊗n(Ins,t) ≤(dA⊗1(I1

s,t))n

n!=

(A(t)−A(s))n

n!

Proof. Let Sn be the set of permutations on 1, . . . , n. Recall that |Sn| = n! and foreach σ ∈ Sn, define the sets

In,σs,t =

(s1, . . . , sn) ∈ (s, t]n : s < sσ(1) < sσ(2) < · · · < sσ(n) = t.

By symmetry we have for two arbitrary σ, σ′ ∈ Sn,

dA⊗n(In,σs,t ) = dA⊗n(In,σ′

s,t ).

Further, for all two σ 6= σ′, we have

In,σs,t ∩ In,σ′

s,t = ∅.

Using In,σs,t ⊂ (s, t]n, and dA⊗n((s, t]) = (A(t)−A(s))n, we get∑σ∈Sn

dA⊗n(In,σs,t ) ≤ dA⊗n((s, t]) = (A(t)−A(s))n,

which finally yields

dA⊗n(Ins,t) ≤(A(t)−A(s))n

n!.

We can now prove Theorem 4.55.

Page 65: Polynomial Semimartingales and a Deep Learning Approach to

CHAPTER 4. MES AND MEASURE INTEGRAL EQUATIONS 55

Proof of Theorem 4.55. Let us start by noting that because u is càdlàg , we have

u(t−) ≤ u0(t) +

∫(s,t)

u(ξ−)A(dξ).

Hence we can iterate the assumed inequality to get

u(t) ≤ u0(t) + u0(t)n∑k=1

dA⊗k(Iks,t) +

∫(s,t]

u(ξ−)A(dξ)(dA⊗n(Ins,t)).

Note that because u is càdlàg and so is A, the integral is finite and does not depend onn. An application of Lemma 4.56 now yields

u(t) ≤ u0(t)(1 +

n∑k=1

(A(t)−A(s))n

n!) +Rn (4.36)

withRn =

∫(s,t]

u(ξ−)A(dξ)(A(t)−A(s))n

n!.

Hence, limn→∞Rn → 0. Taking n→∞ in (4.36) yields the desired inequality (4.35).

Remark 4.57. Let us stress that in our version of the Gronwall inequality, it is crucialthat the integrand is the left limit of the function u, i.e. it appears u(ξ−) rather thenu(ξ). Otherwise, we would need to redefine the simplex to fit our needs to be

Ins,t := (s1, . . . , sn) ∈ (s, t]n : s < s1 ≤ s2 ≤ · · · ≤ sn = t .

In this case, we would however have

In,σs,t ∩ In,σ′

s,t 6= ∅,

which is why a more combinatorial discussion is necessary to bound the measure of thesimplex in dimension n by a summable sequence wrt n. While possible to derive sufficientconditions for such a sequence to exist in the form of ∆At < 1 for all t, the general caseis unclear.

Page 66: Polynomial Semimartingales and a Deep Learning Approach to

56 CHAPTER 4. MES AND MEASURE INTEGRAL EQUATIONS

Page 67: Polynomial Semimartingales and a Deep Learning Approach to

Chapter 5

Polynomials and matrix evolutionsystems

In the previous chapter we have seen how proper MESs (P, γ,A) relate to measure differ-ential equations through their extended generators. These systems will play an importantrole in the next chapter, where we will introduce polynomial processes and show thatthese are indeed special semimartingales, motivating the introduction of so called polyno-mial semimartingales, provided that a MES related to the process satisfies assumptionsof the kind we have encountered in the previous chapter. As the name suggests, mul-tivariate polynomials and mappings between spaces of such polynomials will play animportant role. An important observation the authors made in Cuchiero et al. [Cuc+12]is that when considering polynomials of at most a certain degree, finite dimensionality ofthese spaces implies that linear mappings such as conditional expectation operators canbe represented by matrices when fixing a basis of Polk. Further, the extended generatorof a process X, a mapping between function spaces, can be shown to map polynomialsto polynomials of at most the same degree and hence permit a representing matrix.

We will start by introducing some notation and preliminaries, relating polynomials withcorresponding Euclidean spaces, MSGs and MESs. Following that, we discuss the homo-geneous case as considered in [Cuc+12]. This will serve as a motivation for us to extendthe observations we make in that section to the two-parameter case in form of MESs.This extension will be presented in the last section of this chapter.

5.1 Preliminaries

As a first step, let us introduce some notation that will add clarity to the subsequentdiscussion. Throughout the rest of this chapter, let k ∈ N0 and d ∈ N. Recall that we

57

Page 68: Polynomial Semimartingales and a Deep Learning Approach to

58 CHAPTER 5. POLYNOMIALS AND MATRIX EVOLUTION SYSTEMS

always consider polynomials over the whole Rd, i.e. Polk = Polk(Rd) with

Polk =

x 7→k∑|l|=0

αlxl | αl ∈ R,l = (l1, . . . , ld), x

l = xl11 . . . xldd

,

where we use the multi index notation, i.e the sum is taken over all multi indices l =(l1, . . . , ld) with |l| =

∑i li ≤ k. We will usually omit the dependency wrt d since this

will not lead to any ambiguity. This includes the dimensions of the respective polynomialspaces. Hence, define Nk := dimPolk. We denote the vector space of all polynomialswith arbitrary large but finite degree with Pol =

⋃n∈N Poln.

Denote by βl = (βl1, . . . , βlNl

) a fixed basis of Poll. To keep the following discussionssimple, we make the following convention that is without loss of generality.

Convention 5.1. For the set of bases βl : l ∈ N0, it holds

(i) βi ⊂ βi+1 for 0 ≤ i ≤ i+ 1 ≤ k, and

(ii) if p ∈ βi ∩ βi+1, then p = βil = βi+1m implies m = l. Hence, βi+1 is obtained by

extending the basis βi.

Therefore we can define a basis of Pol, denoted with β = βj : j ∈ N, by βj = βij , wherei is such that j ≤ Ni. From here on we fix β and all βl.

Remark 5.2. Note that the canonical basis of Pol and Polk for k ∈ N0 introduced in thepreliminaries satisfy this condition.

To show that these conventions are without loss of generality, it will be sufficient to showthat matrix evolution systems are stable under basis transformations.

Proposition 5.3. Let T ∈ GLn.

(i) Let P be a MES. Then TPT−1 is a MES.

(ii) Let (P, γ,A) be a proper MES with generator G. Then (P , γ, A) is a proper MESwith generator G, where

P = TPT−1, γs = TγsT−1, G = TGT−1.

Proof. (i) Since P (s, s) = IdNk, we get P (s, s) = IdNk

. Further, we have for s ≤ u ≤ t,

P (s, u)P (u, t) = TP (s, u)T−1TP (u, t)T−1 = TP (s, t)T−1 = P (s, t).

Page 69: Polynomial Semimartingales and a Deep Learning Approach to

CHAPTER 5. POLYNOMIALS AND MATRIX EVOLUTION SYSTEMS 59

(ii) We have already shown that P is a MES. Further, since T and T−1 are constantmatrices, P is again càdlàg and of finite variation on compacts. Further, we com-pute

P (s, t)− P (s, u) = T (

∫(u,t]

γsdA

(ξ)A(dξ))T−1 =

∫(u,t]

TγsdA

(ξ)T−1A(dξ).

Hence, if γs is the signed vector measure induced by P , then γ∞ dA and dγs

dA =

T dγsdA T

−1, hence γs = TγsT−1. This shows that (P , γ, A) is a proper MES. We

further have

IdNk+

∫(s,t]

P (s, u−)G(u)A(du)

= T

∫(s,t]

P (s, u−)G(u)A(du)T−1 = TP (s, t)T−1 = P (s, t),

hence G is the extended generator of (P , γ, A).

When dealing with mappings from some polynomial space to another, we are interestedin properties such as continuity. Hence we will define a norm on Pol which induces atopology. We can then equip all subspaces Poln for n ∈ N with the trace topology. It iswell known that Pol can be identified with c00, the space of all zero sequences with allbut finitely many elements non-equal to zero. Then for p ∈ Pol, x ∈ Rd, there exists ac = (c1, . . .) ∈ c00 with

p(x) =∑i∈N

ciβi(x).

We define ‖·‖Pol : Pol→ R+ for a polynomial p with coefficients c ∈ c00 via

‖p‖Pol =∑i∈N|ci|.

Recall that the sum of the absolute values defines a norm ‖·‖c00 on c00, so the relationshipp↔ c defines an isometric isomorphism between Pol and c00 wrt. their respective norms.We denote this mapping with ι : Pol → c00, ι(p) = c. While Pol and equivalentlyc00 are not complete under these norms, their finite dimensional subspaces are as theycan be identified with Euclidean spaces equipped with the ‖·‖1 norm. For us, the mostinteresting subspaces of c00 are those, that correspond to Poll for some l ∈ N. Weformalize this by introducing two maps.

First, for i ∈ N, define the map Ti : c00 → Ri by (Tic)j = cj for j ≤ i. Second, for l ∈ N0,define prl : Pol→ Poll the following way. Let p ∈ Pol with ι(p) = c. Then

prl(p) =

Nl∑i=1

ciβi ∈ Poll.

Page 70: Polynomial Semimartingales and a Deep Learning Approach to

60 CHAPTER 5. POLYNOMIALS AND MATRIX EVOLUTION SYSTEMS

Pol c00

Poll RNl

prlι

TNl

ι−1

ιl

ι−1l

(a)

Pol c00

Poll RNl

ι

πl

ιl

λNl

(b)

Poll Poll

RNl RNl

τ

ιlι−1l

P

(c)

Figure 5.1: (a): Commutative diagram depicting TNl as the representing matrix of prland defining the maps ιl and ι−1

l . (b): Commutative diagram showing that πl and λNldefine the same maps under ι. (c): Commutative diagram defining the representingmatrix P .

Hence, the mappings TNl and prl define the same mappings under ι. This is shown by thecommutative diagram in Figure 5.1 (a) where the maps ιl and ι−1

l are defined by the restof the diagram, indicated by the dashed arrows. This definition is possible since prl andTNl are both surjective. Further, let πl be the inclusion map from Poll → Pol and λNlthe one from RNl → c00, i.e. in particular we have prl πl = idPoll and TNl λNl = idRNl .We then have ι πl = λNl ιl, compare Figure 5.1 (b).

We equip each Polk with the subspace norm, inducing the trace topology wrt. Pol. Sincefor v ∈ RNl , we have ‖v‖1 = ‖λNl(v)‖c00 , this shows that indeed each Polk is a completenormed space. This should be no surprise since one could identify each Polk directly witha corresponding Euclidean space. We will however be dealing with a situation where weare confronted with several subspaces of Pol simultaneously, therefore this seems to bethe most natural approach.

The following proposition states the obvious.

Proposition 5.4. The map ιl is an isometric isomorphism between (Poll, ‖·‖Pol) and(RNl , ‖·‖1) and ι−1

l is it’s inverse.

Proof. Consider Figure 5.1. Diagram (a) implies ιlprl = TNl ι and ι−1l TNl = prlι−1.

Using that ι is an isomorphism, we have ι−1l TNl ι = prl which implies prl = ι−1

l ιlprl.Since the range of prl is Poll, we get ι−1 ι = idPoll . The same way, we get ιl prl ι−1 =TNl , hence ιl ι

−1l TNl = TNl . Again we argue that since TNl has range RNl , we get

ιl ι−1l = idRNl . Let p ∈ Poll. Using that ι is isometric, we compute

‖p‖Pol = ‖πlp‖Pol = ‖(ι πl)p‖c00 = ‖(λNl ιl)p‖c00 = ‖ιlp‖1 ,

where we made use of (ι πl) = (λNl ιl) and in the last step the fact that under ourchoice of norms, the inclusion map λNl preserves the norm.

Let us now consider the finite dimensional spaces Polk and RNk and for l ≤ k, considerPoll and RNl . We have that Poll is a subspace of Polk, so an embedding is clear and givenby πl,k := prkπl. Further, the projection prk,l : Polk → Poll is defined by prk,l = prlπk.

Page 71: Polynomial Semimartingales and a Deep Learning Approach to

CHAPTER 5. POLYNOMIALS AND MATRIX EVOLUTION SYSTEMS 61

Further, RNl is only isometrically isomorph to Nl-dimensional subspaces of RNl and thereare several equally reasonable embeddings possible. However, in light of the conventionwe made for β, we choose to fix the embedding induced by λNl and TNk , hence we definethe embedding of RNl into RNk denoted by λNl,Nk : RNl → RNk and the projectionsTNk,Nl : RNk → RNl by

λNl,Nk := TNk λNl , TNk,Nl := TNl λNk .

As a next step, let us recall the following definitions. The function Hβk : Rd → RNk wasdefined as

x 7→ Hβk(x) = (β1(x), . . . , βNk(x))> .

Hence, for p ∈ Polk, we have

p(x) = ιk(p)>Hβk(x) ∀x ∈ Rd.

We will sometimes indicate p = ιk(p) through bold letters, if it is clear which k is meant,i.e. ιk(p) = p. Let τ : Polk → Polk. Define the map P : RNk → RNk via Figure 5.1 (c),i.e. P = ιl τ ι−1

l . Since ιl is an isomorphism, we get that P is linear if and only if τis linear. In that case, P can be identified wrt it’s representing matrix and in the sequel,we will always do so.

Let l ≤ k. For v ∈ RNl and w ∈ RNk , we write v Pol= w, if λNl(v) = λNk(w). This implies

w ∈ im(λNl,Nk) with λNl,Nkv = w. Further, we then have ∃!p ∈ Poll, such that

p(x) = v>Hβl(x) = w>Hβk(x), ∀x ∈ Rd.

Note that the last two implications are equivalent characterizations of v Pol= w. Let us

now define the projection of mappings τ : Polk → Polk to subspaces Poll for l ≤ k.

Definition 5.5. Let l ≤ k and consider τ : Polk → Polk. We define the map τ l : Poll →Poll, the projection to Poll by

τ l = prk,l τ πl,k,

compare the diagram in Figure 5.2 (a).

We can now define a key property of linear maps on Polk that will play a central role inour following discussions.

Definition 5.6. We call a mapping τ : Polk → Polk degree preserving on Polk (orjust degree preserving if it is clear which k is meant), if for all l ≤ k, with projections τ l,we have

πk τ πl,k = πl τ l,

or equivalentlyτ πl,k = πl,k τ l.

Page 72: Polynomial Semimartingales and a Deep Learning Approach to

62 CHAPTER 5. POLYNOMIALS AND MATRIX EVOLUTION SYSTEMS

Poll Poll

Polk Polk

πl,k

τ l

τ

prk,l

(a)

Poll RNl RNk Polk

Poll RNl c00 RNk Polk

τ l

ι−1l

P l

λNl,Nk

P

ι−1k

τ

ιl λNl λNkιk

(b)

Figure 5.2: (a): Commutative diagram defining the projection τ l of a map τ for l ≤ k.(b): Diagram showing the relation of degree preserving properties between maps andtheir respective projections.

This property can be formulated as τ(Poll) ⊂ Poll for all l ≤ k, or for all p ∈ Poll, notingthat the inclusion map satisfies πl,kp = p,

τ lp = τp.

It is clear that for degree preserving mappings τ , this property is inherited by the rep-resenting matrices of τ , denoted by P and the one fore τ l denoted by P l, see the nextproposition. We call P l the projection of P on Poll. By definition of the projection τ l,P l is given by for each

P l = TNk,NlPλNl,Nk .

Proposition 5.7. Consider τ with projection τ l to Poll and P, P l the respective repre-senting matrices. Then for any v ∈ RNl and w ∈ RNk with v Pol

= w we have

PwPol= P lv,

if and only if τ is degree preserving on Polk.

Proof. By definition of the representing matrices, P and P l are defined such that theleft and the right subdiagrams in Figure 5.2 (b) commute. If now the whole diagramcommutes, then this implies that both, τ preserves the degree and Pw Pol

= P lv, compare inparticular with . We show either condition implies that that diagram commutes. Assumenow τ is degree preserving, hence πk τ πl,k = πl πl,k τ l. Since πi = ι−1 λNi ιi fori ∈ l, k, this yields

ι−1 λNk ιk τ πl,k = ι−1 λNl ιl τl ⇒ λNk ιk τ πl,k = λNl ιl τ

l,

since ι is isomorph. Further, since πl,k = ι−1k λNl,Nk ιl, this yields using ιl ι

−1l = idNl ,

λNk ιk τ ι−1k λNl,Nk = λNl ιl τl ι

−1l ,

Page 73: Polynomial Semimartingales and a Deep Learning Approach to

CHAPTER 5. POLYNOMIALS AND MATRIX EVOLUTION SYSTEMS 63

implying that the diagram commutes. The rest is more direct. Assume Pw Pol= P lv for

all w = λNl,Nkv. Then this implies

λNl Pl = λNk P λNl,Nk ,

hence the diagram commutes. This finishes the proof.

In the sequel, we write P = ιk(τ) to indicate that P is the representing matrix of thelinear map τ : Polk → Polk. By linearity we then have for all p ∈ Polk,

(τp)(x) = ιk(τp)>Hβk(x) = (P ιk(p))

>Hβk(x) = (P p)>Hβk(x),

where Pp is the matrix-vector product. The following definition will simplify notationin the future.

Definition 5.8. For an index set I ∈ R+,R>0,T, and a map τ : I×Polk → Polk with(t, p) 7→ τtp, we call τ pointwise linear on Polk if each τt is a linear map on Polk fort ∈ I. For P : I → MNk(R), we write ιk(τ) = P if for all t ∈ I, ιk(τt) = P (t). Finally,we call τ degree preserving on Polk, if it is pointwise degree preserving on Polk.

Remark 5.9. Note that by bijectivity of ιk, one can conversely define an operator τ :I × Polk → Polk through a matrix function P : I → MNk(R) by P = ιk(τ).

5.2 Degree preserving properties of time-homogeneouspolynomial processes

Let us continue our discussions in the context of the results presented in Cuchiero et al.[Cuc+12], in particular we consider the index set I = R+. The authors consider a Markovprocess (X, (Px)x∈E) with state space E such that there exists a semigroup τ = (τt)t∈R+

on Polk (which by definition of semigroups is pointwise linear) such that for f ∈ Polk,

Ex [f(Xt)] = τtf(x),

where the expectation is taken under the measure Px and t 7→ τtf(x) is right continuousat t = 0 for all polynomials f ∈ Polk and all x ∈ E. Unlike our case, they considerpolynomials over a closed subset of Rd rather then Rd itself, namely the state space E.For our following discussions however, we assume polynomials over all of Rd.

Setting P = ιk(τ), the continuity assumption implies that P is a regular MSG, c.f.Proposition 4.2. By Proposition 4.2, P has an infinitesimal generator G with P (t) =exp(tG). For the readers convenience, let us prove these points.

Proposition 5.10. Given a pointwise linear τ indexed by R+, let P = ιk(τ). If τsatisfies the right continuity assumption at t = 0 made above, then P is a regular MSG.

Page 74: Polynomial Semimartingales and a Deep Learning Approach to

64 CHAPTER 5. POLYNOMIALS AND MATRIX EVOLUTION SYSTEMS

Proof. As mentioned, we only need to show limt↓0 P (t) = P (0) = IdNk . Since τ is asemigroup, τ0 = idPolk , hence P (0) = IdNk . For arbitrary polynomial p ∈ Polk, we haveby the semigroup property of τ that τs+tp = τsτtp, hence P (s+t)p = P (s)P (t)p. Sincep was arbitrary, we get P (s)P (t) = P (s+ t). Further,

limt↓0

τtf(x) = f(x)

by assumption. This implies

limt↓0

(P (t)f)>Hβk(x)− f>Hβk(x) = f>(

limt↓0

(P (t)− IdNk)

)>Hβk(x) = 0,

for all f ∈ RNk , x ∈ Rd, which implies the claim. Otherwise, there is a vector g 6= 0such that g>Hβk(x) = 0 for all x which is only possible for the zero polynomial that weexcluded with g 6= 0.

The next lemma shows that projections inherit the finite variation on compacts property.We consider the case I = R+, but the proof is valid for I = T if one fixes the firstcoordinate s of (s, t) ∈ T.

Lemma 5.11. Assume P = ιk(τ). The following two points are equivalent.

(i) The function t 7→ P (t) is of finite variation on compacts in each component,

(ii) for all l ≤ k, the projections P l = ιl(τl) satisfy t 7→ P l(t) is of finite variation on

compacts in each component.

Proof. As a first step, let us show that the finite variation property of P is equivalent tot 7→ τtf(x) is of finite variation for all f ∈ Polk and x ∈ Rd fixed. Note that we have

t 7→ τtp(x) = (P (t)p)>Hβk(x) =

Nk∑i=1

αi(t)βi(x) =

Nk∑i=1

τtβi(x),

for some functions αi. Hence if P satisfies the finite variation property, than so mustt 7→ τtf(x) since in the second part, only P depends on t. For the converse, assumet 7→ τtp(x) satisfies the finite variation property for all p and x while P does not. Butthat implies that there exists a vector p ∈ RNk such that at least for one i, αi is notof finite variation on compacts if x is fixed. That implies t 7→ τtβi(x) is not of finitevariation on compacts which is a contradiction since βi ∈ Polk.

Considering (ii)⇒ (i), note that this is clear since we can choose l = k, hence P k = P .We now show (i)⇒ (ii). We have by the first arguments we made that kt 7→ τf (x) is offinite variation on compacts for all f ∈ Polk, x ∈ Rd. Further, we have

τ ltf = prk,l τt πl,k.

Since prk,l and the inclusion πl,k do not depend on t, this implies t 7→ τ ltf(x) is offinite variation on compacts for all f ∈ Poll, x ∈ Rd. This in return implies by our firstarguments (ii).

Page 75: Polynomial Semimartingales and a Deep Learning Approach to

CHAPTER 5. POLYNOMIALS AND MATRIX EVOLUTION SYSTEMS 65

Note that a matrix G ∈ MNk(R) defines an operator G : Polk → Polk through (note wedo not differentiate between a linear map on RNk and the representing matrix)

Gf = ι−1k G ιk, Gf(x) = (Gf)>Hβk(x).

The authors in [Cuc+12] show that the generator G defines a map G, that coincides withthe extended generator of the process X, a concept we will present in the next chapter.A key observation is that if τt is degree preserving on Polk for all t ≥ 0, an assumptionthat is part of the definition of polynomial processes, compare [Cuc+12, Remark 2.2 (ii)],then so is G. In light of Proposition 5.7, this is equivalent to the preserving property ofthe representing matrices. For illustration, we want to rephrase the condition w Pol

= v forw ∈ RNk , v ∈ RNl . Denote by ek = ek1, . . . , ekNk the canonical standard basis of RNk .Define El = spanei : i ≤ Nl. We then have

El = w ∈ RNk : ∃v ∈ RNl s.t. w Pol= v.

The property that G is degree preserving than is equivalent to Gw ∈ El for all w ∈ Eland all l ≤ k, hence we make the following definition.

Definition 5.12. We call a matrix G ∈ MNk(R) degree preserving on Polk, if for alll ≤ k, Gw ∈ El for all w ∈ El.

Hence we have the following corollary to Proposition 5.7.

Corollary 5.13. Consider a linear mapping G : Polk → Polk with G = ιk(G). Then Gis degree preserving on Polk if and only if G is degree preserving on Polk.

In the classical homogeneous case where G can be expressed as the limit

G = limt↓0

P (t)− IdNkt

,

it follows directly that G(El) ⊂ (El) and by that G is preserving since P (t)(El) ⊂ (El)for all t and IdNk(El) = El combined with the fact that the finite dimensional subspacesEl are closed. This extends to the inhomogeneous case in which the infinitesimal gener-ator can be expressed as a limit and it is the approach that is implicitly taken in C. A.Hurtado [CAH17]. This approach will not be available to us when dealing with polyno-mial semimartingales. There is however another approach to showing that G inherits thepreserving property and it is the one the authors took in [Cuc+12].

Under the assumption that P is degree preserving, each projection P l defines a MSG.Since the projections τ l are right continuous if τ is right continuous, Proposition 5.10implies that each P l is regular with infinitesimal generator Gl. It is then possible toshow that the infinitesimal generator G of P can be represented by a sum of maps,defined by each Gl implying that G is indeed degree preserving. For us the situationpresented in the next section is different as it is not clear that a projected MES has anextended generator if the original one does. We will however be able to show that underour most general assumption that implies the existence of an extended generator (c.f.Theorem 4.35), there exists a version of the generator that preserves the degree.

Page 76: Polynomial Semimartingales and a Deep Learning Approach to

66 CHAPTER 5. POLYNOMIALS AND MATRIX EVOLUTION SYSTEMS

5.3 Degree preserving properties of matrix evolution sys-tems and extended generators

Let us now turn to the inhomogeneous setting as it is relates to our setup, hence we havethe index set I = T. Consider τ : T× Polk → Polk. We call τ càdlàg , if s 7→ τs,tf(x) iscàdlàg on [0, t] and t 7→ τs,tf(x) is càdlàg on [t,∞) for all f ∈ Polk and x ∈ Rd. DefineP = ιk(τ). We call τ an evolution system on Polk if P is a MES in dimension Nk. It isclear that τ is càdlàg if and only if P is càdlàg , since

ιk(τs,tf) = P (s, t)ιk(f), (5.1)

noting that the components of the left hand site are the coefficients of τs,tf wrt. thebasis βk. The next definition extends the degree preserving property to matrix functionssuch as MESs and extended generators. Denote with I ∈ R>0,T an index set.

Definition 5.14. We call a function T : I → MNk(R) degree preserving on Polk, iffor any i ∈ I, T (i) is degree preserving. Hence, a mapping τ : I × Polk → Polk withιk(τ) = T is degree preserving, if and only if T is degree preserving.

Let us continue our discussions with the following lemma.

Lemma 5.15. Let l ≤ k and consider a degree preserving operator τ : Polk → Polk withprojections τ l : Poll → Poll. Then τ is bijective, if and only if τ l is bijective for all l ≤ k.

Proof. The one direction, that is the if and only if part, is clear since τk = τ . Assumenow τ is bijective. Let p ∈ Poll. Then the degree preserving property implies τlp = τp,hence (τ l)−1 := prk,l τ−1 πl,k satisfies (τ l)−1τ l = τ l(τ l)−1 = idPoll .

The next is an important implication as we will need it later.

Corollary 5.16. Let l ≤ k and assume τ : T × Polk → Polk has representing matricesP = ιk(τ). Let τ l : T×Poll → Poll be the pointwise defined projections with representingmatrices P l = ιl(τ

l). If τ is degree preserving, then

t : P (t−, t) /∈ GLNk =⋃l≤k

t : P l(t−, t) /∈ GLNl

.

We now show that under the assumption that τ is degree preserving, the projectionsdefine evolution systems on the respective subspaces.

Proposition 5.17. Consider an evolution system on Polk with P = ιk(τ). If τ is degreepreserving on Polk, then we have for the projections τ l with l ≤ k that they are anevolution system on Poll and ιl(τ

l) = P l is a MES in dimension Nl. Further, if Pbelongs to a proper MES (P, γ,A), then there exist families of signed vector measuresγl ∈ MNl(R) for each l such that (P l, γl, A) is a proper MES.

Page 77: Polynomial Semimartingales and a Deep Learning Approach to

CHAPTER 5. POLYNOMIALS AND MATRIX EVOLUTION SYSTEMS 67

Proof. Fix s ≤ u ≤ t. If τ is degree preserving, then τ ls,tp = τs,tp for all p ∈ Poll.Since τs,s = idPolk , this implies τ ls,t = idPoll . This implies P l(s, s) = IdNl . Further, weget prk,l τ lu,tp = τ lu,tp. Hence, τ ls,u τ lu,tp = τs,u τu,tp = τs,tp = τ ls,tp which impliesP l(s, u)P l(u, t) = P l(s, t). Therefore, P l is a MES in dimension Nl and by that τ l anevolution system on Poll.

Assume now (P, γ,A) is a proper evolution system. By Lemma 5.11, this implies that forall l ≤ k, the functions t 7→ P (s, t) are of finite variation on compacts for all fixed s ≤ t.Since P is a càdlàg MES, so is τ by Equation (5.1). Since τ l = prk,l τ πl,k, and neitherprk,l nore πl,k depend on s or t, τ l is càdlàg which again by (5.1) implies P l is càdlàg .Here we stress the point that ιl is an isomorphism and hence v : ∃f ∈ Poll s.t. ιl(f) =v = RNl .

Let now f ∈ Polk and p ∈ Poll such that p = prk,lf . Note that prk,l is surjective. ByProposition 5.7, we have

Pιk(πl,k prk,lf)Pol= P lιl(prk,lf). (5.2)

Recall that TNk,Nl is the representing matrix of prk,l and λNl,Nk is the embedding of RNlinto RNk , i.e. we have for all f ∈ Polk the equality ιk(πl,k prk,lf) = λNl,NkTNk,Nlιk(f).Hence we get

PλNl,NkTNk,Nlιk(f)Pol= P lιl(prk,lf).

This in return yields

TNk,NlPλNl,NkTNk,Nlιk(f) = P lιl(prk,lf).

By surjectivity of prk,l, we conclude

TNk,NlPλNl,Nk = P l,

as one would expect from (5.2). To keep the notation simply, let us define F = TNk,Nland F = λNl,Nk . Since (P, γ,A) is a proper MES, we have

P l(s, t)− P l(s, u) = FP (s, t)F − FP (s, u)F

= F

∫(u,t]

dγsdA

(u)A(du)F =

∫(u,t]

FdγsdA

(u)FA(du).

Hence, the family γl defined by P l through the already established finite variation prop-erty has densities dγls

dA = F dγsdA F which implies γl

∞ dA implying that (P l, γl, A) is a

proper MES as claimed. Since l ≤ k was arbitrary, this finishes the proof.

In the following, we call (P l, γl, A) the projection of (P, γ,A) on Poll. We can nowformulate our main result that gives sufficient conditions such that all projections of(P, γ,A) on Poll have an extended generator, allowing the construction of a versionof the extended generator G of (P, γ,A) that preserves the degree on Polk. The nextTheorem can therefore be seen as an improvement to Theorem 4.35.

Page 78: Polynomial Semimartingales and a Deep Learning Approach to

68 CHAPTER 5. POLYNOMIALS AND MATRIX EVOLUTION SYSTEMS

Theorem 5.18. Given a proper MES (P, γ,A), assume Condition 4.26 (ii) holds (or(i) which implies (ii)). Assume P preserves the degree of Polk. Then there exists anextended generator G that is unique dA-a.e. outside a countable subset of R+ and thatpreserves the degree on Polk.

Proof. For k = 0, the statement is trivial since then there are no true subspaces of lowerpolynomial degree, hence in the sequel, let us assume k > 0. Since P preserves thedegree, so does τ by Corollary 5.13. By Corollary 5.16, Condition 4.26 implies that wehave for any l ≤ k, that

t : P l(t−, t) /∈ GLNl ,∆At = 0

has dA measure zero for the proper MESs (P l, γl, A) that exist by Proposition 5.17.Hence, Theorem 4.35 implies the existence of an extended generator Gl for each l ≤ k.Define the operator Gl : R>0 × Poll → Poll, by G = ιl(G), c.f. Definition 5.8, hence

(u, p) 7→ Glup(·) =(Gl(u)ιl(p)

)>Hβl(·).

Note that we use two notations, one where Gl is a map that takes arguments fromR>0 × Polk and once where Gl is a family of operators Glu defined pointwise for each uby all Gl(u) for each l ≤ k. This should not lead to any confusion but will allow us tokeep the notation simple while defining the same operators.

Let p ∈ Polk. Define now for l ≤ k, κl : Polk → Poll with κ0p = prk,0 p and for 0 < l ≤ k,define

κl p = prk,l p− πl−1,l prk,l−1 p.

Hence for any p ∈ Poll, we have κl πl,k p = 0 for l > l. Next, define G# : R>0 ×Polk →Polk by

(u, p) 7→ G#u p =

k∑l=0

πl,k Glu κl p.

By construction, G# preserves the degree on Polk pointwise for all u. Further, G# ispointwise linear since it is the sum of pointwise linear maps on Polk. Hence, for each uit defines a matrix G#(u). In other words, we have G# = ιk(G#) and by Corollary 5.13,G# is degree preserving. We now show that G# is a version of the extended generatorof (P, γ,A). Let τ be defined by ιk(τ) = P and denote with τ l and P l the respectiveprojections. Since we know that P, P l are càdlàg , Equation (5.1) implies that theexpression τs,u−f and τ ls,u−g are well defined for all f ∈ Polk and g ∈ Poll. Further, notethat since P preserves the degree by assumption, hence Corollary 5.13 implies that sodoes τ .

Note that we want to show

P (s, t) = IdNk+

∫(s,t]

P (s, u−)G#(u)A(du),

Page 79: Polynomial Semimartingales and a Deep Learning Approach to

CHAPTER 5. POLYNOMIALS AND MATRIX EVOLUTION SYSTEMS 69

which is equivalent to

P (s, t)w = w +

∫(s,t]

P (s, u−)G#(u)A(du)w

= w +

∫(s,t]

P (s, u−)G#(u)wA(du), ∀w ∈ RNk .(5.3)

Since we have equipped the finite dimensional space Polk with a norm under which itis complete, the integral is well defined in a Bochner sense. Further, since ιk preservesthe metric between Polk and RNk , we get for all pointwise linear (and measurable) c :R+ × Polk → Polk, with ιk(ct) = C(t),∫

γcξfA(dξ) = ι−1

k ∫γC(ξ)ιk(f)A(dξ), γ ∈ B(R+).

Hence, (5.3) is equivalent to showing

τs,tf = f +

∫(s,t]

τs,u− G#u fA(du), ∀f ∈ Polk.

As a first step, note that since τ is degree preserving, and càdlàg , the operator τs,u− :Polk → Polk, given by

τs,u−f = limε↓0

τs,u−εf

is also degree preserving. This is due to the fact that for arbitrary sequence un ↑ u, thedegree preserving property of τ implies τs,unf ∈ Poll for all f ∈ Poll. Since Poll ⊂ Polkis a closed subset, this implies that the limit is in Poll too. We continue by computing

f +

∫(s,t]

τs,u− G#u fA(du) = f +

k∑l=0

∫(s,t]

τs,u− πl,k Glu κl fA(du). (5.4)

Since Glu κlf ∈ Poll , the degree preserving property of τ implies τs,u− πl,k = πl,k τ ls,u−above yielding that (5.4) equals

f +k∑l=0

πl,k ∫

(s,t]τ ls,u− Glu κl fA(du)

= f +k∑l=0

πl,k ι−1l

(∫(s,t]

P l(s, u−)Gl(u)A(du)ιl(κlf)

)

= f +k∑l=0

πl,k ι−1l

(P l(s, t)ιl(κlf)− IdNl

ιl(κlf))

= f +k∑l=0

πl,k ι−1l

(P l(s, t)ιl(κlf)

)−

k∑l=0

πl,k ι−1l (IdNl

ιl(κlf))

= f +

k∑l=0

πl,k ι−1l

(P l(s, t)ιl(κlf)

)−

k∑l=0

πl,k κlf

(5.5)

Page 80: Polynomial Semimartingales and a Deep Learning Approach to

70 CHAPTER 5. POLYNOMIALS AND MATRIX EVOLUTION SYSTEMS

Noting that for all f ∈ Polk, the definition of the κl implies∑k

l=1 πl,k κlf = f , we getthat (5.5) equals

k∑l=0

πl,k ι−1l

(P l(s, t)ιl(κlf)

)=

k∑l=0

πl,k τ ls,t κlf,

where we use that τ ls,t is represented by P l(s, t) through ιl. Using once again that τ isdegree preserving, we get

k∑l=0

πl,k τ ls,t κlf =

k∑l=0

τs,t πl,k κlf = τs,t k∑l=0

πl,k κlf = τs,tf.

Because τ is represented by P , this finally yields

P (s, t) = IdNk+

∫(s,t]

P (s, u−)G#(u)A(du),

hence G# is a version of the extended generator of (P, γ,A). As this generator is uniquedA-a.e. outside a countable set by Theorem 4.35, we get G#(u) = Gk(u) dA-a.e. outsidea countable set. This finishes the proof.

Let us now consider the reverse direction, i.e. if G : R>0 → MNk(R) is degree preservingon Polk, is then the MES defined through the fundamental solution of (MDE:A, V ) alsodegree preserving? Let us make the convention that for a matrix function G indexed bysome set i, AG is defined pointwise by A(i)G(i), i ∈ I, where in case A is a constantmatrix, it is interpreted as a constant function over the index set of G. We have seenbefore that if a MES P was degree preserving, then so were it’s projections P l, whichare representing matrices of the projections τ l wrt ιl. Further, recall that we have

τ l = prk,l τ πl,k,

which in terms of the representations is equivalent to

P l = TNk,NlPλNl,Nk .

By definition, τ preserves the degree on Polk, if for all l ≤ k,

τ πl,k = πl,k τ l.

Since prk,l is surjective on Poll, this gives the equivalent characterization

τ πl,k prk,l = πl,k τ l prk,l,

hence by definition of τ l, for all l ≤ k

τ πl,k prk,l = πl,k prk,l τ πl,k prk,l.

Page 81: Polynomial Semimartingales and a Deep Learning Approach to

CHAPTER 5. POLYNOMIALS AND MATRIX EVOLUTION SYSTEMS 71

By Corollary 5.13, we know that τ is degree preserving if and only if P is degree pre-serving. Hence, P is degree preserving if and only if

PλNl,NkTNk,Nl = λNl,NkTNk,NlPλNl,NkTNk,Nl . (5.6)

Alternatively, P is degree preserving by definition, provided for all l ≤ k,

Pw ∈ El, ∀w ∈ El.

Let Ll ∈ MNk(R) be the diagonal matrix with elements (Ll)i,j = 1i=j1i≤Nl. Giventhat El is the subspace of RNk that contains precisely those elements that are identifiedunder ιk with polynomials of at most degree l, a closer look shows λNl,NkTNk,Nl = Ll andsince Llw = w for all w ∈ E, we get P is degree preserving, if and only if for all l ≤ k,

LlPLlw = PLlw, ∀w ∈ RNk ⇔ LlPLl = PLl

which is the same as (5.6). Denote further with Lcl the diagonal matrix with Lcl =IdNk

−Ll. Note that by surjectivity of Ll on El and imLl = El, we have for a matrixM ∈ MNk(R),

LlM = 0⇔ f>M = 0, ∀f ∈ El. (5.7)

Let us now proof the following theorem.

Theorem 5.19. Let G : R>0 → MNk(R) and A ∈ V+

Ω. Consider (MDE:A, V ) for

V (t, x) = G(t)>x and assume ‖G‖dA,t < ∞ for all t ≥ 0. Then there exists a properMES (P, γ,A) for which G is an extended generator. If G preserves the degree on Polk,then so does P .

Proof. The first statement, that is the existence of a proper MES (P, γ,A) is the state-ment of Proposition 4.53. Let M be the fundamental solution, i.e. M = P> and recallthat we have uniqueness of solutions to (MDE:A, V ). We first show

f>M(s, t)Ll = f>LlM(s, t), ∀f ∈ El. (5.8)

Let w ∈ RNk be arbitrary. Then, since M is the fundamental solution, we have

LlM(s, t)w = Llw +

∫(s,t]

LlG(u)>M(s, u−)wA(du).

Since Ll is a diagonal matrix we have Ll = L>l . Further, since by assumption G is degreepreserving, we have LlGLl = GLl which implies LlG(u)> = LlG(u)>Ll, yielding

LlM(s, t)w = Llw + Ll

∫(s,t]

G(u)>LlM(s, u−)wA(du).

Page 82: Polynomial Semimartingales and a Deep Learning Approach to

72 CHAPTER 5. POLYNOMIALS AND MATRIX EVOLUTION SYSTEMS

In the same way, M(s, t)Llw is the unique solution with initial value Llw, hence we have

M(s, t)Llw = Llw +

∫(s,t]

G(u)>M(s, u−)LlwA(du).

Define g(s, t) = LlM(s, t) − M(s, t)Ll, hence since M(s, s) = IdNk, g(s, s) = 0. We

compute for arbitrary w ∈ RNk , using LlG(u)>Ll = LlG(u)> by the degree preservingassumption,

g(s, t)w =

∫(s,t]

LlG(u)>M(s, u−)wA(du)−∫

(s,t]G(u)>M(s, u−)LlwA(du)

= Ll

∫(s,t]

G(u)>LlM(s, u−)wA(du)−∫

(s,t]G(u)>M(s, u−)LlwA(du)

=

∫(s,t]

G(u)>g(s, u−)wA(du)− Lcl∫

(s,t]G(u)>LlM(s, u−)wA(du)

Since we have LlLcl = 0 and using again LlG(u)>Ll = LlG(u)>, this gives us

Llg(s, t)w = Ll

∫(s,t]

G(u)>g(s, u−)wA(du)

=

∫(s,t]

(G(u)Ll)>Llg(s, u−)wA(du).

Since ‖Ll‖ <∞, we have ‖GLl‖dA,t ≤ ‖G‖dA,t ‖Ll‖ <∞ for all t ≥ 0. This implies thatLlg(s, t)w is the unique solution to (MDE:A, V ) for V (t, x) = (G(u)Ll)

>x with initialvalue 0, hence Llg(s, t)w ≡ 0 for all w ∈ RNk , implying Llg(s, t) = 0. This impliesf>g(s, t) = 0 for all f ∈ El, which in turn implies (5.8). Hence we get

LlP (s, t)f = P (s, t)f, ∀f ∈ El,

implying P preserves the degree on Polk since l ≤ k was arbitrary.

Page 83: Polynomial Semimartingales and a Deep Learning Approach to

Chapter 6

Polynomial Semimartingales

In this chapter we will introduce and discuss the core subject of this part of this thesis,polynomial processes and polynomial semimartingales. As discussed in the intro-duction, it is our goal to combine the two approaches taken in [FL16] by Filipović andLarsson and [Cuc+12] by Cuchiero et al. In particular, we want to extend the class ofpolynomial processes, see Chapter 1, to include those processes that map polynomialsof a certain degree to polynomials of at most the same degree while allowing for timeinhomogeneity and stochastic discontinuity.

While Cuchiero et al. started with a Markov process and hence with a semigroup thatallows for a straight forward definition of the polynomial property, Filipović and Larssonstart by considering diffusion processes with coefficients satisfying a polynomial conditionwhich is equivalent to assuming an invariance property of the extended generator of thediffusion. This then is shown to imply a polynomial property in the spirit of [Cuc+12]under natural integrability assumption. Hence the starting point of the discussions isa semimartingale setting. One key observations the authors make in [FL16] is thatprocesses as they introduce them do not necessarily need to be Markov process, or to beprecise they do not assume that it is always possible to embed their class of processes ina Markov process.

Our approach relates to the two above in the following way. We start our discussions byconsidering a càdlàg process and define the polynomial property in a natural way, henceunder a fixed stochastic basis similar to [FL16]. This however will lead to technicalchallenges that would not exist if one would start with a semigroup (Pt)t≥0 such that forany x ∈ E with E being the state space and f ∈ Polk one has

Ptf(x) := Ex [f(Xt)] , ∀x ∈ E,

which defines a function on all of E. In our situation however, we do not want to apriory fix a state space E since we neither consider a Markov-process, nor do we want to

73

Page 84: Polynomial Semimartingales and a Deep Learning Approach to

74 CHAPTER 6. POLYNOMIAL SEMIMARTINGALES

deal with time varying supports. Hence we consider processes taking values in all of Rdinstead of a subset E. In that case however, we face the issue that there might be twog, g′ ∈ Polk with g 6= g′ but "agree" on the support of the conditioning value, meaningone has the a.s. identity

E [f(Xt) | Xs] = g(Xs) = g′(Xs).

In the setting of [Cuc+12], this can not happen since polynomials are considered overE, hence one would have the identification g = g′. The goal however is to construct aMarkov-semigroup-like structure defined on polynomials that allows for similar discus-sions as was done in [Cuc+12]. It turns out that the concept needed for these discussionsare the matrix evolution systems introduced in Chapter 4. As we will see, the regularityassumptions we made for MESs there are well suited to study our class of polynomialprocesses allowing for stochastic discontinuity.

We will also see that our processes are semimartingales on our fixed stochastic basis simi-lar to [FL16] and unlike [Cuc+12], where semimartingality is defined for each probabilitymeasure Px, x ∈ E and hence with Llaw(X0) = X0#P = δx. Hence, we will introduceour class of processes as polynomial processes, but in light of the semimartingality resultrefer to them finally as polynomial semimartingales.

Recall that with ‖·‖Rd we denote any norm on Rd which is reasonable as by equivalenceof all norms in Rd, the statements in this chapter will be independent of the concretechoice unless otherwise stated. We will sometimes pick a concrete norm since it willsimplify notation without losing us generality. For that, recall that the p-norms on Rmfor p ≥ 1 are defined by

‖x‖pp =m∑i=1

|xi|p.

Also recall, that by the conventions we made in the preliminaries, we define the poly-nomials fi, i ∈ 1, . . . , d on Rd as the projections to the i-th coordinate, i.e. forx = (x1, . . . , xd)

> ∈ Rd we have fi(x) = xi.

We will further use the notation introduced in the previous chapter. Further, Con-vention 5.1 holds to simplify notation. Note that this is w.l.o.g. since the followingdiscussions hold for arbitrary choice of a basis of the spaces Polk, k ∈ N unless explicitlyindicated otherwise.

In the sequel, whenever we consider a stochastic basis (Ω,F , P ), we assume that theusual conditions hold regardless of whether we explicitly say so or not. Further, let usfix the notation F = (Ft)t≥0.

6.1 Setting and the polynomial property

Consider the stochastic basis (Ω,F ,F, P ) satisfying the usual conditions and let X bean Rd-valued adapted càdlàg process on this basis. Recall that with Polk we always

Page 85: Polynomial Semimartingales and a Deep Learning Approach to

CHAPTER 6. POLYNOMIAL SEMIMARTINGALES 75

mean Polk(Rd). Recall that for each k ∈ N0, we defined Nk := dimPolk. If there is noambiguity, we will write N for Nk, hence omit the degree k.

We start with a lemma regarding integrability assumptions we will make later on.

Lemma 6.1. Given a random variable Y ∈ Rd and a positive integer k ∈ N, the followingintegrability conditions are equivalent:

(i) E [|f(Y )|] <∞ for all f ∈ Polk

(ii) E[‖Y ‖kRd

]<∞ for all choices of ‖·‖Rd .

Proof. (i) ⇒ (ii): First note that in (ii), by the equivalence of all p-norms on Rd, itis always sufficient to to show the integrability condition for p = 1. Recall that fromJensen’s inequality for sums, we have for ai ∈ R, i = 1, . . . , l, l ∈ N,(

l∑i=1

ai

)k≤

(l

l∑i=1

1

lai

)k= lk

l∑i=1

1

l(ai)

k = lk−1l∑

i=1

(ai)k. (6.1)

Denote by fki (x) = xki . Hence fki ∈ Polk for all i = 1, . . . , d. We have

E[‖X‖k1

]= E

( d∑i=1

|Xi|

)k (6.1)≤ dk−1

d∑i=1

E[|Xi|k

]. (6.2)

Since |Xi|k =∣∣fki (X)

∣∣, we get (ii) for p = 1 and hence for all p. We now show the otherdirection.

(ii)⇒ (i): First note that we have |Xi| ≤ ‖X‖1. Further, for any f ∈ Polk, there existsa constant C depending on f such that

|f(X)| ≤ C(1 +d∑i=1

|Xi|k) ≤ C(1 + d · ‖X‖k1). (6.3)

Hence (i) follows from (ii) for p = 1 and we are done.

A direct consequence of the previous lemma is the following lemma.

Lemma 6.2. Let X be a stochastic process and k ∈ N a positive integer. Then thefollowing two integrability conditions are equivalent:

(i) for all f ∈ Polk and t > 0 we have

sups≤tE [|f(Xs)|] <∞

Page 86: Polynomial Semimartingales and a Deep Learning Approach to

76 CHAPTER 6. POLYNOMIAL SEMIMARTINGALES

(ii) for all p ≥ 1 we have

sups≤tE[‖Xs‖kp

]<∞

Proof. (i)⇒ (ii): We have seen from the proof of Lemma 6.1 that

sups≤tE[‖Xs‖k1

]≤ dk−1

d∑i=1

sups≤tE[∣∣∣fki (Xs)

∣∣∣] .Since by assumption each summand is finite and we sum over finitely many values, thefirst direction is proven.

(ii)⇒ (i): Again from the proof of Lemma 6.1, we have

sups≤tE [|f(Xs)|] ≤ sup

s≤tE[C(1 + d ‖Xs‖k1

]= C + d sup

s≤tE[‖Xs‖k1

],

which finishes the proof.

As motivation for Definition 6.4 further down, let us state the following well known resultunder the assumption of square integrability, see [JP12, Theorem 23.3].

Theorem 6.3. Let (Ω,F , P ) be a probability space with sub σ-algebra A ⊂ F . LetY ∈ L2 (Ω,F , P ) with the usual P -a.s. identification. If for some random variable Z wehave σ(Z) = A, then there exists a Borel measurable function f such that

E [Y | A] = f(Z) [P ].

The idea is now the following. Given a càdlàg process X, set Y = ϕ(Xt) and Z = Xs

for s ≤ t, where ϕ : Rd → R and require that if ϕ belongs to a certain class of functions,then so does f . In the context of polynomial processes, this class of functions shall bethe space of multivariate polynomials up to a given degree.

Definition 6.4. We say the process X has the k-polynomial property, if for all l ∈0, . . . , k, f ∈ Poll we have E

[‖Xt‖k1

]<∞ ∀ t ≥ 0 and for any 0 ≤ s ≤ t there exists a

qfs,t ∈ Poll such that we have

E [f(Xt) | Fs] = qfs,t(Xs) [P ]. (6.4)

Finally, if X is has the k-polynomial property for all k ∈ N, we say it has the polynomialproperty.

Page 87: Polynomial Semimartingales and a Deep Learning Approach to

CHAPTER 6. POLYNOMIAL SEMIMARTINGALES 77

Remark 6.5. (i) Since F = (Ft)t≥0 is a filtration, we have due to Equation (6.4) andthe tower property,

E [f(Xt) | Xs] = E [E [f(Xt) | Fs] | Xs] = E[qfs,t(Xs) | Xs

]= qfs,t(Xs). (6.5)

Hence, a process with the k-polynomial property satisfies the Markov property on thefinite dimensional space Polk rather on the space of bounded measurable functions,compare e.g. [EK09, Chapter 4.1].

(ii) Note that in Definition 6.4, property (6.4) holds for all degrees l ≤ k, hence thepolynomial qfs,t has at most the same degree as f . This is similar to Cuchieroet al. [Cuc+12] where the same condition is required and the reason given there,c.f. Remark 2.2 (ii). For us too, this degree preserving property is needed forTheorem 6.44 and Theorem 6.57 that we too consider the most important from anapplication point of view. The reader will notice that this property corresponds todegree preserving properties of maps τ : T × Polk → Polk and indeed we will seethat this precisely why we make the definition in this way.

Note that in Definition 6.4, we do not require qfs,t to be unique. This has several conse-quences which we want to discuss now.

Remark 6.6. First, note that we consider polynomials over the whole space Rd, so whiletwo polynomials f, f ′ might satisfy f(Xs) = f ′(Xs) a.s., they do not need to be the same.Just consider the following example where Z is standard normal and X := (Z,Z). Henced = 2. However, the support of X is a submanifold of R2, and for f(x, y) = x andf ′(x, y) = y we have f(X) = f ′(X) a.s. while f 6= f ′. This implies that in general, themap f 7→ qfs,t above is not well defined as a map from Poll to Poll.

Let τ : T×Polk → Polk with (s, t, f) 7→ τs,tf such that for any (s, t) ∈ T and l = 0, . . . , k,τs,t(Poll) ⊂ Poll and

E [f(Xt) | Fs] = (τs,tf)(Xs) [P ]. (6.6)

Note that by Remark 6.6, there might be more than one such function τ . Further, atleast one such function always exists by defining it pointwise for any (s, t) ∈ T throughthe axiom of choice. A construction without invoking the axiom of choice by finitedimensionality of Polk is actually possible by the next remark.

Remark 6.7. One might think that ∀(s, t) ∈ T, the map τs,t is linear (or τ is pointwiselinear) due to the linearity of the conditional expectation. However, by Remark 6.6, τs,tis not unique. This is the reason why a priory we do not have linearity. In Example 6.11below, we provide an example in which τs,t for fixed (s, t) ∈ T is not linear. However, byfixing an arbitrary basis β of Polk, one can always construct a pointwise linear τ from τin the following way. For f ∈ Polk, let ~α ∈ RNk s.t.

f(x) =

Nk∑i=1

αiβi(x) ∀x ∈ Rd. (6.7)

Page 88: Polynomial Semimartingales and a Deep Learning Approach to

78 CHAPTER 6. POLYNOMIAL SEMIMARTINGALES

Then, we define τs,t : Polk → Polk by

τs,tf =

Nk∑i=1

αiτs,t(βi). (6.8)

Then, the linearity of the conditional expectation operator implies

τs,tf(Xs) = τs,tf(Xs) [P ].

Further, τ is linear by construction, which finishes the overall construction. Note thatthe construction of τ in (6.8) only required τ to be defined on a finite set, namely thebasis. Hence, τ can be constructed without an explicit invoking of the axiom of choice.

Remark 6.8. Let us note that the tower property of the conditional expectation wouldindicate that a map τ would satisfy an evolution property, i.e.

τs,u τu,tf = τs,tf. (6.9)

Again, this is in general not true, even if τ is pointwise linear. We provide a counterex-ample with Example 6.11. Even the condition τs,s = id would not need to be satisfied inlight of Remark 6.6.

Remark 6.9. As one expects, the above points regarding linearity and the evolutionproperty are rendered mute if one assumes τ to be unique. In that case, linearity andthe evolution property follow by Proposition 6.13. Unlike the case for linearity however,we can not always assume existence of a τ satisfying Equation (6.9) in the general case.Let us make the following observation. If we were to consider a discrete time model, i.e

(n,m) ∈ N20 : n ≤ m

rather then T, we can construct a τ that is pointwise linear and

satisfies Equation (6.9) by first constructing a pointwise linear version, again denoted byτ and then defining for each n ∈ N0

τn,n := idPolk and τn,n+1 := τn,n+1.

Finally, we extend τ to the whole parameter set by defining

τn,n+m := τn,n+1 . . . τn+m−1,n+m.

Then, by construction, τ satisfies Equation (6.9) and is pointwise linear. This con-struction would not work in the continuous parameter case. There however, providedan infinitesimal description of the evolution “τs,t → τs,t+ε” is available in form of whatwe will introduce as an extended generator, it is possible to show existence of such a τ ,compare Theorem 6.41.

The following proposition is just a summary of the previous remarks regarding existenceof a linear map τ .

Page 89: Polynomial Semimartingales and a Deep Learning Approach to

CHAPTER 6. POLYNOMIAL SEMIMARTINGALES 79

Proposition 6.10. Let X have the k-polynomial property. Then there exists a mapτ : T × Polk → Polk such that for each (s, t) ∈ T, Equation (6.6) is satisfied and τs,t islinear on Polk.

Let us now give the example we promised in the remarks above regarding nonlinearityand violation of the evolution property.

Example 6.11. Let X be an R-valued Markov process with Xt ∈ −1, 1∀t ≥ 0 definedby P (X0 = −1) = h = 1 − P (X0 = 1) for some h ∈ [0, 1] and Xt = Xbtc such that weonly need to specify Xn for n ∈ N and τn,m for n,m ∈ N with n ≤ m. For that, we define

P (Xn+1 = −1 | Xn) =

p ∈ [0, 1], if Xn = −1

p ∈ [0, 1], if Xn = 1.

For n ∈ N0, we define τn,nf(0) = f(0) which satisfies Equation (6.6). Further, we have

E[f(1)(Xn+1) | Xn

]=

1− 2p, if Xn = −1

1− 2p, if Xn = 1.

Hence τ must satisfy

τn−1,nf(1)(−1) = 1− 2p

τn−1,nf(1)(1) = 1− 2p,(6.10)

and it is clear that there is always exactly one polynomial g1 ∈ Pol1 that satisfies (6.10)so we define τn−1,nf(1) = g1. Next, note that

E[f(2)(Xn+1) | Xn

]=

1, if Xn = −1

1, if Xn = 1.

Hence τ must satisfy

τn−1,nf(2)(−1) = 1

τn−1,nf(2)(1) = 1,(6.11)

and there are infinitely many polynomials g2 ∈ Pol2 such that (6.11) is satisfied. Letus set p = 1

2 and p = 34 . Then g1(x) = −x

4 −14 and one possible choice for g2 is

g2(x) = f(2)(x) = x2. Next, consider the polynomial f = f(2) − f(1). Again, we have

E [f(Xn+1) | Xn] =

2p = 1, if Xn = −1

2p = 32 , if Xn = 1.

Hence τ must satisfy

τn−1,nf(−1) = 1

τn−1,nf(1) =3

2.

(6.12)

Page 90: Polynomial Semimartingales and a Deep Learning Approach to

80 CHAPTER 6. POLYNOMIAL SEMIMARTINGALES

Hence, for g(x) := 54 + x

4 , we can set τn−1,nf = g and (6.12) is satisfied. Another choicecould have been g2 − g1, compare Equation (6.8), but our choice is valid as it maps to apolynomial that satisfies Equation (6.4). But since g 6= g2 − g1, we see that τ is not apointwise linear map as linearity would require g = g2 − g1.

Example 6.12. Next we show that the evolution property can be violated. So far, wehave defined τ only for the indices (n − 1, n). Let us change our example above toτn−1,nf = τn−1,nf(2) − τn−1,nf(1). In fact, since (f(0), f(1), f(2)) forms a basis of Pol2,let us extend τn−1,n to all of Pol2 as we did in Equations (6.7) and (6.8). Hence, nowfor each n, τn,n and τn−1,n are linear maps. Note that we only need to extend τ to theparameters n < m with m− n > 1 in order to have τ fully specified. This could be doneas proposed in Remark 6.9 and by construction, the evolution property would be satisfied.This construction however is not the only choice possible as we show in the following.Consider m = n+ 2. We have τn,n+1f(2) = f(2). This fixed point property combined withthe tower property yields

E[f(2)(Xn+2) | Xn

]= E

[τn+1,n+2f(2)(Xn+1) | Xn

]= E

[f(2)(Xn+1) | Xn

]= E [1 | Xn] = 1 [P ],

hence we can set τn,n+2f(2) = f(0) together with τn,n+2f(i) = τn,n+1 τn+1,n+2f(i) fori = 0, 1 and extend to all of Polk as we did in Equations (6.7) and (6.8). But then

f(0) = τn,n+2f(2) 6= τn,n+1 τn+1,n+2f(2) = f(2),

so the evolution property (6.9) does not hold, which finishes our example.

In the example above, we have used the non-uniqueness in Equation (6.4) to constructnonlinearity and a violation of the evolution property. The following proposition showsthat in case of uniqueness, both properties would be implied.

Proposition 6.13. Let X have the k-polynomial property and assume for all 0 ≤ t andf ∈ Polk, the polynomial qfs,t ∈ Polk for which Equation (6.4) is satisfied is unique.Then, the mapping τ is uniquely defined, pointwise linear and satisfies the evolutionproperty together with τs,s = idPolk for all s ≥ 0. Further, for all (s, t) ∈ T we haveτs,t(Poll) ⊂ Poll for all l ∈ 0, . . . , k and τs,t is bijective for all (s, t) ∈ T.

Proof. Uniqueness of τ is implied by the uniqueness of qfs,t. Note that by the definitionof the k-polynomial property we can deduce deg(qfs,t) ≤ deg(f). Further, uniquenessimplies that for t ≥ 0 and p, q ∈ Polk, p(Xt) = q(Xt)P -a.s. implies p = q. Hencelinearity follows from linearity of conditional expectation. The evolution property nowcomes from the tower property, again using the implication p = q if p(Xt) = q(Xt)P -a.s.,yielding

τs,u τu,tf(Xs) = E [τu,tf(Xu) | Fs]= E [E [f(Xt) | Fu] | Fs] = E [f(Xt) | Fs] = τs,tf(Xs) [P ],

Page 91: Polynomial Semimartingales and a Deep Learning Approach to

CHAPTER 6. POLYNOMIAL SEMIMARTINGALES 81

and hence τs,uτu,t = τs,t. The property that τ is pointwise bijective is just a rephrasing ofexistence and uniqueness of qfs,t for all f ∈ Polk. Finally, note that τs,sf(Xs) = f(Xs) P -a.s. which implies τs,s = idPolk .

In the previous chapter we have called mappings τ evolution systems if the correspondingrepresenting matrices form a MES. It is clear that this definition can be formulated with-out invoking representing matrices. Hence we introduce the next equivalent definition.Additionally, we define what it means if a mapping τ belongs to a process X.

Definition 6.14. We call a mapping τ , that is pointwise linear, satisfies the evolutionproperty with τs,s = idPolk ∀s ≥ 0 an evolution system on Polk. If in addition, it mapsfor fixed (s, t) ∈ T a polynomial f ∈ Polk to a polynomial of at most the same degree (seeDefinition 6.4) such that τs,tf satisfies (6.6), we call it an evolution system of X onPolk.

Remark 6.15. Note that by the definition above, if τ is an evolution system of a k-polynomial process X on Polk, then it is automatically degree preserving on Polk.

From Proposition 6.13 we know that a sufficient condition for τ to be linear and satisfythe evolution property is uniqueness of the polynomials qfs,t in Equation (6.4). The nextcondition will be sufficient for such a uniqueness.

Condition 6.16. We say that X satisfies the full support condition, if there is aD ⊂ Rd such that it holds

D ⊂⋂s≥0

supp(Xs) (6.13)

with D 6= ∅.

As noted in Remark 6.5 (i), k-polynomial processes satisfy the Markov-property on Polkinstead of bE(Rd) as one often requires (compare e.g. Blumenthal and Getoor [BG07,Definition 1.1 and Theorem 1.3 (iii)]). Indeed it does not follow that equation (6.4)implies that X can be embedded in a Markov process. Therefore we can not assumethe existence of an infinite dimensional evolution system on bE(Rd), compare Böttcher[Böt14, Section 2], as is usually the starting point when studying Markov processes.

The next lemma shows that if Condition 6.16 holds, thenX has a unique evolution systemon Polk for all k ∈ N0. In particular, by the definition of the polynomial property, theevolution system will be degree preserving.

Lemma 6.17. Let X satisfy the polynomial property and assume that the full supportcondition is satisfied. Then there is a mapping τ : T× Pol→ Pol such that τ |Polk is theunique evolution system for X on Polk for all k ∈ N.

Page 92: Polynomial Semimartingales and a Deep Learning Approach to

82 CHAPTER 6. POLYNOMIAL SEMIMARTINGALES

Proof. By Proposition 6.13, it is sufficient to show that for all all s ≥ 0, the implication

p(Xs) = q(Xs) [P ] for p, q ∈ Pol⇒ p = q,

holds. Hence assume p(Xs) = q(Xs) for fixed s ≥ 0. Note that it suffices to show thatp = q on a dense subset of an open ball in Rd since that uniquely determines a polynomialon Rd.

Since p(Xs) = q(Xs), there exists a set J ⊂ Ω, such that P (J ) = 1 and

p(Xs(ω)) = q(Xs(ω)) for all ω ∈ J . (6.14)

Set I = Xs(J ) so that we have P (Xs ∈ I) = 1. Hence, since we have

D ⊂ supp(Xs) =⋂C

C : C ⊂ Rd closed , P (Xs ∈ C) = 1

, (6.15)

we get that D ⊂ I. Note that (6.14) is equivalent to

p(x) = q(x) ∀x ∈ I. (6.16)

By the full support condition, there exists an non-empty open ball B ⊂ Rd with B ⊂D ⊂ I. As I is dense in I, we have that I ∩B is dense in B. Combined with (6.16), thisfinishes the proof for uniqueness of the polynomial in (6.4). Hence, by Proposition 6.13for each k ∈ N there is a unique τk which is an evolution system on Polk for X and forall (s, t) ∈ T, m ∈ N0,

τk+ms,t |Polk = τks,t,

defining a mapping τ : T×Pol→ Pol with the desired properties which finishes the proofof this Lemma.

Remark 6.18. The full support condition at s = 0 might appear rather restrictive. Justthink of the prime example of a process with the polynomial property in d = 1, namelyBrownian motion (c.f. C. A. Hurtado [CAH17] or Filipović and Larsson [FL16]).Indeed, if one considers that Brownian motion B starting at some x ∈ R, then the fullsupport condition is violated in s = 0. This however is an a priory restriction in oursetup. Instead, one should think of Brownian motion as a process on our stochastic basiswith initial value B0, where B0 is a F0 measurable random variable. Then, it is very wellpossible to have B0 be distributed by an initial distribution that satisfies the full supportcondition since F0 does not need to be the trivial σ-algebra. The situation where B startsin x is then recovered by conditioning on B0 = x which is a measurable set since byour initial assumptions, the filtration is complete.

In the following, for a process X with the k-polynomial property, we will indicate that amapping τ is an evolution system on Polk for X by writing

XPolk−→ τ.

Page 93: Polynomial Semimartingales and a Deep Learning Approach to

CHAPTER 6. POLYNOMIAL SEMIMARTINGALES 83

Definition 6.19. We call a pair (X, τ) k-polynomial process for an adapted càdlàgprocess X and a mapping τ : T × Polk → Polk, if X has the k-polynomial property andX

Polk−→ τ .

In the sequel we will not assume uniqueness of τ as it is sufficient for our purposes to justhave one degree preserving evolution system available. However, the example furtherdown shows that even under perfect regularity conditions for X such as continuity andboundedness, it is then possible that the càdlàg property of X is not inherited by τ .Recall that in the last chapter we equipped Pol with the topology induced by the norm‖·‖Pol and the subspaces Polk are equipped with the subspace norm. In particular we getthat the topology induced on Polk coincides with the trace topology. Recall the followingdefinition.

Definition 6.20. We call a mapping τ : T × Polk → Polk càdlàg , if for any f ∈ Polkand x ∈ Rd,

(i) the function t 7→ τs,tf(x) is càdlàg on [s,∞) and

(ii) the function s 7→ τs,tf(x) is càdlàg on [0, t].

This definition is compatible with our topology on Polk as demonstrated by the nextproposition.

Proposition 6.21. Consider a mapping τ : R+ × Polk → Polk and let tn be a sequencein R+ converging to t. Then

limn→∞

‖τtnf − τtf‖Pol = 0 (6.17)

if and only if for all x ∈ Rd,

limn→∞

|τtnf(x)− τtf(x)| = 0. (6.18)

Proof. Note that since τtn , τt map polynomials to polynomials of at most degree k, wehave

τtnf(x)− τtf(x) = ((τtn − τt)f)(x) =

Nk∑i=1

αn,iβi(x),

where for each i, (αn,i)n is a sequence in R. Note that by the definition of ‖·‖Pol, (6.17)is equivalent to αn,i being a zero sequence in n for each n. Hence, if (6.17) holds, then(6.18) follows.

For the other direction, note that since we are dealing with polynomials, we get that theleft hand side converges uniformly to zero on all compact sets, including B1(0). But thatimplies that (τtn − τf )f converges to the zero polynomial with respect to the supremum-norm on B1(0), hence all coefficients αn,i for all i must be zero sequences in n, implying(6.17).

Page 94: Polynomial Semimartingales and a Deep Learning Approach to

84 CHAPTER 6. POLYNOMIAL SEMIMARTINGALES

Let us now give an example, showing that τ can loose the càdlàg property of X.

Example 6.22. Consider the deterministic process defined below and the basis β =(β0, β1, β2) of Pol1(R2), also defined below. We shall use the notation z = (x, y) ∈ R2.

Xt =

(0t

), β0(x, y) = 1, β1(x, y) = x, β2(x, y) = y.

Since X is deterministic, we have for any f ∈ Pol1,

E [f(Xt) | Fs] = f(Xt).

Since we require the evolution system τ of X on Pol1 to be pointwise linear, it is enoughto specify τs,tβi for i = 0, 1, 2. The polynomial property implies

1 = β0(Xt) = c00(s, t)β0(Xs)

0 = β1(Xt) = c10(s, t)β0(Xs) + c11(s, t)β1(Xs) + c12(s, t)β2(Xs)

t = β2(Xt) = c20(s, t)β0(Xs) + c21(s, t)β1(Xs) + c22(s, t)β2(Xs),

where the functions cij : T→ R must be chosen so that τ satisfies the evolution propertywith τs,s = idPol1. Using β1(Xt) ≡ 0 and β2(Xt) = t for all t ≥ 0, we see that thespecification c00 = c11 = c22 = 1, c10 = c12 = c21 = 0 and c20(s, t) = t − s satisfiesthe above equations and defines τ to be an evolution system on Pol1 as one sees byobserving τs,tβi = βi for i = 0, 1 which yields the evolution property for the first two basispolynomials. For the third, consider for 0 ≤ s ≤ u ≤ t

τu,tβ2 = (t− u)β0 + β2 ⇒ τs,u τu,tβ2 = (t− u)τs,uβ0 + τs,uβ2 = t− s,

which shows τ is indeed an evolution system of X on Pol1. However, a closer look atthe necessary conditions on the functions cij above shows that due to β1(Xt) = 0 for allt ≥ 0, one can actually change the definition of c11 by setting c11(s, t) = ρ(s, t), where ρis an arbitrary MES in dimension d = 1. Hence,

τs,tβ1(x, y) = ρ(s, t)β1(x, y) = ρ(s, t)x.

Then τ is still an evolution system of X on Pol1. Consider now Example 4.12 and defineρ as in that example. As we have already seen, this defines an evolution system which isnot càdlàg in s = 1 for t = 1. Hence, τ is not a càdlàg evolution system.

Motivated by the example above, we will make regularity assumptions in the next sectionthat allow us to consider MESs that satisfy regularity properties suitable for our followingdiscussions.

Page 95: Polynomial Semimartingales and a Deep Learning Approach to

CHAPTER 6. POLYNOMIAL SEMIMARTINGALES 85

6.2 Regular polynomial processes and representations in fi-nite dimensions

The goal of this section is to make use of the fact that linear maps from Polk 7→ Polkcan be represented as matrix-vector operations as we have seen in the previous chapter.We will see that the study of an evolution system on Polk of a k-polynomial process(X, τ) can be reduced to the study of matrix evolution systems (MESs) as introduced inChapter 4.

Recall that for all polynomials f ∈ Polk, we have

f(x) = ιk(f)>Hβk(x) = f>Hβk(x), ∀x ∈ Rd.

Note that we indicate f = ιk(f) by using bold letters. In the sequel we could use any basisof Polk, but that would not gain us any generality, hence we keep our notation simpleby keeping the conventions and notations we made in the previous chapter regarding thebases βl, l ∈ N0, unless otherwise specified.

Consider a k-polynomial process (X, τ). A function P : T → MN (R) is the uniquerepresentation of τ wrt the vector space isomorphism ιk, denoted by an abuse of notationas ιk(τ) = P , if for all (s, t) ∈ T and f ∈ Polk we have

(τs,tf)(x) = (ιk(τs,tf))>Hβk(x) = (P (s, t)f)>Hβk(x), ∀x ∈ Rd.

We have seen in the previous chapter that since τ is an evolution system on Polk, Pis a MES on MN (R). Note that τ has the additional property τs,t(Poll) ⊂ Poll forl ∈ 0, . . . , k, which is inherited by P in the sense that it each P (s, t) has upper blocktriangular form since the basis βk under the conventions of the previous chapter satisfies

deg(βi) < deg(βj)⇒ i < j,

hence τ is degree preserving on Polk in the terminology introduced in the previous chapterand equivalently so is P . In that case the blocks correspond to the indices belonging tothose basis functions with the same degree, hence there are k + 1 blocks. Equivalentlyto P = ιk(τ) we will sometimes write τ = ι−1

k (P ) to indicate that τ is defined by P andnot the other way around.

Definition 6.23. We call (X,P ) k-polynomial process, if P is a MES on MN (R)and for τ = ι−1

k (P ) we have that (X, τ) is a k-polynomial process.

Since we have fixed the basis of Polk for all k, it is always clear which isomorphism ιkis meant. We will now define regularity properties for k-polynomial processes throughregularity of P as a MES.

Page 96: Polynomial Semimartingales and a Deep Learning Approach to

86 CHAPTER 6. POLYNOMIAL SEMIMARTINGALES

Definition 6.24. We call a k-polynomial process (X, τ) (resp. (X,P )) càdlàg , if P isa càdlàg MES (which is equivalent to saying τ is càdlàg ). If P is even regular, we callthe first and the latter regular. Further, we call (X,P, γ,A) a proper k-polynomialprocess, if (X,P ) is a k-polynomial process such that (P, γ,A) is a proper MES onMNk(R).

Remark 6.25. Recall that for a proper MES (P, γ,A), by definition we have that P iscàdlàg . Hence, if (X,P, γ,A) is a proper k-polynomial process (wrt. a fixed basis β),then (X,P ) is in particular a càdlàg k-polynomial process and by that also (X, τ) forτ = ι−1

k (P ).

We will later see that under these regularity conditions and additional assumptions, anotion of extended generator for X similar to that for Markov processes can be repre-sented on Polk precisely by the generator of a proper MES coming from a k-polynomialprocess X. This will be the topic of the next section.

6.3 Extended generators for k-polynomial processes

In this section we want to extend the concept of infinitesimal description by means ofextended generators of a stochastic process to our setting. As noted in the introduction,the extended generator plays a central role in e.g. Cuchiero et al. [Cuc+12] and Filipovićet al. [Fil+17]. So far, the study of finite dimensional polynomial processes has beenunder regularity assumptions that imply stochastic continuity as a consequence of theMarkov inequality.

In the existing literature for polynomial processes, extended generators have been definedas mappings G (or as a collection of mappings (Gs)0≤s, c.f. C. A. Hurtado [CAH17])that map functions f coming from a certain domain to functions Gsf such that

f(Xt)− f(Xs)−∫ t

s(Guf)(Xu)du ∈Mloc.

Notice that the integral is wrt the Lebesgue measure, which requires that f(X) is quasileft continuous. This implies certain properties of f(X), compare Jacod and Shiryaev[JS13, Propositions 2.26 and 2.9]. Hence, the only polynomials that lie in the domain ofthe extended generator in case of stochastic discontinuity, or ∆Xt 6= 0, are the constantpolynomials. We however want to extend the class of polynomial processes to includeprocesses with deterministic jump-times in the spirit of Keller-Ressel et al. [KR+18]where this was done for the affine case. This requires a more general notion of anextended generator, we will introduce it further down and ask the reader to indulge us,will make it necessary for us to be able to deal with expressions of the form

E [f(Xt−) | Fs] rather then E [f(Xt) | Fs] ,

Page 97: Polynomial Semimartingales and a Deep Learning Approach to

CHAPTER 6. POLYNOMIAL SEMIMARTINGALES 87

where f is a polynomial with deg(f) ≤ k and (X, τ) a k-polynomial process. In case ofquasi-left continuity, i.e. ∆Xt = 0, both expressions coincide a.s., but when this is notthe case, we can so far only compute expressions as on the right side using τ . assumethat (X, τ) is a càdlàg k-polynomial process. Then, if we assume that we can apply thedominated convergence theorem to the limit below, we get for s < t using continuity ofpolynomials that

τs,t−f(Xs) = limε↓0

τs,t−εf(Xs) = limε↓0E [f(Xt−ε) | Fs] = E [f(Xt−) | Fs] [P ], (6.19)

where the limit exists by our càdlàg assumption and the right hand side implicitly bythe dominated convergence theorem. This is precisely what we need. Alternatively,the generalized dominated convergence theorem (see Theorem 3.3) could be used toobtain the same result using the following well known fact about convergence of randomvariables.

Proposition 6.26. Let (Xn) be a sequence of random variables such that Xn → Y a.s.

and XnL1→ Y ′. Then Y ′ = Y [P ].

Therefore, the property we consider is uniform integrability for the sets

Hft = f(Xs) : 0 ≤ s ≤ t ,

for all t ≥ 0 and f ∈ Polk. This motivates the following definition.

Definition 6.27. We call a càdlàg k-polynomial process (X, τ) uniformly integrable,if for each t ≥ 0 and f 3 Polk, the set Hft is uniformly integrable. Similarly, we callproper k-polynomial process (X,P, γ,A) uniformly integrable, if (X, τ) is uniformlyintegrable for τ = ι−1

β (P ).

Remark 6.28. Note that for k-polynomial processes, we have

E[∥∥Hβk(Xt)

∥∥RNk

]<∞ ∀t ≥ 0, (6.20)

since by definition E [|βi(Xt)|] <∞ for all i ≤ Nk.

The usual conditions of our stochastic basis guarantee that

Fs =⋂ε>0

Fs+ε.

Unlike intersections however, we do not have that unions of σ-fields are again σ-fields.Hence we define the left limit of our filtration the following way,

Fs− := σ(⋃ε>0

Fs−ε). (6.21)

Page 98: Polynomial Semimartingales and a Deep Learning Approach to

88 CHAPTER 6. POLYNOMIAL SEMIMARTINGALES

Lemma 6.29. Let (X, τ) be a càdlàg k-polynomial process. Then the following holds forany f ∈ Polk:

(i) For s < t and (X, τ) uniformly integrable,

E [f(Xt−)|Fs] = τs,t−f(Xs) [P ]. (6.22)

(ii) For 0 < s ≤ t,E [f(Xt)|Fs−] = τs−,tf(Xs−) [P ]. (6.23)

(iii) For s < t and (X, τ) uniformly integrable, τs−,t−(Xs−) exists and

E [f(Xt−)|Fs−] = τs−,t−f(Xs−) [P ]. (6.24)

Proof. (i) Note that the limit τs,t−f exists by the càdlàg property for all f ∈ Polk,hence the limit below exists

limε↓0

τs,t−εf(Xs) = τs,t−f(Xs) [P ].

Next, note that by assumption we have that HfT is uniformly integrable for some

T > t. Since f is continuous and X a càdlàg stochastic process, we have f(Xt−ε)[P ]→

f(Xt−) as ε ↓ 0. Hence, by Theorem 3.3, we get f(Xt−ε)L1→ f(Xt−) as ε ↓ 0. By

Lemma 3.5, we have E [|f(Xt−)|] < ∞, hence E [f(Xt−) | Fs] is well defined. ByJensen’s inequality and the tower property we get

limε↓0E [ |E [f(Xt−ε) | Fs]− E [f(Xt−) | Fs] | ]

= limε↓0E [ |E [f(Xt−ε)− f(Xt−) | Fs] | ]

≤ limε↓0E [ | f(Xt−ε)− f(Xt−) | ] = 0,

(6.25)

which showsE [f(Xt−ε) | Fs]

L1−→ E [f(Xt−) | Fs] as ε ↓ 0.

Putting these points together, by page 87 we conclude

τs,t−f(Xs) = E [f(Xt−) | Fs] [P ].

(ii) Let (sn) be an increasing sequence with sn ↑ s. Define

Yn := E [f(Xt) | Fsn ] .

Then, (Yn) forms a martingale wrt. the filtration (Fsn)n≥1 and since f(Xt) ∈L1(P,Ω,F) by the k-polynomial property, Lévy’s upwards convergence theorem

Page 99: Polynomial Semimartingales and a Deep Learning Approach to

CHAPTER 6. POLYNOMIAL SEMIMARTINGALES 89

(c.f. Kallenberg [Kal06, Theorem 7.23]) yields limn→∞ Yn = E [f(Xt) | Fs−] =:Y∞ [P ]. Further, we have

Yn = τsn,tf(Xsn).

Fix an arbitrary basis β of Polk and with P = ι−1β (τ). Then a.s. we have

Yn = (P (sn, t)f)>Hβ(Xsn).

By our càdlàg assumptions, P (sn, t) → P (s−, t) and Hβ(Xsn) → Hβ(Xs−) [P ].Hence,

Y∞ = (P (s−, t)f)>Hβ(Xs−),

and since ιβ(τs−,tf) = P (s−, t)f for all f ∈ Polk, we have

τs−,tf(Xs−) = Y∞ = E [f(Xt) | Fs−] .

(iii) We need to show for sn ↑ s and tm ↑ t, that

limn→∞

limm→∞

τsn,tm = limm→∞

limn→∞

τsn,tm .

Fix a basis β of Polk. Then for P = ιβ(τ), this is equivalent to the limits beingequal for P (sn, tm) instead of τsn,tm . Since P is a càdlàg MES, this follows fromLemma 4.22. We now get by (ii) that

limm→∞

limn→∞

E [f(Xtm) | Fsn ] = limm→∞

E [f(Xtm) | Fs−] ,

and the rhs limit exist by the same arguments as in (i). Combining this with theprevious result, we get just as we did in (ii),

E [f(Xt−) | Fs−] = τs−,t−f(Xs−) [P ]. (6.26)

Finally, we have

limn→∞

limm→∞

E [f(Xtm) | Fsn ] = limn→∞

E [f(Xt−) | Fsn ] ,

and by the same Lévy upward convergence theorem argument as in (ii), we get thatthis limit exists and is a.s. equal to the rhs of (6.26), which finishes the proof.

The next definition will play the central role in our discussions of polynomial processes.In particular, it will allow us to show a tight connection between the extended generatorof a stochastic process that we will introduce further down and the extended generatorof a MES in the context of k-polynomial processes.

Recall the following definition from Chapter 4. Let µ be a measure on R>0 and G :R>0 → Mn(R) for some n ∈ N. We define the running essential supremum of G wrt µ as

‖G‖µ,t := ess supµ

(‖G(·)‖∞ 1·≤t),

where the norm is the operator norm wrt some norm on Rn. Note that a condition of theform ‖G‖µ,t <∞ is independent of the concrete choice of norm that defines the operatornorm.

Page 100: Polynomial Semimartingales and a Deep Learning Approach to

90 CHAPTER 6. POLYNOMIAL SEMIMARTINGALES

Definition 6.30 (bounded k-polynomial process). We call a collection (X,P,A,G) abounded k-polynomial process, if there exists a family of measures γ such that(X,P, γ,A) is a proper k-polynomial process and G is an extended generator of the MESP such that for all t ≥ 0 we have ‖G‖dA,t <∞, where dA is the measure induced by theincreasing càdlàg function A.

We now introduce the concept of extended generators for stochastic processes as donein Çinlar et al. [Çin+80]. The original introduction in this generality can be found inKunita [Kun69, Definition 3.1].

Definition 6.31. Fix a stochastic basis (Ω,F ,F, P ). Let X be an adapted càdlàg processon this basis and let A ∈ V+. Denote by D the set of all f ∈ E(Rd) for which there existsa function g ∈ E(R>0 × Rd), such that

Mft = f(Xt)− f(X0)−

∫(0,t]

g(u,Xu−)dAu ∈Mloc. (6.27)

We call a mapping G that maps f 7→ g a pre-version of the extended A-generator ofthe process X with domain DX,A.

By definition, the domain is independent of the choice of a pre-version of the extendedA-generator.

Note that in our definition we speak of pre-versions of an extended generator. Let usdiscuss this. First, we only talk about a version of an extended generator, since for twofunctions g, g′ such that g(u, ·) = g′(u, ·) outside a dA-zero set we have that if f 7→ gsatisfies (6.27), then so does f 7→ g′. Hence we can at best expect uniqueness outsidedA-zero sets. Further, we call it a pre-version, since just as in the discussion of the mapsτs,t, we can not in general expect linearity of a map G since it is well possible that twofunctions g 6= g′ exist and yet the two processes u 7→ g(u,Xu−) and u 7→ g′(u,Xu−) areequal up to evanescence. As we did with τs,t however, we can always construct a linearmodification on Polk. This is demonstrated in the next proposition.

Proposition 6.32. Assume the setting in Definition 6.31. Assume for each f ∈ Polkthere exists a function g ∈ E(R>0×Rd) such that (6.27) holds. Then, there exists a linearmap G : Polk → B(R>0 × Rd) such that G is the restriction on Polk of a pre-version ofthe extended A-generator of X and Polk ⊂ DX,A.

Proof. The statement Polk ⊂ DX,A is part of the assumption. We write N = Nk. Fixthe basis βk = β1, . . . , βN of Polk. By assumption, for each i = 1, . . . , N , there existsa gi ∈ B(R>0 × Rd) such that (6.27) holds. Hence, define G(βi) := gi. We now extendthe definition of G to all of Polk. To that extend, let f ∈ Polk be arbitrary with

f =

N∑i=1

ciβi.

Page 101: Polynomial Semimartingales and a Deep Learning Approach to

CHAPTER 6. POLYNOMIAL SEMIMARTINGALES 91

Then we define

G(f) =

N∑i=1

ciG(βi).

By construction, G is a linear mapping. It is further a pre-version of the extended A-generator of X restricted to Polk due to linearity of the Lebesgue integral and the factthat local martingales are defined as the localization of uniformly integrable martingaleswhich by Protter [Pro05, Chapter I Theorem 18] yields that the finite sum of localmartingales is again a local martingale.

Remark 6.33. Note that if Mft is a local martingale, it follows that

Mfs,t = f(Xt)− f(Xs)−

∫(s,t]

(Gf)(u,Xu−)dAu (6.28)

is a local martingale with respect to the filtration Fs = (Ft)t≥s, since Mfs,t = Mf

t −Mfs

and the sum of two local martingales is again a local martingale.

Remark 6.34. Let us stress that by our definition, the extended A-generator does notneed to be unique, even modulo agreeing outside a dA-zero set. This is because we chooseas function domains Rd regardless of the support of the underlying process. If how-ever, one would make the identification of processes that are indistinguishable, one wouldget uniqueness modulo agreeing outside dA-zero sets since local martingales are specialsemimartingales. Hence they have a unique (up to indistinguishability) canonical decom-position into local martingale part and finite variation part. This viewpoint is taken e.g.in Çinlar et al. [Çin+80], where the extended A-generator is a mapping between processesinstead of deterministic functions. They also consider the case where A is itself a randomprocess in the context of extended random generators. For our discussion however, wewill only consider the case A ∈ V+

Ω. We finish this remark by noting that the reason we

take our approach is that it will allow for a simplified discussion. Otherwise, we wouldhave to deal with situations where we identify two polynomials at a time point t but notat a different time point t′ since the processes we consider can have a time dependentsupport.

Remark 6.35. Note that DX,A is a linear space since the finite linear combination oflocal martingales is a local martingale. If one allows for the axiom of choice (and theequivalent lemma of Zorn), one can construct a linear pre-version of the extended A-generator the same way we did for the finite dimensional case in Proposition 6.32. Indeed,by Zorn’s lemma, there exists a vector space basis of DX,A. By the axiom of choice, wecan define a mapping on the basis as we did in the finite dimensional setting. Then, oneextends canonically to the linear hull, noting that an arbitrary element of DX,A has afinite representation as a linear combination, extending the local martingale property tothe extended map. The existence of a linear pre-version leads to the following definition.

Definition 6.36. We call a pre-version of the extended A-generator G a version of theextended A-generator (or in short a version of GX,A), if it is a linear map between

Page 102: Polynomial Semimartingales and a Deep Learning Approach to

92 CHAPTER 6. POLYNOMIAL SEMIMARTINGALES

domain and range. If it is linear as a restriction on a subspace S, we say G|S is a versionof the extended A-generator on S or in short G|S is a version of GX,A on S. If G is onlydefined on S, then we say G is a version of GX,A on S.

Recall the notation from the previous chapter, where for a matrix function G : R>0 →MNk(R), we define an operator G : R>0 × Polk → Polk pointwise through

(u, f) 7→ Guf(·) = (G(u)ιk(f))>Hβk(·).

This was denoted by ιk(G) = G. Since we will be using matrix functions such as G todefine versions GX,A on Polk, it will be convenient to interpret a mapping G defined byG as a mapping G : Polk → E(R>0 × Rd), by

Gf(u, x) = Guf(x), u ∈ R>0, x ∈ Rd.

In particular, we get that for each u ∈ R>0 fixed, we have

Gf(u, ·) = Guf(·) = (G(u)ιk(f))>Hβk(·) ∈ Polk.

We define the function space Polk(R>0×Rd) as the set of all functions g : R>0×Rd → Rsuch that for fixed u ∈ R>0, g(u, ·) ∈ Polk and for all x ∈ Rd, u 7→ g(u, x) is Borelmeasurable. In the sequel, we write G ιk→ G to indicate that G is defined by G as amapping G : Polk → Polk(R>0×Rd). An important observation is that by the definitionof G, if G is degree preserving on Polk, then for any l ≤ k, u ∈ R>0 we have Gf(u, ·) ∈ Pollwhenever f ∈ Poll.

Note that for all functions G : Polk → Polk(R>0×Rd), which are linear for the pointwisesum on Polk and Polk(R>0×Rd), there exists a measurable function G : R>0 → MNk(R)

such that G defines G as above, i.e. G ιk→ G. To see that, note that by linearity we onlyneed to look at the basis polynomials of Polk. Hence, there exist functions aij : R>0 → Rfor i, j ∈ 1, . . . , Nk with

(Gβi)(u, x) =

Nk∑j=1

aij(u)βj(x), u ∈ R>0, x ∈ Rd.

Note that ιk(βi) ∈ RNk is the vector with the i-th coordinate being one and the rest zero.Hence the elements ifG(u) are given by aij(u) = (G(u))ij . The measurability claim is nowobvious since all βi are linear independent continuous and hence measurable functionsand the product Borel σ-algebra is the Borel σ algebra wrt the product topology ofR>0 × Rd. Hence, G is Borel measurable if and only if G is Borel measurable.

The next lemma shows that we get L1 boundedness for the setsHft of a stochastic processif the extended A-generator has a certain polynomial structure. The idea of the proofis to use the Gronwall inequality in the spirit of the proof of Cuchiero et al. [Cuc+12,Theorem 2.10].

Page 103: Polynomial Semimartingales and a Deep Learning Approach to

CHAPTER 6. POLYNOMIAL SEMIMARTINGALES 93

Lemma 6.37. Let k ≥ 0 be an even number, βk = β1, . . . , βNk our fixed basis of Polkand A ∈ V+

Ω. Let X be a càdlàg adapted process such that Polk ⊂ DX,A. Assume there

exists a measurable function

G : R>0 → MNk(R) with ‖G‖dA,t <∞ ∀t ≥ 0,

such that the map G : Polk → Polk(R>0 × Rd) defined by G ιk→ G, i.e.

f 7→ Gf(u, x) = (G(u)f)>Hβk(x), (u, x) ∈ R>0 × Rd,

is a version of GX,A on Polk. Then for any f ∈ Polk and T > 0,

(i) there exists a constant KfT , depending on f and T such that

ess supu∈[0,T ], dA

|Gf(u, x)| ≤ KfTF (x),

for a polynomial F ∈ Polk that does not depend on the choice f and T .

(ii) If E[‖X0‖kRd

]<∞, the sets HfT are L1-bounded, i.e. for all T ≥ 0,

sup0≤t≤T

E [|f(Xt)|] <∞ ∀f ∈ Polk.

(iii) For all f ∈ Polk, the process g defined via gt =∫

(0,t] Gf(u,Xu−)A(du) is càdlàg,adapted, predictable and belongs to V.

Proof. Note that the case k = 0 is trivial since then all polynomial in Polk are constant.We set N = Nk. Recall fi(x) = xi. Hence, we have fki ∈ Polk. Let us first show (i). Asin [Cuc+12], define the polynomial F ∈ Polk by

0 ≤ F (x) := 1 +d∑i=1

fi(x)k.

For a polynomial f ∈ Polk we define the function |f | by

f(x) =N∑i=1

ciβi(x)⇒ |f |(x) :=N∑i=1

|ci| |βi(x)| .

Note that for any f ∈ Polk, there is a K <∞ such that

|f(x)| ≤ |f |(x) ≤ KF (x). (6.29)

Define the polynomial λ ∈ Polk by ιk(λ) = (1, . . . , 1)> ∈ RN . Define

c∞ = maxi∈1,...,N

|ci| , hence c∞ = ‖ιβ(f)‖∞ = ‖f‖∞ .

Page 104: Polynomial Semimartingales and a Deep Learning Approach to

94 CHAPTER 6. POLYNOMIAL SEMIMARTINGALES

Further, let Kλ be the constant in (6.29) for the polynomial λ. By equivalence of normson RN , we can pick any norm, so we pick the norm ‖·‖1 and the associated operatornorm. It then holds

‖f‖1∥∥Hβkk(x)

∥∥1

=

(N∑i=1

|ci|

) N∑j=1

|βi(x)|

=

N∑i,j=1

|ci| |βi(x)|

N∑i,j=1

c∞ |βi(x)|

= Nc∞|λ|(x) ≤ N ‖f‖∞KλF (x).

Noting that ‖G‖dA,t is the running essential supremum wrt the measure dA, we cancompute for dA-almost any u ∈ (0, t],

ess supu∈[0,T ], dA

| Gf(u, x) | = ess supu∈[0,T ], dA

∣∣∣(G(u)f)>Hβk(x)∣∣∣ ≤ ‖G‖dA,T ‖f‖1 ∥∥Hβk(x)

∥∥1

≤ ‖G‖dA,T N ‖f‖∞KλF (x).

Hence, we have shown (i) for KfT = ‖G‖dA,T N ‖f‖∞Kλ. Regarding (ii), note that

we can choose f = F so that c∞ = ‖ιβ(F )‖∞ = ‖F‖∞. Recall that by assumption,Polk ⊂ DX,A. Since F ∈ Polk, we get that MF is a local martingale with MF

0 = 0. LetTn be a localizing sequence of MF . Define the stopping times

Tn = t ≥ 0 : F (Xt) > n .

Since X is càdlàg and F continuous, we get F (X(t∧Tn)−) ≤ n. Because (MF )Tn is auniformly integrable martingale, by Doob’s optional stopping theorem we have that forTn := Tn ∧ Tn, (MF )Tn is a uniformly integrable martingale, implying Tn is a localizingsequence of MF . Let T > t. Then, since E

[MFt∧Tn

]= E

[MF

0

]= 0, we get

E [F (Xt∧Tn)] = E [F (X0)] + E

[∫(0,t∧Tn]

GF (u,Xu−)A(du)

]

≤ E [F (X0)] + E

[∫(0,t∧Tn]

| GF (u,Xu−)|A(du)

]

≤ E [F (X0)] + ‖G‖dA,tNc∞KλE

[∫(0,t∧Tn]

F (Xu−)A(du)

]

≤ E [F (X0)] +KFT E

[∫(0,t]

F (X(u∧Tn)−)A(du)

],

where in the last step we use that F ≥ 0 and KFT is the constant from (i). Since

dA((0, t]) < ∞ and Tn = Tn ∧ Tn, we have for all u ∈ (0, t] that 0 ≤ F (X(u∧Tn)−) ≤ n,which justifies an application of the Fubini theorem, yielding

E [F (Xt∧Tn)] ≤ E [F (X0)] +KFT

∫(0,t]

E[F (X(u∧Tn)−)

]A(du).

Page 105: Polynomial Semimartingales and a Deep Learning Approach to

CHAPTER 6. POLYNOMIAL SEMIMARTINGALES 95

By assumption, E[‖X0‖kRd

]< ∞ which by Lemma 6.1 yields E [F (X0)] < ∞. We can

therefore apply Gronwall’s inequality (Theorem 4.55) which gives us for any 0 ≤ t ≤ T ,

E [F (Xt∧Tn)] ≤ E [F (X0)] exp(KFT A(t)

).

Let now g ∈ Polk be arbitrary and denote by Kg the constant in Equation (6.29) for g.By the lemma of Fatou we now have

E [|g(Xt)|] ≤ Kg E [F (Xt)] = Kg E[

limn→∞

F (Xt∧Tn)]

≤ Kg lim infn→∞

E [F (Xt∧Tn)]

≤ Kg E [F (X0)] exp(KFT A(T )

)<∞.

(6.30)

Since the upper bound does not depend on t and g was arbitrary, this yields

sup0≤t≤T

E [|f(Xt)|] <∞ for all f ∈ Polk, (6.31)

as claimed. Let us now show (iii). Define the process g(u) = (Gf)(u,Xu−). By (i), thereis a constant Kt such that we have

ess supu∈[0,t],dA

|Gf(u, x)| ≤ KtF (x).

Hence we have

(|g| • A)t =

∫(0,t]|g(u)|A(du) =

∫(0,t]|Gf(u,Xu−)|A(du)

≤ Kt

∫(0,t]

F (Xu−)A(du) ≤ Kt

(supu∈[0,t]

F (Xu)

)A(t).

Since F is continuous and X is a càdlàg process, we get that the supremum above isfinite for almost all ω ∈ Ω. Since A is càdlàg , A(t) is finite for all t ≥ 0. Hence we haveshown (|g| • A)t(ω) <∞ for all t ≥ 0 for all ω outside a null set. By Proposition I.3.5 in[JS13] to positive and negative part of g separately we now get

g+ • A ∈ V+ and g− • A ∈ V+, (6.32)

hence g • A ∈ V and by that càdlàg , where we have that g+ and g− are adapted becauseX is adapted.

We are left to show that g • A is predictable. Because g(u) = (Gfi)(u,Xu−) where(Gfi)(u, ·) is deterministic and hence predictable, and Xu− is càglad and therefore pre-dictable, we get that g is a predictable process. Similarly, positive and negative parts ofg are also predictable since (Gf)(u, ·)± are deterministic. This shows that again by theproposition cited above, that

(g • A)· =

∫(0,·]Gf(u,Xu−)dAu (6.33)

is a predictable finite variation process. By definition, this process starts at zero whichimplies special semimartingality.

Page 106: Polynomial Semimartingales and a Deep Learning Approach to

96 CHAPTER 6. POLYNOMIAL SEMIMARTINGALES

We will in the sequel require to be able to integrate over paths of a process given byconditional expectations, i.e. for t ≥ s,

t 7→ E[Hβk(Xt) | Fs

].

By linearity this is equivalent to being able to integrate over paths t 7→ E [f(Xt) | Fs]for all f ∈ Polk. Since conditional expectations are defined up to a null set and the un-countable union of null sets can have any measure between zero and one (it can even benon-measurable), the process above is a priory not well defined.1 If however, two modifi-cations of the process above exist and they are both càd, then they are indistinguishable,c.f. [Pro05, Theorem I.2]. Hence, if we can show existence of a càd modification of theprocess, then we can always pick such a version and integration over paths are well de-fined. In case of càdlàg k-polynomial processes, the situation is by assumption alreadyeasy since a closer look reveals that the existence of a càdlàg modification is part of thedefinition of the process. This leads to the following lemma.

Lemma 6.38. Let (X, τ) be a càdlàg k-polynomial process. Then, there exists a unique(up to indistinguishability) càdlàg modification of the process

[s,∞) 3 t 7→ E[Hβk(Xt) | Fs

].

Proof. Let f ∈ β be any basis polynomial of Polk. Hence, for all t ≥ s, we have

E [f(Xt) | Fs] = τs,tf(Xs) [P ].

Hence, τs,·f(Xs) has càdlàg paths on [s,∞) for all ω ∈ Ω since τ is càdlàg and determin-istic. Hence it is a modification of E [f(X·) | Fs], which is càdlàg and therefore uniqueup to indistinguishability, compare Theorem A.3 in the appendix.

We can now establish a link between the matrix-evolution-system induced by a regulark-polynomial process and the extended generator of the process. We will see that theextended generator restricted to Polk is in fact given by the extended generator of thematrix-evolution-system. That will allow us to relate the mapping τ of a k-polynomialprocess (X, τ) to the extended generator of the process X.

The next theorem is the first part of our version of [Cuc+12, Theorem 2.7].

Theorem 6.39. Let (X,P,A,G) be a bounded and uniformly integrable k-polynomialprocess for k ≥ 0. Then Polk ⊂ DX,A and G defined by G ιk→ G is a version of GX ,A onPolk. Further, for all f ∈ Polk, the process Mf defined by

Mft = f(Xt)− f(X0)−

∫(0,t]Gf(u,Xu−)A(du)

is a true martingale. If k > 0, then X is a special semimartingale. If G preserves thedegree on Polk, then G(Poll) ⊂ Poll for all l ≤ k.

1Compare the remarks made in the very beginning of Protter [Pro05, Chapter I.1]

Page 107: Polynomial Semimartingales and a Deep Learning Approach to

CHAPTER 6. POLYNOMIAL SEMIMARTINGALES 97

Proof. For k = 0, all f ∈ Polk are constant polynomials and the statement is obviouslytrue. Hence, assume k ≥ 1. We first show for arbitrary f ∈ Polk

f(Xt)− f(X0)−∫

(0,t]Gf(u,Xu−)A(du) (6.34)

is a true martingale, which implies Polk ⊂ DX,A and G is a version of GX,A on Polk. Notethat since (X,P,A,G) is uniformly integrable, by definition the sets Hft are uniformlyintegrable for all f ∈ Polk and t ≥ 0 and ‖G‖dA,t < ∞ for all t ≥ 0 holds by theboundedness assumption. By Lemma 6.38, there exists a unique càdlàg modification of[s,∞) 3 t 7→ E [Hβ(Xt) | Fs], hence we shall always pick this modification. Further,by uniform integrability of the sets Hβi for all i ∈ 1, . . . , N, Lemma 6.29 (i) impliesthat the left limit in each point of this modification is given by E [Hβ(Xu−) | Fs], i.e inparticular we have for all f ∈ Polk,

E [f(Xt−) | Fs] = (P (s, t−)f)>Hβk(Xs).

We start with integrability. We compute∫(s,t]

E [ |Gf(u,Xu−)| ]A(du) =

∫(s,t]

E[ ∣∣∣(G(u)f)>Hβ(Xu−)

∣∣∣ ]A(du)

≤ ‖G‖dA,t ‖f‖1∫

(s,t]E[‖Hβ(Xu−)‖1

]A(du)

= ‖G‖dA,t ‖f‖1∫

(s,t]E

[N∑i=1

|βi(Xu−)|

]A(du)

= ‖G‖dA,t ‖f‖1N∑i=1

∫(s,t]

E [ |βi(Xu−)| ]A(du)

≤ ‖G‖dA,t ‖f‖1A(t)

N∑i=1

sup0<u≤t

E [ |βi(Xu−)| ] .

(6.35)

Since βi ∈ Polk, we have that Hβit is uniformly integrable. Hence, by Lemma 3.5, thesum is finite. By Fubini-Tonelli, we get∫

(s,t]E [ |Gf(u,Xu−)| ]A(du) = E

[ ∫(s,t]|Gf(u,Xu−)|A(du)

]<∞.

Now, since f ∈ Polk, we get again by Lemma 3.5 that E [|f(Xt)|] and E [|f(X0)|] arefinite. Hence,

E[|Mf

t |]≤ E [|f(Xt)|] + E [|f(X0)|] + E

[ ∫(s,t]|Gf(u,Xu−)|A(du)

]<∞.

Page 108: Polynomial Semimartingales and a Deep Learning Approach to

98 CHAPTER 6. POLYNOMIAL SEMIMARTINGALES

Next we show the martingale property. Let 0 ≤ s ≤ t. Then,

E[Mft | Fs

]= E

[f(Xt)− f(X0)−

∫(0,t]

(G(u)f)>Hβ(Xu−)A(du)|Fs

]=

= E [f(Xt)|Fs]− f(X0)− E

[∫(0,t]

(G(u)f)>Hβ(Xu−)A(du)|Fs

].

(6.36)

Recalling, that the càdlàg MES P represents τ wrt. the basis β, and an application ofthe Fubini-Tonelli theorem for conditional expectations (c.f. Schilling [Sch17, Theorem27.17]) that we already justified above yields

E[Mft | Fs

]−Mf

s = (P (s, t)f)>Hβ(Xs)− (P (s, s)f)>Hβ(Xs)

−∫

(s,t]E[(G(u)f)>Hβ(Xu−)|Fs

]A(du),

(6.37)

where P is a MES and hence P (s, s) = IdN . By linearity we have that (6.37) is equal to

f>(P (s, t)>Hβ(Xs)− P (s, s)>Hβ(Xs)

−∫

(s,t]G(u)>E [Hβ(Xu−)|Fs]A(du)

).

(6.38)

Recalling, that for the i-th element we have [Hβ(x)]i = βi(x), and hence [ιβ(βi)]j =1i=j, we have

E [βi(Xu−) | Fs] = (Pi)>Hβ(Xs),

where Pi ∈ RN is the i-th column of the matrix P (s, u−). This then gives us

E [Hβ(Xu−) | Fs] = P (s, u−)>Hβ(Xs).

Using this, we get that Equation (6.38) is equal to

f>

(P (s, t)>Hβ(Xs)− P (s, s)>Hβ(Xs)−

∫(s,t]

G(u)>P (s, u−)>Hβ(Xs)A(du)

)

= f>

(P (s, t)> − P (s, s)> −

∫(s,t]

G(u)>P (s, u−)>A(du)

)Hβ(Xs)

= f>

(P (s, t)> − P (s, s)> −

∫(s,t]

P (s, u−)G(u)A(du)

)>Hβ(Xs) = 0,

(6.39)

where we use that G is an extended generator of the MES P and hence the bracketvanishes. Hence, t 7→Mf

t is a true martingale with respect to the filtration F. Note that

Page 109: Polynomial Semimartingales and a Deep Learning Approach to

CHAPTER 6. POLYNOMIAL SEMIMARTINGALES 99

this also implies that for all s ≥ 0, we have that t 7→ Mfs+t −M

fs is a martingale wrt.

the filtration Fs = (Ft+s≥s)t≥0. Indeed we have already shown that

E[∣∣∣Mf

s+t −Mfs

∣∣∣] <∞by applying the triangle inequality. We are left to show that X is a special semimartin-gale.

To see that X is indeed a special semimartingale, define fi(x) = xi, the projection to thei-th coordinate. Since X is a special semimartingale on Rd if and only if each componentis a special semimartingale, we need to show that for all i ∈ 1, . . . , d we have thatfi(X) is special. Since fi ∈ Polk we have fi ∈ DX,A. This implies that fi(X) has thedecomposition

fi(X0) +

(fi(X)− fi(X0)−

∫(0,·]

(Gfi)(u,Xu−)A(du)

)+

∫(0,·]

(Gfi)(u,Xu−)A(du).

(6.40)

and we are left to show (for the semimartingale property) that the integral on the rightdefines a process belonging to V. That means we need to show that it is a càdlàgprocess starting from zero which is adapted and of finite variation. If we then showthat the process is also predictable, we get that fi(X) is a special semimartingale for alli = 1, . . . , d. But that is precisely the statement of Lemma 6.37 (iii) for f = fi. Finally,if now G is degree preserving on Polk, then G(Poll) ⊂ Poll for all l ≤ k by the way G wasdefined via G.

Remark 6.40. In the proof above, we can see that the uniform integrability assumptionfor the sets Hβit for all t ≥ 0 would imply by Equation (6.35), that

E [Var ((g • A)t)] = E [(|g| • A)t] <∞,

implying (g • A) ∈ Aloc. The special semimartingale property would then alternativelyfollow from Jacod and Shiryaev [JS13, Proposition I.4.23].

We will next give the reverse statement of the previous theorem, finishing our versionof [Cuc+12, Theorem 2.7]. Note that in [Cuc+12], the authors prove an intermediaryresult. Since they consider a homogeneous setting, they only need to assume continuityof the semigroup to deduce that the representing MSG has an infinitesimal generator,c.f. Theorem 4.3. In our setting, for the one direction, we needed to a priory assumethe existence of an extended generator since we do not have a general existence resultunder our regularity assumptions on τ , even though the existence follows already undervery mild assumptions, c.f. Theorem 4.35. In particular, since by definition the MES Pbelonging to a càdlàg k-polynomial process (X,P ) preserves the degree, Theorem 5.18yields that this generator can be chosen to be degree preserving. The situation

Page 110: Polynomial Semimartingales and a Deep Learning Approach to

100 CHAPTER 6. POLYNOMIAL SEMIMARTINGALES

Theorem 6.41. Consider a càdlàg process X on our stochastic basis and let k ∈ N0.Let A ∈ V+

Ω. Assume

(i) Polk ⊂ DX,A and there exists an on Polk degree preserving measurable functionG : R>0 → MNk(R), such that G defined by G ιk→ G is a version of GX,A on Polkwith ‖G‖dA,t <∞ for all t ≥ 0,

(ii) for any f ∈ Polk, the sets Hft are uniformly integrable for all t ≥ 0 and the processMf defined by

Mft = f(Xt)− f(X0)−

∫(0,t]Gf(u,Xu−)A(du)

is a true martingale.

Then, there exists a regular MES P such that G is the extended A-generator of P , and(X,P,A,G) is a bounded and uniformly integrable k-polynomial process. In particular, ifk > 0, then X is a special semimartingale.

Proof. Let us first argue that there exists a càdlàg modification of the process

[s,∞) 3 t 7→ E[Hβk(Xt) | Fs

]=: Fs(t).

By assumption (ii), for all i ≤ Nk, the process [s,∞) 3 t 7→ βi(Xt) has the decomposition

βi(Xt) = βi(X0) +Mft + Zt,

where Mft is a true martingale and Zt = (g • A)t for g(u) = Gβi(u,Xu−). Since by

assumption, all sets Hβit are uniformly integrable, we get

E [Var ((g • A)t)] = E [(|g| • A)t] <∞

for all t, implying Z ∈ Aloc, compare the computations in Equation (6.35) that are validhere, including the application of the theorem of Fubini-Tonelli. Hence, the process

[s,∞) 3 t 7→ E [βi(Xt) | Fs]

has a càdlàg modification. Hence, (Fs(t))i has a càdlàg modification for all i ≤ Nk, yield-ing the existence of a càdlàg modification of Fs(t). For the rest of this proof, we alwayspick such a càdlàg modification without further mentioning it. Let f ∈ Polk be arbitrary.Hence, Hft is uniformly integrable for all t ≥ 0 by assumption and by Lemma 3.5, weget E [|f(Xt)|] < ∞ for all t ≥ 0. We next show the polynomial property. Since wepicked the càdlàg version of Fs(t), we have that Fs(t−) exists a.s. and by the uniformintegrability assumption, Lemma 3.5 yields limε↓0 Fs(t − ε) converges as a L1-limit toE[Hβk(Xt−) | Fs

]. With Proposition 6.26, this gives us Fs(t−) = E

[Hβk(Xt−) | Fs

].

Page 111: Polynomial Semimartingales and a Deep Learning Approach to

CHAPTER 6. POLYNOMIAL SEMIMARTINGALES 101

Note that we have

E [f(Xt) | Fs] = E[f>Hβk(Xt) | Fs

]= f>Fs(t).

The true martingale assumption yields

E [f(Xt) | Fs] = f(Xs) + E

[∫(s,t]Gf(u,Xu−)A(du)

].

Again the same computations in Equation (6.35), combined with the uniform integrabilityassumption justifies an application of the Fubini-Tonelli theorem for the conditionalexpectation, yielding

f>Fs(t) = E [f(Xt) | Fs] = f(Xs) +

∫(s,t]

E [Gf(u,Xu−) | Fs]A(du)

= f>Hβk(Xs) +

∫(s,t]

E[(G(u)f)>Hβk(Xu−) | Fs

]A(du)

= f>Fs(s) +

∫(s,t]

(G(u)f)>Fs(u−)A(du)

= f>

(Fs(s) +

∫(s,t]

G(u)>Fs(u−)A(du)

),

where we used Fs(s) = Hβk(Xs). All these equalities are to be understood in an a.s.way. Since the above has to hold for all f ∈ RNk , we get that Fs(t) solves (MDE:A, V ) forV (u, x) = G(u)>x. By Proposition 4.53, solutions exist for all initial values w ∈ RNk anddefine a fundamental solution M(s, t) for s ≤ t. Hence we have Fs(t) = M(s, t)Hβk(Xs).Define P = M>. Then by the same theorem, we know that P is a regular MES suchthat (P, γ,A) is a proper MES with extended generator G. Since G is degree preservingon Polk by assumption, Theorem 5.19 yields that P preserves the degree on Polk. Hencewe get

E [f(Xt) | Fs] = (P (s, t)f)>Hβk(Xs),

with ι−1k (P (s, t)f) ∈ Poll for f ∈ Poll. Since this holds for all l ≤ k, and since ‖G‖dA,t <

∞ for all t ≥ 0 and Hft uniformly integrable for all f ∈ Polk and t ≥ 0, was part of theassumption, we get that (X,P,A,G) is a bounded and uniformly integrable k-polynomialprocess. If k > 0, Theorem 6.39 yields that X is a special semimartingale, finishing theproof.

Remark 6.42. In the theorem above, assumption (ii) requiresMft to be a true martingale

and the sets Hft need to be uniformly integrable for all f ∈ Polk. In light of Lemma 6.37,in the case where k is an even number, one could “sacrifice” one degree to achieve uniformintegrability of all Hft for f ∈ Polk−1 by means of Theorem 3.4. This is shown forillustration in the appendix, see Proposition B.1. However, a stronger result is possibleby adapting the ideas presented in Cuchiero et al. [Cuc+12, Theorem 2.10]. We will see

Page 112: Polynomial Semimartingales and a Deep Learning Approach to

102 CHAPTER 6. POLYNOMIAL SEMIMARTINGALES

that the assumption (ii) above is implied by the first, in case k is an even number. Justas in [Cuc+12], it will be convenient to formulate the following condition.

Condition 6.43. Let (Ω,F ,F, P ) be our stochastic basis and X a càdlàg and adaptedprocess. Let A ∈ V+

Ω. Then Polk ⊂ DX,A and there exists a measurable function G :

R>0 → MNk(R) that is degree preserving on Polk such that G defined by G ιk→ G is aversion of GX,A on Polk and ‖G‖dA,t <∞ for all t ≥ 0.

The goal is now to proof the next theorem that is our version of [Cuc+12, Theorem 2.10].

Theorem 6.44. Let k ≥ 0 be an even number and X a càdlàg and adapted process withE[‖X0‖kRd

]< ∞. Then there exists a proper MES (P, γ,A) in dimension Nk with ex-

tended generator G that preserves the degree on Polk, such that (X,P,A,G) is a boundedand uniformly integrable k-polynomial process if and only if X satisfies Condition 6.43for k and A ∈ V+

Ω.

The condition k 6= 1 is necessary as demonstrated in [Cuc+12, Remark 2.11 (ii)]. Therea counterexample for k = 1 is provided. The case k = 0 is a trivial statement. As anintermediary result, let us first proof the following lemma.

Lemma 6.45. Let k ≥ 2 be an even number and assume X satisfies Condition 6.43 for kand A ∈ V+

Ω. Assume further E

[‖X0‖kRd

]<∞. Then f(X)∗t ∈ L1 for all f ∈ Polk, t ≥ 0

and Mft is a true martingale for all f ∈ Polk.

Proof. We follow Cuchiero et al. [Cuc+12] and show E[(Mf )∗t

]<∞ for all t ≥ 0. Then

by Lemma 3.6 we get thatMf is a true martingale as claimed. In the same way, we showfor F ∈ Polk as defined in Lemma 6.37, that E [F (X)∗t ] <∞. This will then imply thatE [f(X)∗t ] < ∞ for all f ∈ Polk, implying that the sets Hft are all uniformly integrable.Note that all conditions of Lemma 6.37 are satisfied. As in that lemma, define F ∈ Polkwith

1 ≤ F (x) = 1 +d∑i=1

fi(x)k.

Further, recall the definition of a function |f |(x) for a polynomial f ∈ Polk, compareEquation (6.29). We first consider the case f ∈ Poll for l < k. Let p := k

l > 1. Then forall f ∈ Poll, there exists a g ∈ Polk such that

|f(x)|p ≤ |f |(x)p = |g|(x),

and by (6.29) we get|f(x)|p ≤ CF (x),

Page 113: Polynomial Semimartingales and a Deep Learning Approach to

CHAPTER 6. POLYNOMIAL SEMIMARTINGALES 103

for some constant C > 0 depending on f . In the same way, we compute

ess supu∈[0,T ], dA

|Gf(u, x)|p = ess supu∈[0,T ], dA

∣∣∣(G(u)f)>Hβk(x)∣∣∣p .

Since G is degree preserving, we have recalling the definition of the matrix Ll from theprevious chapter,

ess supu∈[0,T ], dA

∣∣∣(G(u)f)>Hβk(x)∣∣∣p = ess sup

u∈[0,T ], dA

∣∣∣(G(u)f)>LlHβk(x)∣∣∣p

= ‖G‖pdA,T ‖f‖p1N

p−1l

Nl∑i=1

|βi(x)|p,

where in the last step we used Jensen’s inequality for sums and that

(LlHβk(x))i = βi(x)1i≤Nl.

Since for i ≤ Nl, βi ∈ Poll, we again get that there exists a constant C > 0, dependingon f such that

ess supu∈[0,T ], dA

|Gf(u, x)|p ≤ CF (x). (6.41)

Recall from the proof in Lemma 6.37, that we can choose a localizing sequence (Tn)n ofMf such that 0 ≤ F (X(t∧Tn)−) ≤ n. We compute using the previous estimates

∣∣∣Mft∧Tn

∣∣∣p =

∣∣∣∣∣f(Xt∧Tn)− f(X0)−∫

(0,t∧Tn]Gf(u,Xu−)A(du)

∣∣∣∣∣p

≤ C

(F (Xt∧Tn) + F (X0) +

∫(0,t∧Tn]

F (Xu−)A(du)

)

≤ C

(F (Xt∧Tn) + F (X0) +

∫(0,t]

F (X(u∧Tn)−)A(du)

)

For some constant C > 0 and t ≤ T . In the last step we used that F > 0. From theproof of Lemma 6.37, compare Equation (6.30), we have with g = F that there exists aconstant K > 0 such that

supt≤T

E[F (X(t∧Tn)−)

]≤ sup

t≤TE [F (Xt)] ≤ KE [F (X0)] exp [KA(T )] . (6.42)

By assumption, E[‖X0‖kRd

]<∞, hence Lemma 6.1 implies E [F (X0)] <∞. We further

have ∫(0,t]

E[|F (X(u∧Tn)−)|

]A(du) =

∫(0,t]

E[F (X(u∧Tn)−)

]A(du)

≤ KE [F (X0)] exp [KA(T )]A(T ) <∞.(6.43)

Page 114: Polynomial Semimartingales and a Deep Learning Approach to

104 CHAPTER 6. POLYNOMIAL SEMIMARTINGALES

Note that in the last step we used (6.42) in combination with the standard estimate forthe dA integral. This justifies an application of the theorem of Fubini-Tonelli, overallyielding

E[∣∣∣Mf

t∧Tn

∣∣∣p] ≤ C, (6.44)

where the constant C > 0 does depend on f but not on n. We continue with Doob’smaximal Lp-inequality for p = k

l > 1, yielding for all n, that there exists a constantK <∞ independent of n such that

E

[supt≤T

∣∣∣Mft∧Tn

∣∣∣] ≤ KE [∣∣∣MfT∧Tn

∣∣∣p] ≤ KC,where C is from (6.44). The lhs is increasing in n, hence by monotone convergence weget (Mf )∗T ∈ Lp for all f ∈ Polk−1 and T ≥ 0. Since p > 1, this then implies the desired

E

[supt≤T

∣∣∣Mft

∣∣∣] <∞.Let us now consider the case l = k. Let q = k/2, which is an integer by assumption.Let f ∈ Polq be defined by f(x) = fi(x)q for i ∈ 1, . . . , d. We compute, using Jensen’sinequality for sums and for integrals, where with C < ∞ we denote an appropriateconstant, possibly changing from line to line,

f(Xt)2 =

(Mft + f(X0) +

∫(0,t]Gf(u,Xu−)A(du)

)2

≤ C

((Mf

t )2 + f(X0)2 +

∫(0,t]|Gf(u,Xu−)|2A(du)

)

≤ C

((Mf

t )2 + f(X0)2 +

∫(0,t]

F (Xu−)A(du)

),

where for the last step we used (6.41). Since F ≥ 0, the integral is increasing in t, yielding

supt≤T

f(Xt)2 ≤ C

(supt≤T

(Mft )2 + f(X0)2 +

∫(0,T ]

F (Xu−)A(du)

).

Since f2 ∈ Polk, the assumption E[‖X0‖kRd

]< ∞ implies E

[f(X0)2

]< ∞. We have

already shown above that from (6.43) and Fubini-Tonelli we get

E

[∫(0,T ]

F (Xu−)A(du)

]<∞.

Page 115: Polynomial Semimartingales and a Deep Learning Approach to

CHAPTER 6. POLYNOMIAL SEMIMARTINGALES 105

Since f ∈ Polq, we have p = k/q = 2 and by the previous result, we get (Mf )∗t ∈ L2,hence we have

E

[supt≤T

f(Xt)2

]<∞.

Noting that F (x) = 1 +∑d

i=1 fi(x)k, and f(x)2 = fi(x)k, summing over i ∈ 1, . . . , dyields

E [F (X)∗T ] = E

[supt≤T

F (Xt)

]<∞. (6.45)

From |f(x)| ≤ CF (x) for some constant C we conclude

E [f(X)∗T ] = E

[supt≤T|f(Xt)|

]≤ CE [F (X)∗T ] <∞,

hence f(X)∗t ∈ L1 for all f ∈ Polk, t ≥ 0 as claimed. For f ∈ Polk arbitrary, we have byLemma 6.37 (i) and (ii) that

supt≤T

∫(0,t]|Gf(u,Xu−)|A(du) ≤

∫(0,T ]|Gf(u,Xu−)|A(du)

≤ KfT A(T ) sup

t≤TF (Xu−),

and we have already shown with (6.45) that the rhs is integrable. Hence we get from

supt≤T|Mf

t | ≤ supt≤T

F (Xt) + F (X0) + supt≤T

∫(0,t]|Gf(u,Xu−)|A(du),

that (Mf )∗T ∈ L1 for all f ∈ Polk and T ≥ 0 as claimed. By Lemma 3.6, we get Mf is atrue martingale finishing the proof.

Proof of Theorem 6.44. The one direction, that is (X,P,A,G) is a bounded and uni-formly integrable k-polynomial process with G preserving the degree on Polk, then Con-dition 6.43 holds is the statement of Theorem 6.39 for all k ∈ N0, not just for evenones. For the reverse direction, the case k = 0 is a trivial statement since all càdlàgadapted processes have the 0-polynomial property with τs,t ≡ 1 and Hft for f ∈ Pol0contains only one real number, hence is uniformly integrable. For the case k ≥ 2, wehave by Lemma 6.45 that f(X)∗t ∈ L1 for all f ∈ Polk. Further, X is càdlàg and sincepolynomials are continuous, we get that for almost all ω ∈ Ω,

sups≤t|f(Xs(ω))| <∞.

Fix HfT . The goal is to show uniform integrability. We have for n ∈ N0,

supt≤T

E[f(Xt)1|f(Xt)|>n+1

]≤ E

[supt≤T

f(Xt)1|f(Xt)|>n

]≤ E [f(X)∗T ] <∞.

Page 116: Polynomial Semimartingales and a Deep Learning Approach to

106 CHAPTER 6. POLYNOMIAL SEMIMARTINGALES

By the dominated convergence theorem with dominating r.v. f(X)∗T , we have for allf ≤ T ,

0 = E[

limn→∞

f(X)∗T1f(X)∗T>n

]= lim

n→∞E[f(X)∗T1|f(X)∗T |>n

]≥ lim

n→∞E[f(Xt)1|f(Xt)|>n

].

Since the upper bound goes to zero and is independent of t ≤ T , we get that HfT is indeeduniformly integrable. Further, by Lemma 6.45 we have that Mf is a true martingale forall f ∈ Polk. Hence, with the rest of Condition 6.43, we get that the assumptions ofTheorem 6.41 hold, which finishes the proof.

Remark 6.46. Note that when defining the k-polynomial property, we follow the au-thors Cuchiero et al. [Cuc+12] and require P to preserve the degree. We have seen thatthis makes it reasonable to assume that the extended generator also preserves the degree,compare in particular Theorem 5.18 and Theorem 5.19. This property was crucial inLemma 6.45 and indeed, by means of [Cuc+12, Remark 2.11 (iii)], there is a counterex-ample where a process X has an extended generator whose representing matrix G, whileexisting, does not preserve the degree and despite all other assumptions being satisfied,the process X fails to have the polynomial property.

6.4 Polynomial Processes as Semimartingales

We have seen that k-polynomial processes for k ≥ 1, under mild regularity assumptionsare special semimartingales. In this section, we will deal with the characterization of k-polynomial processes in the context of semimartingality, i.e. we want to characterize theirsemimartingale characteristics. As we do not a priory assume a k-polynomial process tobe Markov, the results we give are weaker then those in [Cuc+12], compare Remark 6.50.

The results we present in this section are all based on the ideas presented in Cuchieroet al. [Cuc+12] and the proofs we present are all straight forward modifications of theproofs therein. Since we consider a time inhomogeneous setting, we will need to introducethe following function space. Recall that a function g ∈ Polk(R>0 × Rd) is of the form

Nk∑i=1

ai(u)βi(x),

where each function ai : R>0 → R is measurable. Hence, u 7→ ιk(g(u, ·)) is a measurablefunction. We further have

‖ιk(g(u, ·))‖1 =

Nk∑i=1

|ai(u)|.

Page 117: Polynomial Semimartingales and a Deep Learning Approach to

CHAPTER 6. POLYNOMIAL SEMIMARTINGALES 107

Let µ be a locally finite measure on R>0. We define

Polµk(R>0 × Rd) :=

g ∈ Polk(R>0 × Rd) : ess sup

u≤t,µ‖ιk(g(u, ·))‖1 <∞, ∀t ≥ 0

.

Let us motivate the introduction of Polµk(R>0 × Rd) with the next lemma.

Lemma 6.47. Let A ∈ V+

Ωand G : R>0 → MNk(R) for k ∈ N0. Assume ‖G‖dA,t < ∞

for all t ≥ 0 and that G preserves the degree on Polk. Define G ιk← G. Then for anyl ∈ N0 with l ≤ k,

(i) if f ∈ Poll, thenGf ∈ PoldAl (R>0 × Rd).

(ii) Further, for any f ∈ PoldAl (R>0 × Rd), there exists a constant K > 0 such that

ess supu≤t,dA

|f(u, x)| ≤ K ess supu≤t

‖ιl(f(u, ·))‖1 (1 + ‖x‖lRd) <∞,

where K depends on f and the concrete choice of the norm ‖·‖Rd but not on t .

Proof. Let us start with (i). Note that since G preserves the degree on Polk, we haveGf ∈ Poll(R>0 × Rd). Further, we compute

ess supu≤t,dA

‖Gf(u, ·)‖1 = ess supu≤t,dA

‖G(u)f‖ = ess supu≤t,dA

‖G(u)‖ ‖f‖1

≤ C ‖G‖dA,t ‖f‖1 <∞,

where C is a finite constant coming from the equivalence of norms in RNk . Hence (i) isshown and we next turn to (ii). We compute

ess supu≤t,dA

|f(u, x)| = ess supu≤t,dA

|ιl(f(u, ·))>Hβl(x)| ≤ C ess supu≤t

‖ιl(f(u, ·))‖1∥∥Hβk(x)

∥∥1.

By assumption, the essential supremum on the right is finite. Further, we have

∥∥Hβk(x)∥∥

1=

Nl∑i=1

|βi(x)|.

Since each summand has degree of at most l, there is a constant Ci > 0 such that|βi(x)| ≤ Ci(1 + ‖x‖l1). Hence the claim follows by summing over finitely many sums forthe ‖·‖1 norm. The general case then follows by equivalence of norms.

Page 118: Polynomial Semimartingales and a Deep Learning Approach to

108 CHAPTER 6. POLYNOMIAL SEMIMARTINGALES

Note that any f ∈ Polk1 is an element of PoldAk1 (R>0 × Rd), since it is constant in u. Inparticular, that implies for any g ∈ PoldAk2 (R>0 × Rd), that fg ∈ PoldAk1+k2

(R>0 × Rd),where the product is the pointwise product.

Throughout the rest of this section, we denote with µX the random measure associatedwith the jumps of X, see Jacod and Shiryaev [JS13, Proposition II.1.16], given by

µX(ω; dt, dξ) =∑s∈R+

1∆Xs(ω) 6=0εs,∆Xs(ω)(dt, dξ),

where ε is the Dirac-distribution on R+ × Rd. Note that the sum is well defined sinceX(ω) is càdlàg for all ω ∈ Ω and hence for only countably many s we have ∆Xs(ω) 6= 0.

In the sequel, for a function f : Rd → R with f ∈ C2(Rd) and i, j ∈ 1, . . . , d, we denotewith Di and Dij the partial derivatives wrt the coordinates i and j, i.e.

Dif(x) =∂

∂xif(x), Dijf(x) =

∂2

∂xi∂xjf(x).

From Lemma 6.37 (iii), we know that a càdlàg and adapted process X for which Con-dition 6.43 holds for a given A ∈ V+

Ωand k ≥ 0 is a special semimartingale (provided

k > 0). In the next proposition we show that the characteristics wrt the “truncation func-tion” h(x) = x satisfy certain polynomial properties. Combined with Proposition 6.54,this can be seen as our version of [Cuc+12, Proposition 2.12]. All the statements weshall make regarding characteristics are to be understood up to evanescence. Further, weassume that our stochastic basis satisfies the usual conditions, hence an arbitrary versionof the characteristics is predictable, c.f. Jacod and Shiryaev [JS13, Remark II.2.8].

Set Ω = Ω×R+×Rd. We denote the predictable σ-algebra with P and set P = P⊗B(Rd),compare [JS13, p. II.1.4] where we have E = Rd and E = B(Rd). We call a randommeasure µ predictable, if for all P-measurable functions W , W ∗ µ is P-measurable (c.f.[JS13, Definition II.1.6]).

Proposition 6.48. Let X be a càdlàg and adapted process, A ∈ V+

Ωand k ∈ N with

k ≥ 2. Assume that Condition 6.43 holds. Then, X is a special semimartingale withsemimartingale characteristics (B,C, ν) wrt the “truncation” function h(x) = x, suchthat for i, j ∈ 1, . . . , d, the characteristics can be written as

Bt,i =

∫(0,t]

bi(u,Xu−)A(du), bi ∈ PoldA1 (R>0 × Rd), (6.46)

Ct,ij +

∫(0,t]

∫Rdfi(ξ)fj(ξ)ν(du, dξ) =

∫(0,t]

aij(u,Xu−)A(du),

aij ∈ PoldA2 (R>0 × Rd).(6.47)

Page 119: Polynomial Semimartingales and a Deep Learning Approach to

CHAPTER 6. POLYNOMIAL SEMIMARTINGALES 109

Further, the characteristics C and ν can be written as

Ct,ij =

∫(0,t]

cu,ijA(du), ν(ω; dt, dξ) = K(ω, t; dξ)A(dt),

for some predictable processes c·,ij and P-measurable transition kernel K from (Ω×R+,P)to (Rd,B(Rd)). Finally, if k ≥ 3, then for arbitrary multi-index l with 3 ≤ |l| ≤ k, thereexists a polynomial g ∈ PoldA|l| (R>0 × Rd), such that,∫

RdξlK(ω, u; dξ) = g(u,Xu−) [dA⊗ P ].

Remark 6.49. (i) Note that by the definition of the characteristic C, we have thatt 7→ Ct is continuous. Hence we get for the process c that if ∆At > 0, then ct,ij = 0for all i, j ∈ 1, . . . , d.

(ii) Let us make the remark that our characterization of the characteristics is differentfrom the affine case (c.f. [KR+18, Theorem 3.2]). There, the authors consider thedecomposition of A into continuous part and pure jump part. One can thereforeask for conditions under which a semimartingale with characteristics of the formabove is an affine semimartingale in the sense of [KR+18]. This however would gobeyond the scope of this thesis and we leave this question for future work.

Proof of Proposition 6.48. Since k ≥ 2, we have fi ∈ Polk for all i ∈ 1, . . . , d. Hence,Lemma 6.37 (iii) yields that X is a special semimartingale. Let Xt,i = Mt,i +Bt,i be thecanonical decomposition of fi(X) and l any multi-index with |l| ≤ k. Note that we haveMt,i = Mfi

t . By Itô’s formula we have

fl(Xt) = fl(X0) +

∫(0,t]

d∑i=1

Difl(Xu−)dXu,i +1

2

∫(0,t]

d∑i,j=1

Dijfl(Xu−)dCu,ij

+∑s≤t

[fl(Xs)− fl(Xs−)−

d∑i=1

Difl(Xs−)∆Xi,s

].

(6.48)

Next, note that since by convention we have X0− = X0 and by that ∆X0− = 0, we canwrite (we suppress the ω-dependency of µX)

∑s≤t

[fl(Xs)− fl(Xs−)−

d∑i=1

Difl(Xs−)∆Xi,s

]

=

∫(0,t]

∫Rd

(fl(Xu− + ξ)− fl(Xu−)−

d∑i=1

Difl(Xu−)ξi

)µX(du, dξ).

Page 120: Polynomial Semimartingales and a Deep Learning Approach to

110 CHAPTER 6. POLYNOMIAL SEMIMARTINGALES

Recall that ξi = fi(ξ) and that fl(x) = xl. Then Lemma 2.1 yields

∑s≤t

[fl(Xs)− fl(Xs−)−

d∑i=1

Difl(Xs−)∆Xi,s

]

=

∫(0,t]

∫RdW (u, ξ)µX(du, dξ),

(6.49)

where

W (u, ξ) :=

|l|∑|m|=2

(l

m

)fl−m(Xu−)ξm.

Using this, combined with the canonical decomposition of X, (6.48) becomes

fl(Xt) = fl(X0) +

∫(0,t]

d∑i=1

Difl(Xu−)dMfiu +

∫(0,t]

d∑i=1

Difl(Xu−)dBu,i

+1

2

∫(0,t]

d∑i,j=1

Dijfl(Xu−)dCu,ij +

∫(0,t]

∫RdW (u, ξ)µX(du, dξ).

(6.50)

Since polynomials are smooth and Xt,i is càglàd, Mfi ∈Mloc yields

Difl(X−) • Mfi ∈Mloc.

By [JS13, Proposition I.3.5] we have (Difl(X−) • Bi) ∈ V and (Dijfl(X−) • Cij) ∈ Vand by the same proposition, both are predictable. By [JS13, Lemma I.3.10], both belongto Aloc. From (6.49), we have∫

(0,t]

∫RdW (u, ξ)µX(du, dξ) ∈ V.

Hence, from [JS13, Proposition I.4.23 (iii)] it follows

W ∗ µX =

∫(0,t]

∫RdW (u, ξ)µX(du, dξ) ∈ Aloc,

since fl(X) is a special semimartingale. Hence |W |∗µX ∈ A+loc, and by [JS13, Proposition

II.1.28] we have W ∗ µX −W ∗ ν = W ∗ (µX − ν), implying W ∗ µX −W ∗ ν is a localmartingale (compare [JS13, Definition II.1.27]). Since fl(X) is special, it’s canonicaldecomposition is unique. Therefore, by combining the definition of Mfl with (6.50), weget

Mfl =

d∑i=1

Difl(X−) • Mfi +W ∗ (µX − ν)

= fl(Xt)− fl(X0)−d∑i=1

Difl(X−) • Bi −1

2

d∑i,j=1

Dijfl(X−) • Cij −W ∗ ν.

Page 121: Polynomial Semimartingales and a Deep Learning Approach to

CHAPTER 6. POLYNOMIAL SEMIMARTINGALES 111

This, again by definition of Mfl , yields∫(0,t]Gfl(u,Xu−)A(du) =

d∑i=1

(Difl(X−) • Bi)t

+1

2

d∑i,j=1

(Dijfl(X−) • Cij)t +W ∗ νt.

(6.51)

Let l = ei, i.e. in particular |l| = 1 and fl(x) = fi(x). Then Djfl(x) = 1j=i andDijfl(x) = 0. Therefore (6.51) implies∫

(0,t]Gfi(u,Xu−)A(du) = Bi,t, i ∈ 1, . . . , d.

Hence we can set bi := Gfi, and since G is degree preserving on Polk for k ≥ 2 and‖G‖dA,t < ∞ for all t ≥ 0, we have by Lemma 6.47 that bi ∈ PoldA1 (R>0 × Rd). Next,consider the polynomials of degree two, i.e. fij(x) = fei+ej (x) for i, j ∈ 1, . . . , d. Inthis case, we have Gfij ∈ Pol2. Further, since Difij = fj ∈ Pol1, we get Difijbi ∈ PoldA2 .Since |ei + ej | = 2, we have

W (u, ξ) =d∑

i,j=1

fij(ξ) =d∑

i,j=1

ξiξj .

Hence, (6.51) yields

Ct,ij +W ∗ νt = Ct,ij +

∫(0,t]

∫Rdξiξjν(du, dξ) =

∫(0,t]

aij(u,Xu−)A(du),

for some aij ∈ PoldA2 (R>0 × Rd). Hence we have shown (6.46) and (6.47). We continueby defining

A′t(ω) :=

∫(0,t]

∫Rd‖ξ‖22 ν(ω; du, dξ).

Note that we have∑d

i=1 ξiξi = ‖ξ‖22, hence

d∑i=1

Ct,ii(ω) +A′t(ω) =d∑i=1

∫(0,t]

aii(u,Xu−)A(du). (6.52)

By the definition of the characteristic C, the diagonal elements are non-negative increas-ing and predictable processes. Again by the definition of the characteristic ν, we get forthe P measurable function V (ω, t, ξ) = ‖ξ‖22 that A′ = V ∗ ν is predictable, hence A′

is a predictable, non-negative and increasing process. By (6.52), we have for almost allω ∈ Ω,

d

(d∑i=1

C·,ii(ω) +A′(ω)

) dA.

Page 122: Polynomial Semimartingales and a Deep Learning Approach to

112 CHAPTER 6. POLYNOMIAL SEMIMARTINGALES

Then, [JS13, Proposition I.3.13] implies that there exists a non-negative and predictableprocess H1 such that

d∑i=1

Ct,ii +A′t = (H1 • A)t.

By non-negativity and the increasing property, we then have for a.a. ω ∈ Ω and alli ∈ 1, . . . , d,

dC·,ii(ω) dA and dA′(ω) dA.

Hence, again by the same proposition, there are non-negative and predictable processesc·,ii, H

2 such thatCt,ii = (c·,ii • A)t, A

′t = (H2 • A)t.

Define V (ω, t, ξ) := V (ω, t, ξ) + 10(ξ). Then V is a strictly positive, P -measurablefunction on Ω. Further, since µX(ω;R+ × 0) = 0, by construction of the compensatorν we can always assume ν(ω;R+ × 0) = 0. That implies A′ = V ∗ ν = V ∗ ν. Henceby [JS13, Theorem II.1.8], there exists a transition kernel K ′(ω, t; dξ) from (Ω,R+,P) to(Rd,B(Rd)) with

ν(ω; du, dξ) = K ′(ω, u; dξ)dA′u.

Combining this with the existence of H2 above, we get

ν(ω; du, dξ) = H2u(ω)K ′(ω, u; dξ)dAu.

Defining K(ω, u; dξ) := H2u(ω)K ′(ω, u; dξ), we get that K is again a P-measurable tran-

sition kernel with the desired property

ν(ω; du, dξ) = K(ω, u; dξ)dAu,

for almost all ω. This implies that (6.52) can be written as

Ct,ij =

∫(0,t]

(aij(u,Xu−)−

∫RdξiξjK(u; dξ)

)A(du) [P ].

This implies for i 6= j, we also have dC·,ij dA and since aij are deterministic andXu− is càglàd, we get aij(u,Xu−) is predictable. Further K is P measurable, implyingthat

∫Rd ξiξjK(ω, ·; dξ) is predictable. Hence we get that the densities c·,ij with Ct,ij =

(c·,ij • A)t are predictable processes. Let us continue. A closer look shows that we havethe identity

1

2

d∑i,j=1

Dijflξiξj =2∑

|m|=2

(l

m

)fl−mξ

m,

where the important observation is that all multi-indices m with |m| = 2 are of the form

Page 123: Polynomial Semimartingales and a Deep Learning Approach to

CHAPTER 6. POLYNOMIAL SEMIMARTINGALES 113

m = ei + ej . Hence we can write (6.51) as

∫(0,t]Gfl(u,Xu−)A(du) =

∫(0,t]

d∑i=1

Difl(Xu−)bi(u,Xu−)A(du)

+

∫(0,t]

1

2

d∑i,j=1

Dijfl(Xu−)

(ct,ij +

∫RdξiξjK(u; dξ)

)A(du)

+

∫(0,t]

∫Rd

|l|∑|m|=3

(l

m

)fl−m(Xu−)ξm

K(u; dξ)A(du)

=

∫(0,t]

d∑i=1

Difl(Xu−)bi(u,Xu−)A(du) +

∫(0,t]

1

2

d∑i,j=1

Dijfl(Xu−)aij(u,Xu−)A(du)

+

∫(0,t]

∫Rd

|l|∑|m|=3

(l

m

)fl−m(Xu−)ξm

K(u; dξ)A(du).

(6.53)

Consider the case where l is such that |l| = 3. Then the summands in the last termabove are unequal zero only for m = l. Hence we have

|l|∑|m|=3

(l

m

)fl−m(Xu−)ξm = ξl.

As all other terms belong to PoldAl for |l| = l, we get that there exists a g ∈ PoldAl (R>0×Rd) such that

∫(0,t]

∫RdξlK(u; dξ)A(du) =

∫(0,t]

g(u,Xu−)A(du) [P ],

and since this holds for all t ≥ 0, we get∫Rd ξ

lK(u; dξ) = g(u,Xu−) outside dA⊗P zerosets for all l with l = 3. We extend this result to all l with |l| ≤ k by induction. Assumefor all m with |m| ≤ l − 1 there exists a polynomial gm ∈ PoldA|m|(R>0 × Rd) such that

∫RdξmK(u; dξ) = gm(u,Xu−) [dA⊗ P ].

Note that we can assume fl−m ∈ Pol|l|−|m| since for all m l the binomial coefficients

Page 124: Polynomial Semimartingales and a Deep Learning Approach to

114 CHAPTER 6. POLYNOMIAL SEMIMARTINGALES

are zero. Hence for m ≤ l we get fl−mgm ∈ PoldAl (R>0 × Rd). Further, we have∫(0,t]

∫Rd

|l|∑|m|=3

(l

m

)fl−m(Xu−)ξm

K(u; dξ)A(du)

=

∫(0,t]

|l|∑|m|=3

(l

m

)fl−m(Xu−)

(∫RdξmK(u; dξ)

)A(du)

=

∫(0,t]

l−1∑|m|=3

(l

m

)fl−m(Xu−)

(∫RdξmK(u; dξ)

)A(du)

+

∫(0,t]

l∑|m|=l−1

(l

m

)fl−m(Xu−)

(∫RdξmK(u; dξ)

)A(du)

=

∫(0,t]

l−1∑|m|=3

(l

m

)fl−m(Xu−)gm(u,Xu−)A(du)

+

∫(0,t]

l∑|m|=l−1

(l

m

)fl−m(Xu−)

(∫RdξmK(u; dξ)

)A(du)

=

∫(0,t]

l−1∑|m|=3

(l

m

)fl−m(Xu−)gm(u,Xu−)A(du) +

∫(0,t]

∫RdξlK(u; dξ)A(du).

Hence by the same arguments as before, we get that there exists a gl ∈ PoldAl (R>0×Rd)such that ∫

RdξlK(u; dξ) = gl(u,Xu−) [dA⊗ P ],

for all l with |l| ≤ k, which finishes the proof.

Remark 6.50. In Cuchiero et al. [Cuc+12], the authors show in their version a strongerresult regarding the processes c·,ij and the random measure K(u; dξ) above. A result inour situation analogous to theirs (they proved a time homogeneous version) would be thatone can pick autonomous versions of c·,ij and K(u; dξ), that is one can pick version ofthe form

ct,ij(ω) = c(t,Xu−(ω)) and K(ω, u; dξ) = K(u,Xu−(ω); dξ).

The authors took advantage of the Markov structure they assumed so that they could applyresults from Çinlar et al. [Çin+80]. This is not possible in our situation, explaining themotivation for the next definition and the subsequent proposition that can be seen as thesecond part of our version of [Cuc+12, Proposition 2.12].

Let us first recall the following proposition from Jacod and Shiryaev [JS13, PropositionII.2.9], where we omit the proof. We restrict ourself to the case of special semimartingales.

Page 125: Polynomial Semimartingales and a Deep Learning Approach to

CHAPTER 6. POLYNOMIAL SEMIMARTINGALES 115

Proposition 6.51. Let X be a special semimartingale. Then one can find a version of thecharacteristics (B,C, ν), called “good versions”, wrt the “truncation” function h(x) = xwhich are of the form

Bi = bi • ACij = cij • A

ν(ω; dt, dξ) = K(ω, t; dξ)dAt,

where A ∈ A+loc is predictable, for all i, j ∈ 1, . . . , d, the R-valued processes bi and cij

are predictable and c = (cij)di,j=1 takes values in the set of all symmetric non-negative

(positive definite) matrices and K(ω, t; dξ) is a transition kernel from (Ω × R+,P) into(Rd,B(Rd)).

Note that K is indexed with time t ∈ R+. Since ∆X0 = 0 by convention, we can alwaysassume K(0, x; dξ) = 0dξ. Further, by construction, for the canonical decompositionX = X0 + M + V one has M0 = 0 and V0 = 0. Hence, it is sufficient to define thecharacteristics on R>0 rather then R+ as we will do from here on. This can also be seenwhen considering the A integrals above. Since A is càdlàg , b and c do not need to bedefined at time t = 0.

Definition 6.52. Let X be a special semimartingale. We call X a k-polynomial semi-martingale for k ≥ 2, if one can choose a representation in Proposition 6.51, such thatfor i, j ∈ 1, . . . , d,

(i) A ∈ A+loc can be chosen as an element of V+

Ω⊂ A+

loc,

(ii) bt,i = bi(t,Xt−) for some bi ∈ PoldA1 (R>0 × Rd),

(iii) ct,ij = cij(t,Xt−) for some measurable function cij,

(iv) K(ω, t; dξ) = K(t,Xt−, dξ), where K is some transition kernel from

(R+ × Rd,B(R+)⊗ B(Rd)) into (Rd,B(Rd)),

(v) the functions cij and K satisfy for all i, j ∈ 1, . . . , d and x ∈ Rd,

cij(u, x) +

∫RdξiξjK(u, x; dξ) = aij(u, x),

where all aij ∈ PoldA2 (R>0 × Rd),

(vi) if k ≥ 3, then for all multi-indices l with 3 ≤ |l| ≤ k, it holds∫RdξlK(u, x; dξ) ∈ PoldA|l| (R>0 × Rd).

Page 126: Polynomial Semimartingales and a Deep Learning Approach to

116 CHAPTER 6. POLYNOMIAL SEMIMARTINGALES

If X is a k-polynomial semimartingale for all k ≥ 2, then we simply say X is a polynomialsemimartingale.

Remark 6.53. (i) Just as in Remark 6.49, note that since we require the above func-tions to define characteristics, we implicitly get that c vanishes for those timeswhere ∆At > 0. Again, this is different from the characterization results for affinesemimartingales in [KR+18].

(ii) Note that we implicitly assume that all the expressions in the definition above arewell defined. In particular, we have

∫Rd ‖ξ‖

kK(u, x; dξ) <∞ for all (u, x) ∈ R>0×Rd.

In the sequel, whenever we talk about a polynomial semimartingale, we assume thecharacteristics are given in the form above.

Proposition 6.54. Let X be a k-polynomial semimartingale, hence k ≥ 2. Then for A ∈V+

Ωfrom the definition of polynomial semimartingales and k, X satisfies Condition 6.43

for said A and k.

Proof. Note that fl : |l| ≤ k forms a basis of Polk. Hence for any f ∈ Poll with l ≤ k,there are coefficients αm indexed by multi-indices m with |m| ≤ k, such that

f(x) =k∑

|m|=0

αmfm(x), αm = (0, . . . , 0) ∀m with l < |m| ≤ k.

Consider now (6.53). It follows that a version of GX,A on Polk, denoted by G, is given by

Gf(u, x) =

k∑|m|=0

αmGfm(u, x) =

l∑|m|=0

αmGfm(u, x),

implying Gf ∈ PoldAl . Hence G is a linear map on Polk that preserves the degree. Hencethere exists a matrix function G such that G ιk← G where G is degree preserving on Polkby Corollary 5.13. To be precise, each matrix G(u) is defined the following way. Thei-th column with i ≤ Nk is given by the coefficients of the polynomial Gβi(u, ·) wrt thebasis βk.

We are left to show ‖G‖dA,t <∞ for all t ≥ 0. Since this property is preserved under basistransformations as these are matrices that are constant in u, we choose to work under thecanonical basis of Polk introduced in the preliminaries. In particular, we have βi = flifor all βi ∈ βk where li is the i-th element in Λk under the graded reverse-lexicographicorder. Note that this basis satisfies Convention 5.1.

Page 127: Polynomial Semimartingales and a Deep Learning Approach to

CHAPTER 6. POLYNOMIAL SEMIMARTINGALES 117

Define the operators Gi : Polk → PoldAk for i = 1, 2, 3 by their action on the basispolynomials βλ for λ ∈ 1, . . . , Nk via

G1βλ(u, x) =d∑i=1

Diβλ(x)bi(u, x),

G2βλ(u, x) =1

2

d∑i,j=1

Dijβλ(x)aij(u, x),

G3βλ(u, x) =

∫Rd

k∑|m|=3

(lλm

)flλ−m(Xu−)ξm

K(u; dξ)

=

∫Rd

|lλ|∑|m|=3

(lλm

)flλ−m(Xu−)ξm

K(u; dξ).

Note that just as before, the sum is well defined since the multi-binomial coefficients arezero if m ≤ lλ does not hold. Hence we have for all Polk the identity Gf =

∑3i=1 Gif

and for Giιk→ Gi we have G =

∑3i=1Gi for all u ∈ R>0.

Consider G1 and define for 1 ∈ 1, . . . , d the operators Tif(u, x) = Dif(x)bi(u, x) suchthat we have G1 =

∑di=1 Ti. Since Di reduces the degree of a non-constant polynomial

by one, we have

Diβλ(x) =

Nk∑z=1

αλzβz(x) =

Nk−1∑z=1

αλzβz(x),

for some finite constants αλz . Hence, since bi ∈ PoldA1 (R>0 × Rd), there are measurablefunctions αλz (u), such that

Diβλ(x)bi(u, x) =

Nk∑z=1

αλz (u)βz(x),

and since ‖ιk(bi(u, ·))‖1 is essentially locally bounded, we get

ess supu≤t,dA

|αλz (u)| <∞, for all t ≥ 0 and λ, z ≤ Nk.

Hence the representing matrix of Tiιk← Ti is given by these coefficients, that is

(Ti(u))z,λ = αλz (u).

Denote with ‖·‖Σ the matrix norm given by

‖M‖Σ :=

Nk∑λ,z=1

|Mz,λ| for M = (Mz,λ)Nkz,λ=1.

Page 128: Polynomial Semimartingales and a Deep Learning Approach to

118 CHAPTER 6. POLYNOMIAL SEMIMARTINGALES

We get

ess supu≤t,dA

‖Ti(u)‖Σ ≤Nk∑

z,λ=1

ess supu≤t,dA

|αλz (u)| <∞.

Since all norms in finite dimension are equivalent, this implies ‖Ti‖dA,t <∞ for all t ≥ 0.This implies ‖G1‖dA,t < ∞ for all t ≥ 0 by means of the triangle inequality applied toG1(u) =

∑di=1 Ti(u).

By the same arguments we get ‖Gi‖dA,t <∞ for all t ≥ 0 where i ∈ 2, 3. Hence, againby the triangle inequality we get ‖G‖dA,t <∞ for all t ≥ 0, which finishes the proof.

Remark 6.55. If X is a polynomial semimartingale, the form of the characteristicsmight suggest that X has the Markov property. However, since we do not assume thecorresponding semimartingale problem to be well posed, even in the pure diffusion case,this does not need to hold. This observation is analogous to the one made by Filipovićand Larsson [FL16] and Filipović and Larsson [FL17] in the respective introductions.

We now want to address the question of conditions under which a k-polynomial semi-martingale X can be embedded in bounded and uniformly integrable k-polynomial pro-cess (X,P,A,G). By means of Theorem 6.44, if X is a polynomial semimartingale withE[‖X0‖kRd

]<∞, then this is always possible for all k ≥ 2 since by definition, P and G

preserve the degree and X has the l-polynomial property for all l ≤ k. Hence if X is a(2k)-polynomial semimartingale, then it can be embedded in a bounded and uniformlyintegrable 2k-polynomial process (X,P,A,G). Since P and G can be projected to RNk ,one can then embed X into the bounded and uniformly integrable k-polynomial process(X, P , A, G) with projections P , G that are MESs and corresponding extended generatorsdue to the degree preserving properties, see Proposition 5.17, Theorem 5.19 combinedwith the uniqueness result Proposition 4.53.

Let us continue with the following lemma that is our version of [Cuc+12, Lemma 2.17].Again the proof we present is a modification of the one presented there.

Lemma 6.56. Let X be a special semimartingale where for k ≥ 2, the characteristics(B,C, ν) wrt the “truncation” function h(x) = x satisfy properties (i) − (v) in Defini-tion 6.52 for A ∈ V+

Ω. Then there exists a constant K that depends on t such that

E[sups≤t‖Xs‖kRd

]≤ K

(1 + E

[‖X0‖kRd

]+

∫(0,t]

E[∫Rd‖ξ‖kRd K(u,Xu−; dξ)

]A(du)

+

∫(0,t]

E[‖Xu−‖kRd

]A(du)

).

(6.54)

If additionally, we have E[‖X0‖kRd

]<∞ and either∫

(0,t]E[∫Rd‖ξ‖kRd K(u,Xu−; dξ)

]A(du) <∞, ∀t ≥ 0, (6.55)

Page 129: Polynomial Semimartingales and a Deep Learning Approach to

CHAPTER 6. POLYNOMIAL SEMIMARTINGALES 119

or if for all t ≥ 0, there exists some finite constant K such that for all t ≤ T ,∫Rd‖ξ‖kRd K(t, x; dξ) ≤ C (1 + ‖x‖kRd), x ∈ Rd, (6.56)

then for all t ≥ 0,

E[sups≤t‖Xs‖kRd

]<∞.

Proof. As a first step we show that E[sup(0,t] |fi(Xs)|k

]= E

[sup(0,t] |Xs,i|k

]satisfies

the inequality. Let Xi = X0,i + M + F be the canonical decomposition of the specialsemimartingale Xi. Hence we have M = Mfi and F = Bi which gives us

Ft =

∫(0,t]

bi(u,Xu−)A(du).

By [JS13, Theorem I.4.18] we can decomposeM into continuous and purely discontinuouspart M = M0 + M c + Md where in our case M0 = M c

0 = Md0 = 0. Since a continuous

local martingale has no jumps, hence bounded jumps, M c0 = 0 yields that M c is a locally

square integrable martingale and we denote with 〈·, ·〉 the predictable covariation. Hencewe have Ct,ii = 〈M c〉t = 〈M c,M c〉t. By [JS13, Theorem I.4.52], the increasing andnon-negative process Z defined by

Zt :=∑s≤t

(∆Xs,i)2 =

∫(0,t]

∫Rdξ2i µ

X(ds, dξ),

gives us[M ] = [M,M ] = C + Z,

where [·, ·] denotes the quadratic covariation of two semimartingales. Next, define thestopping times

TXj := inft ≥ 0 : |Xt| ≥ jTZj := inft ≥ 0 : |Zt| ≥ jTj := TXj ∧ TZj .

For the rest of the proof, recall that we denote with C a finite constant (that is indepen-dent of t) that can change from estimate to estimate. For notational convenience, let usdefine the function

ρ1(t) := ess supu≤t,dA

‖ι1(bi(u, ·))‖1 ,

which is finite for all t ≥ 0 since bi ∈ PoldA1 (R>0 × Rd). A standard estimate yields

sups≤t

∣∣∣XTji,s

∣∣∣k ≤ C|X0,i|k + sup

s≤t

∣∣∣MTjs

∣∣∣k + sups≤t

∣∣∣∣∣∫

(0,s∧Tj ]bi(u,Xu−)A(du)

∣∣∣∣∣k . (6.57)

Page 130: Polynomial Semimartingales and a Deep Learning Approach to

120 CHAPTER 6. POLYNOMIAL SEMIMARTINGALES

Note that we have |X0,i|k ≤ ‖X0‖k1 ≤ C ‖X0‖kRd . Further, we have bi ∈ PoldA1 (R>0 ×Rd)by assumption. With Lemma 6.47 (ii), we estimate

sups≤t

∣∣∣∣∣∫

(0,s∧Tj ]bi(u,Xu−)A(du)

∣∣∣∣∣k

≤ sups≤t

∫(0,s∧Tj ]

|bi(u,Xu−)|k A(du)

≤ sups≤t

∫(0,s∧Tj ]

∣∣∣Cρ1(t)(1 + ‖Xu−‖Rd)∣∣∣k A(du)

≤ C(ρ1(t) +A(t))(

1 +

∫(0,t]

∥∥∥X(u∧Tj)−

∥∥∥kRdA(du)

).

(6.58)

Let us now consider sups≤t |Ms∧Tj |k. An application of the Burkholder-Davis-Gundyinequality yields

E[sups≤t|Ms∧Tj |k

]≤ CE

[[M ]

k2t∧Tj

]≤ CE

[Ck2t∧Tj + Z

k2t∧Tj

], (6.59)

where in the last step we used Jensen’s inequality for sums. Recall that by our assump-tions, we have

Ct,ii +

∫(0,t]

ξ2iK(u,Xu−; dξ)A(du) =

∫(0,t]

aii(u,Xu−)A(du),

for aii ∈ PoldA2 (R>0 × Rd). Define

ρ2(t) := ess supu≤t,dA

‖ι2(aii(u, ·))‖1 ,

which is finite for all t ≥ 0 since aii ∈ PoldA2 (R>0 × Rd). Since ξ2i ≥ 0, this yields again

with Lemma 6.47 (ii),

E[Ck2t∧Tj

]≤ C(ρ2(t) +A(t))

(1 + E

[∫(0,t∧Tj ]

‖Xu−‖kRd A(du)

])

≤ C(ρ2(t) +A(t))

(1 +

∫(0,t]

E[∥∥∥X(u∧Tj)−

∥∥∥kRd

]A(du)

).

In the last step we use Fubini-Tonelli due to non-negativity of the integrand. We now

consider Zk2t∧Tj . We continue following the ideas in [Cuc+12] where the authors base their

arguments on ideas presented in Jacod et al. [Jac+05]. Note that we have

∆Zt =∑s≤t

(∆Xi,s)2 −

∑s<t

(∆Xi,s)2 = (∆Xi,t)

2.

Since Z is increasing and piecewise constant, we get that for any p ≥ 1, we have that Zp

is increasing and piecewise constant with Zp0 = 0. This implies

Zk2t =

∑s≤t

∆(Zk2s ) =

∑s≤t

Zk2s − Z

k2s− =

∑s≤t

(Zs− + ∆Zs)k2 − Z

k2s−.

Page 131: Polynomial Semimartingales and a Deep Learning Approach to

CHAPTER 6. POLYNOMIAL SEMIMARTINGALES 121

Hence we get

Zk2t∧Tj =

∫(0,t∧Tj ]

∫Rd

[(Zs− + ξ2

i

) k2 − Z

k2s−

]µX(ds, dξ).

Noting that ν is the predictable compensator of µX and the integrand above is P mea-surable, we have by [JS13, Theorem II.1.8]

E[Zk2t∧Tj

]= E

[∫(0,t∧Tj ]

∫Rd

[(Zs− + ξ2

i

) k2 − Z

k2s−

]ν(ds, dξ)

]. (6.60)

The following fundamental inequalities will play an important role in the sequel. Theywere proposed in [Jac+05] and we shall use them in the same way they were used in[Cuc+12]. For x, z ≥ 0, ε > 0 and k

2 = p ≥ 1,

(z + x)p − zp ≤ 2p−1(zp−1x+ xp), (6.61)

zp−1x ≤ εzp +xp

εp−1. (6.62)

We apply (6.61) to (6.60), yielding

E[Zk2t∧Tj

]≤ E

[∫(0,t∧Tj ]

∫Rd

2k2−1

[Zk2−1

s− ξ2i + (ξ2

i )k2

]ν(ds, dξ)

]

= E

[∫(0,t∧Tj ]

∫Rd

2k2−1

[Zk2−1

s− ξ2i + |ξi|k

]ν(ds, dξ)

].

(6.63)

By assumption, property (iii) in Definition 6.52 holds for ν. Further, since Ct,ii isincreasing and non-negative, we have cii(u,Xu−) ≥ 0 outside a dA ⊗ P -zero set. Thisimplies

∫Rd ξ

2iK(u,Xu−; dξ) ≤ aii(u,Xu−) outside a dA⊗P -zero set. Hence we compute

for the first part on the rhs above,

E

[∫(0,t∧Tj ]

∫Rd

2k2−1Z

k2−1

s− ξ2i ν(ds, dξ)

]

= E

[∫(0,t∧Tj ]

∫Rd

2k2−1Z

k2−1

s− ξ2iK(s,Xs−)A(ds)

]

≤ E

[∫(0,t∧Tj ]

2k2−1Z

k2−1

s− aii(s,Xs−)A(ds)

]

≤ C(1 +A(t))E

∫(0,t∧Tj ]

εZk2s− +

1 + ‖Xs−‖2 k2

Rd

εk2−1

A(ds)

.In the last step, we used |Zs| = Zs, Lemma 6.47 (ii) for aii ∈ PoldA2 (R>0 × Rd) and(6.62). Since Z is non-decreasing, we can estimate using ZTj− ≤ j,∫

(0,t∧Tj ]Zk2s−A(ds) ≤ A(t)(j

k2 ∧ Z

k2t∧Tj ).

Page 132: Polynomial Semimartingales and a Deep Learning Approach to

122 CHAPTER 6. POLYNOMIAL SEMIMARTINGALES

Applying these estimates to (6.63), we get

E[Zk2t∧Tj

]≤ C(1 +A(t))εE

[∫(0,t∧Tj ]

Zk2s−A(ds)

]

+ C(1 +A(t))E

[∫(0,t∧Tj ]

1 + ‖Xs−‖kRdεk2−1

A(ds)

]

+ E

[∫(0,t∧Tj ]

∫Rd

2k2−1|ξi|kν(ds, dξ)

]

≤ C(1 +A(t) +A(t)2)εE[jk2 ∧ Z

k2t∧Tj

]+ C(1 +A(t) +A(t)2)E

[∫(0,t∧Tj ]

1 + ‖Xs−‖kRdεk2−1

A(ds)

]

+ CE

[∫(0,t∧Tj ]

∫Rd|ξi|kK(s,Xs−; dξ)A(ds)

].

(6.64)

Since Zk2t∧Tj ≥ (j

k2 ∧ Z

k2t∧Tj ), we have 1

2Zk2t∧Tj ≤ Z

k2t∧Tj −

12(j

k2 ∧ Z

k2t∧Tj ). Hence, choosing

ε = (2C(1 +A(t) +A(t)2))−1 and using |ξi| ≤ ‖ξ‖1, we infer from (6.64)

1

2E[Zk2t∧Tj

]≤ E

[Zk2t∧Tj

]− 1

2E[jk2 ∧ Z

k2t∧Tj

]≤ C(1 +A(t) +A(t)2)

(E

[∫(0,t]

1 +∥∥∥X(s∧Tj)−

∥∥∥kRdA(ds)

]

+ E

[∫(0,t∧Tj ]

∫Rd‖ξ‖kRd K(s,Xs−; dξ)A(ds)

]),

where C contains all constants coming from the equivalence between ‖·‖Rd and ‖·‖1.Define

ρ3(t) = (1 +A(t) +A(t)2 + ρ1(t) + ρ2(t)).

Applying the above, together with (6.58) and (6.59), to (6.57) and changing order ofintegration due to non-negativity, yields

E[sups≤t

∣∣∣XTji,s

∣∣∣k] ≤ Cρ3(t)(

1 + E[‖X0‖kRd

]+ E

[Ck2t∧Tj

]+ E

[Zk2t∧Tj

]+

∫(0,t]

E[∥∥∥X(s∧Tj)−

∥∥∥kRd

]A(ds)

)≤ Cρ3(t)

(1 + E

[‖X0‖kRd

]+

∫(0,t]

E[∥∥∥X(s∧Tj)−

∥∥∥kRd

]A(ds)

+ E

[∫(0,t∧Tj ]

∫Rd‖ξ‖kRd K(s,Xs−; dξ)A(ds)

]).

(6.65)

Page 133: Polynomial Semimartingales and a Deep Learning Approach to

CHAPTER 6. POLYNOMIAL SEMIMARTINGALES 123

Note that if either term on the rhs of (6.54) is infinite, then the claim is always true.Hence, we can assume they are all finite. Since X and Z are càdlàg , they can not explodein finite time. That implies that for almost all ω ∈ Ω we have Tj(ω) → ∞ as j goes toinfinity. Hence, by monotone convergence we get

limj→∞

E[sups≤t|XTj

i,s |k

]= E

[sups≤t|Xi,s|k

].

Since j 7→∥∥∥X(s∧Tj)−

∥∥∥kRd

is monotone, again by monotone convergence we have

limj→∞

∫(0,t]

E[∥∥∥X(s∧Tj)−

∥∥∥kRd

]A(ds) =

∫(0,t]

E[‖Xs−‖kRd

]A(ds),

where we use that ‖·‖Rd is continuous, s∧Tj is a monotone increasing sequence convergingto s and Xs− is left continuous. Finally, we have monotonicity of

j 7→∫

(0,t∧Tj ]

∫Rd‖ξ‖kRd K(s,Xs−; dξ)A(ds).

Hence we have

limj→∞

E

[∫(0,t∧Tj ]

∫Rd‖ξ‖kRd K(s,Xs−; dξ)A(ds)

]

= E

[∫(0,t]

∫Rd‖ξ‖kRd K(s,Xs−; dξ)A(ds)

]

=

∫(0,t]

E[∫Rd‖ξ‖kRd K(s,Xs−; dξ)

]A(ds),

where for the last step we use non-negativity of the integrand. By summing over theindex i ∈ 1, . . . , d, we now get with equivalence of norms and Jensen’s inequality forsums,

E[sups≤t

∥∥∥XTjs

∥∥∥kRd

]≤ C

d∑i=1

E[sups≤t

∣∣∣XTji,s

∣∣∣k]≤ Cρ3(t)

(1 + E

[‖X0‖kRd

]+

∫(0,t]

E[∥∥∥X(s∧Tj)−

∥∥∥kRd

]A(ds)

+ E

[∫(0,t∧Tj ]

∫Rd‖ξ‖kRd K(s,Xs−; dξ)A(ds)

]).

(6.66)

Hence, (6.54) follows by applying monotone convergence to (6.66).

Page 134: Polynomial Semimartingales and a Deep Learning Approach to

124 CHAPTER 6. POLYNOMIAL SEMIMARTINGALES

We continue with showing the integrability of the running supremum. Assume we haveE[‖X0‖kRd

]< ∞ and consider (6.66). We can bound the last term and apply Fubini-

Tonelli to get

E[sups≤t

∥∥∥XTjs

∥∥∥kRd

]≤ Cρ3(t)

(1 + E

[‖X0‖kRd

]+

∫(0,t]

E[∥∥∥X(s∧Tj)−

∥∥∥kRd

]A(ds)

+ E

[∫(0,t]

∫Rd‖ξ‖kRd K(s,Xs−; dξ)A(ds)

])= Cρ3(t)

(1 + E

[‖X0‖kRd

]+

∫(0,t]

E[∥∥∥X(s∧Tj)−

∥∥∥kRd

]A(ds)

+

∫(0,t]

E[∫Rd‖ξ‖kRd K(s,Xs−; dξ)

]A(ds)

).

(6.67)

Assume (6.55) holds. Then the function ρ4, defined by

ρ4(t) =

∫(0,t]

E[∫Rd‖ξ‖kRd K(u,Xu−; dξ)

]A(du),

is finite for all t ≥ 0. Note that ρj , j = 3, 4 are non-decreasing. Hence we get the estimatefor all 0 ≤ t ≤ T ,

E[sups≤t

∥∥∥XTjs

∥∥∥kRd

]≤ Cρ3(t)

(1 + E

[‖X0‖kRd

]+ ρ4(t)

)+ Cρ3(t)

∫(0,t]

E[∥∥∥X(s∧Tj)−

∥∥∥kRd

]A(ds)

≤ K +K

∫(0,t]

E[supu≤s

∥∥∥X(u∧Tj)−

∥∥∥kRd

]A(ds),

where K = Cρ3(T )(1 + E[‖X0‖kRd

]+ ρ4(T )). Since

∥∥∥X(s∧Tj)−

∥∥∥ ≤ j for all s ≥ 0, wehave due to A(t) < ∞ that the rhs is finite. By monotone convergence we get that therunning supremum is càdlàg since X is càdlàg . Hence, by Gronwall’s inequality, we getfor all 0 ≤ t ≤ T ,

E[sups≤t

∥∥∥XTjs

∥∥∥kRd

]≤ K exp (KA(T )) <∞. (6.68)

The upper bound is independent for all j and hence with monotone convergence we getfor all t ≤ T ,

E[sups≤t‖Xs‖kRd

]≤ K exp (KA(T )) <∞. (6.69)

Since T was arbitrary, the claim is shown. We are left with assuming (6.56) instead of

Page 135: Polynomial Semimartingales and a Deep Learning Approach to

CHAPTER 6. POLYNOMIAL SEMIMARTINGALES 125

(6.55). Let C be the constant from (6.56) for T . In that case, (6.66) becomes

E[sups≤t

∥∥∥XTjs

∥∥∥kRd

]≤ Cρ3(T )

(1 + E

[‖X0‖kRd

] )+ Cρ3(T )

∫(0,t]

E[∥∥∥X(s∧Tj)−

∥∥∥kRd

]A(ds) + CE

[∫(0,t∧Tj ]

(1 + ‖Xs−‖kRd)A(ds)

]

≤ Cρ3(T )(

1 + E[‖X0‖kRd

] )+ Cρ3(T )

∫(0,t]

E[supu≤s

∥∥∥XTju

∥∥∥kRd

]A(ds)

+ CE

[∫(0,t]

(1 +∥∥∥X(s∧Tj)−

∥∥∥kRd

)A(ds)

]≤ K +K

∫(0,t]

E[supu≤s

∥∥∥XTju

∥∥∥kRd

]A(ds),

where K = (Cρ3(T ) + C)(1 + E[‖X0‖kRd

]) By the exact same arguments as before, the

rhs is finite and since C is only depending on T and not on t, Gronwall’s inequality yields(6.68). Again by monotone convergence we get (6.69), which finishes the proof since Twas arbitrary.

This lemma now allows us to state the main theorem from an application point of view.It shows that polynomial semimartingales can be embedded in a polynomial process ifthe jump measure satisfies a certain integrability condition. In particular condition (6.56)is useful, as it is a statement about a deterministic kernel.

We say a k-polynomial semimartingale can be embedded in a k-polynomial process, ifthere are matrix functions P,G and a A ∈ V+

Ωsuch that (X,P,A,G) is a bounded and

uniformly integrable k-polynomial process.

Theorem 6.57. Let X be a k-polynomial semimartingale and assume E[‖X0‖kRd

]<∞.

If either (6.55) or (6.56) hold, then X can be embedded in a k-polynomial process.

Proof. Let A ∈ V+

Ωbe the function that exists by the definition of k-polynomial semi-

martingales. By Proposition 6.54, we get that Condition 6.43 holds for A and k wherek ≥ 2 by the definition of k-polynomial semimartingales. By Lemma 6.56, we get thatthe running supremum is integrable. Hence, just as in the proofs of Theorem 6.44 andLemma 6.45, we get that the sets Hft are uniformly integrable for all f ∈ Polk and t ≥ 0and that for all f ∈ Polk, Mf is a true martingale. The rest of the claim now followsfrom Theorem 6.41.

Remark 6.58. By [JS13, Proposition II.2.9] we have that for a k-polynomial semi-martingale X with function A ∈ V+

Ωfrom the definition of k-polynomial semimartingales,

that the points of discontinuity of A correspond to the points where X fails to be quasi-leftcontinuous. In particular this implies stochastic discontinuity

Page 136: Polynomial Semimartingales and a Deep Learning Approach to

126 CHAPTER 6. POLYNOMIAL SEMIMARTINGALES

Let us finish this section with an example of a polynomial process X, that is a polynomialsemimartingale, where X is not stochastically continuous.

Example 6.59. Let Z,W be two independent Brownian motions wrt their respectivenatural filtrations on the probability space (Ω,F , P ), both starting in zero. Let X0 be aon [0, 1] uniformly distributed random variable on this probability space independent of Zand W . For each t ≥ 0, we define the σ-algebras

Ft = σ(X0, (Z)s∈[0,t]) if t < 1 and Ft = σ(F1, (W )s∈[1,t)) for t ≥ 1.

Then F = (Ft)t≥0 forms a right-continuous filtration. After completion, we get that(Ω,F ,F, P ) forms a stochastic basis.

Define the process X with initial value X0 via

Xt := X0 + 1t<1Zt + 1t≥1Wt.

Noting that X is adapted and for s < 1 ≤ t, we have Fs is independent of Xt, we getthat X is a polynomial process since Z and W are Brownian motions and hence satisfythe polynomial property in Definition 6.4. Further, we have P (∆X1 6= 0) = 1, hence Xfails to be stochastically continuous at t = 1.

Next, note that 1t<1,1t≥1, Z1t<1 and W1t≥1 are semimartingales on our sto-chastic basis, which implies that X is a semimartingale. Since the indicator functionsare of finite variation and predictable, we compute with the integration by parts formula([He+92, Corollary 9.34]),

1·<1Z = 12·<1Z = (1·<1 • Z1·<1) +

∫(0,·]

Zu1u<1d(1u<1)

= (1·<1 • Z) +

∫(0,·]

Zud(1u<1),

and similarly

1·≥1W = (1·≥1 • W ) +

∫(0,·]

Wud(1u≥1).

Hence we get that X has a semimartingale decomposition X = X0 + M + F for localmartingale part

M = (1·<1 • Z) + (1·≥1 • W )

and finite variation part

F =

∫(0,·]

Zud(1u<1) +

∫(0,·]

Wud(1u≥1) = (W1 − Z1)1·≥1.

In particular, this implies F ∈ Aloc, which by [JS13, Proposition I.4.23 (ii)] yields thatX is a special semimartingale. Let us compute it’s canonical decomposition. We defineF 1 and F 2 by

F = W11·≥1 +(−Z11·≥1

)=: F 1 + F 2.

Page 137: Polynomial Semimartingales and a Deep Learning Approach to

CHAPTER 6. POLYNOMIAL SEMIMARTINGALES 127

Then we get that F 2 is predictable, since Z is predictable. Note that this is not true forF 1 since Ws is independent of Fs for s < 1. A direct computation shows that

E[F 1t | Fs

]=

0 = F 1

s , for s ≤ t < 1

E [Wt | Fs]1t≥1 = E [Wt]1t≥1 = 0 = F 1s , for s < 1 ≤ t

E [Wt | Fs]1t≥1 = Ws1t≥1 = Ws1s≥1 = F 1s , for 1 ≤ s ≤ t.

Hence we have that F 1 is a martingale and since it is piecewise constant, it is purelydiscontinuous. This gives us the canonical decomposition

X = X0 +M c +Md + V,

where M c = M,Md = F 1 and V = F 2. Note that M c = M is continuous. Hence weget using the independence of the two Brownian motions and the fact that the Lebesguemeasure is without atoms,

C := 〈M,M〉 =

∫(0,·]

(1u<1 + 1u≥1)du =

∫(0,·]

(1u<1 + 1u>1)du.

Define the increasing function A(t) := t + 1t≥1. Defining ct := 1u<1 + 1u>1, wethen get using c1 = 0,

C =

∫(0,·]

cuA(du).

In particular we have for c(u, x) := cu that c ∈ PoldA0 (R>0 × Rd) ⊂ PoldA2 (R>0 × Rd).Next, note that Z is continuous. Therefore we have X1− = Z1− = Z1. Define b ∈PoldA1 (R>0 × Rd) as

b(u, x) = −x11(u).

Hence we get

F 2t = −Z11t≥1 =

∫(0,t]

b(u,Xu−)A(du).

We continue withµX(dt, dξ) = ε1,∆X1(dt, dξ).

Since we have ∆Xt = 11(W1 − Z1), we get the decomposition

µX(dt, dξ) =(ε1,W1 − ε1,Z1

)(dt, dξ).

Hence, as Z is predictable and W is independent of F1−, see (6.21), we get that thepredictable compensator ν of µX is given by

ν(dt, dξ) = −ε1,Z1(dt, dξ),

compare [JS13, Theorem II.1.8], and note that the function W in that theorem (notthe Brownian motion in our example) has to be P measurable. Define K(u, x; dξ) :=−11(u)εxdξ. Then, since X1− = Z1, we get

ν(dt, dξ) = K(u,Xu−; dξ)A(du).

Page 138: Polynomial Semimartingales and a Deep Learning Approach to

128 CHAPTER 6. POLYNOMIAL SEMIMARTINGALES

In particular this gives us∫RξkK(u, x; dξ) = −xk11(u) ∈ PoldAk (R>0 × Rd).

Since this implies for j ∈ 0, 1, 2,

a(u, x) = c(u, x) +

∫RξjK(u, x; dξ) ∈ PoldA2 (R>0 × Rd),

we get that X is a polynomial semimartingale with characteristics (B,C, ν) for B = F 2.In particular, since (6.56) holds, Theorem 6.57 yields that X has the polynomial property,an observation we made at the beginning of this example since this can be seen fromelementary computations.

6.5 The discrete time case

In this section we want to briefly discuss the setting for polynomial processes in discretetime. It turns out that the discrete time setting can be seen as a special case of polynomialprocesses. For the rest of this section, We fix a filtered probability space (Ω,F ,F, P ),where the filtration F is in discrete time. Hence we have F = (Fn)n∈N0 and Fn ⊂ Fn+1

for all n ∈ N0. Whenever we say a process X is in discrete time, we mean that it isindexed by N0. Hence X = (Xn)n∈N0 .

Definition 6.60. Let X be a process in discrete time. Define the piecewise constantprocess Y , indexed by R+, via

Yt := Xn, for t ∈ [n, n+ 1).

We call the process Y the natural embedding of X in continuous time. In the sameway, for the filtered probability space (Ω,F ,F, P ) in descrete time, define the filtrationF = (Ft)t∈R+ via

Ft := Fn, for t ∈ [n, n+ 1).

We call the filtered probability space (Ω,F , F, P ) the natural embedding of the discretetime filtered probability space (Ω,F ,F, P ) in continuous time.

By construction, a discrete time process X is adapted wrt a discrete time filtered prob-ability space, if and only if it’s natural embedding in continuous time is adapted wrtthe natural embedding of the filtered probability space. Further, the natural embeddingof the filtration is right-continuous by construction. Hence, if the discrete time filteredprobability space is complete, then the natural embedding forms a stochastic basis.2

To make the following discussions simpler, fix the natural embeddings (Ω,F , F, P ) andY of (Ω,F ,F, P ) and X.

2Recall that in this thesis we defined a stochastic basis to always satisfy the usual conditions, compareDefinition 3.1.

Page 139: Polynomial Semimartingales and a Deep Learning Approach to

CHAPTER 6. POLYNOMIAL SEMIMARTINGALES 129

Definition 6.61. We say that an adapted process X has the k-polynomial property, if Yhas the k-polynomial property wrt (Ω,F , F, P ).

The next proposition shows that in the discrete time case, all embeddings have a verynice structure, allowing for a complete discussion of X by considering the respectiveembeddings.

Proposition 6.62. Let X be a discrete time and adapted process. If X has the k-polynomial property, then there are functions A ∈ V+

Ω, G : R>0 → MNk(R), P : T →

MNk(R) such that (Y, P,A,G) is a bounded and uniformly integrable k-polynomial process.

Proof. We start by constructing the map τ : T×Polk → Polk such that (Y, τ) is a càdlàgk-polynomial process. By the properties of the conditional expectation, it is alwayspossible to set τs,t = idPolk for all s ≥ 0 and t ∈ [s, bsc + 1), since Ybtc = Yt. ByProposition 6.10 there exists a linear mapping τs,bsc+1 on Polk such that Equation (6.4)holds for qfs,bsc+1 = τs,bsc+1f . Hence we have defined τ for indices s, t with n ≤ s ≤ t ≤n + 1 for all n ∈ N0. Further, since τ is piecewise constant, we have that the evolutionproperty is satisfied. We follow Remark 6.9 and extend τ to all of T. For that, lett > bsc+ 1. Then there is a unique l ∈ N such that bsc+ l < t ≤ bsc+ l + 1. Define

τs,t := τs,bsc+1 τbsc+1,bsc+2 · · · τbsc+l,t.

Hence, by definition τ is piecewise constant and càdlàg . Further, again by definition,τ satisfies the evolution property. We get that τ is an evolution system for X on Polk.Define the representing MES P = ιk(τ). Then (X,P ) is a càdlàg k-polynomial process.Next, define A ∈ V+

Ωvia

A(t) := btc.

Since P is piecewise constant, it is of finite variation and defines a family of measures γsuch that γ

∞ dA. And we get that (P, γ,A) is a proper MES in dimension Nk. Further,

note that the sequence tn in Proposition 4.31 is given by tn = n. As dA(R+ \ N0) = 0,we get that there exists an extended generator G of the proper matrix evolution system.Further, we can always set G to be zero on R+ \ N0. Hence, G is only non zero in N0

or equivalently ∆At 6= 0. As there are only finitely many such values on all compactintervals and G is given by Equation (4.22), we get that ‖G‖dA,t <∞ for all t ≥ 0. Next,let f ∈ Polk be arbitrary. Note that by definition of the embedding Y , we have for allt ≥ 0,

Hft = f(Xn) : 0 ≤ n ≤ btc .

Since each element of Hft is integrable by the polynomial property and we are dealingwith a finite set, we get that the supremum over this set is integrable, yielding uniformintegrability of Hft . Overall, this yields that (Y, P,A,G) is a bounded and uniformlyintegrable k-polynomial process, finishing the proof.

Page 140: Polynomial Semimartingales and a Deep Learning Approach to

130 CHAPTER 6. POLYNOMIAL SEMIMARTINGALES

Page 141: Polynomial Semimartingales and a Deep Learning Approach to

Appendix A

Existence of càdlàg modifications

A.1 Modifications and indistinguishability

We shortly present results on càdlàg modifications of stochastic processes as we will beneeding them in the following. We fix a stochastic basis (Ω,F ,F, P ), in particular thismeans by our definition from the preliminaries, that property (iii) in Definition 3.1 holds.

Definition A.1. Two stochastic processes X and Y are called versions of each other(or modification), if for all t ∈ I we have

Xt = Yt [P ]. (A.1)

We say they are indistinguishable or equal up to evanescence if they are equal, simul-taneously for all t outside an evanescent set, i.e

X 6= Y = (ω, t) : Xt(ω) 6= Yt(ω)

is an evanescent set. This means they have almost surely the same sample paths.

Remark A.2. Note that if X and Y are versions of each other, this does not imply thatthey are indistinguishable. However, if X and Y are càdlàg processes, then it is a wellknown result that they are indistinguishable as the càdlàg version of a process is uniqueup to evanescence.

Theorem A.3. Let X and Y be two stochastic processes such that X is a modificationof Y . If X and Y are right continuous, then X and Y are indistinguishable.

Proof. See Protter [Pro05, Theorem I.2]. We will give a proof of the case where theprocess is a two parameter random field.

131

Page 142: Polynomial Semimartingales and a Deep Learning Approach to

132 APPENDIX A. EXISTENCE OF CÀDLÀG MODIFICATIONS

Remark A.4. The crucial observation made above is that if one poses sufficient regu-larity on the paths of a process, such that the process is fully determined by a countablesubset of R+, then one can construct a set of measure zero outside which two modificationsagree. This concept plays an important role in different areas in the theory of stochasticprocesses, e.g. in the study of Gaussian processes or more generally Gaussian randomfields where it is known as separability, compare e.g. [Adl85; Bil74]. For us, it is suffi-cient to assume càdlàg regularity which makes statements regarding indistinguishabilityeasy to prove.

We will later need the following definition regarding the càdlàg property of a two pa-rameter random field. We define this property in the spirit of how we did for matrixevolution systems. Recall that we define the set T by

T =

(s, t) ∈ R2 : 0 ≤ s ≤ t.

Definition A.5. Consider a two parameter random field X : T×Ω⇒ Rd on a probabilityspace (Ω,F , P ). We call X càdlàg (right continuous), if there exists a measurable setN ⊂ Ω with P (N) = 1, s.t. for all ω ∈ N ,

(i) and all fixed s ≥ 0, the functions

t 7→ X(s,t)(ω),

are càdlàg (right continuous) in t on [s,∞),

(ii) and for all t > 0,s 7→ X(s,t)(ω),

are càdlàg (right continuous) in s on [0, t].

For completeness, we shall define the concept of modification and indistinguishability fortwo parameter processes. We shall use the notation (s, t) = t ∈ T, then Xt = X(s,t).

Definition A.6. Given two random fields X and Y indexed by T, we say X and Y areversions (or modifications) of each other, if for any t ∈ T,

Xt = Yt [P ].

If,P (ω ∈ Ω : Xt(ω) = Yt(ω) ∀t ∈ T) = 1,

we say they are indistinguishable.

Page 143: Polynomial Semimartingales and a Deep Learning Approach to

APPENDIX A. EXISTENCE OF CÀDLÀG MODIFICATIONS 133

The next proposition is a two-parameter random field version of Theorem A.3. The proofis essentially the same as for the one-parameter case but we will give it for the readersconvenience.

Proposition A.7. Given a complete probability space (Ω,F , P ) and two T indexed ran-dom fields X and Y , assume they are both right continuous. If X is a version of Y , thenthey are indistinguishable.

Proof. Note that by the completeness assumption, all sets of probability zero are measur-able. We follow the idea in Protter [Pro05, Theorem I.2] to the letter. Let M1 be the setof probability zero that X is right-continuous for all ω ∈M c

1 and M2 the correspondingone for Y . Then, M := M1 ∪M2 is again a set of probability zero and for each ω ∈M c,X(ω) and Y (ω) are right continuous. Define TQ = T ∩Q. Then, because X is a versionof Y , we have for any q ∈ TQ, that there exists a set Nq ⊂ Ω with P (Nq) = 0 such that

Xq(ω) = Yq(ω) ∀ω ∈ N cq .

Define N =⋃q∈TQ Nq. Then P (N) = 0. Finally, define now K = N ∪M . Once again,

P (K) = 0. Further, for ω ∈ Kc, it holds that X(ω) and Y (ω) are right continuous andfor all q ∈ TQ,

Xq(ω) = Yq(ω).

Since P (K) = 1, if we can show Xt(ω) = Yt(ω) for all ω ∈ Kc and t ∈ T, we are done.Let now (s, t) = t ∈ T. Then there are decreasing sequences (sn) ↓ s and (tm) ↓ t with(sn), (tm) ⊂ TQ. If s is rational, we can simply choose sn = s and the same goes for t.If s is rational, but t is not, we get by right continuity Xt(ω) = Yt(ω) by just taking thelimit in (sn). The same is true if t is rational but s is not. Assume now that both arenot rational. Then for any m ∈ N, there exists a lm such that slm+n ≤ tm for all n ≥ 0.Hence we can compute the limit for any m ∈ N, ω ∈ K,

Xs,tm(ω) = limn→∞

X(sn,tm)(ω) = limn→∞

Y(sn,tm)(ω) = Y(s,tm)(ω).

This in return now implies

Xs,t(ω) = limm→∞

X(s,tm)(ω) = limm→∞

Y(s,tm)(ω) = Y(s,t)(ω).

Hence we have shown for all ω ∈ K, that

Xt(ω) = Yt(ω) ∀t ∈ T.

Since P (K) = 1, X is indistinguishable from Y as claimed.

Page 144: Polynomial Semimartingales and a Deep Learning Approach to

134 APPENDIX A. EXISTENCE OF CÀDLÀG MODIFICATIONS

A.2 Existence of càdlàg modifications

We will now deal with the situation where a process is defined via conditional expectationsfor each time point. The problem is then that the conditional expectation is only definedup to a null set. As the index set is in general uncountable, we face the issue that fora given measure P , the union of uncountable P -null sets is not necessarily a P -null setagain. A similar problem arises in filtering theory where one considers a process X thatis not adapted to the filtration F. One then considers a process Y that is defined by

Yt = E [Xt|Ft] .

The problem is now that this definition does not specify the whole process outside aP -null set. In particular, for a stopping time τ , the random variable Yτ is not welldefined almost surely. The same holds true for functionals of the whole process suchas the quadratic variation or integrals alongside a path. In filtering theory this issueis addresses by considering the optional projection of X, which guarantees uniquenessof the chosen version. The interested reader might want to consider Bain and Crisan[BC08].

Our setting is different however, e.g. we are considering a process that is adapted. Tobe concrete, given a stochastic basis (Ω,F ,F, P ), and an adapted process X, we definefor each s ≥ 0 the process Y s : [s,∞)× Ω→ Rd pointwise via

Y st = E [Xt | Fs] .

As mentioned above, since there are uncountable many t ≥ s, this process is not welldefined. If however, one can show that there exists a càdlàg modification, then we haveseen that the above definition restricted to the càdlàg modification is well defined up toindistinguishability.

One way to treat the general setting would be by means of Kolmogorow’s continuitycriterion (actually the càdlàg version which is a straight forward modification). In ourcase however, we can not assume Hölder regularity type bounds on the expectation mak-ing this approach unavailable to us. Fortunately, it will be sufficient for our purposes toassume significantly more on the process X, namely semimartingality with a semimartin-gale decomposition where the local martingale is actually a true martingale, allowing usto show existence of a càdlàg modification with elementary tools for each s ≥ 0 fixed.In the context of polynomial processes, this will actually then imply the existence of aunique (up to indistinguishability) modification of the random field

T 3 (s, t) 7→ Y st = E [Xt | Fs] ,

motivating the discussions in the previous section.

The goal of this section is to prove the following theorem.

Page 145: Polynomial Semimartingales and a Deep Learning Approach to

APPENDIX A. EXISTENCE OF CÀDLÀG MODIFICATIONS 135

Theorem A.8. Given a stochastic basis (Ω,F ,F, P ), consider a semimartingale X onthis basis with decomposition X = X0 +M +A for a local martingale M and and processof finite variation A ∈ V with E [Var (A)t] < ∞ for all t ≥ 0. Assume M is a truemartingale. Then, for all fixed s ≥ 0, the process

t 7→ E [Xt | Fs] , t ≥ s,

has a càdlàg modification that is of finite variation.

Remark A.9. Let us note that it is possible to proof a more general result of the theoremabove where the càdlàg process only needs to be uniformly integrable. This can be doneby means of disintegrating the conditional expectation and considering the conditionaldistribution on appropriate path spaces, see Kallenberg [Kal06, Theorem 6.3]. Since forour purposes, the assumptions made above are sufficient, we opt to take the more directroute and prove the above theorem.

Let us recall the following fundamental property of the conditional expectation. For theproof, we refer e.g. to Kallenberg [Kal06, Theorem 6.1].

Theorem A.10. Given a probability space (Ω,F , P ), let A ⊂ F be a sub-σ-algebra.Then, for all random variables ξ ∈ L1(P ) with ξ ≥ 0 a.s. we have

E [ξ | A] ≥ 0 [P ].

We will now prove an intermediary lemma to Theorem A.8.

Lemma A.11. Given a probability space (Ω,F , P ), let A be a càdlàg and nondecreasingR+-valued stochastic process with A0 = 0 and E [At] < ∞ for all t ≥ 0. Then, for anysub-σ-algebra A ⊂ F , there exists a càdlàg modification of the process

t 7→ E [At | A] , (A.2)

that is nondecreasing with probability one.

Proof. Clearly, it is sufficient to show existence for arbitrary finite time horizon T > 0.Indeed, Since for T ′ > T , a càdlàg modification restricted to [0, T ] is indistinguishablefrom the càdlàg modification with time horizon T . Hence, one can “glue” these modifi-cations to a unique modification on the whole real half-line R+. Fix T > 0. Let Y beany version of (A.2). We will show in a first step that there exists a set ∆ ∈ A withP (∆c) = 0 such that the process Yt(ω)1∆(ω) is a modification on a countable densesubset of [0, T ] such that Y 1∆ is nondecreasing on that dense subset. Note that since∆c is a zero-set, we get that Y 1∆ is indistinguishable from Y , hence still a modificationof (A.2) for any t ≥ 0.

Page 146: Polynomial Semimartingales and a Deep Learning Approach to

136 APPENDIX A. EXISTENCE OF CÀDLÀG MODIFICATIONS

For n ∈ N, denote by Dn the dyadic subset of [0, T ] defined by

Dn =

0,T

2n,2T

2n, . . . ,

(2n − 1)T

2n, T

.

Recall that |Dn| = 2n + 1 and Dn ⊂ Dn+1. Further, the set D :=⋃n∈NDn is dense in

[0, T ]. Further, for any p, q ∈ D, there exists a finite n such that p, q ∈ Dm for all m ≥ n.

Note that for any t ≥ 0, we have Yt ∈ A. Since by convention, we equip R+ with thecorresponding Borel-σ-algebra, we get for any s, t,

ω ∈ Ω : Yt(ω)− Ys(ω) ≥ 0 ∈ A.

By linearity of the conditional expectation, we have that Yt − Ys is a version of

E [At −As | A] .

Further, since At −As a.s. by assumption, we get that

P (ω ∈ Ω : Yt(ω)− Ys(ω) ≥ 0) = 1.

Consider now for each n ∈ N the set

D≤n =

(m,n) ∈ D2n : m ≤ n

.

Note that we have |D≤n | <∞ for all n ∈ N. For each (m, l) ∈ D≤n , define the sets

∆m,l := ω ∈ Ω : Yl(ω)− Ym(ω) ≥ 0 .

As noted before we have ∆m,l ∈ A and P (∆m,l) = 1 or equivalently P (∆cm,l) = 0. Next,

define

∆n :=⋂

(m,l)∈D≤n

∆m,l =

⋃(m,l)∈D≤n

∆cm,l

c

.

Hence, ∆n ∈ A for all n ∈ N and P (∆cn) = 0. Finally, define ∆ :=

⋂n≥1 ∆n. Again, we

have ∆ ∈ A and P (∆c) = 0. By construction, we have for all ω ∈ ∆,

Yp(ω)− Yq(ω) ≥ 0 ∀p, q ∈ D, q ≤ p.

Hence, ∆ is the desired zero set on which Y 1∆ is nondecreasing on D. We will now definethe desired càdlàg and nondecreasing modification of (A.2), denoted by Y on [0, T ]. Forq ∈ D, define Yq(ω) := Yq(ω)1∆(ω). Let now r ∈ [0, T ] \ D. Note that r < T . Hence,there is a sequence (rn) ⊂ D with rn ↓ r and we define

Yr(ω) = limn→∞

Yrn(ω). (A.3)

Since Y is nondecreasing on D, the above limit exists as a monotone sequence boundedbelow by Y0(ω). Further, the limit is independent by the choice of the sequence by mono-tonicity. We have that Yr(ω) is right-continuous for all ω ∈ Ω by construction (outside ∆

Page 147: Polynomial Semimartingales and a Deep Learning Approach to

APPENDIX A. EXISTENCE OF CÀDLÀG MODIFICATIONS 137

it is constant zero). Further, left limits exists for all ω ∈ Ω again by monotonicity withupper bound YT (ω), hence Y is càdlàg . Note that Y is nondecreasing for all ω ∈ Ω.Further, since Yt ∈ A for all t ≥ 0 and 1∆ ∈ A, we have Yt ∈ A for all t ≥ 0. Byconstruction, Y is a version of (A.2) for each t ∈ D and we are left to show that forr ∈ [0, T ] \ D, Yr is also a version of (A.2).

Recall that by assumption A, is nondecreasing and càdlàg with probability one andE [AT ] <∞. For the sequence (rn) above, we therefore get by the dominated convergencetheorem for the conditional expectation that

E [Ar | A] = E[

limn→∞

Arn | A]

= limn→∞

E [Arn | A] [P ].

Combining this with (A.3), yields Yr = E [Ar | A] a.s. which finishes the proof.

We can now give the proof of Theorem A.8.

Proof of Theorem A.8. Let s ≥ 0 be fixed. Note that by assumption, we have M is atrue martingale on our stochastic basis. Hence we can compute for any t ≥ s,

E [Xt | Fs] = X0 + E [Mt | Fs] + E [At | Fs] = X0 +Ms + E [At | Fs] .

Hence, we only need to show that the conditional expectation of the finite variationprocess A has a càdlàg modification. We have A = Ai−Ad for increasing càdlàg processesAi, Ad, both starting in zero. Further, E

[Ait]

+ E[Adt]≤ E [Var (A)t] <∞ for all t ≥ 0.

Denote by Y i the càdlàg and increasing modification of t 7→ E[Ait | Fs

]and in the same

way by Y d the one for Ad. These modifications exist by Lemma A.11. By linearity,Y i − Y d is now a càdlàg modification of E [A· | Fs] that is of finite variation and we aredone.

It is clear, that in Theorem A.8, the existence of such a modification exists on all of R+,not just [s,∞), if one drops the finite variation property of the modification. Hence thefollowing corollary.

Corollary A.12. Consider the situation in Theorem A.8. Then there exists a càdlàgmodification of

t 7→ E [Xt | Fs] , ∀t ≥ 0.

Proof. For s = 0, this is just the theorem itself. Hence consider s > 0. For t ∈ [0, s),since X is adapted and càdlàg , the version is given by X itself. For t ∈ [s,∞), takethe modification from Theorem A.8. The combined process is again càdlàg , remains aversion of the conditional expectation for each t ≥ 0 and we are done.

Page 148: Polynomial Semimartingales and a Deep Learning Approach to

138 APPENDIX A. EXISTENCE OF CÀDLÀG MODIFICATIONS

Page 149: Polynomial Semimartingales and a Deep Learning Approach to

Appendix B

Additional notes

B.1 Proof of a naive uniform integrability result

We now give the previously mentioned naive uniform integrability result. Since we pro-vided a strictly stronger result, this is just for illustration. We also want to note thatthe following proof can be significantly shortened in Equation (B.1) by applying Jensen’sinequality for sums. Since however this is just for illustration, we opt to give the mostelementary proof we have.

Proposition B.1. Let k ≥ 2 be an even number, β a basis of Polk and A ∈ V+

Ω. Let X

be a càdlàg adapted process such that Polk ⊂ DX and E[‖X0‖kRd

]< ∞. Assume there

exists a measurable function

G : R>0 → MN (R) with ‖G‖dA,t <∞ ∀t ≥ 0,

such that the map G : Polk → Polk(R>0 × Rd)), defined by

(Gf)(u)(x) = Gf(u, x) = (G(u)f)>Hβ(x), ∀x ∈ Rd,

is a version of GX,A on Polk. Then for any f ∈ Polk−1 and T ≥ 0, the set HfT isuniformly integrable.

Proof. The idea is to apply Theorem 3.4. Let ε > 0 such that (1 + ε)(k− 1) = k. Definethe function ϕ(x) = x1+ε. Note that ϕ is convex on R+ and

limt→+∞

ϕ(t)

t= +∞.

By Lemma 6.37, we have (6.31). By Lemma 6.2, this is equivalent to

sup0≤t≤T

E[‖Xt‖k1

].

139

Page 150: Polynomial Semimartingales and a Deep Learning Approach to

140 APPENDIX B. ADDITIONAL NOTES

Let f ∈ Polk−1 be arbitrary. Then there exists a constant C > 0 such that

|f(x)| ≤ C(1 + ‖x‖k−11 )⇔ | f(x)| − C ≤ C ‖x‖k−1

1 . (B.1)

This in return implies

| |f(x)| − C|1|f(x)|≥C ≤ C ‖x‖k−11 .

Using this we getϕ( | |f(x)| − C|1|f(x)|≥C ) ≤ C1+ε ‖x‖k1 .

Therefore we can conclude

sup0≤t≤T

E[ϕ( | |f(Xt)| − C|1|f(Xt)|≥C )

]≤ C1+ε sup

0≤t≤TE[‖Xt‖k1

]<∞.

By Theorem 3.4, we have that the set| |f(Xt)| − C|1|f(Xt)|≥C : 0 ≤ t ≤ T

is uniformly integrable. From the definition of uniform integrability, this means that forany ε > 0, there is a L > 0 such that for all t ∈ [0, T ] simultaneously we have

ε > E[| |f(Xt)| − C|1|f(Xt)|≥C1| |f(Xt)|−C|1|f(Xt)|≥C≥L

]= E

[| |f(Xt)| − C|1|f(Xt)|−C≥01| |f(Xt)|−C|1|f(Xt)|−C≥0≥L

]≥ E

[| |f(Xt)| − C|1| |f(Xt)|−C|1|f(Xt)|−C≥0≥L

] (B.2)

where we use in the last step

1|f(Xt)|−C≥0 = 0⇒ 1| |f(Xt)|−C|1|f(Xt)|−C≥0≥L = 0

Using

| |f(Xt)| − C | ≥ L+ 2C ⇒ |f(Xt)| − C ≥ 0 and | |f(Xt)| − C | ≥ L,

we get

ε > E[| |f(Xt)| − C|1| |f(Xt)|−C|1|f(Xt)|−C≥0≥L

]≥ E

[| |f(Xt)| − C|1| |f(Xt)|−C| ≥L+2C

].

(B.3)

Since this holds for all t simultaneously, we have shown that the set

| |f(Xt)| − C| : 0 ≤ t ≤ T

is uniformly integrable. This once again, gives us that

f(Xt) : 0 ≤ t ≤ T = HfTis uniformly integrable. Since f ∈ Polk−1 was arbitrary and this holds for all T ≥ 0, thisfinishes the proof.

Page 151: Polynomial Semimartingales and a Deep Learning Approach to

Part II

A Deep Learning Approach to LocalStochastic Volatility Calibration

141

Page 152: Polynomial Semimartingales and a Deep Learning Approach to
Page 153: Polynomial Semimartingales and a Deep Learning Approach to

Chapter 7

Preliminaries and deep learning

This part of the thesis is based on joint work with Christa Cuchiero1 and Josef Teich-mann2, where we only give a partial presentation of the work together with a first set ofnumerical examples that will serve as a proof of concept.

We start by giving a short exposition to the problem we address with this work andcontinue with a brief summary of the most basic concepts from the area of machinelearning pertinent to our discussions. We continue in Section 7.3, with introducing avariance reduction technique that will make our approach actually feasible. Followingthat, we present in Chapter 8 the central problem formulation we are addressing, namelythe calibration of local stochastic volatility models to market data. In particular, we willspecify numerical experiments regarding the variance reduction technique we proposeand a toy-model example of calibration. In the interest of readability, we move all plotsregarding these experiments to Chapter 9.

For the rest of this thesis, we will assume a zero risk interest rate if r = 0 to keep notationsimple. An extension to the general case however is straight forward.

7.1 Introduction

A central task that has to be performed each day in banks and financial institutions isto calibrate stochastic models to market and historical data. So far the model choicewas not only driven by the capacity of capturing empirically observed market featureswell, but rather by computational tractability. This is now undergoing a big change sinceneural network approaches offer a completely new perspective on model calibration,

1Vienna University of Economics and Business, Welthandelsplatz 1, 1020 Vienna2ETH Zurich, Rämistrasse 101, 8092 Zurich

143

Page 154: Polynomial Semimartingales and a Deep Learning Approach to

144 CHAPTER 7. PRELIMINARIES AND DEEP LEARNING

see e.g. Hernandez [Her17], Bayer and Stemper [BS18], Horvath et al. [Hor+19] and[Cuc+18a].

This part of the thesis focuses on the calibration of local stochastic volatility (LSV)models, which is still an intricate task, in particular in multivariate situations. LSVmodels, going back to [Lip02; Ren+07], combine classical stochastic volatility with localvolatility to achieve both, a good fit to time series data and in principle a perfect cali-bration to the implied volatility smiles and skews. In these models, the discounted priceprocess (St)t≥0 of an asset satisfies

dSt = StL(t, St)αtdWt, (7.1)

where (αt)t≥0 is some stochastic process taking values in R, L(t, s) the so-called leveragefunction depending on time and the current value of the asset andW a one-dimensionalBrownian motion.

For notational simplicity we consider here the one-dimensional case, but the setup easilytranslates to a multivariate situation with several assets and general multivariate varianceprocess.

The function L is the crucial part in this model. It allows in principle to perfectlycalibrate the implied volatility surface seen on the market. In order to achieve this goal,it has to satisfy

L2(t, s) =σ2

Dup(t, x)

E[α2t |St = s]

, (7.2)

where σDup denotes Dupire’s local volatility function (see [Dup96]). Note that (7.2) is animplicit equation for L as it is needed for the computation of E[α2

t |St = s]. This in turnmeans that the SDE for the price process (St)t≥0 is actually a McKean-Vlasov SDE, sincethe law of St enters in the equation. Existence and uniqueness results for this equationare not at all obvious since the coefficients do not satisfy any kind of standard conditionslike for instance Lipschitz continuity in the Wasserstein space. Existence of a short-timesolution of the associated non-linear Fokker-Planck equation for the density of (St)t≥0 wasshown in Abergel and Tachet [AT10] under certain regularity assumptions on the initialdistribution. As stated in Guyon and Henry-Labordere [GHL11] (see also Jourdain andZhou [JZ16], where existence of a simplified version of an LSV model is proved) a verychallenging and still open problem is to derive the set of stochastic volatility parametersfor which LSV models exist uniquely for a given market smile.

Despite these existence issues, LSV models have attracted – due to their appealing featureof a potentially perfect smile calibration and their econometric properties – a lot ofattention from the calibration and implementation point of view. We refer to Guyonand Henry-Labordere [GHL11], Guyon and Henry-Labordère [GHL13], and Cozma et al.[Coz+17]) for Monte Carlo methods, to Ren et al. [Ren+07] and Tian et al. [Tia+15]for PDE methods based on non-linear Fokker-Planck equations and to Saporito et al.[Sap+17] for inverse problem techniques for PDEs.

Page 155: Polynomial Semimartingales and a Deep Learning Approach to

CHAPTER 7. PRELIMINARIES AND DEEP LEARNING 145

7.2 Preliminaries on deep learning

We will now briefly introduce two core concepts in deep learning, namely artificialneural networks and stochastic gradient descent, where the latter is a widely usedoptimization method for optimization problems involving the first. In standard machinelearning terminology, the optimization problem is usually referred to as “training” andin the sequel we will use both terminologies interchangeably. In the context of standardmachine learning terminology, let us also mention that in all of our following discussions,we only consider neural networks of feed forward type (we define this below) and we donot make any structural assumptions regarding the topology of the network, hence weconsider densely connected layers instead of convolutional type ones.

7.2.1 Artificial neural networks

We start with the definition of feed forward neural networks. These are functions ofparticular structure, indexed by parameters such that this class of functions can serveas an approximation class for an unknown function under certain assumptions, compareTheorem 7.2 below. In particular, it will be clear from the definition below that deriva-tives of these functions can be efficiently expressed iteratively with a small set of functions(see e.g. Hecht-Nielsen [HN92]), which is a desirable feature from an optimization pointof view.

Definition 7.1. Let L,N0, N1, . . . NL ∈ N, σ : R → R and for any ` ∈ 1, . . . , L, letw` : RN`−1 → RN` , x 7→ A`x+ b` be an affine function with A` ∈ RN`×N`−1 and b` ∈ RN`and additionally bL = 0. A function RN0 → RNL defined as

F (x) = wL FL−1 · · · F1, with F` = σ w` for ` ∈ 1, . . . , L− 1

is called a feed forward neural network. Here the activation function σ is appliedcomponentwise. L denotes the number of layers and N1, . . . , NL−1 denote the dimensionsof the hidden layers and N0 and NL the dimension of the input and output layers.

The following version of the so-called universal approximation theorem is due toK. Hornik [Hor91]. For its formulation we denote the set of all feed forward neuralnetworks with activation function σ, input dimension N0 and output dimension NL byNN σ

∞,N0,NL.

Theorem 7.2 (Hornik (1991)). Suppose σ is bounded and non-constant. Then the fol-lowing statements hold:

(i) For any finite measure µ on (RN0 ,B(RN0)) and 1 ≤ p < ∞, the set NN σ∞,N0,1

isdense in Lp(RN0 ,B(RN0), µ).

Page 156: Polynomial Semimartingales and a Deep Learning Approach to

146 CHAPTER 7. PRELIMINARIES AND DEEP LEARNING

(ii) If in addition σ ∈ C(R,R), then NN σ∞,N0,1

is dense in C(RN0 ,R) for the topologyof uniform convergence on compact sets.

Since each component of an RNL valued neural network is an R-valued neural network,this result easily generalizes to NN σ

∞,N0,NLwith NL > 1.

We denote by NNN0,NL the set of all neural networks in NN σ∞,N0,NL

with a fixedarchitecture, i.e. a fixed number of layers L, fixed input and output dimensions N` foreach hidden layer ` ∈ 1, . . . , L − 1 and a fixed activation function σ. This set can bedescribed by

NNN0,NL = F (·|θ) |F feed forward neural network and θ ∈ Θ,

with parameter space Θ ∈ Rq for some q ∈ N and θ ∈ Θ corresponding to the entries ofthe matrices A` and the vector b` for ` ∈ 1, . . . , L.

7.2.2 Stochastic gradient descent

In light of Theorem 7.2, it is clear that neural networks can serve as function approxima-tors and the goal is to find the “correct” parameters. Usually, the situation is such thatthe unknown function is expressed as an expectation. Probably the most prolific trainingmethod for such a setup is stochastic gradient descent, and we will shortly review themost basis facts about this optimization/training method.

The structural properties of neural networks allow to solve minimization problems of thetype

minθ∈Θ

f(θ) with f(θ) = E[Q(θ)] (7.3)

for some stochastic objective function3 Q : Ω × Θ → R, (ω, θ) 7→ Q(θ)(ω) that dependson parameters θ in some space Θ very efficiently via stochastic gradient descent andback propagation.

The classical method how to solve generic optimization problems for some differentiableobjective function f (not necessarily of the expected value form as in (7.3)) is to apply agradient descent algorithm: starting with an initial guess θ(0), one iteratively defines

θ(k+1) = θ(k) − ηk∇f (k)(θ(k)) (7.4)

for some learning rate ηk and f (k) = f . Under suitable assumptions, θ(k) converges fork →∞ to a local minimum of the function f .

3We shall often omit the dependence on ω.

Page 157: Polynomial Semimartingales and a Deep Learning Approach to

CHAPTER 7. PRELIMINARIES AND DEEP LEARNING 147

One of the key insights of deep learning is that stochastic gradient descent methods,going back to stochastic approximation algorithms proposed by Robbins and Monro[RM51], are much more efficient. To apply this, it is crucial that the objective functionf is linear in the sampling probabilities. In other words, f needs to be of the expectedvalue form as in (7.3). In the simplest form of stochastic gradient descent, under theassumption that

∇f(θ) = E[∇Q(θ)],

the true gradient of f is approximated by a gradient at a single sample Q(θ)(ω) whichreduces the computational cost considerably. In the updating step for the parameters θas in (7.5), ∇f is then replaced by ∇Q(θ)(ω), hence

θ(k+1) = θ(k) − ηk∇Q(k)(θ(k))(ω), (7.5)

with Q(k) = Q. The algorithm passes through all samples ω of the so-called trainingdata set and performs the update for each element, several times until an approximateminimum is reached.

A compromise between computing the true gradient of f and the gradient at a singleQ(θ)(ω) is to compute the gradient of a subsample of size Nbatch, called (mini)-batch, sothat Q(k) used in the update (7.5) is now given by

Q(k)(θ) =1

Nbatch

Nbatch∑n=1

Q(θ)(ωn+kNbatch), k ∈ 0, 1, ..., bN/Nbatch, c − 1. (7.6)

where N is the size of the whole training data set. Any other unbiased estimators of∇f(θ) can of course also be applied in (7.5).

In typical applications of machine learning, the data set available is limited, i.e. N <∞.In our situation, we apply machine learning in a simulated environment, allowing usto generate data at will. This corresponds to N = ∞, meaning that we can generatecompletely new data in each step k.

7.3 Variance reduction for pricing and calibration

This section is dedicated to introduce a generic variance reduction technique for MonteCarlo pricing and calibration by using hedges as control variates. This method willbe crucial in our LSV calibration presented in Chapter 8, since without such a variancereduction method applied, our numerical experiments showed that a sufficient calibrationresult is not possible in a reasonable amount of time.

Consider a financial market in discounted terms with r traded instruments (Zt)t≥0 follow-ing an Rr-valued stochastic process on some filtered probability space (Ω, (Ft)t≥0,F ,Q).

Page 158: Polynomial Semimartingales and a Deep Learning Approach to

148 CHAPTER 7. PRELIMINARIES AND DEEP LEARNING

Here, Q is a risk neutral measure and (Ft)t≥0 is supposed to be right continuous. Inparticular, (Zt)t≥0 is a σ-martingale with càdlàg paths.

Let C be an FT -measurable random variable describing the payoff of some Europeanoption at maturity T > 0. Then the usual Monte Carlo estimator for the price of thisoption is given by

π =1

N

N∑n=1

C(ωn) (7.7)

for n ∈ N. This estimator can easily be modified by adding a stochastic integral withrespect to Z. Indeed, consider a locally bounded predictable strategy (hs)s∈[0,t] and someconstant c. Define the following estimator

π =1

N

N∑n=1

(C(ωn)− c(h • Z)T (ωn)). (7.8)

Then, for any (ht)t∈[0,T ] and c, this estimator is still an unbiased estimator for the priceof the option with payoff C since the expected value of the stochastic integral vanishes,at least when Z is a true martingale. Denoting by

I =1

N

N∑n=1

(h • Z)T (ωn),

the variance of π is given by

Var(π) = Var(π) + c2Var(I)− 2cCov(π, I).

This becomes minimal by choosing

c =Cov(π, I)

Var (I).

With this choice, we have

Var(π) = (1− Corr2(π, I))Var(π).

In particular, in the case of a perfect (pathwise) hedge given by (ht)t∈[0,T ] this ratio isequal to 1 and the estimator π has 0 variance since in this case

Var(π) = Var(I) = Cov(π, I).

Therefore it is crucial to find a good approximate hedge such that Corr2(π, I) becomeslarge. This is subject of Section 7.3.1 and 7.3.2 below.

Page 159: Polynomial Semimartingales and a Deep Learning Approach to

CHAPTER 7. PRELIMINARIES AND DEEP LEARNING 149

7.3.1 Black Scholes hedge

In many cases of local stochastic volatility models as of form (7.1) and options dependingonly on the terminal value of the price process, a Delta hedge of the Black-Scholes modelworks well. Indeed, let C = g(ST ) and let πgBS(t, x, σ) be the price at time t of thisclaim in the Black-Scholes model. Here, x stands for the price variable and σ for theBlack-Scholes volatility. Then choosing as hedging instrument only the price S itself andas approximate hedging strategy

ht = ∂xπgBS(t, St, L(t, St)αt) (7.9)

usually already yields a high variance reduction. In fact it is even sufficient to considerαt alone to achieve satisfying results, i.e. one has

ht = ∂xπgBS(t, St, αt). (7.10)

7.3.2 Neural networks hedge

Alternatively, in particular when the space of hedging instruments becomes higher di-mensional, one can learn the hedging strategy by parameterizing it via neural networks.Indeed, let the payoff be again a function on the terminal values of the hedging instru-ments, i.e., C = g(ZT ). Then in Markovian models it makes sense to specify the hedgingstrategy via a function h : R+ × Rr → Rr

ht = h(t, z),

which in turn will correspond to a neural network (t, z) 7→ h(t, z|δ) ∈ NN r+1,r withweights denoted by δ in some parameter space ∆. Following the approach in Buehleret al. [Bue+18], an optimal hedge for the claim C can be computed through

infδ∈∆

ρ (−C + (h(·, Z·|δ) • Z·)T )

for some risk measure ρ. If the risk measure is chosen of the form

ρ(C) = infw∈Rw + E [`(C − w)]

for some continuous, non-decreasing and convex loss function ` : R → R, we can applystochastic gradient descent, because we fall in the realm of problem (7.3). Indeed, theobjective function f(δ, w) that should be minimized with respect to δ and w is given by

f(δ, w) = E [w + `(−C + (h(·, Z·|δ) • Z·)T − w)] .

The optimal hedge h(·|δ∗) for an optimizer δ∗ can then be plugged into (7.8).

Page 160: Polynomial Semimartingales and a Deep Learning Approach to

150 CHAPTER 7. PRELIMINARIES AND DEEP LEARNING

Our situation is different from [Bue+18], as we do not want to hedge against a riskexpressed by a risk measure, but directly aim to reduce variance. Hence the correspondingoptimization problem for the hedge simplifies to

infδ∈∆

E[(C − πmkt − (h(·, Z. | δ) • Z)T )2

],

where πmkt = E [C] is the market price of claim C.

In the context of calibration, we will face the situation where the terminal value C =g(ZT ) is a function of ω ∈ Ω and θ ∈ Θ, where θ are those parameters that specifythe model. Hence, in general we deal with two different sets of parameter, where δ onlyserves auxiliary purposes for the calibration problem.

Page 161: Polynomial Semimartingales and a Deep Learning Approach to

Chapter 8

Calibration of LSV models

We will now make a precise formalization of the considered calibration problem and themethod we apply.

Consider a LSV model as of (7.1) defined on (Ω, (Ft)t≥0,F ,Q), some filtered probabilityspace, where Q is a risk neutral measure. We assume the stochastic process α to be fixed.This can for instance be achieved by first calibrating the pure stochastic volatility modelwith L ≡ 1 (e.g. SABR) and by fixing the corresponding parameters.

Our main goal is to determine the leverage function L in perfect accordance with marketdata. Due to the universal approximation properties outlined in Section 7.2 (Theorem7.2), we choose to parametrize L via neural networks. More precisely, let 0 = T0 <T1 · · · < Tn = T denote the maturities of the available European call options to whichwe aim to calibrate the LSV model. We then specify the leverage function L(t, s) via afamily of neural networks, i.e.,

L(t, s) = 1 + Fi(s) t ∈ [Ti−1, Ti), i ∈ 1, . . . , n, (8.1)

where F i ∈ NN 1,1. We denote the parameters of Fi by θi and the corresponding param-eter space by Θi. For each maturity Ti, we assume to have Ji options with strikes Kij ,j ∈ 1, . . . , Ji. The calibration functional for the i-th maturity is then of the form

argminθi∈Θi

Ji∑j=1

wiju(πmodij (θi)− πmkt

ij ), i ∈ 1, . . . , n, (8.2)

where πmodij (θi) (πmkt

ij respectively) denotes the model (market resp.) price of an optionwith maturity Ti and Strike Kij , u : R → R+ is some (positive, nonlinear, convex)function (e.g. square or absolute value) measuring the distance between market andmodel prices and wij are some weights, e.g. of vega type, which allows to match impliedvolatility data rather then pure prices, our actual goal, very well.

151

Page 162: Polynomial Semimartingales and a Deep Learning Approach to

152 CHAPTER 8. CALIBRATION OF LSV MODELS

We solve the minimization problems (8.2) iteratively: we start with maturity T1 and fixθ1. This then enters in the computation of πmod

2j (θ2) and thus in (8.2) for maturity T2,etc. To simplify the notation in the sequel, we shall therefore leave the index i away sothat for a generic maturity T > 0, (8.2) becomes

argminθ∈Θ

J∑j=1

wju(πmodj (θ)− πmkt

j ).

Since the model prices are given by

πmodj (θ) = E[(ST (θ)−Kj)

+], (8.3)

we have πmodj (θ)− πmkt

j = E [Qj(θ)] where

Qj(θ)(ω) := (ST (θ)(ω)−Kj)+ − πmkt

j . (8.4)

Note that ST depends via (8.1) on θ. The calibration task then amounts to finding aminimum of

f(θ) :=J∑j=1

wju(E [Qj(θ)]). (8.5)

If u is a general non-linear function, this is clearly not of the expected value form ofproblem (7.3). In the following section we illustrate two possibilities how to deal withthis non-linearity and the fact that stochastic gradient descent as outlined in Section7.2.2 is not directly applicable.

8.1 Minimizing the calibration functional

The goal of this section is to specify two methods for minimizing (8.5). It is possible toconsider linearized versions of (8.5) such that classical stochastic gradient descent withpotentially small batch-size is possible. In this thesis however, we only consider twoapproaches that both amount to use classical gradient descent.

8.1.1 Standard gradient descent

The most obvious choice (which however does not work in practice) is to use a standardMonte Carlo estimator for E[Qj(θ)] so that (8.5) is estimated by

f(θ) =

J∑j=1

wju

(1

m

m∑l=1

Qj(θ)(ωl)

), (8.6)

Page 163: Polynomial Semimartingales and a Deep Learning Approach to

CHAPTER 8. CALIBRATION OF LSV MODELS 153

for i.i.d samples ω1, . . . , ωm ∈ Ω. Since the Monte Carlo error decreases as 1√m, the

number of simulation m has to be chosen large (≈ 108) in order to approximate wellthe true model prices in (8.3). Note that implied volatility to which we actually aim tocalibrate is even more sensitive. As it is not obvious how to apply stochastic gradientdescent due to the non-linearity of u, it seems necessary at first sight to compute thegradient of the whole function f(θ). As m ≈ 108, this is however computationallyvery expensive and does not allow to find a minimum in the usually high dimensionalparameter space Θ in a reasonable amount of time.

8.1.2 Standard gradient descent with control variates

One possible remedy is to apply hedging control variates as introduced in Section 7.3as variance reduction technique. This allows to reduce the number of samples m inthe Monte Carlo estimator drastically so that usual (non-stochastic) gradient descent isenough to achieve accurate calibration results.

Assume that we have r hedging instruments (including the price process S) denotedby (Zt)t≥0 which are σ-martingale under Q and take values in Rr. Consider strategieshj : [0, T ]× Rr → Rr and some constant c. Define

Xj(θ)(ω) := (St(θ)(ω)−Kj)+ − c(hj(·, Z·(θ)(ω)) • Z·(θ)(ω))t − πmkt

j (8.7)

where (hj • Z)t denotes (a discretized version of) the stochastic integral with respect toZ. The calibration functionals (8.5) and (8.6), can then simply be defined by replacingQj(θ)(ω) by Xj(θ)(ω).

Analogously as in Section 7.3.2, we can parametrize the hedging strategies via neuralnetworks and find the optimal weight δ by computing

argminδ∈∆

1

N

N∑n=1

`(−Xj(θ, δ)(ωn)).

for i.i.d samples ω1, . . . , ωN ∈ Ω and some loss function ` when θ is fixed. Here,

Xj(θ, δ)(ω) = (ST (θ)(ω)−Kj)+ − (hj(·, Z·(θ)(ω)|δ) • Z·(θ)(ω))T − πmkt

j .

This means to iterate the two optimization procedures, one for θ and the other one forδ. Clearly the Black-Scholes hedge approach of Section 7.3.1 works as well, in this casewithout additional optimization with respect to the hedging strategies.

8.2 Numerical Implementation

In this section, we present numerical examples in the context of a toy model we nowwont to specify. This toy-model will play the role of the “real world”, i.e. we consider

Page 164: Polynomial Semimartingales and a Deep Learning Approach to

154 CHAPTER 8. CALIBRATION OF LSV MODELS

data generated by this model and the task is then to calibrate to these prices. Let usfirst summarize the computational framework we work in. We implement out approachvia tensorflow, taking advantage of gpu-accelerated computing. All computations areperformed on desktop-pc with AMDr Ryzen 2700x cpu, 16 GB of RAM and one GeForceGTX 1070 gpu. The underlying software libraries are tensorflow 1.12 with CUDA 10.We use the python 3.6 API for tensorflow and for the implied volatility computations,we rely on the python py_vollib library.1

8.2.1 Toy-model specification

We consider a SABR type model with one dimensional price-process and one dimensionalvariance process α for which the dynamics are given by

dSt = StLd(t, St)αtdWt,

dαt = ναtdBt,

dWtdBt = ρdt,

for two Brownian motionsW,B with correlation ρ ∈ [−1, 1]. Since we will compute modelprices with a Monte Carlo method via an Euler-discretization of the model, it will bepreferable to work in log-price coordinates for S. In particular, we can then parametrizeLd with X := logS rather then S. By denoting this parametrization again with Ld,where d stands for “data”, we therefore have Ld(t,X) instead of Ld(t, S) and the modeldynamics read

dXt = αtLd(t,Xt)dWt −

1

2α2tL

d(t,Xt)2dt,

dαt = ναtdBt,

dWtdBt = ρdt.

Note that α is a geometric Brownian motion, in particular, the closed form solution forα is available and given by

αt = α0 exp

(−ν

2

2t+ νBt

).

We specify the leverage function Ld in our toy-model by

Ld(t, x) = 1 + 0.3 sin (13x+ 20t) cos (1.1t) . (8.8)

For the other parameters, we choose the ones given in Figure 8.1(a). This toy-model ofours will be used to generate data, i.e. option prices. Hence we separate these parametersfrom our model parameters θ and δ as we keep them fixed and do not optimize (train)

1See http://vollib.org/

Page 165: Polynomial Semimartingales and a Deep Learning Approach to

CHAPTER 8. CALIBRATION OF LSV MODELS 155

ν ρ X0 α0

0.2 -0.7 1 0.2

(a)

T1 T2 T3 T4

0.15 0.25 0.5 1.0

(b)

K1 K2 K3 K4

0.1 0.2 0.4 0.5

(c)

Figure 8.1: (a) Parameters for the toy-model. (b) Maturities to generate data for cali-bration. (c) Parameters that define the strikes of the call options to which we calibrate.

over them. We generate data using this model consisting of European call prices forn = 4 different maturities Ti, i ∈ 1, 2, 3, 4 given in Figure 8.1(b). For each maturity Ti,we consider Ji = 20 strikes Ki,j , j ∈ 1, . . . , Ji, given by

Ki,j = exp (−Ki) + (j − 1)∆Ki , for ∆Ki =exp (Ki)− exp (−Ki)

19.

The values for Ki are given in Figure 8.1(c).

We compute prices in our toy-model using an explicit Euler discretization. For the time-grid, we discretize the interval [0, T4] = [0, 1] with step size 1/100, hence we consider thetime grid T = t0 = 0, . . . , tN100 = 1. Since α is known in closed form, we simulatethe Brownian motion B on T and compute α accordingly. For the 4 × 20 prices thatwe consider as “data” in our deep-learning approach, we simulate Nd = 109 pairs ofcorrelated Brownian paths on T and compute prices via (7.7). Note that we use thesame set of paths for all maturities and strikes. We do not employ the variance reductionmethod proposed above for the computation of these prices. We implement the MonteCarlo scheme via tensorflow, allowing us to use gpu-accelerated computations withouthaving to write native CUDA2 code.

8.2.2 A first numerical experiment for the variance reduction

We briefly present a numerical experiment to demonstrate the effect of variance reductionvia hedges. Consider the toy model we specified above. It is sufficient to demonstrate thevariance reduction for one slice, hence we consider one maturity T1 = 1 and three strikesK1,1 = exp(−0.3),K1,2 = 1 and K1,3 = exp(0.3). We generate “true” prices via 109

simulations with the Monte Carlo scheme we specified above. We compare the variancereduction for each strike with the following experiment. Recall that all plots we refer toare to be found in the next chapter.

Experiment 8.1. Consider the traded assets α and exp(X), i.e. Z = (exp(X), α)>.Hence, as we consider only one maturity, we can specify all hedges via

h(t, z | δ) =

[hX(t, z | δ)hα(t, z | δ)

]=

[hX,1(t, z | δ) hX,2(t, z | δ) hX,3(t, z | δ)hα,1(t, z | δ) hα,2(t, z | δ) hα,3(t, z | δ)

],

2For more information regarding the CUDA framework we refer to the framework provider’s websitehttps://developer.nvidia.com/cuda-zone.

Page 166: Polynomial Semimartingales and a Deep Learning Approach to

156 CHAPTER 8. CALIBRATION OF LSV MODELS

where each columnh(t, z | δ)i = (hX,i(t, z | δ), hα,i(t, z | δ))>

is the hedge to the contingent claim with strike K1,i. We model h with two neural net-works, each representing hX or hα respectively. Hence we have hX , hα ∈ NN3,3. We usethe same network architecture for both neural networks,

• each having 4 hidden layers with dimensions N1 = N2 = N3 = N4 = 64,

• and a single activation function for all nodes σ(x) = tanh(x).

All weights are initialized by randomly drawing independent truncated-normal distributedvalues with zero mean and standard deviation 0.05 for the matrices Al and setting thevectors bl = 0 for initial values. For comparison, we consider four different scenarios,given by

(i) no variance reduction is used which can be achieved by fixing the final layer weightsof the two neural networks to be zero, hence no training,

(ii) only the Black-Scholes hedge is applied, i.e. hα ≡ 0 and hX is given by Equa-tion (7.10), hence no training is involved,

(iii) for hX , we use the Black-Scholes hedge as in (ii) and hα is given by a neuralnetwork that needs to be trained,

(iv) both functions hX and hα are given by neural networks of the previously specifiedarchitecture such that both sets of parameters need to be trained.

For each contingent claim and each scenario (i)− (iv), we simulate 105 trajectories andcompute a normalized histogram of hedge errors, in case of (iii) and (iv) after traininghas finished. In those cases, we train the weights by minimizing

infδ∈∆

3∑i=1

wi E[(

(exp(XT )−K1,i)+ − π1,i − (h(·, Z. | δ)i • Z)T

)2],

where π1,i is the price of the contingent claim with strike K1,i. The weights wi are of Vegatype, given by wi = min(700, 1

vi), where vi is the Black-Scholes Vega for corresponding

strike K1,i, maturity T1 = 1 and implied volatility α0. We bound this weights to ensurethat that no contingent claim is overweighted (compare Cont and Ben Hamida [CBH04]).We approximate the expectation in each training/ optimization step with the standardMonte Carlo estimator, compare (7.7), with mini-batch size Nbatch = 100. Further, weestimate in each training step the mean (which needs to be close to zero since we subtractpre-computed prices) and standard deviation by simulating 105 trajectories. We makeoverall 100 training steps, using stochastic gradient descent with adaptive step size in theform of the Adam optimizer and implemented in tensorflow, see Kingma and Ba [KB14].

Page 167: Polynomial Semimartingales and a Deep Learning Approach to

CHAPTER 8. CALIBRATION OF LSV MODELS 157

K1,1 K1,2 K1,3

scenario (i) 1.88× 10−1 1.08× 10−1 2.74× 10−2

scenario (ii) 7.79× 10−3 1.05× 10−2 8.03× 10−3

scenario (iii) 6.43× 10−3 8.71× 10−3 6.71× 10−3

scenario (iv) 2.40× 10−2 5.73× 10−2 2.56× 10−2

Figure 8.2: Estimated standard deviations per strike and scenario.

We show results of this experiment in Figure 9.1 for strike K1,1, Figure 9.1 for strikeK1,2 and Figure 9.3 for strike K1,3. For scenarios (iii) and (iv), we plot the estimatedmean and standard deviation as a function of the training steps in Figure 9.4, where (a)and (b) correspond to scenario (iii), (c) and (d) to scenario (iv). Note that the valuefor training step zero corresponds to not using any variance reduction, not the estimatedvalues wrt the hedge generated by randomly drawing the weights. The estimated valuesfor the standard deviation for each scenario and each strike is given in Figure 8.2.

Let us make the following observations:

• In each case, an improvement regarding minimizing the variance takes place.

• Based on the presented histograms and the estimated standard deviations, scenario(iii) performs best in terms of variance reduction.

• Scenario (ii), where only the Black-Scholes hedge is used and no training is takingplace, is only slightly worse then scenario (iii). This suggests that the main part ofthe variance reduction is due to the Black-Scholes hedge.

• Scenario (iv) performs significantly worse then (ii) and (iii), suggesting that it isso far not possible to beat the Black-Scholes hedge.

8.2.3 Numerical results for the calibration problem

We will now turn to the calibration problem and present numerical results. Let us startwith the specifications regarding the neural networks that define the parametrizationin (8.1) and the one for the variance reduction hedge h. Based on the results of theprevious section, we choose to employ the pure Black-Scholes hedge, i.e. scenario (ii)in Experiment 8.1, since this allows us to reduce the dimension of the parameter spaceover which we optimize. Hence, there is no training wrt h. We are left to specify thenetwork architecture of the neural networks for the leverage function of our parametrizedmodel. As we only consider finitely many prices on finitely many slices, we do not expectto recover Ld. The goal is to calibrate implied volatility, hence we will measure the

Page 168: Polynomial Semimartingales and a Deep Learning Approach to

158 CHAPTER 8. CALIBRATION OF LSV MODELS

training impl. vola comp.

Experiment 8.2 00 : 27 : 45 00 : 44 : 56

Experiment 8.3 00 : 36 : 29 00 : 44 : 48

Figure 8.3: Computational time table for Experiment 8.2 and Experiment 8.3, with timegiven in hours, minutes and seconds.

performance of our approach accordingly. Note that our model leverage function L in(8.1), takes arguments t and x = log(s). Since we consider 4 different maturities, weneed to specify n = 4 neural networks Fi ∈ NN 1,1, where each Fi has 4 hidden layerswith architecture defined by

• N1 = N2 = N3 = N4 = 10,

• σ(x) = tanh(x).

We use the same weight initialization strategy as in Experiment 8.1. Again, we usestochastic gradient based optimization in form of the Adam method where we computeeach optimization step with mini-batch size Nbatch that needs to be specified. For theloss function u, we choose u(x) = |x|. We start by training F1 using the data from thefirst slice and after completion, we freeze the weights of F1 and continue with the nextslice and so on. For each slice, we make Ntrain training steps. After termination, weuse the trained weights that define L to compute model prices using a sample size of109 simulations. We use the same Vega type weights as in Experiment 8.1 to calibrateimplied volatility rather then prices.

We consider the following two concrete numerical experiments.

Experiment 8.2. For each training step, set Nbatch = 500 and make Ntrain = 15000iterations. The plots consist of implied volatility of data and model, the error of impliedvolatility and a plot of x 7→ L2(Ti−1, x) where, T0 = 0 and x ∈ [−Ki,Ki].

Results for Experiment 8.2 are given in Figures 9.5 and 9.6, where each column corre-sponds to one maturity. As we compute model prices by simulating 109 pairs of correlatedBrownian motions for the Euler discretization, the main computational time is due tothe model implied volatility computation. Hence we present the computational time ac-cordingly, by separating pure training time from implied volatility time computations.These results are given in Figure 8.3.

It is clear that in practice, one would apply the variance reduction technique for themodel price computation to speed up the overall process. In particular, this would allowfor an abort criterion in the training steps to avoid unnecessarily long training.

With the next experiment, we show that there can be a situation where a very highsensitivity between implied volatility and corresponding leverage function exists. Theincrease in accuracy needed for examining this is achieved by augmenting Nbatch.

Page 169: Polynomial Semimartingales and a Deep Learning Approach to

CHAPTER 8. CALIBRATION OF LSV MODELS 159

Experiment 8.3. For each training step, set Nbatch = 3000 and make Ntrain = 15000iterations. For each slice, plot the intermediary training results after 7500 training steps.The plots consist of implied volatility of data and model, the error of implied volatilityand a plot of x 7→ L2(Ti−1, x) where, T0 = 0 and x ∈ [−Ki,Ki].

We plot the intermediary and final results wrt training steps per slice in Figures 9.7to 9.10. Each figure is showing the results wrt one of the four maturities at respectivehalf time Ntrain/2 (left column) and finial training step Ntrain = 15000 (right column).

We can see from from both experiments, that a exact calibration is achieved up to asmall error relaxation. However, by comparing the trained leverage functions at the finalmaturity in Figure 9.6 (bottom right) and Figure 9.10 (last row), one sees that littleimprovement on the implied volatility matching can lead to significant changes in thetrained leverage function.

8.2.4 Conclusion

We have demonstrated how the parametrization by means of neural networks can beused to calibrate local stochastic volatility models to implied volatility data. We makethe following remarks:

(i) The method we presented does not require any form of interpolation for the im-plied volatility surface since we do not calibrate via the Dupire formula. As thecalibration result should not depend on the choice of the interpolation result, thisis a desirable feature of our method.

(ii) It is possible to “plug in” any stochastic variance process such as rough volatilityprocesses as long as an efficient simulation of trajectories is possible.

(iii) The multivariate extension is straight forward.

(iv) The level of accuracy of the calibration result is of a very high degree, making thepresented method already of interest by this feature alone.

(v) The method can be significantly accelerated by applying distributed computationmethods in the context of multi-gpu computational concepts.

The presented method is - at least conceptionally - further able to deal with non-vanillatype European options since all computations are done by means of Monte Carlo simula-tions. We are further able to consider options on the variance process, e.g. VIX options,for the calibration. This will be investigated in a separate work as it goes beyond thescope of this thesis.

Page 170: Polynomial Semimartingales and a Deep Learning Approach to

160 CHAPTER 8. CALIBRATION OF LSV MODELS

Page 171: Polynomial Semimartingales and a Deep Learning Approach to

Chapter 9Plots and Histograms

(a) Scenario (i): Histogram without any vari-ance reduction.

(b) Scenario (ii): Variance reduction usingonly the Black-scholes ∆-hedge.

(c) Scenario (iii): Variance reduction usingthe Black-Scholes ∆-hedge summed togetherwith a trained neural network.

(d) Scenario (iv): Variance reduction wherethe trading strategy is model only with atrained neural network.

Figure 9.1: Normalized Monte Carlo histograms for Experiment 8.1 with strike K1,1

where we compute pricing errors applying different methods for variance reduction viahedging (scenarios (i) to (iv)). Note that the histograms have different scales as a resultof the variance reduction. Normalization here means that the area of the blue bars sumup to one. For estimated standard deviations see Figure 8.2.

161

Page 172: Polynomial Semimartingales and a Deep Learning Approach to

162 CHAPTER 9. PLOTS AND HISTOGRAMS

(a) Scenario (i): Histogram without any vari-ance reduction.

(b) Scenario (ii): Variance reduction usingonly the Black-scholes ∆-hedge.

(c) Scenario (iii): Variance reduction usingthe Black-Scholes ∆-hedge summed togetherwith a trained neural network.

(d) Scenario (iv): Variance reduction wherethe trading strategy is model only with atrained neural network.

Figure 9.2: Normalized Monte Carlo histograms for Experiment 8.1 with strike K1,2

where we compute pricing errors applying different methods for variance reduction viahedging (scenarios (i) to (iv)). Note that the histograms have different scales as a resultof the variance reduction. Normalization here means that the area of the blue bars sumup to one. For estimated standard deviations see Figure 8.2.

Page 173: Polynomial Semimartingales and a Deep Learning Approach to

CHAPTER 9. PLOTS AND HISTOGRAMS 163

(a) (b)

(c) (d)

Figure 9.3: Normalized Monte Carlo histograms for Experiment 8.1 with strike K1,3

where we compute pricing errors applying different methods for variance reduction viahedging (scenarios (i) to (iv)). Note that the histograms have different scales as a resultof the variance reduction. Normalization here means that the area of the blue bars sumup to one. For estimated standard deviations see Figure 8.2.

Page 174: Polynomial Semimartingales and a Deep Learning Approach to

164 CHAPTER 9. PLOTS AND HISTOGRAMS

(a) Running mean for scenario (iii) (b) Running std for scenario (iii)

(c) Running mean for scenario (iv) (d) Running std for scenario (iv)

Figure 9.4: Estimated running mean (see figures (a) and (c)) and standard deviation(see figures (b) and (d)) for scenarios (iii) and (iv) of Experiment 8.1 against trainingsteps. Training step zero corresponds to no variance reduction at all, not the variancereduction due to randomly drawn weights of the untrained neural networks.

Page 175: Polynomial Semimartingales and a Deep Learning Approach to

CHAPTER 9. PLOTS AND HISTOGRAMS 165

Figure 9.5: Plots for Experiment 8.2 for maturity T = 0.15 in the first column andT = 0.25 in the second column after 15000 training steps with Nbatch = 500. Note thatimplied volatility and leverage function are given in log-strike coordinates x = log K

S0for

strike K.

Page 176: Polynomial Semimartingales and a Deep Learning Approach to

166 CHAPTER 9. PLOTS AND HISTOGRAMS

Figure 9.6: Plots for Experiment 8.2 for maturity T = 0.5 in the first column and T = 1.0in the second column after 15000 training steps with Nbatch = 500. Note that impliedvolatility and leverage function are given in log-strike coordinates x = log K

S0for strike K.

Mind the scale on the bottom right. The plot shows that the trained leverage functionfor the last maturity is basically constant.

Page 177: Polynomial Semimartingales and a Deep Learning Approach to

CHAPTER 9. PLOTS AND HISTOGRAMS 167

Figure 9.7: Plots for Experiment 8.3 for maturity T = 0.15 after 7500 (left column) and15000 (right column) training steps with Nbatch = 3000. Note that implied volatility andleverage function are given in log-strike coordinates x = log K

S0for strike K.

Page 178: Polynomial Semimartingales and a Deep Learning Approach to

168 CHAPTER 9. PLOTS AND HISTOGRAMS

Figure 9.8: Plots for Experiment 8.3 for maturity T = 0.25 after 7500 (left column) and15000 (right column) training steps with Nbatch = 3000. Note that implied volatility andleverage function are given in log-strike coordinates x = log K

S0for strike K.

Page 179: Polynomial Semimartingales and a Deep Learning Approach to

CHAPTER 9. PLOTS AND HISTOGRAMS 169

Figure 9.9: Plots for Experiment 8.3 for maturity T = 0.5 after 7500 (left column) and15000 (right column) training steps with Nbatch = 3000. Note that implied volatility andleverage function are given in log-strike coordinates x = log K

S0for strike K.

Page 180: Polynomial Semimartingales and a Deep Learning Approach to

170 CHAPTER 9. PLOTS AND HISTOGRAMS

Figure 9.10: Plots for Experiment 8.3 for maturity T = 1.0 after 7500 (left column) and15000 (right column) training steps with Nbatch = 3000. Note that implied volatility andleverage function are given in log-strike coordinates x = log K

S0for strike K. Also, note

the difference between the trained leverage functions in the last row, despite the fact thatthe implied volatility error in the second row is already small after 7500 training steps.

Page 181: Polynomial Semimartingales and a Deep Learning Approach to

Bibliography

[AP97] William F Ames and BG Pachpatte. Inequalities for differential andintegral equations. Vol. 197. Elsevier, 1997.

[AT10] Frédéric Abergel and Rémi Tachet. “A nonlinear partial integro-differentialequation from mathematical finance”. In: Discrete and Continuous Dy-namical Systems-Series A 27.3 (2010), pp. 907–917.

[Acz66] János Aczél. Lectures on functional equations and their applications.Vol. 19. Academic press, 1966.

[Adl85] Robert J Adler. “Random fields”. In: Encyclopedia of Statistical Sci-ences (1985).

[Arn92] Vladmimir I. Arnold. Ordinary Differential Equations. Springer Text-book. Springer Berlin Heidelberg, 1992. isbn: 9783540548133.

[BC08] Alan Bain and Dan Crisan. Fundamentals of stochastic filtering. Vol.60. Springer Science & Business Media, 2008.

[BG07] Robert McCallum Blumenthal and Ronald Kay Getoor.Markov processesand potential theory. Courier Corporation, 2007.

[BS18] Christian Bayer and Benjamin Stemper. “Deep calibration of rough stochas-tic volatility models”. In: arXiv preprint arXiv:1810.03399 (2018).

[BS73] Fischer Black and Myron Scholes. “The pricing of options and corporateliabilities”. In: Journal of political economy 81.3 (1973), pp. 637–654.

[Ben+18] Fred Espen Benth, Nils Detering, and Paul Kruhner. “Multilinear processesin Banach space”. In: arXiv preprint (2018).

[Bil74] Patrick Billingsley. “A note on separable stochastic processes”. In: The An-nals of Probability 2.3 (1974), pp. 476–479.

[Bue+18] Hans Buehler, Lukas Gonon, Josef Teichmann, and Ben Wood. “Deep hedg-ing”. In: (2018).

[Böt14] Björn Böttcher. “Feller evolution systems: Generators and approximation”.In: Stochastics and Dynamics 14.03 (2014), p. 1350025.

171

Page 182: Polynomial Semimartingales and a Deep Learning Approach to

172 BIBLIOGRAPHY

[CAH17] Maria Fernanda del C. A. Hurtado. “Time-inhomogeneous polynomial pro-cesses in electricity spot price models”. PhD thesis. Albert-Ludwigs-Univer-sität Freiburg, June 2017.

[CBH04] Rama Cont and Sana Ben Hamida. “Recovering volatility from option pricesby evolutionary optimization”. In: (2004).

[CR98] Fabienne Comte and Eric Renault. “Long memory in continuous-time sto-chastic volatility models”. In: Mathematical finance 8.4 (1998), pp. 291–323.

[CT18] Christa Cuchiero and Josef Teichmann. “Generalized Feller processes andMarkovian lifts of stochastic Volterra processes: the affine case”. In: arXivpreprint (2018).

[Coz+17] Andrei Cozma, Matthieu Mariapragassam, and Christoph Reisinger. “Cali-bration of a Hybrid Local-Stochastic Volatility Stochastic Rates Model witha Control Variate Particle Method”. In: Preprint, available at arXiv(2017).

[Cuc+12] Christa Cuchiero, Martin Keller-Ressel, and Josef Teichmann. “Polynomialprocesses and their applications to mathematical finance”. In: Finance andStochastics 16.4 (2012), pp. 711–740.

[Cuc+18a] Christa Cuchiero, A. Marr, Mavuso M., A. Mitoulis, S. Singh, and J. Teich-mann. “Calibration of mixture interest rate models with neural networks”.In: Working paper (2018).

[Cuc+18b] Christa Cuchiero, Martin Larsson, Sara Svaluto-Ferro, et al. “Polynomialjump-diffusions on the unit simplex”. In: The Annals of Applied Prob-ability 28.4 (2018), pp. 2451–2500.

[Cuc+18c] Christa Cuchiero, Martin Larsson, and Sara Svaluto-Ferro. “Probabilitymeasure-valued polynomial diffusions”. In: arXiv preprint (2018).

[Cuc11] Christa Cuchiero. “Affine and polynomial processes”. PhD thesis. ETH-Zurich, 2011.

[Cuc18] Christa Cuchiero. “Polynomial processes in stochastic portfolio theory”. In:Stochastic Processes and their Applications (2018).

[DS72] Purna Candra Das and Rishi Ram Sharma. “Existence and stability of mea-sure differential equations”. In: Czechoslovak Mathematical Journal22.1 (1972), pp. 145–158.

[DS80] PC Das and RR Sharma. “Some stieltjes integral inequalities”. In: Journalof Mathematical Analysis and Applications 73.2 (1980), pp. 423–433.

[Dup96] Bruno Dupire. “A unified theory of volatility”. In: Derivatives pricing:The classic collection (1996), pp. 185–196.

[EER19] Omar El Euch and Mathieu Rosenbaum. “The characteristic function ofrough Heston models”. In: Mathematical Finance 29.1 (2019), pp. 3–38.

Page 183: Polynomial Semimartingales and a Deep Learning Approach to

BIBLIOGRAPHY 173

[EK09] Stewart N Ethier and Thomas G Kurtz. Markov processes: characteri-zation and convergence. Vol. 282. John Wiley & Sons, 2009.

[EK90] Lynn Erbe and QingKai Kong. “Stieltjes integral inequalities of Gronwalltype and applications”. In: Annali di Matematica pura ed applicata157.1 (1990), pp. 77–97.

[EN99] Klaus-Jochen Engel and Rainer Nagel. One-parameter semigroups forlinear evolution equations. Vol. 194. Springer Science & Business Media,1999.

[FL16] Damir Filipović and Martin Larsson. “Polynomial diffusions and applica-tions in finance”. In: Finance and Stochastics 20.4 (2016), pp. 931–972.

[FL17] Damir Filipović and Martin Larsson. “Polynomial jump-diffusion models”.In: Swiss Finance Institute Research Paper 17-60 (2017).

[Fil+17] Damir Filipović, Martin Larsson, and Sergio Pulido. “Markov cubature rulesfor polynomial processes”. In: arXiv preprint (2017).

[GHL11] Julien Guyon and Pierre Henry-Labordere. “The smile calibration problemsolved”. In: Preprint, https://ssrn.com/abstract=1885032 (2011).

[GHL13] Julien Guyon and Pierre Henry-Labordère. Nonlinear option pricing.CRC Press, 2013.

[GKR18] Jim Gatheral and Martin Keller-Ressel. “Affine forward variance models”.In: arXiv preprint (2018).

[Gat+14] Jim Gatheral, Thibault Jaisson, and Mathieu Rosenbaum. “Volatility isrough”. In: arXiv preprint (2014).

[Gil07] Michael Gil. Difference equations in normed spaces: stability andoscillations. Vol. 206. Elsevier, 2007.

[HN92] Robert Hecht-Nielsen. “Theory of the backpropagation neural network”. In:Neural networks for perception. Elsevier, 1992, pp. 65–93.

[HW87] John Hull and Alan White. “The pricing of options on assets with stochasticvolatilities”. In: The journal of finance 42.2 (1987), pp. 281–300.

[He+92] Sheng-wu He, Jia-gang Wang, and Jia-an Yan. Semimartingale theoryand stochastic calculus. Routledge, 1992.

[Her17] Andres Hernandez. “Model Calibration with Neural Networks”. In: Risk(2017).

[Hes93] Steven L Heston. “A closed-form solution for options with stochastic vola-tility with applications to bond and currency options”. In: The review offinancial studies 6.2 (1993), pp. 327–343.

[Hil70] David Hilbert. David Hilbert: Gesammelte Abhandlungen. Springer,1970.

Page 184: Polynomial Semimartingales and a Deep Learning Approach to

174 BIBLIOGRAPHY

[Hor+19] Blanka Horvath, Aitor Muguruza, and Mehdi Tomas. “Deep Learning Vola-tility”. In: Available at SSRN 3322085 (2019).

[Hor91] Kurt Hornik. “Approximation capabilities of multilayer feedforward net-works”. In: Neural networks 4.2 (1991), pp. 251–257.

[Hor96] Laszlo Horvath. “Gronwall–Bellman type integral inequalities in measurespaces”. In: Journal of mathematical analysis and applications 202.1(1996), pp. 183–193.

[JP12] Jean Jacod and Philip Protter. Probability essentials. Springer Science& Business Media, 2012.

[JS13] Jean Jacod and Albert N Shiryaev. Limit theorems for stochastic pro-cesses. Vol. 288. Springer Science & Business Media, 2013.

[JZ16] Benjamin Jourdain and Alexandre Zhou. “Existence of a calibrated regimeswitching local volatility model and new fake Brownian motions”. In: Pre-print available at arXiv (2016).

[Jab+17] Eduardo Abi Jaber, Martin Larsson, and Sergio Pulido. “Affine Volterraprocesses”. In: arXiv preprint (2017).

[Jac+05] Jean Jacod, Thomas G Kurtz, Sylvie Méléard, and Philip Protter. “Theapproximate Euler method for Lévy driven stochastic differential equations”.In: Annales de l’IHP Probabilités et statistiques. Vol. 41. 3. 2005,pp. 523–558.

[KB14] Diederik P Kingma and Jimmy Ba. “Adam: A method for stochastic opti-mization”. In: arXiv preprint arXiv:1412.6980 (2014).

[KR+18] Martin Keller-Ressel, Thorsten Schmidt, and Robert Wardenga. “Affine pro-cesses beyond stochastic continuity”. In: arXiv preprint (2018).

[Kal06] Olav Kallenberg. Foundations of modern probability. Springer Science& Business Media, 2006.

[Kun69] Hiroshi Kunita. “Absolute continuity of Markov processes and generators”.In: Nagoya Mathematical Journal 36 (1969), pp. 1–26.

[Lip02] A. Lipton. “The vol smile problem.” In: Risk Magazine 15 (2002), 61–65.

[Mer74] Robert C. Merton. “On the pricing of corporate debt: The risk structure ofinterest rates”. In: The Journal of finance 29.2 (1974), pp. 449–470.

[Mey66] Paul André Meyer. Probability and potentials. Vol. 1318. Blaisdell Pub.Co., 1966.

[Pro05] Philip Protter. Stochastic integration and differential equations.Springer, 2005.

[RM51] Herbert Robbins and Sutton Monro. “A stochastic approximation method”.In: The annals of mathematical statistics (1951), pp. 400–407.

Page 185: Polynomial Semimartingales and a Deep Learning Approach to

BIBLIOGRAPHY 175

[Ren+07] Yong Ren, Dilip Madan, and M Qian Qian. “Calibrating and pricing withembedded local volatility models”. In: RISK-LONDON-RISK MAGA-ZINE LIMITED- 20.9 (2007), p. 138.

[Rud87] Walter Rudin.Real and complex analysis. Tata McGraw-Hill Education,1987.

[Sap+17] Yuri F Saporito, Xu Yang, and Jorge P Zubelli. “The Calibration of Stochas-tic-Local Volatility Models-An Inverse Problem Perspective”. In: Preprint,available at arXiv:1711.03023 (2017).

[Sch17] René L Schilling. Measures, integrals and martingales. 2nd edition.Cambridge University Press, 2017.

[Sch98] Philipp J. Schönbucher. “Term structure modelling of defaultable bonds”.In: Review of Derivatives Research 2.2-3 (1998), pp. 161–192.

[Sha72] RR Sharma. “An abstract measure differential equation”. In: Proceedingsof the American Mathematical Society 32.2 (1972), pp. 503–510.

[Tia+15] Yu Tian, Zili Zhu, Geoffrey Lee, Fima Klebaner, and Kais Hamza. “Cali-brating and pricing with a stochastic-local volatility model”. In: Journalof Derivatives 22.3 (2015), p. 21.

[Wal13] Wolfgang Walter.Gewöhnliche Differentialgleichungen: eine Einfüh-rung. Springer-Verlag, 2013.

[Çin+80] Erhan Çinlar, Jean Jacod, Philip Protter, and MJ Sharpe. “Semimartingalesand Markov processes”. In: Zeitschrift für Wahrscheinlichkeitstheorieund verwandte Gebiete 54.2 (1980), pp. 161–219.