coherent asset allocation and diversi cation in the ... · 18-04-2011 · coherent asset...

Coherent Asset Allocation andDiversification in the Presence of

Stress Events

Alexander Denev

Christ Church

University of Oxford

A thesis submitted in partial fulfillment of the MSc in

Mathematical Finance

April 18, 2011

Acknowledgements

I would like to thank Dr. Riccardo Rebonato not only for his precious

advice but also for giving me the chance to work on such intriguing topic.

Working on it has been real fun and pleasure for me. I would like also

to thank all my friends and my mother for their continuous support and

encouragement in finishing this thesis.

Abstract

The backward looking approach of Markowitz theory and even of some

of its improved versions do not reflect the necessity to consider informa-

tion present in the current world about events that might create market

dislocations. Information about such events might be absent from any

historical dataset but not taking it into consideration could seriously un-

derstate the risk for many asset classes. This necessity is underlined in

numerous publications by scholars, practitioners and regulators, especially

after the subprime debacle 2007-2009. Such mood is already incorporated

in investors’ strategies who protect their portfolios with insurance trades

in case a an extreme event or series of such events materializes. However,

this strategy is realized ex post i.e. after running mathematical mod-

els to calculate optimal asset allocations, as an afterthought. Having a

mathematical tool to satisfy both the need for diversification and forward

looking information about stress events in an integrated fashion is still

missing. This thesis tries to fill this gap by introducing a new framework

for asset allocation in the presence of stress events carried out in a co-

herent way and not ad hoc, as an afterthought. We propose a procedure

that blends standard statistical techniques such as correlations, copulae

etc. with a subjective Bayesian approach based on causality assumptions

to model stress events. The approach is highly transparent, auditable and

easy to be understood by non-specialists.

Because the practical application of the procedure is of extreme impor-

tance we introduce also some original tools that ease its implementation,

its robustness and its calculation speed. More specifically, we present i) a

maximum entropy approach for the degree of confidence in the model ii)

a fast numerical technique which avoids continuous repetitions of Monte

Carlo simulations iii) a linear programming technique which helps the

estimation of probabilities iv) the concept of stressed efficient frontier.

Finally, we suggest a link with the Black-Litterman model which becomes

in this context a particular case of our approach.

Contents

1 Introduction 1

1.1 Outline of the Problem and Dataset . . . . . . . . . . . . . . . . . . . 6

2 Statistics 8

2.1 Identification of Normal Data . . . . . . . . . . . . . . . . . . . . . . 8

2.1.1 Some Remarks About the MVE, MCD and Related Methods . 14

2.2 Constructing the Marginal Distribution of the ‘Normal’ Risk Factors . 24

2.3 Constructing a Joint Distribution of ‘Normal’ Risk Factors . . . . . . 27

3 Causality 36

3.1 Treatment of Exceptional Events . . . . . . . . . . . . . . . . . . . . 36

3.2 Bayesian nets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.3 Constructing a Bayesian Net . . . . . . . . . . . . . . . . . . . . . . . 39

3.3.1 Building the Marginal and Conditional Probability Tables . . 41

3.3.2 Linear Programming . . . . . . . . . . . . . . . . . . . . . . . 42

3.3.3 From the Conditional Probability Tables to the Joint Probabil-

ities of Rare Events . . . . . . . . . . . . . . . . . . . . . . . . 45

4 Fusion 46

4.1 Mapping from the ‘Normal’ Joint Distribution to the ‘Normal’ Return

Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4.2 Mapping from the Exceptional Joint Distribution to the Exceptional

Return Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4.3 Splicing the ‘Normal’ and ‘Exceptional’ Portion of the P&L Distribution 47

4.4 Maximization of the Utility . . . . . . . . . . . . . . . . . . . . . . . 48

4.4.1 Numerical optimization . . . . . . . . . . . . . . . . . . . . . . 50

4.5 Maximum Entropy for the choice of k . . . . . . . . . . . . . . . . . . 51

i

5 A Worked-Out Example 56

5.1 Construction of the Bayesian Net . . . . . . . . . . . . . . . . . . . . 56

5.2 Discussion of the Results . . . . . . . . . . . . . . . . . . . . . . . . . 60

5.3 Maximum entropy allocations . . . . . . . . . . . . . . . . . . . . . . 65

5.4 A fast algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

5.4.1 Gaining intuition . . . . . . . . . . . . . . . . . . . . . . . . . 72

5.4.2 Generalizing the results . . . . . . . . . . . . . . . . . . . . . 74

6 Conclusions 78

6.1 Future research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

A Total moments 81

A.0.1 First moment . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

A.0.2 Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

A.0.3 Covariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

A.0.4 Utility maximization . . . . . . . . . . . . . . . . . . . . . . . 83

B Fusion with the Black-Litterman approach 85

B.1 Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

B.2 Investor views and implied drifts . . . . . . . . . . . . . . . . . . . . 87

B.2.1 Implied drifts in the case of knowledge contained in the market 88

B.2.2 Implied drifts in the case of specific knowledge . . . . . . . . . 90

B.2.3 Summing up . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

C Efficient frontier 92

D Influence plots 98

E Truncated distributions 100

F The Johnson distribution 102

G Truncation algorithm applied to a subset 104

Bibliography 104

ii

List of Figures

2.1 Robust Mahalanobis distance vs date of observation. The major crisis

are clearly visible in terms of the concentration of large values of the

robust distance: the Russian crisis, the dot-com crisis and the recent

subprime debacle. The red line corresponds to χ24(0.975) = 3.34. The

parameter h was set to [0.75N ] (this is also the value adopted in [63]).

The two peaks on the most right part of the figure correspond to the

dates immediately after Lehman’s default: 15 and 17 September 2008.

The highest peak in the middle is the reopening of the stock exchange

after 9/11. The highest peak on the left is the 31 August 1998, a few

days after Russia defaulted on its debt. . . . . . . . . . . . . . . . . . 11

2.2 Volume of the ellipsoid as a function of the number of points removed 15

2.3 Variation for Fig 2.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.4 Minimum covariance determinant as a function of the number of points

removed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.5 Variation for Fig 2.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.6 Correlation matrix determinant as a function of the number of points

removed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.7 Variation for Fig. 2.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.8 Correlations as a function of the number of points removed. The sym-

bol ρ signifies correlation and the subscripts C, B, M and E denote

Credit, Bond, Mortgage and Equity, respectively, as described in the

Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.9 Variation for Fig. 2.8 . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.10 Variation in the eigenvalues of the correlation matrix as a function of

the number of points removed. The eigenvalues are ordered from the

smallest λ1 to the largest λ4. . . . . . . . . . . . . . . . . . . . . . . . 20

2.11 Body (circles) and outliers (stars) after the removal of 160 points. . . 21

iii

2.12 Robust Mahalanobis distance vs date of observation of the ’body’. The

red line corresponds to χ24(0.975) = 3.34. The parameter h was set to

[0.75N ]. For this graph the outliers were not removed but replaced

by the average of the dataset in order to have historical comparability

with Figure 2.1. The extreme peaks of Figure 2.1 are no longer present. 21

2.13 Influence plot of the standardized returns of Bond and Credit. The in-

fluence contour lines show by how much approximately the correlation

coefficient vary if we remove a single point sitting on that line. The red

points represent the outliers eliminated by the truncation algorithm.

Some outliers may appear as not influential on the two dimensional

graph but we must remember that the truncation was carried out in

a 4-dimensional space. On a such higher dimensional space they are

influential. The method used here to draw the plot is the one presented

in Appendix D. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.14 Influence plot of the standardized returns of Bond and Equity. See

Fig 2.13 for an explanation. . . . . . . . . . . . . . . . . . . . . . . . 23

2.15 The fit to the S&P daily returns obtained using a Gaussian and a

t-Student distribution. . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.16 The Q-Q plot for the two fits in Figure 2.15. . . . . . . . . . . . . . . 26

2.17 Scatterplot between Bond-Mortgage for the five different subsets. In

the low-right corner the scatterplot on the entire dataset is shown. The

correlations are respectively 88.1%, 87.1%, 93.1%, 91.3%, 81.6% and

86.5%. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

2.18 Scatterplot between Bond-Equity for the five different subsets. In the

low-right corner the scatterplot on the entire dataset is shown. The cor-

relations are respectively −13.1%, −43.2%, −2.9%, −29.6%, −43.5%

and −22.2%. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

2.19 Correlation between Bond and Equity for the entire dataset. The cor-

relation is calculated within a sliding window of 250 datapoints. . . . 34

2.20 Gaussian copula. Cumulative distributions of the four distances used

to assess the GOF of the copula for the last subset. . . . . . . . . . . 34

2.21 t copula. Cumulative distributions of the four distances used to assess

the GOF of the copula for the last subset. . . . . . . . . . . . . . . . 35

2.22 Scatterplot of the generated random numbers with the fitted copula

(and mapped inversely from [0, 1] to the real axis with the help of the

fitted marginals) for Bond and Mortgage. . . . . . . . . . . . . . . . . 35

iv

3.1 Example of a Bayesian net . . . . . . . . . . . . . . . . . . . . . . . . 40

5.1 The Bayesian net used for the example in this Section . . . . . . . . . 57

5.2 The asset allocation to the four sub-portfolios as a function of the total

probability mass of being in a normal state obtained with a logarithmic

utility function. The allocation on the far right is the Markowitz-like

allocation that would apply in an exclusively normal world. . . . . . . 60

5.3 Same as Fig 5.2 for a power utility function β = 0.8. . . . . . . . . . . 61

5.4 Same as Fig 5.2 for a power utility function β = 1.2. . . . . . . . . . . 61

5.5 Allocations with a two-moment simplified optimization approach for a

logarithmic utility function. . . . . . . . . . . . . . . . . . . . . . . . 62

5.6 Allocations as a function of the degree of confidence - truncated Gaussian 65

5.7 Allocations as a function of the degree of confidence - Johnson . . . . 69

5.8 Logarithmic utility function for two sets of returns as the allocations

vary. The two sets are: plane with squared markers - r = (1%,−4%,−5%);

plane without markers - r = (−1%, 2%, 3%). . . . . . . . . . . . . . . 73

5.9 Allocations as a function of the normalization factor in the case: Monte

Carlo for each k - thin line, numerically inferred - thick line . . . . . . 77

C.1 Stressed efficient frontier: k = 0 - blue, k = 0.1 - red, k = 0.15 - green,

k = 0.2 - yellow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

C.2 Standard deviation of the minimum variance portfolio as a function of k 96

C.3 Concavity of the efficient frontier as a function of k . . . . . . . . . . 97

C.4 Maximum entropy efficient frontier, frontier at k = 0 and frontier ob-

tained with the covariance matrix and drifts before truncation of the

outliers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

F.1 Johnson distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

G.1 Volume of the ellipsoid as a function of the number of points removed 104

G.2 Variation for Fig G.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

G.3 Minimum covariance determinant as a function of the number of points

removed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

G.4 Variation for Fig G.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

G.5 Correlation matrix determinant as a function of the number of points

removed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

G.6 Variation for Fig. G.5 . . . . . . . . . . . . . . . . . . . . . . . . . . 107

v

G.7 Correlations as a function of the number of points removed. The sym-

bol ρ signifies correlation and the subscripts C, B, M and E denote

Credit, Bond, Mortgage and Equity, respectively, as described in the

Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

G.8 Variation for Fig. G.7 . . . . . . . . . . . . . . . . . . . . . . . . . . 109

G.9 Variation in the eigenvalues of the correlation matrix as a function of

the number of points removed. The eigenvalues are ordered from the

smallest λ1 to the largest λ4. . . . . . . . . . . . . . . . . . . . . . . . 110

vi

Chapter 1

Introduction

In the late 50s Markowitz laid down the foundations of the modern asset allocation

approach [39][40] through a new theory which since then has become one of the main

pillars of investment decisions. Although it has several limitations, the main idea

of diversification, which is at its heart, is still deeply rooted in investor’s mindset

dictating allocation strategies. The idea which was not new at the time but had

been known since the times of the Venice merchants is reworded sometimes simply

as ”do not put all the eggs in the same basket”. As simple as that, but cast in the

powerful language of mathematics, it gave a quantitative dimension of concepts like

risk and return and their mutual dependence and trade-off. Such notions become

more intuitive by making use of powerful representations such as the efficient frontier

which gives a straightforward bird-eye grasp of how these quantities are related. One

of appealing features of the theory, and a major drawback, is its mathematical sim-

plicity. By making simplistic assumptions, closed form solutions are derived, subject

to an immediate understanding and easiness of calculation. But even if the main

Markowitz idea did not lose its momentum, the theory in its original form has been

hardly used and has undergone several criticisms and improvements over the decades.

Let’s start with some stylized facts of asset returns series. In first place, especially

for daily data, it widely known they are not well described by normal distributions, as

it is sometimes assumed. More than 6σ events populate the tails of the distributions

quite more frequently than expected when postulating normality1. Not yet clear in

the 50s this fact was crystallized as an increasing quantity of high-frequency data

became available2.

1On longer time scales, however, financial time series exhibit aggregational gaussianity i.e. thenormal approximation becomes increasingly better. See, e.g., [10] Chapter 1, Figure 1.3a to 1.3f,and the discussion in the text.

2Mandelbrot in the 60s even advanced the hypothesis that financial returns series could haveinfinite variance [38].

1

Second, volatility exhibits clustering behaviour: periods of high and low volatility

seem to alternate in contrast with the homoscedastic assumptions of Markowitz.

Third, correlations are assumed constant in the model while an overwhelming

evidence has been given, that they are prone to increase during turbulent times 3.

Without listing all the stylized statistical facts of returns series4 we endorse the

idea that any improvement of portfolio models, in general, should take into account

the dichotomy of turbulent/non-turbulent periods between which statistical proper-

ties show different patterns. In other words, we start from the fact that there are

periods in which statistical frequentist description of financial series derived from

non-distressed conditions breaks down and the listed stylized facts become more and

more visible: correlations change dramatically, volatility increases, tail events happen

frequently, drifts change. Our view is that such regimes must be taken into account

for a robust asset allocation. This sentiment is well exemplified by in the following

quote by Mohamed A. El-Erian, PIMCO’s CEO [16]:

‘...[Recent] developments will serve to further highlight the danger of

[...] being overly committed to an historical policy portfolio whose rigid

backward-looking characterization no longer corresponds to the realities

of today and tomorrow....’ ‘...[W]ith its conventional (or, to be more pre-

cise, reduced-form) analytical foundation now subject to some motion, it

will become even more difficult to rely just on a traditional portfolio di-

versification as both necessary and sufficient to deliver high returns and

mitigate risks. Diversification will remain necessary, but a lot more atten-

tion will be devoted to the appropriate specification of tail scenarios and

tail hedges...’

This view is gaining currency not only among practitioners and many economists , but

even among regulators, who had been wedded until recently to a strictly frequentist

view of risk. See, for instance, the views recently expressed by the Basel Committee:

’...The financial crisis has shown that estimating ex ante the probabil-

ities of stress events is problematic. The statistical relationships used to

3As Longin and Solnik show empirically in a seminal paper [36] by using extreme value theory,the null hypothesis of multivariate normality for the negative tail of the joint return distribution forthe examined equity indices can be rejected, but not for the positive tail i.e. markets become morecorrelated when they are bearish. For an empirical summary about how correlation changed duringthe global financial crisis see [46]. For an empirical study of the joint extremes of foreign exchangeseries see [25].

4A detailed summary is given in [11].

2

derive the probability tend to break down in stressed conditions. In this

respect, the crisis has underscored the importance of giving appropriate

weight to expert judgment in defining relevant scenarios with a forward

looking perspective...’ - The Basel Committee [47]

Another sentence is quite to the point [60]:

’...A systemic crisis is by definition rare event, and statistical models

drawing on data from ”normal” periods are inappropriate; for example,

it is well known that correlations between the returns of financial assets

increase considerably during crisis periods...’

One of the paths followed to overcome the understatement of risk by normality as-

sumptions is the use of heavy tailed marginal distributions and non-gaussian copulas.

The t-student distribution, for example, with its power law decaying tails could be

used to accommodate arbitrary large returns. Several studies indeed show that it

provides better fits for financial series, especially at high frequencies5.

Another line of research tries to model the heavy tails of the distributions sta-

tistically through the Extreme Value Theory (EVT). The approach consist of fitting

to the extreme values over a given threshold a statistical distribution of their own

while modelling the center with another suitable one (Normal, empirical etc.). While

in these both approaches risk seems no longer understated, the backward-looking

stance, criticized also in the above quotations, could raise some objections to their

validity.

Let’s consider, for example, some of the major crisis that have characterized the

last two decades: the Mexican and Russian crisis, the Asian crisis, the dot-com bubble

and the global financial turmoil 2007-2009. The cause of each of them was different

as well as the underlying transmission mechanisms, the markets affected by conta-

gion and the asset classes involved. Generally speaking, it is very hard to find by

scrutinizing history crises ignited by the same initial series of events and spreading

through the same asset classes and countries. Even if theories do exist about the

cyclicality and repeatability of crises6, the observed pattern of appearance and its

time regularity in the last century was very different. Minsky’s model7, for example,

tries to give a general explanation and anatomy of a typical crisis and the underlying

5See again [10] Chapter 10.6For instance, the Kitchin inventory cycle of 39 months, the Kuznets cycle of 20 years linked to

the housing construction.7See [30] for an introduction.

3

economic mechanism linked to the expansion of the credit supply was confirmed on

many occasions8. However, since the objective of an asset allocation theory is not to

find a universal economic generating mechanism (like Minsky’s theory) but to choose

an optimal allocation to asset classes, the level of detail needed for this task is more

granular and it is at asset class level and below. Thus ”details” become important

when an investor is faced with the decision where to allocate wealth particularly in

regard to which asset class could be hit and its co-dependence with the other classes.

All these details could be different for each situation of market distress. For instance,

by looking at home prices in US after the Second World War up to 2007 nothing

could suggest that this particular asset class would be vulnerable to such a crash

as in 2007-20099. A hypothetical investor at the beginning of 2007 could have been

easily fooled by trying to make a forecast based on a time series that for the previous

60 years had shown only small deviations from otherwise a positive trend.

A so called school of ”econophysics”10 has analyzed the signatures of extreme val-

ues ([35], [72]). According to such school, the most extreme events form a statistical

class of their own separated from the normal times paradigm. Assuming a normal

distribution for the data would signify postulating less ”once in the life of the uni-

verse” probability of occurrence for events with a much more frequent appearance, a

fact confirmed abundantly from empirical findings. Thus, according to them, these

extremes need no to be discarded as outliers but would need a model of their own to

be described11. This is also the view adopted in this thesis.

Econophysicists defend a theory according to which specific signatures in the assets

behaviour could anticipate such extreme movements ([71] [70]). The log-precursors, as

they are called, are peculiar oscillations which are observable just before the crash and

are the results of altered serial co-dependence and amplification of daily price returns.

The underpinning mechanism is the result of interplay between rational agents and

noisy traders. Although, the approach gives nice fits for some of the historical crashes

8”Each crisis also has its unique individual features - the nature of the shock, the object ofspeculation, the form of credit expansion, the ingenuity of the swindlers [...]; Details proliferate;structure abides” Kindleberger [30] in response to a criticism of the Minsky model that every crisisis unique and so not subject to generalizations.

9Shiller [69] shows that inflation-adjusted U.S. home prices increased 0.4% per year from 1890-2004 and 0.7% per year from 1940 - 2004.

10For an interesting comparison in a neopositivist framework of how classical economics andeconophysics differ in their approach to modelling economic phenomena see [66].

11Indeed, Rebonato et al. [58] [57] analyze the drawdown properties of interest rates and suggestthat exceptional events do belong to a different statistical class, as Sornette [71] suggests. However,they also point out that several, and not one, signatures exist, each corresponding to a different wayfor the market to be distressed. These studies suggest that during exceptional periods market dobehave differently, but not always in the same way.

4

and making some successful predictions, universal validity of the theory is still under

question (see [17] for some objections raised to the log-precursors theory). In fact,

even being close to the critical time tc12 there is still implicit randomness in the event

happening because, had it been certain, it could have been forecasted and avoided (or

hedged)13. So, as a consequence, a bubble, instead of bursting, could land smoothly.

Such provision for random outcome makes the theory even less testable. The log-

periodic model led to two false alarms respectively in 1997 and 1999 in the US stock

market.

Absent a convincing proof of the econphysicists claim, we together with many

economists take a rather sceptical view of the ability to provide a frequentist statis-

tical description of future economic events14. The approach presented here indeed

takes such an agnostic stance. With the econophysicists we also believe that extreme

events belong to a class of their own. But we do not assume that market behaviour in

situations of distress displays a universal signature. Neither we believe that Extreme

Value Theory with his backward-looking approach can always take into account the

complexity and joint behaviour of assets under new exceptional circumstances. In-

stead we attempt to specify the distressed co-dependencies and stand-alone properties

of the returns distribution on a case-by-case basis by making use of our (imperfect

but useful) understanding of how the world works. As we shall see, the exceptional

co-dependencies will be a direct output of the specific causal links between the drivers

of exceptional returns that the asset manager will have identified.

Of course, the idea of complementing statistical approaches with stress testing and

scenario analysis is not new. Asset managers have been aware for a long time of the

exceptional and unpredictable market behaviour in conditions of distress and have

traditionally tried to defend their portfolios by adding, after the optimization, vari-

ous types of insurance trades (e.g., out-of-the- money puts). However, these defensive

positions have typically been added to optimized portfolios as an afterthought, and

12The critical time tc is the most probable time of occurrence of the crash.13In fact this is the only way of being compatible with the rational expectation hypothesis on

which the model is based. For a discussion of the compatibility of the coexistence of bubbles andrational expectations see [2].

14For instance, Rochet [60] points out, ’financial risks are not exogenous, but arise from thebehavior of economic agents. Consequently, the statistical distribution of [...] financial returns isnot stationary but depends critically on the economic and regulatory environment, as well as theindividual incentives confronting the many economic agents who participate in different aspects offinancial intermediation’. See also Davidson [50], page 18 and passim, for a discussion of limits of theergodic theorem, or the body of work that goes under the rubric of Imperfect Knowledge Economicsfor a more general criticism of the predictability of economic change (see, e.g., Frydman et all. [20][19]).

5

outside a coherent probabilistic and utility-maximization framework. Such ad hoc

treatment may seem quite arbitrary and lacking the desired quantitative integration

with the statistical modelling part. The purpose of this thesis is to overcome such

drawback and to give a recipe of how this integration can be done ex ante in a coherent

way. It will make use of standard statistical techniques such as distributions, copulae,

correlations etc. together with a Bayesian approach based on causality assumptions.

In other words, we claim that ‘most of the times’ markets behave regularly, i.e., display

regularities (both in the marginal distributions and in the co-dependence structure)

that can be uncovered using frequentist statistical techniques. As for tail events, sub-

jective probabilities (whose assignment obviously also takes into account frequentist

information when appropriate) and assumptions about the causal structure of tail

events then provide the required complement to the frequentist information. Given

these two inputs, the approach proposed in this thesis blends them in a new kind

of portfolio optimization framework. Its subjective inputs are transparent, auditable

and challengeable by non-specialists. In this sense the approach proposed here trans-

lates and expands to an asset management context many of the ideas introduced for

stress testing in [56], [55], [54].

The Black-Litterman model (Black et al. [1]) was the first model of this kind with

a forward-looking perspective. It gives possibility to the investor to express views on

how the market will behave and amalgamate his opinions with statistical data in a

very elegant way. We will highlight later the drawbacks and how this approach can

be integrated in the framework we propose here.

1.1 Outline of the Problem and Dataset

In this thesis we will consider the problem of how to choose, given a fixed investment

amount (the budget constraint), the optimal composition of a top portfolio, Π, (ie,

the weights, w, to be given to several sub-portfolios). For simplicity of exposition

we assume that each sub-portfolio is invested in a different asset class and that it

can be associated with a single dominant risk factor. This assumption makes the

exposition clearer, but can be easily relaxed. So, in our example one sub-portfolio

could be a long-only US equity portfolio (and the risk factor would be an appropriate

US equity index); another sub-portfolio could be a portfolio of US investment grade

corporate bonds (and the risk factor would be an investment grade bond index, or,

say, a AA spread) etc. Given the stylized nature of the example, mutatis mutandis

6

we can therefore speak interchangeably of a portfolio, of a risk factor or of an asset

class.

Apart from giving a general recipe, we will illustrate the approach by analyzing

step by step a real world example of the proposed procedure.

The dataset we will use consists of four asset classes, Government Bonds, Investment-

Grade Credit Bonds, Equities and High-Grade RMBS Securities (called asset class

Bond, Credit, Equity and Mortgage, respectively, in the following). More precisely,

the following indices were used15:

• for Bond the BarCap US Treasury Index;

• for Credit the BarCap US Credit Index;

• for Equity the S&P 500;

• for Mortgage the BarCap US MBS Index.

The data set consists of 3360 x 4 daily returns16, covering the period February 1997

to June 2010.

This thesis is organized as follows. In Chapter 2 a technique to obtain ”clean” (in

a sense we will explain later) quantities like correlations and volatilities is presented.

In Chapter 3 the Bayesian nets technology, which is the foundation of the approach, is

introduced. Chapter 4 fuses the ”cleaned” quantities with the output of the Bayesian

net and shows how to perform the optimization. A real world example is presented

and discussed in Chapter 5. Finally, some of the technical details and proofs which

complement the body of the thesis are put in Appendices.

All the coding and figures were done in MATLAB (version 7.10.0.499 R2010a).

15Bloomberg tickers: Bond - LUATTRUU, Credit - LUCRTRUU, Equity - SPX, Mortgage -LUMSTRUU.

1636 non-contiguous missing values in the dataset were linearly interpolated.

7

Chapter 2

Statistics

2.1 Identification of Normal Data

In general, we assume that, for each sub-portfolio, a distribution of (arithmetic re-

turns) is available. The time-tk arithmetic return for sub-portfolio i, rki , is defined

as

rki =xk+1i

pki− 1

where pki denotes the price level of sub-portfolio i at time tk, xk+1i denotes the value

at time tk+1 of the same sub-portfolio plus the (reinvested) dividends accrued from

time tk to time tk+1.

Since we believe that exceptional returns and co-dependencies cannot be mod-

elled statistically but need a forward-looking Bayesian approach, the first step is to

isolate them from ’normal’ returns1. In general, the distribution of ’normal’ returns

could display fatter tails then the Gaussian one. This is due ‘mundane’ and ‘regular’

phenomena such as, say, stochastic volatility or small or medium-size jumps which

can give rise to fat tails, without necessarily implying that the market is in a state of

distress. Because of this, in what follows, we will leave the possibility to have such

’normal’ returns modelled with means other than Gaussian distributions.

Several techniques can be employed to identify such ‘body’ of ’normal’ returns of

a distribution. The simplest one is to truncate all the data points over a threshold

(e.g., a given number of standard deviations) in the empirical margins and define as

‘body’ the rest of the data. This procedure is very simple but rather arbitrary. More

sophisticated and ‘objective’ methods have been developed which aim at eliminating

1In this thesis the adjective ‘normal’ is often used referred to data to mean ‘non-exceptional’.To avoid confusion, the word ‘normal’ is never used in the statistical sense, for which sense theword ‘Gaussian’ is invariably used. To further avoid confusion, the adjective ‘normal’ is often placedamong quotes (as in ‘normal’).

8

outliers in the data. The earliest papers on the topic (see, e.g., [24], [18]) focused on

outliers of the Gaussian distribution. More recent studies broaden the idea to fat-

tailed distributions such as t-Student or other power law decaying distributions (see

[49] and [67]). The central statement is again that, although heavy-tailed distribution

are used by definition to accommodate ‘large’ returns, there are still values which are

so extreme that they cannot be explained by the hypothesized distribution. The

outcome of these studies is a statistically formalized test procedure.

The focus of these studies is the unidimensional case. This could hide some inter-

esting dynamics in the joint behaviour of asset classes. A useful multi-dimensional

technique is based on the volume of the minimum volume ellipsoid (MVE) that con-

tains the data2 and the minimum covariance determinant (MCD). Both MVE and

MCD belong to the class of the so called high-breakdown estimators, highly resistant

to outliers.

Let’s consider a set of N data points (asset returns in our case) r = (r1, r2, ..., rn)

each defined in p dimensions. For each of the points in the dataset we can define the

Mahalanobis distance:

MD(rj) =√

(rj − r)TS−1(rj − r) (2.1)

where T means transposition, S is the sample covariance matrix and r is the sample

mean of the p dimensional vectors r. If data come from a multivariate normal distri-

bution the Mahalanobis distances follow a χ2p distribution. It would be intuitive to

define the body of data as the locus of points such that:

L ≡ r;MD(r) ≤√χ2p(α)

where α represents a predetermined confidence level3. This approach, however, has

an intrinsic problem. The covariance matrix S and the means r in (2.1) are both

estimated from the sample. Therefore the presence of outliers might inflate heavily

these estimates and if we make use of (2.1) we could miss some potential outliers

which will appear as ’normal’ points. This outcome is called masking effect. The

MVE and MCD are measures that allow to overcome such issue. Let’s define first the

MVE [62]:

2See [61] [62] [63] for an introduction. A detailed description can be also found in [44].3For the use of the Mahalanobis distance to discern periods of financial turbulence see also [32],

an article published during the writing of this thesis. However, the use of the ’pure’ Mahalanobisdistances has a severe drawback, described in what follows, which we will overcome with the help ofthe MVE and MCD.

9

Definition 1: The MVE location estimator t and scatter estimator C minimize

the determinant of C subject to the condition

#i; (ri − t)TC−1(ri − t) ≤ c2 = h

where the minimization is over all t ∈ Rp and C ∈ PDS(p), the class of positive

definite symmetric matrices of size p

In this definition h is a number lying between [N/2] + 1 and N where [] means

integer part. The symbol # stands for number of points. Therefore, the MVE is an

estimator of the center and the scatter of the h points that are the most concentrated

in a given dataset. The constant c can be chosen according to different criteria. One

such criterion leads to the choice c as a consistent estimator of the covariance matrix

for data coming from a multivariate normal distribution i.e. c =√χ2p(α) where

α = h/n. We define the MCD as follows:

Definition 2: The MCD location estimator t is the average of h points and the

scatter estimator is the minimum determinant covariance matrix C subject to the

following condition:

#i; (ri − t)T C−1(ri − t) = h

Once these estimators are found they can be used in (2.1) to calculate the robust

Mahalanobis distances and label as outliers observations which robust distance ex-

ceeds a give threshold. However, given h points in the p-dimensional space finding

the MVE or MCD, means sweeping through(N

h

)=

N !

(N − h)!h!

possible combinations i.e. computational complexity O((Nh

)). Even for small datasets

this might be infeasible. Fortunately, methods exists that allow to overcome this

computational issue. A very fast method called FAST-MCD which finds the minimum

covariance determinant can be found in [63]4. As an illustration, we show in Fig.2.1

the robust Mahalanobis distances calculated with the FAST-MCD approach as a

function of the observation date in the dataset. As already explained, a possible

criterion to identify the body of the data is to cull all the points above a certain

threshold which for 4 assets might be χ24(0.975) = 3.34.

Another method based on the MVE/MCD philosophy is proposed in [53] and [44].

We will describe it in detail because it will the main method we will use to decide

4In the same paper the author recommends the use of the MCD over the MVE for reasons ofstatistical efficiency and computation speed. However, we find in what follows that the MVE cangive some additional insight and can be used as a complement.

10

Figure 2.1: Robust Mahalanobis distance vs date of observation. The major crisisare clearly visible in terms of the concentration of large values of the robust distance:the Russian crisis, the dot-com crisis and the recent subprime debacle. The red linecorresponds to χ2

4(0.975) = 3.34. The parameter h was set to [0.75N ] (this is also thevalue adopted in [63]). The two peaks on the most right part of the figure correspondto the dates immediately after Lehman’s default: 15 and 17 September 2008. Thehighest peak in the middle is the reopening of the stock exchange after 9/11. Thehighest peak on the left is the 31 August 1998, a few days after Russia defaulted onits debt.

11

which points to ban as outliers. The procedure consists of three separate routines

combined together.

Routine A - Calculation of the smallest ellipsoid that contains h obser-

vations5

Step 1: Given a set of h observations define the initial weights as follows:

wn =1

h, n = 1, ..., h

Step 2: Compute the location parameter t and scatter C according to

t =1∑h

n=1wn

h∑n=1

wnrn

C =h∑

n=1

wn(rn − t)(rn − t)T

Step 3: Calculate the squared Mahalanobis distance:

Ma2n = (rn − t)TC−1(rn − t), n = 1, ..., h

Step 4: For every n for which Ma2n > 1 change the respective weight to:

wn ← wnMa2n

Step 5: If convergence is reached stop and define the volume of the ellipsoid as:

V = γp√|C|

where γp is the volume of the unit sphere in p dimensions.

γp =πp/2

Γ(p2

+ 1)

otherwise go to Step 2.

Routine B - Removal of the farthest outlier6

Step 1: Define the N × p matrix

U =

r1 − t....

rN − t

5The method in this Routine was originally introduced in [75].6The method in this Routine was originally introduced in [53].

12

where t is the sample mean of the data.

Step 2: By denoting with λn the n-th element of the diagonal of the information

matrix :

λn = (U(UTU)−1UT )nn

remove the observation with the highest value of λn. This by definition is the obser-

vation that changes the most the determinant of the sample covariance matrix.

Routine C - Truncation

Step 1: Start with the entire dataset.

Step 2: Calculate the sample covariance matrix C.

Step 3: Use Routine A to find the location and scatter parameters of the minimum

volume ellipsoid ε.

Step 4: Find the farthest outlier with Routine B and remove it from the data.

Step 5: If the number of data points left equals h = [0.5N ] stop the algorithm.

Otherwise go to Step 2.

The main idea is that by plotting ε and C as a function of the points left in

the dataset the delimiting line between outliers and body is where these value start

to show abrupt changes. Other important quantities can be also monitored during

truncation and their stabilization used to infer a separation line between body and

outliers’ cloud.

An example of this procedure is shown in Figures 2.2 to 2.10. Figure 2.2 shows

the volume of the ellipsoid as a function of numbers of observations removed from the

dataset. Figures 2.4 and 2.6 display the determinant of the covariance and correlation

matrices, respectively, also as function of the number of removed observations. Figures

2.3, 2.5 and 2.7 then show variations in the same quantities. We define variation of a

quantity q as:

δq(i) =q(i+ 1)− q(i)

q(i)

where q(i) is the value of the quantity estimated on the dataset if i observations

are removed.

The FAST-MCD method can be used in the same fashion. Once the robust

distances are determined for the whole dataset, one can start by removing the point

with the largest distance, then the second largest and so on and monitor some key

quantities, such as the covariance matrix determinant, until they stabilize.

13

It is apparent that excluding 100 to 200 points (3% to 5% of the whole data

set) renders the covariance and correlation structure much more stable7. Figures 2.8

and 2.9 show that this is indeed the case by displaying the individual elements of

the correlation matrix among the Bond, Credit, Mortgage and Equity time series as

a function of the number of points removed. Figure 2.10 displays the stabilization

of the four eigenvalues of the correlation matrix. As we shall see, as long as these

stabilization requirements are met, the exact identification of which data points belong

to the body and the tail of the distribution is in itself less important. For the concrete

example examined here we will define the first 160 truncated points as outliers and

the rest as the ’body’. Figure 2.11 shows visually this separation for the Equity,

Bond and Credit returns. In Figure 2.12 the robust Mahalanobis distances after the

removal of the outliers are shown. Finally, an influence plot can be seen in Figure

2.13 and 2.148.

2.1.1 Some Remarks About the MVE, MCD and RelatedMethods

It is important to make a clear distinction between the methods of the previous

section. The Minimum Volume Ellipsoid of h points out of N is defined as the

volume of those h points that are the most concentrated i.e. with the minimum

containing ellipsoid. So, if we are given a set of 100 points and h = 50, we have to

play with all the different combinations of 50 points out of those 100 until we find

those that have the containing ellipsoid with the minimum possible volume. That

ellipsoid is the Minimum Volume Ellipsoid (MVE) for that parameter h. The original

authors (Rousseeuw et al. [61], [62], [63]) first fix h, lets say h = [0.75N ], and then

find the most concentrated set of points for that value of h. Finding those points

means finding C and t in Definition 1. With this metric, obtained for the h most

concentrated points (for h = [0.75N ]), we can calculate the Mahalanobis distance for

7The same procedure but with the FAST-MCD method was tested and it points to the sameconclusions. However, we will not show the plots related to such approach because visually they areextremely close to those presented here.

8We note in closing this section that Bouchaud and Potters [3] find that, by excluding the lowesteigenvalues from the orthogonalization of an empirical correlation matrix obtained from S&P 500returns, the resulting efficient frontier becomes much more stable as a function of the temporalsubsection of data used (eg, first half versus second half). They also find that the purged correlationmatrices become much more stable over time. They point out that, even if an investor availed herselfof the knowledge of the realized returns, investing according to a Markowitz prescription using allthe eigenvectors of the correlation matrix would result in much riskier ex post portfolios than if theinvestment had been made using a small number of stable eigenvalues. Bouchaud and Potters tendto explain this by positing that the lowest eigenvalues are simply picking up noise.

14

Figure 2.2: Volume of the ellipsoid as a function of the number of points removed

Figure 2.3: Variation for Fig 2.2

15

Figure 2.4: Minimum covariance determinant as a function of the number of pointsremoved

Figure 2.5: Variation for Fig 2.4

16

Figure 2.6: Correlation matrix determinant as a function of the number of pointsremoved

Figure 2.7: Variation for Fig. 2.6

17

Figure 2.8: Correlations as a function of the number of points removed. The symbol ρsignifies correlation and the subscripts C, B, M and E denote Credit, Bond, Mortgageand Equity, respectively, as described in the Introduction.

18

Figure 2.9: Variation for Fig. 2.8

19

Figure 2.10: Variation in the eigenvalues of the correlation matrix as a function ofthe number of points removed. The eigenvalues are ordered from the smallest λ1 tothe largest λ4.

20

Figure 2.11: Body (circles) and outliers (stars) after the removal of 160 points.

Figure 2.12: Robust Mahalanobis distance vs date of observation of the ’body’. Thered line corresponds to χ2

4(0.975) = 3.34. The parameter h was set to [0.75N ]. Forthis graph the outliers were not removed but replaced by the average of the dataset inorder to have historical comparability with Figure 2.1. The extreme peaks of Figure2.1 are no longer present.

21

Figure 2.13: Influence plot of the standardized returns of Bond and Credit. Theinfluence contour lines show by how much approximately the correlation coefficientvary if we remove a single point sitting on that line. The red points represent theoutliers eliminated by the truncation algorithm. Some outliers may appear as notinfluential on the two dimensional graph but we must remember that the truncationwas carried out in a 4-dimensional space. On a such higher dimensional space they areinfluential. The method used here to draw the plot is the one presented in AppendixD.

22

Figure 2.14: Influence plot of the standardized returns of Bond and Equity. See Fig2.13 for an explanation.

all the points in the dataset. The reasoning behind the MCD is the same, but we

use Definition 2 instead. The numerical search for the MCD is fast thanks to the

FAST-MCD method. The search for the MVE is slower but, nevertheless, there are

some numerical techniques which render it faster than the brute force search. Once

the Mahalanobis distances are calculated in this way for all the points either with

the MVE metric or with the MCD metric they are called robust distances. They can

then be ordered and a procedure can be employed to eliminate one by one the most

distant points and to monitor variation in some key quantities.

What Meucci [44] does, is to calculate the MVE for all the N datapoints, then

to calculate it for N − 1 datapoints, then for N − 2 and so on. The calculation is

therefore carried out not just on the h = [0.75N ] most concentrated points. This

raises the question of how the MVE can be found for N−1, N−2 etc. datapoints. A

method which, the author states, might be sometimes suboptimal, is what presented

in Routine A, B, C above. We calculate with Routine A the MVE for N points. With

Routine B we eliminate the farthest outlier and thus get N − 1 points. The N − 1

points thus obtained are those that are contained by the MVE. In other words, if,

instead of removing the farthest outlier, we removed another point at random, we

would still have N − 1 points, but the ellipsoid that contains these points will have a

23

bigger volume than the one calculated on the N − 1 points obtained by removing the

farthest outlier, as defined by Routine B. By definition it would not be the minimum-

volume ellipsoid. To rephrase, if we want to find the MVE of N − 1 points out of N ,

we have two possibilities:

1. Play with all the possible N − 1 points out of N until we get to the N − 1

points with the smallest volume of the containing ellipsoid.

2. Start from N points, remove the farthest outlier, calculate the ellipsoid volume

for the remaining ones and we can be reasonably confident that this is the ellipsoid

with the minimum volume, i.e. the MVE.

Procedure 2 is, of course, faster and simpler than procedure 1 (which, if imple-

mented naively, is combinatorial), but it is not guaranteed to yield the truly minimum-

volume ellipsoid. We use the safe procedure 2 in our study, but we have checked that,

for the data we have been using, the two approaches give qualitatively very similar

results9.

2.2 Constructing the Marginal Distribution of the

‘Normal’ Risk Factors

Once the ‘normal’ portion of the data has been identified as suggested in the previous

section, a parametric distribution can be fitted to the empirical distribution of ‘nor-

mal’ returns obtained for each individual risk factor. Depending on the risk factor

and the sampling interval, a Gaussian distribution may or may not be adequate. For

daily returns, even the body of most time series is not well described by a Gaussian

distribution. We find that a Student-t distribution does an acceptable job for our

dataset.

As an example, Figures 2.15 and 2.16 present the unidimensional fit in the form

of a histogram and a Q-Q plot for the asset class Equity using the Gaussian and the

t-Student distributions10. Let’s define the empirical cumulative distribution by:

Fn(x) =1

n

n∑i=1

IXi≤x

9We highlight in closing that for the application of the MVE/MCD approach we described,instead of inspecting figures and decide visually a cut-off which separates body from outliers, in apractical implementation, an automatic criterion/criteria to terminate the truncation can be used.For example, whenever the variation of the correlation matrix determinant falls below a certainthreshold T , e.g. T = f(1/N), the truncation routine could be stopped.

10The maximum likelihood principle was used to estimate the parameters.

24

Figure 2.15: The fit to the S&P daily returns obtained using a Gaussian and a t-Student distribution.

where IXi≤x is the indicator function, equal to 1 if Xi ≤ x and equal to 0 otherwise,

and the KS distance as:

Dn = supx|Fn(x)− F ∗(x)|

where F ∗(x) is the hypothesized distribution with parameters estimated from the

sample. Since the parameters are not known but estimated, the KS cannot be applied

directly, at least by using the commonly tabulated critical points. We can resort

to Monte Carlo simulation11 in order to find the distribution of D and hence the

confidence levels.

The results are reported in Tab. 2.1 for 10,000 simulations. The better fit of the

t-Student distribution is evident.

11For the case of KS test in case of unknown mean and variance for the normal distribution, see[34]. The method to infer a distribution for the distance D will be very well described in Section 2.3in the context of copula fitting.

25

Figure 2.16: The Q-Q plot for the two fits in Figure 2.15.

Asset class P-value (t-Student) P-value (Gaussian)

Bond 0.5251 0.0086Credit 0.5914 0.0091Equity 0.3627 1.4496e-005Mortgage 0.9622 2.1527e-006

Table 2.1: KS test for the t-Student and Normal fits. P-value is the probability oferror if H0 is rejected. H0 is the null hypothesis that the empirical distribution comesfrom the tested one.

Asset class Mean Standard Deviation Degrees of Freedom

Bond 0.00027 0.00248 9.2Credit 0.00031 0.00260 9.8Equity 0.00031 0.00888 4.9Mortgage 0.00025 0.00136 4.6

Table 2.2: Parameters of the fitting t-Student distributions

26

2.3 Constructing a Joint Distribution of ‘Normal’

Risk Factors

The procedure described in the previous section produces univariate marginal distri-

butions for each risk factor. These marginals display fatter tails than Gaussians, but

nonetheless refer to non-extreme market conditions. The next step is to conjoin them

using a copula. We are going to choose the copula that provides the best fit of the

data12.

In general, if for each risk factor the ’normal’ portion of the data can be satis-

factorily described by a Gaussian distribution and they can be joined by a Gaussian

copula, we shall see that important computational savings can be achieved (because

a simple closed-form expression can be obtained in some cases by linking the distri-

bution of portfolio returns and the weights of the individual sub-portfolios). If this

is not the case, the procedure described in the following can still be applied, but the

final numerical search becomes somewhat more burdensome. Conceptually, however,

nothing changes.

Broadly speaking, there are three methods to estimate the parameters of a copula

from data [8]:

1. Exact Likelihood Method: a one-step method that estimates jointly the pa-

rameters of the marginals and those of the copula. Statistically, it is the most

efficient method, but it is computationally very onerous.

2. Inference from Marginals Method: a two-step procedure whereby the marginals

are fitted first, and the copula then. This is the method that has been used in

this study.

3. Canonical Maximum Likelihood Method: it estimates directly the parameters

of the copula without fitting the marginals, as the empirical marginals are used

instead.

Two types of copulas were fitted: t-Student and Gaussian. A given copula could

show a higher likelihood13 but the likelihood in itself is not a measure of how good a fit

is, nor does it give p-values on the basis of which we could reject a hypothesis. We will

present some shortcut methods that allow a fast computation of the goodness-of-fit.

12The MATLAB function copulafit was used to find the parameters of the copulae. This functionby default uses maximum likelihood estimation.

13Already Mashal et al. [41] showed that the t copula is a superior fit for equity returns on thebasis of a likelihood ratio test.

27

Goodness-of-fit (GOF) measures for copulas are well described in [21]. The com-

putational complexity of these methods is much higher than those used to assess the

GOF of one dimensional distributions. The starting point is the construction of the

empirical copula:

Cn(u) =1

n

n∑i=1

1(Ui1 ≤ u1, ..., Uid ≤ ud) (2.2)

u = (u1, u2, ..., ud) ∈ [0, 1]d

The second step is to use a kind of Kolmogorov-Smirnov or Anderson-Darling test for

the distance between this copula and a hypothesized copula. However, the empirical

copula may become very hard to estimate especially in the case studied in this thesis

consisting of 4 assets, 320014 data points each. Apart from computational demands,

the storage of information is another issue to solve15.

Malevergne et al. [37] propose another method, much faster, to tackle this problem

for a Gaussian copula. They show that under the hypothesis H0 that the dependence

between random variables can be described by a Gaussian copula, the variable:

z2 =

p∑j,i=1

Φ−1(Fi(xi))(ρ−1)ijΦ

−1(Fj(xj)) (2.3)

follows a χ2 distribution with p degrees of freedom. Φ−1(.) is the inverse cumulative

standardized normal distribution, the Fi are the cumulative marginal distribution

functions and p is the number of assets. The matrix ρ is defined as:

ρij = Cov[Φ−1(Fi(xi)),Φ−1(Fj(xj))]

They propose the following four distance measures:

KS : d1 = maxz|Fz2(z2)− Fχ2(z2)|

Average KS : d2 =

∫|Fz2(z2)− Fχ2(z2)|dFχ2(z2) (2.4)

Anderson−Darling : d3 = maxz|Fz2(z2)− Fχ2(z2)|√Fχ2(z2)[1− Fχ2(z2)]

Average Anderson−Darling : d4 =

∫ |Fz2(z2)− Fχ2(z2)|√Fχ2(z2)[1− Fχ2(z2)]

dFχ2(z2)

14This is the number of data points left after truncation (removal of 160 observations).15Some methods exist regarding how to optimize the calculation time and how to store the in-

formation more efficiently by using some properties of the empirical copula. A solution is proposedin [74] which comes with an accompanying MATLAB program pwlCopula freely available on theInternet.

28

The KS distances are more sensitive to deviations in the bulk of the distributions.

On the contrary, the AD statistics are more sensitive to the tails. A deviation from

statistics of common use is the presence of moduli in the numerator instead of squares.

The advantage is that such distances are less sensitive to outliers. The disadvantage

is that standard statistical tests (ω-test and Ω-test) cannot be used. However this

disadvantage disappears, since the covariance matrix is not known but is estimated

from data, and in any case the exact parameters needed in the derivation of such

statistics are not known. A bootstrap method can be used [15]. The following recipe

taken from [37] is followed also in this thesis:

1.) Given the original time series generate the Gaussian variables y = Φ−1(Fi(xi)).

2.) Estimate the covariance matrix ρ of the Gaussian variables y, which allows to

compute the variable z2 in Eq.2.3 and then measure the distance (according to one

or some of the proposed distance measures above) of its estimated distribution to the

χ2-distribution.

3.) Given this covariance matrix ρ, generate numerically a time series of N (3200

in this case) Gaussian random vectors with the same covariance matrix ρ.

4.) For the time series of Gaussian vectors synthetically generated with covariance

matrix ρ estimate its sample covariance matrix ρ

5.) To each of the N vectors of the synthetic Gaussian time series, associate the

corresponding realization of the random variable z2, called z2.

6.) Construct the empirical distribution for the variable z2 and measure the

distance between this distribution and the χ2-distribution.

7.) Repeating 10,000 times the steps 3 to 6, obtain an accurate estimate of

the cumulative distribution of distances between the distribution of the synthetic

Gaussian variables and the theoretical χ2-distribution.

8.) Then, the distance obtained at step 2 for the true variables can be transformed

into a significance level by using the synthetically determined distribution of distances

between the distribution of the synthetic Gaussian variables and the theoretical χ2-

distribution.

The procedure can be extended also to a t-Student copula as done in [31]. Kole

et al. show that if H0 is the hypothesis that the dependence structure comes from a

t-Student copula with correlation matrix ρ and ν degrees of freedom then the variable:

z2 =

p∑j,i=1

Ψ−1(Fi(xi), ν)(ρ−1)ijΨ−1(Fj(xj), ν)/p

29

where Ψ−1(Fi(xi), ν) is the inverse of the cumulative standard Student’s t distribution

with degrees of freedom parameter ν, is distributed according to an F-distribution

with degrees of freedom p and ν. The same steps as in the case of Gaussian copula

with obvious modifications can be applied to this variable and a distribution of the

four GOF distances estimated.

We show the results in Table 2.3, section of the table ”Entire dataset”. We see

that, although the t-Student copula shows a better fit witnessed by the higher p-

values16, neither of the copulae is a good absolute fit, as clear from the 3 out of 4

p-values, at least if we choose a threshold value of 1% or higher. To understand what

”went wrong”, let’s split the last 10 years of the dataset in 5 equal parts and run the

fit (univariate and copula) separately on each subset, after applying the truncation

methods of Section 2.117. The results of the fit are shown in the five upper sections of

Table 2.318. The t-copula now shows also a much better absolute fit for each subset.

The cumulative distributions of the 4 distances for the two copulas are in Fig. 2.20 -

2.21 for the last subset and p-values and distances in Table 2.4.

We present in Fig. 2.17 and 2.18 scatterplots between pair of assets for the five

subsets. Apart from being visually different, they show also different correlations in

time, due to the changing underlying structure. To confirm additionally this fact, we

present in Fig. 2.19 a sliding window of 250 datapoints of the correlation coefficient

between Bond and Equity. This instability leads the GOF tests to reject both the

elliptical copulae even with 1% threshold19. It would be natural at this point to choose

the last 2 years of data for the next steps, since the copula description provides better

results in terms of fit and the correlation structure is more stable. Nevertheless, in

16This is also confirmed by log-likelihood of the t-copula which in this case is 6990 while that ofthe Gaussian copula is 6620.

17We tested and ascertained that the truncation procedure of Section 2.1 has the same qual-itative implications for each subset. In particular, the same behaviour is observed for theMVE/MCD/Correlation Matrix/Correlations/Eigenvalues and their variations. After a series ofturbulent changes, a stabilization is observed, taken as a sign of the separation body-outliers. Onaverage 30 outliers were removed from each subset after application of the procedure. See AppendixG for figures related to last subset.

18We report the results for the copula fit only. Nevertheless, we tested also that the univariatet-Student distribution provides consistently a better fit for the asset classes marginals.

19The effect on copulae of changing correlations is examined in [59]. By introducing the conceptof dynamic copula Remillard et al. show that for the correlation between Can/US exchange rateand oil prices (NYMEX oil futures) the Gaussian copula is a good fit (high p-value) if the sample issplit in subsamples. This is not the case if the entire sample is taken into account. They attributethis to the change of dependence in time. Another line of research is that of Embrechts et al. [13]who try to model the change of dependence structure by time varying the parameters of the chosencopula. We will not pursue such route here. Finally, in [4], a test is offered to check against thehypothesis that the copula is changing over time.

30

Subset Gaussian copula P-value t copula P-value

1st subset 0.018 0.1040.012 0.2440.032 0.0870.002 0.392

2nd subset 0.046 0.3440.053 0.6420.044 0.0820.006 0.708

3rd subset < 10−3 0.098< 10−3 0.456< 10−3 0.156< 10−3 0.462

4th subset < 10−3 0.042< 10−3 0.514< 10−3 0.032< 10−3 0.692

5th subset 0.084 0.2520.138 0.2730.052 0.0620.128 0.328

Entire dataset < 10−3 < 10−3

< 10−3 0.002< 10−3 0.007< 10−3 0.197

Table 2.3: P-values representing the probability of error if H0 is rejected for the Entiredataset and the five subsets described in the text. H0 is the null hypothesis that theempirical multivariate distribution comes from the tested copula.

what follows, we will use the entire dataset because of the richness of interesting effects

it will exhibit later and that we would like to show. This choice will not compromise

at all the philosophy and the mathematics we will examine. However, we should bear

in mind for practical purposes the limits we showed in this Section20.

From now onwards we will implicitly assume that the t-copula is used to model

the ’normal’ part21. Once the marginals have been conjoined using the copula of

choice, the modelling of the ‘normal’ portion of the data is complete. In Fig. 2.22

the scatterplot of data points generated through the fitted t copula are shown.

20It is worth stressing again that the t copula provides consistently better fit than the Gaussiancopula over the entire dataset as well as in each subset.

21The degrees of freedom coming from the fit are ν = 5.72.

31

Distance Gaussian copula P-value t copula P-value

d1 0.067 0.084 0.049 0.252d2 0.305 0.138 0.081 0.273d3 0.247 0.052 0.205 0.062d4 1.166 0.128 0.311 0.328

Table 2.4: Distances and p-values representing the probability of error if H0 is re-jected for the last subset. H0 is the null hypothesis that the empirical multivariatedistribution comes from the tested copula.

Figure 2.17: Scatterplot between Bond-Mortgage for the five different subsets. In thelow-right corner the scatterplot on the entire dataset is shown. The correlations arerespectively 88.1%, 87.1%, 93.1%, 91.3%, 81.6% and 86.5%.

32

Figure 2.18: Scatterplot between Bond-Equity for the five different subsets. In thelow-right corner the scatterplot on the entire dataset is shown. The correlations arerespectively −13.1%, −43.2%, −2.9%, −29.6%, −43.5% and −22.2%.

33

Figure 2.19: Correlation between Bond and Equity for the entire dataset. The corre-lation is calculated within a sliding window of 250 datapoints.

Figure 2.20: Gaussian copula. Cumulative distributions of the four distances used toassess the GOF of the copula for the last subset.

34

Figure 2.21: t copula. Cumulative distributions of the four distances used to assessthe GOF of the copula for the last subset.

Figure 2.22: Scatterplot of the generated random numbers with the fitted copula (andmapped inversely from [0, 1] to the real axis with the help of the fitted marginals) forBond and Mortgage.

35

Chapter 3

Causality

The previous Chapter gave an original combination of methods and ideas to deal

with the isolation and modelling of the ’normal’ times returns. However, the truly

innovative idea of this thesis will be exposed in this and subsequent Chapters.

3.1 Treatment of Exceptional Events

The next step of our approach is the specification of those tail events that, given the

particular macroeconomic and financial conditions of the moment, may be expected

to have a large effect on a given portfolio. These significant events could be identified

on the basis of macro analysis, or starting from the vulnerabilities of a portfolio,

as suggested in [56]1. The analysis should be limited to a handful of ‘extreme but

plausible’2 occurrences that have the potential to inflict very serious damage (or,

indeed, to generate exceptional gains) in a given portfolio.

In the approach we propose these extreme events are modelled as Boolean vari-

ables, that is, they can assume ‘true’ or ‘false’ logical values. For instance, at the

time of this writing, the possibility of the break-up of the European Monetary Union

and default of some European countries could be such events. These relationships

between the variables will be modelled through Bayesian Nets (BN). The next sec-

tion introduces such technology on a general level3. while a concrete example will be

examined in Chapter 5.

1See the discussion in Chapter 13 in particular.2This is the expression used by Bank of International Settlements [47] in a stress-testing context.3A list of all the properties and supporting theorems of BNs will not be given in this thesis. A

deeper and more exhaustive introduction can be found in [78] and [52]. The next section will bebased on these two sources.

36

3.2 Bayesian nets

The task of determining a joint probability distribution P (E1, E2....En) of n random

variables is a complex one, even if there is the assumption that the random variables

exist in a dichotomous discrete state space. A table with 2n rows could be used to

specify the different combinations of the values of the Es and the associated joint

probabilities. The number of equations needed to fully determine this table is 2n − 1

considering the fundamental property that all the probabilities must sum up to 1.

The problem becomes quite complex both from computational and conceptual point

of view as the number of dimensions grow and with it the requirement of giving

growing number of relationships between the variables.

Fortunately, sometimes there are links which greatly simplify the problem and

this happens when one variable depends on a small set of other variables. Such

information allows the decomposition of such a large problem into little pieces each

involving only a subset of variables. In this context, graphical representation helps

particularly well as intuition fades when the number of dimensions grows. A natural

graphical representation is given by the Directed Acyclical Graphs (DAG)4. DAGs

are constituted from a collection of vertices and directed edges, each edge connecting

one vertex to another, such that there is no way to start from a vertex Ei and follow

a sequence of edges that leads back to Ei. In the context of probability, they became

known as Bayesian networks mainly because of (see [52]):

(1) the subjective nature of input information

(2) the reliance on Bayes conditioning to update information

(3) the distinction between causal and evidential way of reasoning.

An example of a Bayesian network is presented in Fig. 3.1. A formal definition

follows.

A Bayesian network B on a set of variables E consists of two components (see

[78]):

1.) A DAG G = (E,A) where E and A are respectively the sets of vertices and

directed edges in the graph. A family terminology is used when describing a net. For

instance, in Figure 3.1 E3 has two parents (ParE3): E1 and E2 i.e. a set of nodes

from which there is an arrow pointing to E3. E1 on the other hand has no parents

4Undirected graphs are used, usually to represent symmetrical spatial relationships. Directedgraphs are convenient for causal and temporal relationships.

37

and two children: E4 ∈ (ChiE1) and E3 ∈ (ChiE1) i.e. a set of nodes reached by an

arrow from E1.

2.) A probability specification S. For every variable Ei ∈ E a probability ta-

ble is specified containing the probability of that variable conditional on its parents

p(Ei|ParEi) (only marginal probability for nodes without parents).

A BN graph and its probability specification are linked by an assumption called

Markov Condition. According to this assumption, each variable, given the knowledge

of its parents, is probabilistically independent of all the variables which are not its

own descendants.

We will describe a graphical criterion that allows to read the conditional indepen-

dencies directly from a DAG5 (see [52]):

Definition 1: A path p is said to be d-separated (or blocked) by a set of nodes Z

if and only if

1.) p contains a chain i→ m→ j or a fork i← m→ j such that the middle node

m is in Z, or

2.) p contains an inverted fork (or collider) i→ m← j such that the middle node

is not in Z and such that no descendant of m is in Z

A set Z is said to d-separate the set of nodes X from the set of nodes Y if and

only if Z blocks every path from a node in X to a node in Y .

The following theorem combines d-separation and conditional independence (see

[52]):

Theorem 1: If sets X and Y are d-separated by Z in a DAG G, then X is inde-

pendent of Y conditional on Z in every distribution compatible with G. Conversely,

if X and Y are not d-separated in a DAG G by Z, then X and Y are dependent on

Z in at least one distribution compatible with G.

D-separation is important because it determines all and the only independencies

implied by the BN. For example, in Fig. 3.1, E1 d-separates E4 from E3 because:

1.) The path E4 → E1 → E3 contains a fork with middle node E1 which is in Z6

2.) It is the only path from E4 to E3

The described properties of BNs allow to reduce the number of probabilities to be

supplied to obtain the joint probability P (for example, we see that, thanks to the

d-separation, P (E4|E1, E2) = P (E4|E1)).

5This criterion becomes particularly useful in complex and large graphs.6It is actually the only node in Z.

38

By exploiting the chain rule of probability calculus:

P (E1, E2, ...., En) =n∏i=1

P (Ei|Ei−1, Ei−2, ..., E1)

we can deduce a master formula for the joint probability decomposition of a Bayesian

net by combining it with the Markov condition:

P (E1, E2, ...., En) =n∏i=1

P (Ei|ParEi) = (3.1)

=n∏i=1

P (Ei|Y i1 , Y

i2 ....)

The Y ij is the set of parents of the node i.

From now on we will adopt the following notation for joint probability tables:

E1 E2 E3 E4 Joint Probability0 0 0 0 p0

0 0 0 1 p1

0 0 1 0 p2

0 1 0 0 p3

1 0 0 0 p4

0 0 1 1 p5

0 1 0 1 p6

0 1 1 0 p7

1 0 0 1 p8

1 0 1 0 p9

1 1 0 0 p10

0 1 1 1 p11

1 0 1 1 p12

1 1 0 1 p13

1 1 1 0 p14

1 1 1 1 p15

(3.2)

We will assume that the i-th column is associated with the event Ei. We will

denote with Ei = 1 the i-th event happening, Ei = 0 otherwise. In the last column

the joint probability of the particular combination of events to the left occurring or

not is shown. p0 will be always linked to none of the events occurring.

3.3 Constructing a Bayesian Net

In what follows our attention will be restricted to S2 networks i.e. networks with

at most two parents. This may seem reductive but nevertheless introduces a lot of

39

Figure 3.1: Example of a Bayesian net

computational comforts and at the same time is easier to be managed conceptually

and to be explained to the upper management layers.

The simple example of Bayesian net in Fig. 3.1 shows a situation where variable E1

exerts a causal influence on variables E4 and E3, variable E2 affects variable E3, and

variables E1 and E2 do not directly influence each other. Each node will be associated

to an event that could hit the asset classes in the portfolio under consideration.

The specification of how the stress events interact i.e. the causal structure (the

topology) of a Bayesian net is one of the most important (and most delicate) steps in

the procedure proposed here. This is where the asset manager makes use of her under-

standing of the world today. This information, imperfect as it may be, is invaluable:

we know, say, that a fall in equities will cause equity implied volatilities to rise, and

not the other way around. Yet, when we use associative measure (such as correlations

and copulae) we discard this information. We do so at a great informational loss.

Since, as discussed below, the inputs to the probability tables at each node are

arrived at using a mixture of frequentist, implied and subjective probabilities, this

imposes a ‘cognitive’ limit on the complexity of the underlying net7. This is, admit-

tedly, a limitation of the approach. However, assigning a simplified picture of ‘how

the world works’ is certainly better than providing no such information at all – at

least as long as the inputs are transparent and auditable and sensitivity analysis is

easy to carry out.

The next section shows how the marginal probabilities and conditional probability

tables can be filled in. It must be stressed at this stage, however, that assigning con-

ditional probabilities can often be simpler than specifying the marginal probabilities

for the same events: it may be very difficult to assign a probability for, say, a 1987-like

market crash or for an overnight doubling of equity implied volatilities. However, one

can safely venture that the conditional probability of the increase in equity implied

7It must be stressed that, for reasonable application, the limit of the proposed approach is notat all computational, as the whole procedure is not CPU-time-intensive.

40

volatility given the equity market crash should be well above 50%. For the purposes

of our approach, this degree of precision is more than adequate.

3.3.1 Building the Marginal and Conditional Probability Ta-bles

The first step is to assign marginal probabilities to the nodes i.e. to the events that

could affect a portfolio. To carry out this task, market-implied information (such as

spreads of credit default swaps) can be profitably used. When this information is

not available (or deemed not reliable8) the subjective input of the asset manager is

in general required. Frequentist information is, of course, always available as a fall-

back, complement or ‘sanity check’. As far as the actual assignments are concerned,

spurious precision should be avoided. For the marginal probabilities only an order-of-

magnitude assessments is required (probably nothing more precise than once-a-year,

once-every-few-years, once-a-business-cycle, once-a-century statements can be made).

For conditional probabilities rarely do market-implied quantities provide useful in-

formation, and greater reliance must therefore be placed on subjective input. So, once

the marginal probabilities have been (approximately) elicited, the singly-conditioned

probabilities can be assigned to ‘buckets’ determined by asking the asset manager

whether the occurrence of event E1 decreases (increases) the occurrence of event E2

by a little, a lot, makes no difference or makes the occurrence of E3 almost certain.

It is well known that elicitation of conditional probabilities is a delicate task, and

that many cognitive biases make it harder than it seems. Rebonato [56] discusses

at length elicitation techniques that can make the task easier and less error-prone.

Given these cognitive difficulties in assigning conditional probabilities, it is not un-

common that the subjectively-assigned conditional probabilities may end up being

incoherent. This means that there exists no set of (non-negative, smaller-than-one)

joint probabilities from which the conditional probabilities could have been obtained.

A list of sanity checks derived from probability theory can be applied on them in

order to ensure their consistency with the probability axioms. A list of conditions is

the following [45]:

P (Ei|Ej) ≤ 1 (3.3)

P (Ei|Ej) ≤ P (Ei)/P (Ej) (3.4)

8For instance, because the CDS spreads might be impacted by a systemic distress that pushesaway market prices from fundamental values. Another reason might be the conflict of interestbetween those who provide the data on the one hand and the final users of it (see [60]).

41

P (Ei|Ej) ≥P (Ei) + P (Ej)− 1

P (Ej)(3.5)

P (Ei|Ej) ≥ 0 (3.6)

and the triplet conditions:

1 ≥ P (Ej|Ei)P (Ei|Ek)P (Ek|Ej)P (Ek|Ei)P (Ej|Ek)

≥ 0 for i 6= j 6= k (3.7)

Provided that the probabilities in the BN have been chosen in coherent way these

relationship will be satisfied. Otherwise, the portfolio manager must revise and up-

date them.

The performed sanity checks are not always sufficient to clean all the embedded

logical inconsistencies in a conditional probability matrix. Other techniques such as

Linear programming exist to cleanse the incoherent conditional probability matrix,

and return the closest (in some sense) coherent conditional probability table. See,

e.g., [33], [45] and [22].

Even if the asset manager is unable to provide values to the probabilities at all

nodes, still some useful bounds for them (as well as for the joint probability) can be

provided given the constraints of the assigned probabilities to the other nodes in the

BN. This again can be done with the help Linear Programming. A brief summary of

such technique and two useful original extensions introduced in this thesis to [56] are

illustrated in the next subsection.

3.3.2 Linear Programming

Linear Programming is a technique that maximizes a given linear function f(p0, p1, ...p2n−1)

(objective function) subject to linear constraints. The constraints that we encounter

in the construction of the Bayesian net can be expressed in the form:

~Ei.~p = p(i) i = 1...n

( ~Ei ∗ ~Ej).~p = p(i, j) i = 1...n− 1, j > i

...................

( ~E1 ∗ ~E2.... ~En).~p = p(1, 2...n) (3.8)

2n−1∑i=0

pi = 1

p0, p1, p2, .....p2n−1 > 0

42

where the pi, components of the vector ~p denote the different realizations of the joint

probability i.e. the last column of (3.2). The symbol * stands for multiplication

component by component of the event vectors ~Ei corresponding to the columns in

(3.2). The symbol . stands for the classical scalar product between vectors. p(i) is

the probability of the i-th event happening, p(i, j) 9 of the i-th and the j-th happening

together and so on. Of course, only the constraints for which the r.h.s. of the above

equations is known will be used.

Using a software package such as MATLAB lower/upper bounds for the joint

probability can be found by minimizing/maximizing the objective function equal to

one component of the vector ~p at a time, given the constraints. One can start only

with constraints on the marginal probabilities i.e. the first line of (3.8). If single

conditioned probabilities are added as constraints, the resulting bounds are tighter.

In fact, by introducing more constraints the allowed region of variability is further

restricted by additional hyperplanes thus decreasing the range of possible values for

the objective function.

There are additional constraints that could be implemented which will restrict

further the ranges for the joint probability. In this light, in what follows we propose

2 extensions to [54].

By definition of a Bayesian net there are relations of the type P (E4|E1, E3) =

P (E4|E1) which put additional restrictions. These relations state the independence of

a node from other nodes (which are not descendants) given the knowledge of the par-

ent(s). Lets call the relations of the type P (E4|E1, E3) = P (E4|E1), P (E3|E1, E4) =

P (E3|E1) etc. 1-level independence, relations like P (E4|E1, E2, E3) = P (E4|E1) 2-

level independence and so on for bigger nets. In general, n-level independence con-

straint means that a node is independent of n other nodes (non-descendants) given

the knowledge of the parent. This further restrains the boundaries for the joint prob-

ability. It is easy to introduce such constraints because they are linear. For example:

P (E4|E1, E3) =p12 + p15

p9 + p12 + p14 + p15

= P (E4|E1)

from which the linearity of the constraint is obvious10.

Let’s return one step back and ask the question: how can bounds be obtained

not only for joint probabilities but also for some other quantities like conditional

probabilities? This could be often the case when the asset manager feels bold enough

to guess some of the numbers in the matrix with single conditioned probabilities and

9p(i, j) can expressed as p(i, j) = p(i|j) ∗ p(j) = p(j|i) ∗ p(i).10The quantity P (E4|E1) is assumed known in order to use the constraint.

43

wants to have a guidance on how to choose the remaining ones. Let’s suppose he does

not feel confident about the value of P (E3|E1) and wants to have some idea about

its possible range. The objective function to minimize/maximize is given by:

P (E3|E1) =( ~E1 ∗ ~E3).~p

~E1.~p=

=p9 + p12 + p14 + p15

p4 + p8 + p9 + p10 + p12 + p13 + p14 + p15

which is visibly not linear. A trick to linearize such expression can be found in

[7]. A new scalar variable s is introduced :

s =1

~E1.~p(3.9)

which is the inverse of the denominator of the expression for P (E3|E1). By defining

the new variables:

~x = s ∗ ~p (3.10)

the objective function becomes:

P (E3|E1) = ( ~E1 ∗ ~E3).~x (3.11)

which is now linear in the variables. The constraints become:

~Ei.~x = s ∗ p(i) i = 1...n

( ~Ei ∗ ~Ej).~x = s ∗ p(i, j) i = 1...n− 1, j > i

...................

( ~E1 ∗ ~E2.... ~En).~x = s ∗ p(1, 2...n)

2n−1∑i=0

xi = s

s, x1, x2, .....x2n−1 > 0

which are easily solvable with the Linear Programming techniques. The approach is

directly extendable to doubly conditioned probabilities. The technique introduced in

this Section will be exploited in Chapter 5 in the context of a real world example.

44

3.3.3 From the Conditional Probability Tables to the JointProbabilities of Rare Events

After completing the previous steps the asset manager will have at her disposal a

Bayesian net which describes the assumed causal relationships among the rare events

and the associated conditional probability tables. In building this net, she has only

been asked to provide ‘easy’ (i.e., cognitively resonant and/or market-implied) prob-

abilities.

Once the topology of the net has been chosen, and supposed that all the condi-

tional probability tables have been filled in, the construction of the joint probability

is a mechanical task that can always be accomplished by (3.1).

In summary, the Bayesian-net technology provides the required bridge between the

‘easy’ probabilities the asset manager can feel confident to assign and the ‘difficult’

probabilities that are required. For the example in Fig. 3.1, for instance, only four

marginal, three singly-conditioned and one doubly-conditioned probability have to be

assigned in order to specify fully the 16 joint probabilities.

45

Chapter 4

Fusion

4.1 Mapping from the ‘Normal’ Joint Distribution

to the ‘Normal’ Return Distribution

At this point we have established the joint distribution for the risk factors, both in

the ‘normal’ and in the exceptional regime. For the ’normal’ part we have already

at our disposal a joint distribution f(r1, r2, ...) in the space of returns. As for the

exceptional part, our task will now be to link the derived probabilities to a set of

returns that the extreme events of the BN could cause, if they materialize.

4.2 Mapping from the Exceptional Joint Distribu-

tion to the Exceptional Return Distribution

Various returns distributions can be assumed for the asset classes in case the identified

stress events occur. For the sake of simplicity, we will assume that each joint event

will give rise to a series of extreme returns, to which a multidimensional Dirac-δ dis-

tribution is associated, with ‘area’ equal to its joint probability. Each Delta function

is ’centered’ around the losses (negative returns) incurred by the asset classes given

that the joint event happens. If we have a joint probability table with 16 joint events

(one of which is the event P (E1, E2, ..., En) - nothing happens - to which no Delta is

associated, see next section), we will have also 15 Delta functions1. The mathematical

form of the exceptional joint return distribution can be expressed as follows:

Fexceptional(r1, r2....rn) =1

1− p0

2n−1∑i=1

piδ(r1 − Li1, r2 − Li2...rn − Lin)

1The simplifying picture of using δ-distributions for the losses can be easily relaxed in favour ofmore complicated loss distributions, such as a Gaussian with a mean µ and uncertainty parameterσ, without changing conceptually the approach.

46

The sum on the right runs through all the exceptional events each weighted by

a joint probability pi. The symbol Lij is the loss associated to the j-th asset class in

case that the exceptional event with joint probability pi happens. There might be

some configurations of events in the joint table for which a stress event related to a

given asset class might be non-happening (Ej = 0). In this case, will assume the loss

associated to it to be simply Lij = 0.

4.3 Splicing the ‘Normal’ and ‘Exceptional’ Por-

tion of the P&L Distribution

At this point we can blend together the ’normal’ and stressed returns distributions

as:

F (r1, r2....rn) = (1− k)f(r1, r2, ...) +k

1− p0

2n−1∑i=1

piδ(r1−Li1, r2−Li2...rn−Lin) (4.1)

where f(r1, r2, ...) is the return distribution coming from the used copula and

marginals and k is a normalization factor. A crucial question is how to choose this

factor.

One of the probabilities of the joint events obtained from the Bayesian net (the

probability P (E1, E2, ..., En) = 1−p0) is the probability that none of the n exceptional

events E1, E2, ..., En that the asset manager has identified will occur. If the asset

manager is sufficiently confident in the Bayesian net part of the exercise, the joint

probability k = 1 − p0 directly provides the required ‘normalization’ of the ‘normal’

(continuous) and exceptional (Dirac-δ) contributions. Then the area of the discrete

portion of the distribution (that describes the exceptional events) is simply equal to

1 − p0. All the combinations of elementary extreme events would then have exactly

the probabilities implied by the Bayesian net.

Alternatively, the asset manager can take a less pure but more pragmatic approach

and treat the quantity k as a free parameter of the procedure. If this pragmatic choice

is made, the normal portion of the distribution is given mass 1−k and the sum of the

masses of Dirac-δ distributions are renormalized (uniformly scaled) so as to have total

mass k. As shown in the following, this allows the determination of the sensitivity of

the outputs to an input that is an important part of the procedure.

As a third alternative, k can be looked at as a distribution which can be determined

via the Maximum Entropy Principle. We will describe how this can be achieved in

Section 4.5.

47

4.4 Maximization of the Utility

By this stage of the procedure we have obtained a composite distribution of returns

which contains information both about the ‘normal’ market relationships and about

the specific links among the asset classes that we have posited to prevail if the par-

ticular market dislocations we have identified were to materialize. This information,

of course, is not present in any data base – yet any attempt at diversification that

did not take into account, albeit in an approximate manner, this possibility would be

both misguided and dangerous.

The next step of the procedure is the choice of a utility function. The pros and

cons of various choices have been discussed in literally thousands of papers. For the

application at hand, it is useful to choose a utility function that allows a simple and

continuous ‘tuning’ of the degree of risk aversion.

In the application described below, we will make use of a power utility function2,

parametrized by the coefficient β:

Upower(c) =1

1− β(c1−β − 1

)(4.2)

so as to give degrees of risk aversion greater and smaller than the logarithmic function

(that is obtained in the limit as β =⇒ 1):

Ulog(c) = limβ→1

1

1− β(c1−β − 1

)= ln (c)

The elements of the vector of weights, w, are varied under the budget constraint

wT1n = 1

until the expected utility of the objective function (the end-of-period final wealth3) is

maximized i.e. the final aim is to maximize over the weights the following expression

2Concretely, we try to offer a solution the problem of static, one-period portfolio optimizationunder budget constraints. As it is well known, (see, e.g., [64]), if the returns are independent andidentically distributed (iid) through time and the investor has a constant relative risk aversion, thenthe one-period (myopic) solution iteratively solves the dynamic problem. Given the emphasis givenin the present approach to the state-specific nature of the exceptional returns, the iid hypothesis isclearly not applicable. However, Merton [42] showed that, if the logarithmic utility function is used,then the iid assumption can be relaxed, and therefore the myopic solution also offers the optimalsolution. In our study we shall use both the logarithmic and the power utility function. When theformer (logarithmic) choice is applicable, we shall obtain a solution that is also valid in the dynamicsetting. For the power-law utility function we shall simply deal with a static, one-period problem.

3We adopted in this thesis the absolute wealth as objective. Other options exist such as therelative wealth to a reference portfolio or changes in wealth as in Prospect Theory.

48

at the investment horizon:

E[U ] = (1−k)

∫U(1+wr)f(r)dr+k

∑i>0

∫δ(r1−Li1, r2−Li2...rn−Lin)

pi1− p0

U(1+wr)dr

(4.3)

w∗ = argmaxw∈ΘE[U ]

where Θ are the set of constraints, such as the budget constraint, short selling, max-

imum allocations to each asset class etc. The symbol r is the vector of returns and

wr is a shorthand notation for the scalar product between the weight vector and the

return vector i.e.∑

iwiri.

We note in passing that the integration of (4.3) can be done easily in the case of

Gaussian distribution for f and an exponential utility function:

U(1 + wr) = −exp[−1 + wr

ξ]

In this case the integral is nothing but the characteristic function of the k-dimensional

normal distribution. In fact:∫U(1 + wr)f(r)dr =

1

(2π)k/2|Σ|1/2

∫−e−

1+wrξ exp

(−1

2(r − µ)′Σ−1(r − µ)

)dr =

= −exp[−1− wµξ

+1

2

w′Σw

ξ2] (4.4)

where µ is the vector of expected values of r. The integral involving the δ-functions

is straightforward and involves a sum of exponentials:

−∑i>0

pi1− p0

e−1+wLi

ξ

where again wLi is a shorthand notation for∑n

j=1wjLij. Although not analytically

solvable, the optimization can be easily carried out numerically with a standard soft-

ware package4 and it can be very fast. In this thesis, however, we will not use

exponential utility functions for the reasons explained above.

When the marginal distributions of sub-portfolio returns cannot be satisfactorily

modelled by a Gaussian distribution or when the portfolio contains strongly non-

linear products, then the optimization must be carried out by Monte Carlo simulation.

This is, of course, more computationally demanding, but, given the relatively small

number of sub-portfolios, does not pose a prohibitive burden on the whole procedure.

Moreover, resampling techniques can be used, and they are known to give rise to

4Again the MATLAB function fmincon is quite suitable for this problem.

49

reduced sensitivity to the estimated parameters. This is a well-known problem with

the Markowitz optimization technique, as the results can be highly variable even for

small changes in input parameters5. This was one of the drivers that motivated Black

et al. [1] in their approach. Unlike the Black-Litterman model, where the sensitivity

is diminished by smoothing the inputs, in the resampling procedure the same result

is reached by averaging the outputs of several scenarios6.

The weights w∗ provide a coherent allocation to the various sub-portfolios, where

the adjective ‘coherent’ stresses that the allocation has been arrived at by taking into

account in a consistent manner the investor’s preferences over the outcomes associated

both with normal and exceptional market conditions. ’Protection trades’ are not

attached as an incoherent afterthought to an optimization carried out assuming a

stable investing universe.

The sensitivity of the outputs to the subjective inputs can be readily explored, as

is shown in the worked-out example in the following Chapter.

4.4.1 Numerical optimization

This subsection will give an overview of the numerical routine that will be followed

to find the optimal allocation. At this point we have found the univariate marginals

and the best fit copula. They were fitted on daily returns data but the investment

horizon in the future might be different from one day. We thus need to simulate the

returns over a longer period of time7. We will apply the following steps:

Step 1 : Simulate N random numbers with the MATLAB function copularnd with

the fitted copula parameters.

Step 2 : Map them through the inverse cumulative univariate marginals to returns

for each asset class.

Step 3 : Go to Step 1 for each day until the end of the investment horizon T .

Step 4 : By multiplying the returns ri = (1 + ri1)(1 + ri2)...(1 + riT )− 1, where rij is

the return of the asset class i for the j-th day, at the end of the process we will have

N points of the joint return distribution at the investment horizon.

5See, eg, [5]. As for the relative importance of the parameters, Ziemba et. al [9] show thatestimation errors in the expected returns are, as a rule of thumb, about 10 times more importantthen errors in variances and covariances.

6The approach of this thesis will be linked to the BL model in Appendix B.7If the investment horizon is sufficiently long e.g. 1 year, we could exploit the aggregational

Gaussianity of asset returns (See Introduction, footnote 1).

50

Step 5 : Feed into the optimization function fmincon8 a function, described in the

next 2 substeps, to search for the maximum value of the utility in the space of the

weights w, given the constraints on them (∑p

i=1 wi = 1 and wi > 0).

Step 5a: Multiply each of the simulated p-dimensional N points by the weights

w of the asset classes9. Feed into the utility function U(1 +∑p

i=1 wiri) where 1 +∑pi=1wiri is a N x 1 dimensional vector. Take the mean value to obtain the expected

value of the utility function coming from the normal part of the problem.

Step 5b: Splice the mean value by giving it weight 1− k with k times the utility

coming from the stress distribution. Since the stress returns are fixed, there is no

need for a Monte Carlo simulation for this part.

For the practical example in the next Chapter we will use N = 5, 000, 000.

4.5 Maximum Entropy for the choice of k

As stated in Section 4.3, the parameter k can be treated as a free parameter and

be given an arbitrary value by the asset manager according to his beliefs (or given

the value 1 − p0). Such precision might appear as spurious. It is appealing to ’blur’

its value by introducing the concept of degree of confidence that the investor has in

his inference through the Bayesian net (which could include also estimation errors in

the marginal/conditional probabilities). As an analogy, in the Black-Litterman (BL)

model, the views that the investor expresses are not precise but they are distributions

with variance Ω2 proportional to the degree of confidence in the expressed opinions.

If she is very confident in her forecast the variance Ω2 → 0 and the views are delta

functions centered around the forecasted value. Viceversa, if the confidence is low the

distributions are much more dispersed.

What we will do in this section is to represent k as a distribution g(k) and not as

a fixed parameter. Such distribution should be the least prejudicial (least committal)

distribution compatible with the maximum prior knowledge we have about k and can

be chosen via the Principle of Maximum Entropy (PME)10. Our aim is to propose the

maximum unpresumptive distribution, the one that does not incorporate through its

moments and domain more information than we have, given a set of constraints. The

8The interior-point algorithm was used. The termination tolerance on the function value was leftto its default setting: 10−6.

9An initial set of guess weights must be given.10For an introduction see [27].

51

objective is to maximize the information:

H(k) = −∫g(k)log(g(k))dk

over its domain by assuming the following constraints for g(k):

1.) It is defined in the interval [0, 1]

2.) Its mode is at the point k = 1− p0

3.) Our degree of confidence is given by a variance Ω2

The second point reflects our wish of having a peaked distribution around the

event ’nothing happens’ coming from the Bayesian net i.e. we want it to be the most

probable value.

This problem is already solved in [14]. Dowson et al. show that in the closed

interval [0, 1] the most unpresumptive distribution, if we know the mean and the

variance, is:

1.) The truncated exponential if m2(E[k]) = m2te(E[k])

2.) The truncated Gaussian if m2(E[k]) < m2te(E[k])

3.) The U-shape distribution if m2(E[k]) > m2te(E[k])

where m2te(E[k]) is the second moment of the truncated exponential distribution

as a function of its first moment (the mean E[k])11. For example, if we assume certain

values for the mean and the variance, we can calculate the second moment and, if

this is smaller than the one coming from the truncated exponential with the same

mean, the maximum entropy distribution is the truncated Gaussian distribution.

In order to satisfy all the assumptions, the most natural solution for the problem

at hand is the second option i.e. the truncated Gaussian distribution12.

Let’s see how the optimal weights can be calculated in this case. In the case of

fixed k, we remind that the optimal values of w are found as follows:

E[U ] = (1−k)

∫U(1+wr)f(r)dr+k

∑i>0


pi1− p0

U(1+wr)dr

(4.5)

w∗ = argmaxw∈ΘE[U ]11For practical purposes the small values of the mode such as those in a stress testing context,

however, we show that the three conditions can be approximated as 1.) E[k] = Ω 2.) Ω < E[k] 3.)Ω > E[k]. See Appendix E for the proof and a general introduction to the truncated exponentialand Gaussian distributions.

12Because of the constraint on the mode we cannot choose the truncated exponential or the U-shape distributions.

52

Suppose that k is not fixed but follows a distribution g(k)13. In this case, the

optimization is modified as follows:

Ef [U ] =

∫g(k)E[U ]dk = (4.6)

=

∫(1− k)g(k)

∫U(1 + wr)f(r)drdk + ....

...+

∫kg(k)

∑i>0

∫δ(r1 − Li1, r2 − Li2...rn − Lin)

pi1− p0

U(1 + wr)drdk

w∗f = argmaxw∈ΘEf [U ]

To carry out the integration in (4.6) with respect to k is pretty straightforward:

Ef [U ] = (1−µ)

∫U(1+wr)f(r)dr+µ

∑i>0


pi1− p0

U(1+wr)dr

(4.7)

where µ is the mean value of g(k):

µ =

∫kg(k)dk

Now equation (4.7) resembles (4.5) with k replaced by µ. The conclusion is that

the optimal values for (4.6) are the ones of (4.5) by making the replacement k = µ

i.e. the solution of (4.6) is given by the optimal allocation given by (4.5) at the one

particular point: the mean point. However, the mean point and the mode do not

always coincide for the truncated Gaussian, as we will explain below. The particular

simple solution found here only comes about because of the very special dependence

(linear) of the distribution (4.3) on the parameter k. If the uncertainty had been in a

parameter that affects non-linearly the distribution (say, in the mean or the variance),

this particular simple result would not apply.

The choice of a truncated Gaussian may be good if we consider a small uncertainty

around the mean. For a bigger uncertainty the distribution acquires features which

might not seem very desirable. First of all, the mean starts to be displaced from the

mode, a phenomenon negligible for small values of Ω when the distribution can be

considered almost symmetric. Moreover, for example, if mode = 0.161914 and Ω = 0.1

the truncated Gaussian has a value greater than 0 in k = 015. Thus the probability

of the system being in a small and close interval around k = 0 may be not negligible.

13The reasoning that follows applies to any kind distribution.14This is the value that we will obtain for our practical example in Chapter 5 but any other value

is just as good for this example.15If Ω = 0.05 the value at k = 0 is very close to 0 so this choice of Ω is a ’good’ choice.

53

Another drawback may be the fact that we can not reach higher values of k (let’s

say k = 0.916) without inflating substantially the variance. However, if we increase

the variance too much, we saw that the truncated Gaussian becomes a U-distribution

which is not a desired property of the problem at hand.

We search for a remedy to this by looking at some other features which go beyond

the constraints on the first two moments and have not been exploited yet. We assume

the following for g(k):

1.) It is defined in the interval [0, b] where b ≤ 1 i.e. we relax the assumption to

have a coverage of the full interval [0, 1].

2.) Its mode is at the point k = 1− p0

3.) Our degree of confidence is given by a variance Ω2

4.) g(0) = g(b) = 0 i.e. we require the distribution function to vanish at the

extrema.

This kind of problem seems rare and disregarded in previous literature and, to

the knowledge of the author, only Goodman [23] has dealt with it in the context

of Geophysics. Goodman proves that the Maximum Entropy distribution in a closed

interval having prescribed the first two moments and which additionally satisfies point

4.) is the modified log-normal distribution (Johnson SB distribution):

g(k) =b√

2πσk(b− k)exp[−1

2(log( k

b−k )− µσ

)2] 0 ≤ k ≤ b (4.8)

For a description of this distribution see Appendix F. The Johnson distribution, as

well as the normal and lognormal distributions, has only two independent moments

and all other moments can be expressed through them17. The shape of the Johnson

distribution can be seen in Fig. F.1. By taking the first derivative of (4.8) and equat-

ing to 0 at the modal value we can obtain an useful relation between the parameters18

(see Appendix F) which for b = 1 reads:

σ2 =−µ+ log( m

1−m)

2m− 1

16The value k = 0.9 is just for this example. If 1− p0 = 0.01, for example, values of k = 0.3 maybe unreachable.

17It is important to underline that both for the truncated Gaussian and for the Johnson SB dis-tributions, given two parameters of the set (mode, mean, variance), the third parameter is uniquelydetermined i.e. by knowing 2 of the 3 parameters is enough to know the third without inputtingextra information, let’s say, on higher moments.

18The parameters µ and σ do not coincide with mean and variance. See [28] for a detailedanalytical derivation of the moments of the Johnson SB distribution which, however, do not have asimple form.

54

where m is the mode fixed by the assumptions of the problem. Having one free

parameter left, we can find it numerically by availing ourselves of the knowledge of

the variance Ω2 which is a given input of the problem together with the mode.

The drawback of this kind of distribution is that it is not so immediate to grasp

as the symmetrical Gaussian distribution. The variance is easily imagined as a width

of a bell curve but it is more difficult in the case of an asymmetrical distribution such

as the Johnson. However, we can always use a reasoning helped by the Chebishev

theorem e.g. if mean = 0.2 and Ω = 0.1 we are sure that the probability of exceeding

0.2 + 3 ∗ 0.1 = 0.5 is at most approximately 10%.

In the next Section we will give a real world example by treating k both as free

parameter and as a distribution and will show the allocations with the two methods.

55

Chapter 5

A Worked-Out Example

5.1 Construction of the Bayesian Net

In order to illustrate the strengths and weaknesses of the procedure, we present in

this section a simple worked-out example in some detail. The stylized problem we

examine in this section is that of the allocation among the asset classes presented in

the Introduction.

The analysis to identify the body of the distribution was carried out as described in

Chapter 2. The cut-off point was determined using a combination of the ellipsoid and

the minimum covariance determinant methods. 160 data points were excluded by the

algorithm, corresponding to approximately 5% of the full data set. The marginals

of the truncated data set were modelled using a t-Student distribution and were

conjoined through a t-Student copula.

We assume an investment horizon of 3 months1 and in the Monte Carlo simulation

we will simulate all the daily returns of the ’normal’ part to the end of this interval.

The traditional inputs to a Markowitz optimization (i.e., the correlation matrix

and the stand-alone expected returns and volatilities) are given below2Correlation Bond Credit Equity MortgageBond 1 0.96 −0.22 0.87Credit 0.96 1 −0.16 0.87Equity −0.22 −0.16 1 −0.11

Mortgage 0.87 0.87 −0.11 1

1For an investment horizon of such length one could resort also to the aggregational Gaussianity

of asset returns (see Introduction, footnote 1) to save computation time. It depends on the level oftrade-off precision/time one is willing to accept.

2In real-life applications, the expected returns are likely to be assigned using a subjective Black-Litterman approach. This extension has not been incorporated in the present analysis for the sakeof brevity, but it is conceptually straightforward. The use of shrinkage estimators such as Stein’sestimator [73] is another alternative.

56

Figure 5.1: The Bayesian net used for the example in this Section

Expected Return

Bond 1.81%Credit 2.07%Equity 2.05%

Mortgage 1.67%

VolatilityBond 2.27%Credit 2.36%Equity 9.20%

Mortgage 1.44%

The Bayesian net in Figure 5.1 (repeated here with new labelling for ease of

reference) was assumed, with the following interpretation for the four events:

• Event Bond: Large sell-off in the Government Bond market due to unexpected

change in monetary stance, with 1994-like increase by 75 bp of the Fed target

rate.

• Event Mortgage: Large widening of Mortgage spreads due to the forced selling

of inventories prompted by new capital rules.

• Event Credit: Large widening of Credit spreads.

• Event Equity: Sell-off in the Equity market.

In this example it is assumed that both the sell-off in Government Bonds and

the widening of mortgage spreads would increase the probability of investment-grade

spreads widening significantly. It is also assumed that the sell-off in Government

Bonds and the widening in mortgage spreads do not directly affect each other (given

the specific reason for the mortgage weakening). Finally the sell-off in Government

Bonds is assumed to have a strong negative effect on the equity market. In con-

structing this example it was assumed that the event were identified on the basis of

the vulnerabilities of each portfolio. This being the case, each stress event is simply

57

associated with a single portfolio. This simplifies the presentation, and the example

could be straightforwardly generalized.

Given the known composition of each subportfolio, x, the stress loss, L(x), that

would occur, if the associated stress event occurred with certainty, can be estimated.

These subportfolio losses are given in the vector below:Stress Losses

L(Bond) −5%L (Credit) −20%L (Equity) −10%

L (Mortgage) −5%

For clarity, each entry gives the loss if the stress event associated with each asset class

materialized, and the whole portfolio had been invested in that asset class.

The marginal probabilities of occurrence, P (x), of the four stress events were

chosen as follows: Marginal probability

P (Bond) 4%P (Credit) 7%P (Equity) 7%

P (Mortgage) 3%

Given the causal structure embedded in the Bayesian net above, only three singly-

conditioned probabilities, P (x|y), are required in order to specify the full joint prob-

ability matrix. For this example these were assumed to be as follows:

Conditional probabilityP (Credit|Bond) 40%

P (Credit|Mortgage) 25%P (Equity|Bond) 50%

The doubly-conditioned probability, P (Credit|Bond,Mortgage), was assumed to be

50%3. The set of probabilities chosen in such way satisfies the sanity checks (3.3)

- (3.7). The conditional probabilities were also ’cleaned’ through the Linear Pro-

gramming techniques in [56]. The resulting joint probabilities are shown in the table

below.

3As the subjective conditional probabilities are obtained by elicitation, sometimes it is too difficultto venture a guess for P (Credit|Bond,Mortgage). The linear programming techniques mentionedin the previous chapter and subsection 3.3.2 were used as a guide to infer a lower and upper boundfor this and the other probabilities.

58

Bond Credit Equity Mortgage Joint Probability0 0 0 0 0.83810 0 0 1 0.02080 0 1 0 0.04600 1 0 0 0.04461 0 0 0 0.01170 0 1 1 0.00110 1 0 1 0.00650 1 1 0 0.00251 0 0 1 0.00031 0 1 0 0.01171 1 0 0 0.00770 1 1 1 0.00041 0 1 1 0.00031 1 0 1 0.00031 1 1 0 0.00771 1 1 1 0.0003

Note that, given the assumed causal dependence among the four events, the most

unlikely event is not the simultaneous occurrence of all the four stress events. For

instance, the joint events:

• only Bond and Mortgage happen,

• the Equity, Mortgage and Bond events happen with no widening of credit

spreads

• Credit, Mortgage and Bond happen without a sell-off in Equity

have the same probability of occurrence as all events happen simultaneously. A

glance at the posited causal structure confirms that this makes intuitive sense.

The asset manager can either believe that the probability of no stress event pro-

vided by the Bayesian net procedure is fundamentally correct (in which case the mass

of the normal distribution would be 0.8381); or, as suggested above, can treat the

normalization factor as a free parameter or even assume it is a distribution and use

the Maximum Entropy Principle (see Chapter 4).

Given this information, and for any vector of weights w, the optimal portfolio

weights can be calculated by maximizing (4.3). A logarithmic and a power utility

function (with exponents β of 1.2 and 0.8 to straddle in terms of risk aversion the

logarithmic utility) were chosen for the optimization. The results are shown below for

three different degrees of risk aversion (See Figs 5.2 to 5.4). The asset allocation is

59

Figure 5.2: The asset allocation to the four sub-portfolios as a function of the to-tal probability mass of being in a normal state obtained with a logarithmic utilityfunction. The allocation on the far right is the Markowitz-like allocation that wouldapply in an exclusively normal world.

shown as a function of the mass in the normal body of the distribution (this quantity

is 1 − k in the captions below). Therefore the asset allocation that can be read

for k = 0 is a Markowitz-like allocation: the allocation that would be chosen if the

exceptional events never materialized and correlations and variances were therefore

homoscedastic. The smooth curves that evolve to the left of these four points show

how these allocations change as the probability of the world entering a distressed state

increases. The results are discussed in detail in the following section. In Appendix C

the efficient frontier for different values of k is derived and compared.

5.2 Discussion of the Results

The first observation is that the qualitative features of the results display a reasonably

mild dependence on the degree of risk aversion. Given the well-known difficulties in

estimating this quantity in a reliable manner, this is a nice robustness feature of the

procedure.

60

Figure 5.3: Same as Fig 5.2 for a power utility function β = 0.8.

Figure 5.4: Same as Fig 5.2 for a power utility function β = 1.2.

61

Figure 5.5: Allocations with a two-moment simplified optimization approach for alogarithmic utility function.

We will introduce an useful concept at this point and it will be used to shed light

on the discussion that follows. The distribution (4.1) has its own moments which are

calculated up to the second in Appendix A. We will call them total moments exactly

because they are moments of the spliced distribution. We can use an optimization

engine assuming Gaussian distribution of returns with mean and variance given by

the total moments and infer the optimal allocations. We can thus have in what follows

an intuitive benchmark against this simple approximation that relies on the first two

moments. Of course, we do not expect the results to coincide with those coming

from the exact calculation engine based on the t-copula and Monte Carlo of the full

distribution but nevertheless we could gain an useful insight. The total moments

are function of the quantity 1 − k and for each value of it we will have a different

allocation. In Fig. 5.5 we can see the allocations with this approach4. In Tables 5.1

and 5.2 the used moments’ values are shown.

One can then profitably begin the analysis from the central case of the logarithmic

utility function (see Figure 5.2). In the Markowitz-like case (no exceptional events)

4The approximation of total moments seems to hold quite well, at least if we inspect visually thefigures. Under what circumstances the approximation holds is described in Appendix A.

62

the allocations are divided among two of the four asset classes: Credit and Equity. In

this benchmark case, the highest allocation is for the Credit subportfolio (92%). This

is not surprising, given the high return expected from this asset class. The Credit

subportfolio, however, has been assumed to be the most vulnerable to the associated

stress event (L(Credit) = 20%). Therefore, as the probability mass in the normal

state decreases, the allocation to Credit quickly decreases. Indeed, for all degrees

of risk aversion, the allocation to the Credit rapidly goes to zero as soon as the

probability of the world not remaining in a normal state over the investment horizon

is non-negligible. This result is both intuitive and, to some extent obvious. It pays

stressing, however, the value of such an obvious recommendation: in some situations

the expected returns from some assets may appear very attractive (compared to when

their risk is assessed by looking at the historical record) exactly because the market

is pricing in the vulnerability to events that have not materialized yet. In the run-up

to the 2007 crisis, for instance, exotic instruments such as Constant-Proportionality

Debt Obligations (CPDOs) commanded a AAA rating, but a yield of 200 basis points

above other government AAA debt. Similarly, the attractive yields for peripheral

European government debt in the spring and summer of 2010 may have appeared

inexplicably attractive if gauged on their historical record. A frequentist-based asset

allocation technique would suggest very high allocations to these inexplicably cheap

assets.

The non-monotonic behaviour of the allocation to the Equity and to the Bond

subportfolios is less intuitively obvious and deserves some discussion. The allocation

to Equity is low close to the normal state but as the probability of stress events

increases, i.e., moving towards the left in the three graphs, its allocation begins to

increase. This is easy to understand, because the stress loss associated with the

Equity subportfolio is lower than that for Credit (L(Equity) = 10% vs L(Credit)

= 20%), but their marginal probability of occurrence is the same: 7%. In order

to understand better the behaviour of the allocations, we have to look at the event

correlation implied by the Bayesian net assumed above. The event correlation matrix

is defined as5:

ρij =P (Ei, Ej)− P (Ei)P (Ej)√

[P (Ei)(1− P (Ei)P (Ej)(1− P (Ej))]

5For a detailed discussion of the event correlation matrix, see [56] and [76]. The event correlationmatrix clearly has nothing to do with the correlation matrix among the returns of the assets whichis given by the Total correlation matrix worked out in Appendix A and shown in Tables 5.3 - 5.6for some values of the normalization factor.

63

and in our specific example:Event Correlation Bond Credit Equity Mortgage

Bond 1 0.26 0.34 0Credit 0.26 1 0.09 0.12Equity 0.34 0.09 1 0

Mortgage 0 0.12 0 1

From the table we can see that the correlation of occurrence of the loss associated

with Equity and Credit is low: 0.09. A simple calculation of the total expected

return (i.e., the expected return including stress events, see Table 5.1) then shows

that this quantity decreases less rapidly as we move towards the left in the graphs

above for Equity than for Credit, but the two assets continue to retain a negative

total correlation (See Tables 5.3 - 5.6). Therefore, as long as the probability of being

in the stressed state is low, the diversification benefit for allocating between Equity

and Credit arising from the negative correlation in the ”normal” correlation matrix

is not completely lost. However, as we move further to the left (i.e, as the probability

of entering the excited state increases) the associated losses both to Equity and to

Credit start to be non-negligible compared to the other two assets and their total

expected returns becomes comparable to them. This behaviour is also confirmed by

the allocation engine with the total moments in Figure 5.5.

Let’s now examine closely the allocation to the Bond asset class. To understand

how it changes with respect to the allocation to Credit, we have to consider again

the total expected return. Note first that the no stress allocation to Bond is 0 for all

degrees of risk aversion. This is because of its low expected return compared to Credit

with which it is highly correlated (96%). The allocation to Bond replaces completely

the allocation to Credit after k approaches 4% since its total expected return starts to

be comparable to that of Credit (due to a lower loss L(Bond) = 5% and lower marginal

probability of occurrence P (Bond) = 4%). Indeed, a calculation of the total expected

return for Bond and Credit shows that, for k = 4%, they become very similar (1.69%

versus 1.64%), but the total standard deviation for Bond is significantly lower (2.34%

versus 3.70%). Furthermore, Bond is more negatively correlated with Equity than

Credit (-18% versus -6% for that level of the normalization factor k) with which it

can diversify better. For these reasons Bond starts replacing Credit as a partner of

Equity.

The subportfolio Bond replaces Credit but its allocation weight does not keep on

increasing but starts to be slowly shifted toMortgage. This happens because its stress

marginal probability of occurrence is higher than that of Mortgage (P (Mortgage) =

64

Figure 5.6: Allocations as a function of the degree of confidence - truncated Gaussian

3% vs P (Bond) = 4%) while having the same stress loss: L (Mortgage) = L (Bond)

= 5%. The total return of the two assets becomes comparable around 75%. However,

Mortgage starts rising before that point because it consistently enjoys lower volatility

than Bond.

5.3 Maximum entropy allocations

The allocations obtained by choosing for g(k) a truncated Gaussian with mode =

0.1619 are plotted in Fig. 5.6 for different values of the confidence parameter Ω.

Since the allocations are almost constant around the mode, in the example we are

examining, there is no big variation in the optimal weights, even for high values of Ω.

The same allocation sensitivity study was carried out also in the case of a Johnson

distribution and the results are shown in Fig. 5.7 with b = 1. We see that, as the

variance is increased, more weight is given to the Mortgage class with respect to case

of the truncated Gaussian. This is understandable since the Johnson distribution

is more skewed to the right than the truncated Gaussian and its mean is displaced

by definition from the mode thus capturing bigger values of the factor k where the

allocation to Mortgage dominates.

65

1− k (%) Bond Credit Equity Mortgage

100 0.0181 0.0207 0.0205 0.016799 0.0178 0.0196 0.0199 0.016498 0.0175 0.0186 0.0192 0.016297 0.0172 0.0175 0.0186 0.015996 0.0169 0.0164 0.0180 0.015795 0.0166 0.0153 0.0173 0.015494 0.0163 0.0143 0.0167 0.015193 0.0160 0.0132 0.0160 0.014992 0.0157 0.0121 0.0154 0.014691 0.0154 0.0111 0.0148 0.014490 0.0151 0.0100 0.0141 0.014189 0.0148 0.0089 0.0135 0.013888 0.0144 0.0078 0.0129 0.013687 0.0141 0.0068 0.0122 0.013386 0.0138 0.0057 0.0116 0.013185 0.0135 0.0046 0.0109 0.012884 0.0132 0.0036 0.0103 0.012583 0.0129 0.0025 0.0097 0.012382 0.0126 0.0014 0.0090 0.012081 0.0123 0.0003 0.0084 0.011880 0.0120 -0.0007 0.0078 0.011579 0.0117 -0.0018 0.0071 0.011278 0.0114 -0.0029 0.0065 0.011077 0.0111 -0.0039 0.0058 0.010776 0.0108 -0.0050 0.0052 0.010575 0.0105 -0.0061 0.0046 0.010274 0.0102 -0.0072 0.0039 0.009973 0.0099 -0.0082 0.0033 0.009772 0.0096 -0.0093 0.0027 0.009471 0.0093 -0.0104 0.0020 0.009270 0.0090 -0.0115 0.0014 0.0089

Table 5.1: Total mean. See Appendix A for details of the calculation.

66

1− k (%) Bond Credit Equity Mortgage

100 0.0227 0.0236 0.0920 0.014499 0.0229 0.0276 0.0919 0.014798 0.0231 0.0311 0.0918 0.015097 0.0233 0.0342 0.0917 0.015296 0.0234 0.0370 0.0915 0.015595 0.0236 0.0396 0.0914 0.015794 0.0238 0.0420 0.0913 0.016093 0.0239 0.0442 0.0911 0.016292 0.0241 0.0463 0.0910 0.016491 0.0242 0.0483 0.0909 0.016790 0.0244 0.0502 0.0907 0.016989 0.0245 0.0520 0.0906 0.017188 0.0246 0.0537 0.0904 0.017387 0.0248 0.0553 0.0902 0.017586 0.0249 0.0569 0.0901 0.017785 0.0250 0.0584 0.0899 0.017984 0.0251 0.0599 0.0897 0.018083 0.0253 0.0613 0.0895 0.018282 0.0254 0.0626 0.0893 0.018481 0.0255 0.0639 0.0891 0.018580 0.0256 0.0652 0.0889 0.018779 0.0257 0.0664 0.0887 0.018878 0.0258 0.0676 0.0885 0.019077 0.0258 0.0687 0.0883 0.019176 0.0259 0.0698 0.0881 0.019375 0.0260 0.0709 0.0879 0.019474 0.0261 0.0719 0.0877 0.019573 0.0262 0.0729 0.0874 0.019772 0.0262 0.0739 0.0872 0.019871 0.0263 0.0749 0.0869 0.019970 0.0264 0.0758 0.0867 0.0200

Table 5.2: Total volatility. See Appendix A for details of the calculation.

k = 95% Bond Credit Equity Mortgage

Bond 1 0.68 -0.16 0.81Credit 0.68 1 -0.04 0.63Equity -0.16 -0.04 1 -0.05Mortgage 0.81 0.63 -0.05 1

Table 5.3: Total correlation matrix for k = 0.95. See Appendix A for details of thecalculation.

67


Bond 1 0.61 -0.11 0.77Credit 0.61 1 0.01 0.57Equity -0.11 0.01 1 -0.01Mortgage 0.77 0.57 -0.01 1



Bond 1 0.58 -0.06 0.73Credit -0.06 1 0.04 0.53Equity -0.06 0.04 1 0.02Mortgage 0.73 0.53 0.02 1



Bond 1 0.55 -0.02 0.70Credit 0.55 1 0.06 0.50Equity -0.02 0.06 1 0.05Mortgage 0.69 0.05 0.04 1


68

Figure 5.7: Allocations as a function of the degree of confidence - Johnson

5.4 A fast algorithm

In this Section we present a fast and efficient algorithm to infer allocations for different

values of k without resorting every time to a new Monte Carlo simulation. This

algorithm is another original contribution of this thesis and for practical purposes it

can give invaluable saving of computation time without affecting much the allocations.

The problem of finding the optimal weights means maximizing the expected value

of the utility of the final wealth at the investment horizon which, as we saw, means

maximizing:

E[U ] = (1− k)

∫U(1 + wr)f(r)dr + k

∑i>0

∫δ(r − Li)

pi1− p0

U(1 + wr)dr (5.1)

over w given the constraints. δ(r − Li) is a shorthand notation for δ(r1 − Li1, r2 −Li2...rn − Lin).

If the integration cannot be carried out easily, as in the example leading to (4.4),

and/or it is not solvable numerically, a Monte Carlo simulation becomes necessary.

Moreover, finding the optimal weights for several values of k and create graphs as

those shown in Figure 5.6 requires a series of Monte Carlo simulations for each value

69

of the parameter k according to the granularity we choose. Calculating the allocation

for different values of k might be necessary also in real world applications and not

only for a presentation purpose. This is true especially if we endorse the Maximum

Entropy suggestion where we could like to see the allocation over an interval of values

of the confidence parameter Ω6. A Monte Carlo simulation for every chosen parameter

might be quite a burdensome procedure7. In what follows we present another solution

to this problem.

Let’s first consider the problem of optimization in normal times. In this case, the

problem reduces to optimizing:

E[U ] =

∫U(1 + wr)f(r)dr (5.2)

subject to the constraint∑

iwi−1 = 0 and, if we do not want short selling constraints,

also to wi > 0. E is the normal times expectation operator.

Let’s start by considering the problem without short selling constraints. Lagrange

multipliers can be used to find a stationary point. By defining the Lagrange function

as:

Λ = E[U ]− λ(∑i

wi − 1)

the following equations must be satisfied (necessary conditions):

∂Λ

∂wj= 0,

∂Λ

∂λ= 0

or ∫∂U(1 + wr)

∂wjf(r)dr − λ = 0∑

i

wi − 1 = 0

the last equation being simply the constraint. We can derive from here the very useful

relation at the stationary point w:∫∂U(1 + wr)

∂wjf(r)dr =

∫∂U(1 + wr)

∂wlf(r)dr ∀j, l (5.3)

6Following [44] we might also want to explore the behaviour of the allocations not only for k butalso for a broader set K of values of k (k ∈ K) called stress set. We might estimate an opportunitycost of choosing among the different values of k in the stress set.

7The total moments approximation can be good to save computation time and give very closeallocations to those of the exact problem but this might not be always the case, especially in marketsthat deviate strongly from Gaussianity.

70

Suppose that we have already the solution of this problem w and want to calculate

the shift in weights w + δw that optimize the full problem (5.1). We can expand the

utility function around w.

U(w + δw) = U(w) + J(w)δw +1

2δwTH(w)δw (5.4)

where J is the Jacobian matrix, which is a vector (the gradient) for the scalar-valued

utility function and H is the square matrix of second-order partial derivatives of the

utility function (Hessian matrix) with respect to w. If we take the derivative of this

equation with respect to the generic weight wj we obtain:

∂U

∂wj= J(w) +

∑i

δwiHij(w)

Again Lagrange multipliers can be used to find the maximum of (5.1). By defining

the Lagrange function as:

Λ = E[U ]− λ(∑i

wi − 1)

the following equations must be satisfied at the stationary point:

∂Λ

∂wj= 0,

∂Λ

∂λ= 0

or

(1− k)

∫∂U(1 + wr)

∂wjf(r)dr + k

∑i>0

∫δ(r − Li)

pi1− p0

∂U(1 + wr)

∂wjdr − λ = 0

∑i

wi − 1 = 0

By substituting the expansion of the utility function around w in the above equations

we restate the first condition as:∫Jj(w)[(1− k)f(r) + k

∑i>0

δ(r − Li)pi

1− p0

]dr+

+∑l

∫Hlj(w)δwl[(1− k)f(r) + k

∑i>0

δ(r − Li)pi

1− p0

]dr − λ = 0

Combined with the second condition which can be rewritten as:

−∑i

δwi = 0

71

it becomes just a linear system of equations in the shifts δw which can be solved

with traditional linear algebra methods. By defining the total bordered Hessian ma-

trix as:

HT =

∫H11(w)h(r)dr

∫H12(w)h(r)dr · · · −1∫

H21(w)h(r)dr∫H22(w)h(r)dr · · · −1

......

. . ....

−1 −1 · · · 0

where h(r) = (1− k)f(r) + k

∑i δ(r − Li)

pi1−p0 , we can express the solution as:

δwj =detHj

detHT

(5.5)

where Hj is the total bordered Hessian matrix where the j-th column is replaced by

the vector of coefficients given in this case by the Jacobian:

Hj =

∫H11(w)h(r)dr · · · −

∫J1(w)h(r)dr · · · −1∫

H21(w)h(r)dr · · · −∫J2(w)h(r)dr · · · −1

......

.... . .

...

−1 −1 0 · · · 0

It is very important to remind that these matrices are calculated at the ”old” equi-

librium point w. We can see that in case k = 0 the column containing the Jacobians

is composed of identical elements because of (5.3). In this case, because a matrix

having two rows proportional to each other has a 0 determinant, the shifts of weights

δwi is trivially zero.

5.4.1 Gaining intuition

In order to gain intuition of why the approximation (5.4) is a good approximation, we

plot in Fig.5.8 the logarithmic utility of three asset classes i.e. we plot the quantity

log(1+w1∗r1 +w2∗r2 +(1−w1−w2)∗r3) for two sets of returns ri, as w1 and w2 vary.

The third asset allocation is given by w3 = 1−w1−w2 but a short selling constraint

w3 > 0 is not imposed. The results, however, if such constraint is imposed, will be a

subset of Fig. 5.8. We see in the two cases that the utility is almost a plane and that

72

Figure 5.8: Logarithmic utility function for two sets of returns as the allocations vary.The two sets are: plane with squared markers - r = (1%,−4%,−5%); plane withoutmarkers - r = (−1%, 2%, 3%).

even drastic changes in the allocations can be taken into account by a second order

expansion.

Later in this subsection we will evaluate the quality of the approximation in terms

of opportunity cost.

We can gain some additional insight through the two dimensional case. The

solution of (5.5) for δw1 reduces to:

δw1 =E[J2]− E[J1]

2E[H12]− E[H11]− E[H22]

where the expectation is with respect to h(r). If k = 0 the numerator of this expression

is zero by virtue of (5.3). More explicitly, in the case of power utility function (4.2):

δw1 =

=−k∑

i(Li1 − Li2) pi

1−p0 (w1Li1 + w2L

i2)−β

β[(1− k)∫

(r1 − r2)2(w1r1 + w2r2)−β−1f(r) + k∑

i(Li1 − Li2)2 pi

1−p0 (w1Li1 + w2Li2)−β−1]

73

for k close to 0 we can neglect the second term in the denominator with respect to

the first.

δw1 = − k∑

i(Li1 − Li2)pi(w1L

i1 + w2L

i2)−β/(1− p0)

β(1− k)∫

(r1 − r2)2(w1r1 + w2r2)−β−1f(r)dr(5.6)

The conclusion then is straightforward for small k: the shift is proportional to

the factor k and the probability weighted stressed returns difference for the different

scenarios. Moreover, from the additional weight (w1Li1 + w2L

i2)−β it is clear that the

lower the values of Li2 and Li2 the greater the shift. δw is also inversely proportional

to the quantity 1−p0. Finally, the problem is also sensitive to the shape of the utility

function through the parameter β.

It is important to underline that we would have reached the same solution for

small k had we started with an alternative method such as regular perturbation theory

and considered the second term of (5.1) as a small perturbation. We would have then

expanded the weights in power series around the solution of the unperturbed problem

w i.e. w = w+ kw1 + k2w2 + ... and equated the terms of the same order by inserting

in (5.1). We would have obtained exactly (5.6) by neglecting second and higher order

terms.

5.4.2 Generalizing the results

What discussed in the previous section applies to a simple case. What we need,

however, in most of the cases is a solution for more than 2 assets and for values of k

which are not necessarily small. The multidimensional case involving three or more

assets is more complicated and the solution does not have the straightforward form

of (5.6) but must be kept in the form of the matrices HT and Hj.

In general, in the numerator of (5.6) we must deal with stress returns Li weighted

by integrals of the form∫rirj(

∑l rlwl)

−β−1f(r)dr. Although analytically the problem

is not solvable, a solution is directly computable thanks to equation (5.5) and it is

very fast compared to a repeated Monte Carlo simulation for each k. In fact, what is

required is to carry out a simulation only once for k = 0. During the simulation the

elements of the matrices HT and Hj related to the normal part can be calculated and

stored without almost any additional extra time and memory. Once they are stored

they can be reused to infer the shifts δw for any value of k.

If we introduce short selling constraints in the form wi > 0 the equality (5.3) is in

general no longer valid. However, we could still utilize the expansion of U(w) around

74

w and maximize the following expression over the weights for each k:∫(U(w) + J(w)δw)[(1− k)f(r) + k

∑i>0

δ(r − Li)pi

1− p0

]dr+

+1

2

∫δwTH(w)δw[(1− k)f(r) + k

∑i>0

δ(r − Li)pi

1− p0

]dr

For the numerical search one can resort to techniques which include inequality

constraints such as the Kuhn-Tucker optimization. The function fmincon in MAT-

LAB, which accepts inequality constraints, can be also used and the above expression

is easily optimized over w.

As a practical example, consider the graphs in Chapter 5. One of such graphs

requires approximately 1 hour to be calculated8. With the above expansion the

required time is that of the first simulation for k = 0 i.e. around 2 minutes, plus

5 additional seconds to infer the rest of the graph. Fig. 5.9 compares the Monte

Carlo simulation with the above approximation which, as visible from the figure,

holds quite well for all the values k. In order to have an idea about the ”cost” of this

approximation we could search numerically for each k, as done similarly in [29], for

a constant c that would make the investor indifferent between the two optimizations

i.e.:

E[U(1 + w∗optr + c)] = E[U(1 + woptr)]

where w∗opt are the optimal weights obtained with the approximated utility function,

wopt those obtained as a solution to the full problem and c is the opportunity cost.

In Table 5.7 we can see the value of c for several values of the normalization factor9.

We note in closing that, unlike the total moments approach, the method of this

Section relies on a different expansion i.e expansion of weights vs expansion of returns

(to which the total moment approach can be compared, see Appendix A). Moreover,

it does not assume any particular form (e.g. Gaussian) for the full distribution for

returns but leaves it as it is. Which one of the two approximations to be used and is

better, must be examined in the context of the problem to be solved. For example, if

there are not short selling constraints, the search algorithm can push the solutions to

values of w far from the equilibrium value w, something of limited effect if the weights

belong to [0, 1]. Then the weights expansion might not hold very well compared to

the total moments expansion. If, on the other hand, weights are bounded in [0, 1] and

8The machine used for the simulation has a AMD Phenom Triple-Core 2.1GHz processor.9It might appear strange that for c = 0 the opportunity cost is different from 0. This is due to

the limitations of the simulation used to infer the weights e.g. termination tolerance level of 10−6

(see Section 4.4.1).

75

1− k (%) c ∗ 103

100 0.018299 0.000198 -0.010297 -0.008996 -0.010795 -0.021494 -0.119393 -0.003792 -0.003891 -0.004690 -0.106689 -0.101988 -0.130287 0.000186 0.072085 -0.028884 -0.028483 0.024082 0.022781 0.100680 -0.000779 -0.100478 -0.005177 -0.099476 0.000475 0.0003

Table 5.7: Opportunity cost c (multiplied by 103) for the logarithmic utility as afunction of 1− k.

the full return distribution has high third and higher moments and so the Gaussian

approximation is not good, the weights expansion might be better.

76

Figure 5.9: Allocations as a function of the normalization factor in the case: MonteCarlo for each k - thin line, numerically inferred - thick line

77

Chapter 6

Conclusions

A simple method has been presented to carry out a coherent programme of asset

allocation based on utility maximization in the presence of stress events with subjec-

tively assigned probabilities. The method relies on the ability of the asset manager to

specify in an approximate way the causal links, if any, among the stress events that

can most affect a given portfolio. The approach has several advantages. First of all,

it is very transparent. Its crucial ingredients (i.e., the subjective probabilities, the

assumed causal links, the fraction of the probability in the distresses state, etc.) are

clearly visible, and affect the result in an intuitively understandable manner. This

helps the asset managers intuition, as the discussion in the previous sections shows.

The approach also lends itself to thorough and thoughtful sensitivity analysis, as lit-

erally every input can be understood, changed by hand and the results inspected.

Thanks to its intuitional appeal, the approach therefore allows the scrutiny by non-

specialists, trustees and other senior investment officers. We should stress again that,

even though we gave an example only in regard to exceptional loss events, the pro-

cedure is just as valid and applicable to exceptional gains for a given asset class i.e.

we can speak about Profit&Losses and not only about Losses.

Several add-ons give additional power to the whole approach and ease its practical

implementation.

First, the Linear Programming technique of subsection 3.3.2, extended here to

include non-linear fractional constraints, gives a guidance on the choice of conditional

probabilities in the Bayesian net. As the structure of the net becomes more and more

elaborate, this tool might become essential in order to arrive fast at reliable probability

values for the nodes.

Second, thanks to the numerical recipe presented in Section 5.4, no additional

computational burden is posed when calculating the optimal weights for several val-

ues of the normalization factor compared to the Monte Carlo run in the case of no

78

stress events. This is especially helpful, but not only, in the case when the Maximum

Entropy Approach of Section 4.5 is used which might require the calculation of allo-

cations on a broad spectrum of values for the normalization factor to infer allocations

for different values of the confidence Ω.

Third, the concept of Total Moments allows an easy comparison with a simple

allocation based on Gaussian returns which is a very intuitive concept, widely known

among asset managers.

Forth, another known concept such as Efficient Frontier was introduced and ex-

tended in this thesis. This extension allows the asset manager to have a global picture

of the modifed risk/return tradeoff in the presence of potential stress events by a sim-

ple eye inspection.

Finally, the Black-Litterman model was integrated in the framework and made

”clean” by inputting in it a covariance matrix obtained with the methods of Chapter

2. In fact, the BL world can be seen as a particular case of the world we propose

here: forward-looking and with stressed events.

6.1 Future research

Some improvements of the approach described here could go in the direction of intro-

ducing uncertainty in the estimated probabilities for each node in a BN i.e. assuming

for them a distribution instead of fixed values. This uncertainty was dealt with in

this thesis through the maximum entropy principle but on a high level, through the

parameter k. A more granular approach on a single node level could be examined.

The same uncertainty could be also incorporated in the stress loss (gain) which

were assumed here distributed as a Dirac-δ. A more complicated distribution with

uncertainty parameter σ can be assumed e.g. a Gaussian or a uniform distribution.

Moreover, the BN can be complemented by additional nodes associated to events

that do not give rise to losses. For example, imagine the event ”Political elections in

country X”. It does not mean by itself a loss in any particular asset class but it could

cause a series of losses (or gains) if it is connected to events that actually represent

losses. For instance, this node could be a parent node with arrows pointing to other

events associated to losses e.g. ”Crash of the stock market” which in itself represents

a loss, if we had stocks in our portfolio.

This thesis opens the way to some additional applications that can be further

explored. One possible direction might be the application of the described methods

to credit portfolio models. In fact, some of the commercial products in this area,

79

e.g. CreditMetrics, use as an input a correlation matrix inferred from equity prices to

calculate the portfolio loss distribution on a given horizon. Tail events are deduced

via a backward looking simulation of the joint comovement of the underlying factors

such as industry or country indices. The cleansing of historical tail events and splicing

with forward-looking tail events based on causality assumptions, as proposed in this

thesis, could be explored. This direction inevitably suggests that the much discussed

CDO models can be also improved with the same methodology.

80

Appendix A

Total moments

Let’s calculate the first and second moments of the returns with the complete distri-

bution i.e ’normal’ part plus exceptional:

F (r1, r2....rn) = (1− k)f(r1, r2, ...) +k

1− p0

2n−1∑i>0

piδ(r1−Li1, r2−Li2...rn−Lin) (A.1)

The first part on the r.h.s. represents the normal part and the f is the outcome of

the copula fitting process. We will not care about the form of f in this Appendix so

let’s call it Normal. We rewrite the expression for F (r1, r2....rn) as:

F (r1, r2....rn) = (1− k) ∗Normal+k

1− p0

2n−1∑i>0

piδ(r1 −Li1, r2 −Li2...rn −Lin) (A.2)

A.0.1 First moment

Let’s calculate the expected value of a given return ri:

E[rj] = (1− k)E[rj] +k

1− p0

∫ ∞−∞

2n−1∑i>0

pirjδ(r1 − Li1, r2 − Li2...rn − Lin)

The E gives the expectation in the normal state of the world. The sum in the second

term has no zero elements only when i = j. This yields immediately:

E[rj] = (1− k)E[rj] +k

1− p0

P (Ej)Lj (A.3)

where P (Ej) is the marginal probability of the event j happening. The loss Lj is

negative so the second term is also negative and thus the expected value decreases as

we move away from normal states.

81

A.0.2 Variance

To work out the variance we also need:

E[r2j ] = (1− k)E[r2

j ] +k

1− p0

P (Ej)L2j (A.4)

derived with the same logic as for the first momentum. By definition the variance is:

V ar(rj) = E[r2j ]− E[rj]

2

by using the above derivations:

V ar(rj) = (1− k)E[r2j ] +

k

1− p0

P (Ej)L2j − ...

...− (1− k)2E[rj]2 − (

k

1− p0

)2P (Ej)2L2

j − 2(1− k)k

1− p0

E[rj]P (Ej)Lj

This can be regrouped as:

V ar(rj) = (1− k)V ar[rj] +k

1− p0

P (Ej)L2j + ...

...+ k(1− k)E[rj]2 − (

k

1− p0

)2P (Ej)2L2

j − 2(1− k)k

1− p0

E[rj]P (Ej)Lj =

= (1− k)V ar[rj] +k

1− p0

V ars[rj]L2j + k(1− k)(E[rj]

2 − 21

1− p0

E[rj]P (Ej)Lj) + ...

...+k

1− p0

P (Ej)2L2

j(1−k

1− p0

)

where V ar is the variance of the normal part of the problem and V ars:

V ars[rj] = P (Ej)− P (Ej)2

The expression has the desired property to reduce to the normal variance only when

k = 0. Notice, however, that it does not reduce to V ars[rj]L2j when the world is

completely in a non-normal state k = 1. It does so only in the case p0 = 0.

A.0.3 Covariance

By definition of covariance:

cov[ri, rl] = E[rirl]− E[ri]E[rl]

So we use:

E[rirl] = (1− k)E[rirl] +k

1− p0

P (EiEl)LiLl

82

which gives:

cov(ri, rl) = (1− k)E[rirl] +k

1− p0

P (EiEl)LiLl − ....

...− ((1− k)E[ri] +k

1− p0

P (Ei)Li)((1− k)E[rl] +k

1− p0

P (El)Ll)

which by expanding and regrouping can be cast as:

cov(ri, rl) = (1− k)cov[ri, rl] +k

1− p0

covs[ri, rl]LiLl + ....

...+ k(1− k)(E[ri]E[rl]−1

1− p0

E[ri]P (El)Ll −1

1− p0

E[rl]P (Ei)Li) + ...

...+k

1− p0

P (Ei)P (El)LiLj(1−k

1− p0

)

where:

covs[ri, rl] = P (EiEl)− P (Ei)P (El)

The same reasoning as for the variance can also be applied here. If the world is in a

completely normal state the covariance reduces to the normal covariance and when

k = 1 it does not reduce to covs[ri, rl]LiLl. The term multiplying k(1 − k) adds

always a positive amount apart from the extreme points k = 0, k = 1 where it is zero.

So we cannot deduct for the variance/covariance the same strictly non-increasing

behaviour as for the mean. It may well that the variance/covariance does not assume

its maximum value at the extrema but at some other point in the open interval (0, 1).

The correlation is then the expression for the covariance divided by the square

root of the product of the variances of the two returns. Its form becomes cumbersome

and we will not write it down explicitly but we can see from the previous derivations

that it has the same desirable property to reduce to the normal at the point k = 0.

A.0.4 Utility maximization

The idea is now to replace the utility maximization over the distribution (4.1) with

utility maximization over a Gaussian distribution with the moments calculated in this

Appendix i.e. instead of maximizing:

E[U ] = (1− k)

∫U(1 + wr)f(r; E[r],Σ, ν)dr + k

∑i>0

∫δ(r − Li) pi

1− p0

U(1 + wr)dr

(A.5)

maximize:

E[U ] =

∫U(1 + wr)f(r; rT ,ΣT )dr (A.6)

83

over the weights and subject to the constraints of the problem. f is the Gaussian

density, the subscript T means Total and ν is the degrees of freedom of the fitted

copula. The distribution parameters in both cases were made explicit. To go one

step further let’s write the form of the utility function we chose:

E[U ] =

∫[(1− k)f(r; E[r],Σ, ν) + k

∑i>0

δ(r − Li) pi1− p0

]1

1− β((1 + wr)1−β − 1

)dr

Let’s expand the the utility function around r = E[r] = rT up to the second order:

E[U ] =

∫[(1− k)f(r; E[r],Σ, ν) + k

∑i>0


]F (r, w, rT , β)dr

where

F (r, w, rT , β) =1

1− β((1 + wrT )1−β − 1

)+

w(r − rT )

(1 + wrT )β−

−β 1

2

∑i

∑k wiwk(rk − rkT )(ri − riT )

(1 + wrT )β+1+ o((r − rT )3)

By denoting F0, F1, F2 etc. the terms of the expansion, we have by construction:∫[(1− k)f(r; E[r],Σ, ν) + k

∑i>0


]2∑l=0

Fldr =

∫f(r; rT ,ΣT )

2∑l=0

Fldr

The term involving (r− rT )3 is zero in the case of the Gaussian approximation while

it is generally different from zero for the full distribution. Discrepancies of this kind

will be in general present also for higher moments and these will give rise to the

different allocations with the two methods. As long as the higher order terms can be

neglected, the allocations will be similar. For example, if the copula/marginal part of

the full problem can be considered Gaussian and as long as the value of k (and hence

the probability of distress) is small enough, this approximation might give excellent

results1.

1For the example in Chapter 5, given the time horizon considered, the ’normal’ part of theproblem is well represented by a Gaussian copula and marginals (see Footnote 1 of that Chapter)and thus this approximation holds quite well. Finally, we note that the paper of Jondeau et al. [29]an approach, which also relies on expansions, is described. However, they rely on more complicatedexpansion (up to the forth moment) of the final wealth around its expected value, without assumingany particular form for the returns distribution.

84

Appendix B

Fusion with the Black-Littermanapproach

The approach proposed in this thesis can be integrated easily with the Black-Litterman

model [1]1.

B.1 Formulation

For the BL model in a Bayesian form see e.g. [65]. Without going into deep details

the model can be cast as (without a proportionality integration constant)2:

h(µ|r) ∝ f(r|µ)g(µ) (B.1)

where g(µ) is the probability density function of the beliefs of the investor on the

asset returns (the prior), f(r|µ) is the probability density function of the likelihood

i.e. the conditional probability on the investor’s view of the data returns. h(µ|r) is

the posterior and it reflects investor’s updated view from the amalgamation of the

data and his prior views3. g(µ) is assumed distributed as N(q,Ω) where q represents

the investor’s view on the vector of returns4 and Ω is the confidence in these views.

f(r|µ) is assumed to be N(µ, τΣ) where Σ is the covariance matrix of returns and τ

1In [43] an extension of the BL model to non-normal markets is proposed. However, the im-provement is mainly in the direction of considering fat-tailed copulae and therefore different fromthe approach proposed here. For the strength and the weaknesses of the different approaches to theBL model see [77].

2The material in this and the following sections will rely also on the results of Appendix A.3Returns r can be implied through equilibrium considerations e.g. through the market capital-

ization of the asset classes. In what follows, we will leave open both the possibilities of equilibriumand data driven returns.

4It may represent not only absolute returns on an asset but also difference in returns betweendifferent assets. It is represented as Pµ with P a l x p matrix, l being the number of the views andp the number of assets.

85

is a scaling factor often set to 1. Under these assumptions the posterior may be cast

as a normal distribution with a modified drift and variance of the drift (µBL, σBL)

given by:

µBL = [(τΣ)−1 + P ′Ω−1P ]−1[(τΣ)−1r + P ′Ω−1q] (B.2)

σBL = [(τΣ)−1 + P ′Ω−1P ]−1

In our case the probability distribution function of the portfolio returns is in the

form:

F (r) = (1− k)f(r) +k

1− p0

2n−1∑i=1

piδ(r − Li) (B.3)

with mean5:

rT = (1− k)r +k

1− p0

2n−1∑i>0

piLi

and standard deviation of the estimate of the mean:

σT = (1− k)σr ∝ 1/√N

i.e. we assume that we are absolutely confident only in the estimate of the means of

the stressed part: Li6.

We can express our approach in a Bayesian form as a mixture of priors gi(µ) that

multiply respectively the likelihoods fi(r|µ), one for each of the events in the joint

probability table and one for the ’normal’ times part:

h(µ|r) =2n−1∑i=0

hi(µ|r) ∝2n−1∑i=0

fi(r|µ)gi(µ)

where we represent the ’normal’ part for i = 0 and we set g0(µ) equal to:

g0(µ) = (1− k)N(q,Ω)

where N(q,Ω) is Gaussian distribution with mean given by our view on the average

return (or difference in returns as explained in footnote 3) and confidence in the view

given by Ω. For gi(µ) we assume:

gi(µ) =k

1− p0

piδ(µ− Li) for i > 0

5We remind again that r and is a short-hand notation for the vector of expected returns.6It is important to say that this is uncertainty in the estimate of the stressed mean and not the

variance of the stressed distribution around the mean which in our case is a δ-Dirac but could beany other distribution.

86

What about the likelihood functions fi(r|µ)? Let’s assume that the market in ’normal’

times is well represented by a Gaussian. If we draw one single observation from this

distribution its likelihood will be:

f0(rj|µ) ∝ exp−(rj − µ)′Σ−1(rj − µ)

2

by extending the reasoning to N observations, we reach the widely known result for

the sample mean r:

f0(r|µ) ∝ exp−(r − µ)′Σ−1(r − µ)

2/N (B.4)

where r is the sample average of the N observations i.e. r = 1N

∑Nj=1 rj. We will

consider (B.4) to be the likelihood of the ’normal’ part. For the stressed parts we

will put fi(r|µ) = 1 i.e. we have no past market history with which to blend the

hypothesized parameters Li. The total corrected drift will be then:

µcorr = (1− k)µBL + k∑i>0

pi1− p0

Li

with µBL defined as in (B.2) but with the replacement τ = 1/N .

B.2 Investor views and implied drifts

The ’normal’ part, as we saw, has a likelihood which contains the historical average

r but it can be treated in a more sophisticated way if we interpret it in a BL light i.e.

we can use for it a likelihood which incorporates the market implied drifts as in the

equilibrium BL model7. Additionally, we can also imply drifts for the stressed part

and express separate opinions on them. We will show in the next subsections two

ways of getting the implied drifts from the market capitalization weights. However,

it is important to underline since now that having views both in the ’normal’ part

and the part which comes out of the Bayesian net is feasible although it might appear

at first sight as misleading. But having certain subjective views on the ’normal’

part and at the same time the market not being in distress is not a contradiction.

For example, in ’normal’ times the investor can have the view ’This year equities of

the PIIGS8 countries will underperform German equities by 3%’ which might seem

natural in the light of the recent developments. This view is not a stress event. On

the other side, the event ’Default of country X belonging to PIIGS’ is something

7The BL model relies on the assumptions of the CAPM to set the equilibrium returns.8Acronym used to label jointly Portugal, Italy, Ireland, Greece and Spain.

87

that could have a systemic impact and trigger extreme moves on all the assets and

as such is incorporated separately in the Bayesian net which is mutually exclusive

with the ’normal’ part by construction. On the implied drifts of such extreme moves

the investor could express different opinions without ’interfering’ with the views in

’normal’ times.

In summary, subjective views that do not disrupt markets could be incorporated

in the body of the distribution a la Black-Litterman while on the drifts of the extreme

scenarios, which are accommodated in the BN part of the exercise, the investor could

express separate opinions. In this way the BL model becomes a particular case of a

world with stress events i.e. with Bayesian net.

Moreover, we should also remember that the ’body’ of the ’normal’ part was

obtained by truncating extreme points from the data and we saw also that throwing

away just a few datapoints can change substantially the covariance matrix which is

central in the BL calculations. By cleaning the data we also prepare cleaner inputs to

the BL formulae for the ’normal’ part, something which was not done in the original

model.

In what follows we will show two ways of deriving the market implied drifts. The

first one makes the assumption that the information about the stress scenarios is

already incorporated in the market equilibrium. The other point of view is that the

scenarios express a really idiosyncratic opinion of the asset manager and as such are

not incorporated in the market. A real world situation will be a middle way between

the two extrema. How to extend this approach to hybrid situations can be object of

further research.

B.2.1 Implied drifts in the case of knowledge contained inthe market

Let’s derive the reverse optimization starting from the maximization of a quadratic

utility9:

U = Ψ− δ

2Ψ2

where Ψ is the final wealth. The total density has the usual form:

F (r1, r2....rn) = (1− k)f(r1, r2, ...) +k

1− p0

2n−1∑i=1

piδ(r1 − Li1, r2 − Li2...rn − Lin)

9The methodology of this section hold also for a generic form of the quadratic utility functionU = AΨ2 +BΨ +C. For directions about how to set the market risk aversion parameter δ, see [77],[65]. Finally, the derivation we will present does not include the budget constraint as in the majorityof papers on the BL model. An extension that includes such constraint can be found in [26].

88

We have to maximize over the weights the following expression (with short-hand

notation for the returns):

E[U ] =

∫[(1− k)f(r) +

k

1− p0

2n−1∑i=1

piδ(r − Li)][(1 + wr)− δ

2(1 + wr)2]dr

if we take the derivative of with respect to wj:

dE[U ]

dwj=

∫[(1− k)f(r) +

k

1− p0

2n−1∑i=1

piδ(r − Li)][rj − δ(1 + wr)rj]dr =

= (1− δ)µTj +∑l

wlE[rjrl] = (1− δ)µTj +∑l

wl((1−k)E[rjrl] +k

1− p0

P (EjEl)LjLl)

where E is the normal times expectation operator and µTj is the total drift. The

implied total drift of the asset j is given then by equating to 0 the above expression:

µTj = − 1

1− δ∑l

wl((1− k)E[rjrl] +k

1− p0

P (EjEl)LjLl)

where the weights w represent the relative market capitalization of the asset classes

involved in the problem and they are known parameters. This expression can now be

easily solved for µTj and split in:

µTj = πnormalj + µstressedj

with:

πnormalj = − 1

1− δ∑l

wl(1− k)E[rjrl]

and

µstressedj = − 1

1− δ∑l

wlk

1− p0

P (EjEl)LjLl

The stressed drift thus obtained can be equated to (A.3):

µstressedj =k

1− p0

P (Ej)Ljimplied

Limplied can be easily derived by inversion and we will call it πstressedj .

We note that the BL implied drifts are usually derived from the utility: U =

wr − λwTΣw. However, such starting point would have not enabled us to perform

the split between stressed and ’normal’ components because of the presence of mixed

’stressed-normal’ cross products in the covariance matrix10.

10The differences in the mean-variance approach between these two types of utility functions isillustrated with a practical example by Sharpe [68].

89

B.2.2 Implied drifts in the case of specific knowledge

In this case, the implied drifts are given as if no information about the stress events

is contained in the market:

µTj = πnormalj = − 1

1− δ∑l

wlE[rjrl]

B.2.3 Summing up

If we adopt the first approach for the implied drifts, we can split the likelihoods as

follows:

f0(r|µ) ∝ exp−(πnormal − µnormal)′Σ−1(πnormal − µnormal)2τ

where τ is a constant11. For the stressed part12:

fi(r|µ) ∝ exp−(πstressed − µstressed)′Σ−1(πstressed − µstressed)2τ

If we adopt the second approach instead, we can simply write:

f0(r|µ) ∝ exp−(πnormal − µnormal)′Σ−1(πnormal − µnormal)2τ

and:

fi(r|µ) = 1

But what about the priors g(µ)? In the first case, we can again use the distribution

N(q,Ω) for g0(µ) but a δ-Dirac for gi(µ) means that the implied stressed drifts will

be completely shrunk to their prior. Unless we are absolutely certain about our

parameters estimates, we could relax this restrictive assumption. In order to ’smooth’

the effect of the priors we can then use for them Gaussian distributions N(qi,Ωi) and

amalgamate them with the respective likelihoods.

In the second case, we simply apply the implied drifts only to the ’normal’ part

and stick to flat priors for the stressed part13.

The total corrected drift will be then:

µcorr = (1− k)µ0BL + k

∑i>0

pi1− p0

µiBL

11The constant τ is often set to 1 in the BL model. See [77] for an extensive discussion on thedifferent ways of setting this parameter.

12We will make the assumption that the uncertainty in the implied drifts σ2τ is the same for thestressed and ’normal’ parts. In general, two different constants τ1 and τ2 can be introduced and thesensitivity of the method analyzed when varying the ratio m = τ1/τ2.

13Again we have the option of expressing the priors as N(qi,Ωi) if we are not absolutely certainabout the parameters.

90

with each of the µiBL defined as in (B.2) but with the replacement of the respective

parameters (e.g. π, Ω, q etc.).

91

Appendix C

Efficient frontier

In a mean-variance approach we search for the minimum variance given an expected

portfolio return µ∗ (or viceversa maximum return given a variance σ∗). This approach

is applicable whenever the expected value of the utility function is function only of

the first two moments of the objective function1(see [44], Chapter 6). This happens

only in two cases i) the market is elliptical ii) the utility function is quadratic. The

markets we treat are clearly not elliptical after the splicing of the stress distribution.

Neither we chose a quadratic utility function for the reasons in Section 4.4. However,

if we relax this last assumption we could go for the mean-variance approach. Relaxing

the assumption is not a very strong requirement as long as a Taylor expansion to the

second order of the utility function is a good approximation.

In order to carry out the constrained optimization we have to write in first place

the Lagrangian2:

L(w, λ1, λ2) = σ∗ − λ2(wµ− µ∗)− λ1(w~1− 1) = ... (C.1)

... = w′Cw − λ2(wµ− µ∗)− λ1(w~1− 1)

where the quantity w and µ are the allocations and µj = 1 + rj for the asset class

j. C is the covariance matrix of returns. The two constraints are on the vector µ∗

and that the weights must sum up to 1. In this problem we are searching for the

minimum variance σ∗, given the constraints. Let’s take the derivative with respect

to w:∂L

∂w= 2Cw − λ1

~1− λ2µ = 0 (C.2)

1The final wealth in our case.2The derivation of the Efficient Frontier equation which we adapt here to our problem can be

found in many books and papers e.g. see [44].

92

rearranging:

w = C−1λ1

2~1 + C−1λ2

2µ (C.3)

Let’s multiply on the left by µ:

µw = µ∗ =λ1

2µ′C−1~1 +

λ2

2µ′C−1µ

and then by ~1:

~1′w =λ1

2~1′C−1~1 +

λ2

2~1′C−1µ

we can solve for the two Lagrange multipliers:

λ1 =2(K3 − µ∗K2)

K4

λ2 =2(µ∗K1 −K2)

K4

where:

K1 = ~1′C−1~1; K2 = ~1′C−1µ; K3 = µ′C−1µ; K4 = K1K3 −K22

Finally, let’s multiply C.2 on the left by w′:

0 = 2w′Cw − λ1w′~1− λ2w

′µ = 2σ∗ − λ1w′~1− λ2w

′µ

by rearranging:

σ∗ = (K1µ∗2 − 2K2µ

∗ +K3)/K4

this is the equation of the efficient frontier. By changing variables:

σ∗ = a1µ∗2 + a2µ

∗ + a3 (C.4)

A plot of the efficient frontier as a function of the normalization factor k is depicted

in Fig. C.1 for the example in Chapter 5. We can see a non-monotonic behaviour. As

k grows the convexity of the frontier grows and in the figure becomes maximum for

k = 10% (only for the curves represented in the figure). Then it decreases for higher

values of k. This means that, if for a fixed return, let’s say of 2.5%, the maximum

variance is not be given for the highest values of k. In order to better understand this

behaviour we must study the minumum for each k (the µ∗ with the smallest variance)

and the concavity, given by the second derivative.

By taking the first derivative with respect to µ∗ in order to find the minimum:

∂σ∗

∂µ∗= 2a1µ

∗min + a2 = 0

93

µ∗min = − a2

2a1

with variance at this point obtained by plugging this expression in (C.4):

σ∗min = − a22

4a1

+ a3

and for the concavity:∂2σ∗

∂µ∗2= 2a1 (C.5)

we can study how the minimum and the concavity change as k changes.

In Fig. C.2 the standard deviation of the minimum variance portfolio is repre-

sented as a function k. We can see that it grows as k grows. This is line with the

intuition that a global increase of volatility, such as the one during stressed markets,

shifts the efficient frontier to the right. Such global increase in volatility during stress

markets (i.e. for k > 0) is confirmed by the growing value of the total variance for all

the asset classes as a function of k (see Table 5.2)3.

The concavity is shown in Fig. C.3. We see that efficient frontier with the biggest

second derivative is for the market around k = 0.05. Intuitively the bigger the

curvature of efficient frontier, the lesser the diversification potential4. In fact, in a

frontier with bigger curvature more extra risk must be taken for a given increase

in return. In line with this logic, the market for k = 0.05 offers less diversification

benefits. We can gain additional intuition by examining the simple case of 2 assets

only. By expressing Eq.(C.5) as summation:

∂2σ∗

∂µ∗2=

2∑

ij C−1ij∑

ijml C−1mlC

−1ij (µiµj − µlµj)

(C.6)

this reduces for the two dimensional case to:

∂2σ∗

∂µ∗2= 2

σ21 + σ2

2 − 2ρσ1σ2

(µ1 − µ2)2(C.7)

By recalling the definitions of total moments, we see that the denominator is the

square of a linear function in k. The minimum of such function is for equal drifts

µ1 = µ2 and the value of kmin can be found by taking the first derivative and equation

to 0. We thus obtain:

kmin = − E[µ1]− E[µ2]P (E1)L1−P (E2)L2

1−p0 − (E[µ1]− E[µ2])

3With the exception of Equity.4The curvature as a measure of diversification potential, for example, is also used in [6] to study

the Asian markets before, during and after the crisis of 1997.

94

which for, let’s say a protfolio composed of Bond and Equity only, is at kmin = 0.07.

The numerator of (C.7) is also quadratic in k but always ≥ 0. For the case of Bond

and Equity, the numerator is monotonic in k and decreasing in the interval k ∈ (0, 1).

Thus the expression (C.7) is diverging around k = 0.07 and gives rise to a degenerate

solution of infinite concavity.

Suppose that the parameter k that appears in σ(k) and µ(k) follows a distribution

g(k). We assume that g(k) is the maximum entropy distribution chosen according to

the criteria of Section 4.5. We could then integrate the minimum variance σ∗(k, µ∗)

with respect to k weighted by the probability of that k: g(k). We will define then

the Maximum Entropy efficient frontier as the frontier given by the value of σ result

of the integration over k, which we will denote σ∗. The Maximum Entropy Efficient

Frontier is depicted in Fig. C.4 for a truncated Gaussian with Ω = 0.01.

95

Figure C.1: Stressed efficient frontier: k = 0 - blue, k = 0.1 - red, k = 0.15 - green,k = 0.2 - yellow

Figure C.2: Standard deviation of the minimum variance portfolio as a function of k

96

Figure C.3: Concavity of the efficient frontier as a function of k

Figure C.4: Maximum entropy efficient frontier, frontier at k = 0 and frontier ob-tained with the covariance matrix and drifts before truncation of the outliers.

97

Appendix D

Influence plots

This Appendix follows [12] and describes a procedure that allows to calculate the

influence on correlations of a single outlier. A suitable tool to do this is by means of

the sample influence function which is defined as follows:

I−(yi; θ) = (n− 1)(θ − θ−i) (i = 1, ..., n) (D.1)

where y are the observations, n is their number, θ is an estimator and θ−i is the

same estimator but with the i − th observation removed. In our case we will focus

on the estimator of the correlation between two variables. Given two variables with

correlation ρ, the influence function for their bivariate distribution (it could be of any

type) with finite second moment is shown to be:

I(y1; y2; ρ) = −1

2ρ(y2

1 + y22) + y1y2 (D.2)

where y1 and y1 are standardized forms of y1 and y2. Now if we denote by z1 and

z2 the standardized sum and difference between y1 and y1 and if we define further

u1 = (z1 + z2)/√

2 and u1 = (z1 − z2)/√

2, then (D.2) can be written as:

I(y1; y2; ρ) = (1− ρ2)u1u2 (D.3)

by using the sample analogue of (D.3) together with (D.1) we get to the following

approximation:

I(yi1; yi2; r) = (n− 1)(r − r−i) ≈ (1− r2)ui1ui2 (D.4)

where r is the sample correlation coefficient and ui1, ui2 are the sample analogues of

u1, u2 namely:

ui1 =1

2

√n(di1 + di2)/

√1 + r + (di1 − di2)/

√1− r

98

ui2 =1

2

√n(di1 + di2)/

√1 + r − (di1 − di2)/

√1− r

with dij = (yij −mj)/√ajj, mj = (

∑yij)/n and ajj =

∑(yij −mj)

2. The suggestion

is then to use the approximation in (D.4) as a function of two variables and super-

impose selected contours of the function i.e. hyperbolae directly on the scatter plot.

As an alternative, one can also use (D.2) together with the l.h.s of (D.4) without

transforming into the variables u1, u2. This is exactly what done in Fig. 2.13 and

2.14.

99

Appendix E

Truncated distributions

In this Appendix we report without proof the first two moments of the truncated

exponential and truncated Gaussian distributions. The proofs are articulated in [48],

Chapter ”Truncated Distributions”.

The truncated exponential distribution has the following density:

f(y) =e−y/µ/µ

1− e−bµ

(E.1)

where y ∈ (0, b]. If we represent b as multiple of µ (b = lµ) then the moments are:

E[Y ] = µ[1− (l + 1)e−l

1− e−l]

and:

E[Y 2] = 2µ2[1− (l2 + 2l + 2)e−l/2

1− e−l]

if l is big enough:

E[Y ] ≈ µ

and

E[Y 2] ≈ 2µ2

The variance is then:

V ar[Y ] ≈ 2µ2 − µ2 = µ2

and the condition for the truncated Gaussian becomes:

m2(E[Y ]) = E[Y ]2 + Ω2 < m2te(E[Y ]) = 2E[Y ]2

or

Ω < E[Y ]

and viceversa for the U-shape distribution.

100

The truncated Gaussian density has the following form:

f(y) =

1√2πσ

exp(− (y−mo)22σ2 )

Φ( b−moσ

)− Φ(a−moσ

)I[a,b](y) (E.2)

where Φ is the standard normal cdf and φ is the probability density function, mo is

the mode and I is the indicator function:

I[a,b](y) = 1 if a ≤ y ≤ b

The first moment is:

E[Y ] = mo+φ(a−mo

σ)− φ( b−mo

σ)

Φ( b−moσ

)− Φ(a−moσ

)σ

and the variance:

V ar[Y ] = σ2

1 +a−moσφ(a−mo

σ)− b−mo

σφ( b−mo

σ)

Φ( b−moσ

)− Φ(a−moσ

)−

(φ(a−mo

σ)− φ( b−mo

σ)

Φ( b−moσ

)− Φ(a−moσ

)

)2

101

Appendix F

The Johnson distribution

The Johnson distribution was first introduced in [28]. Johnson noticed that a bounded

random variable x could be turned into normal by the transformation:

z = γ + δlogk − ξ

ξ + λ− k= h(k) (ξ < k < ξ + λ) (F.1)

where z is a standardized normal variable (i.e. N(0, 1)) and ξ, λ, γ and δ are param-

eters with the following domains:

ξ < k < ξ + λ, −∞ < γ <∞, λ > 0, ξ ≥ 0 δ > 0.

The Johnson SB distribution, which is the distribution of k, can be deduced from1:

f(k) = N(h(k))|dh(k)/dk| (F.2)

or more explicitly:

f(k) =δ√2π

λ

(k − ξ)(ξ + λ− k)exp−1

2[γ + δlog(

k − σξ + λ− k

)]2 (F.3)

The parameters have the following interpretation: ξ is a location factor, λ is a

scale factor, γ and δ determine the shape. In our case we want the variable k to vary

within [0, b] and so we can put ξ = 0 and λ = b. By substituting then δ = 1/σ and

γ = −µ/σ we recover the expression (4.8):

g(k) =b√

2πσk(b− k)exp[−1

2(log( k

b−k )− µσ

)2] 0 ≤ k ≤ b (F.4)

If we expand (F.4) we get:

g(k) =be−

µ2

2σ2

√2πσ

k−1

2σ2 log(k)−(− µ

σ2 +1)(b− k)−1

2σ2 log(b−k)+(− µ

σ2−1)e−log(k)log(b−k)/σ2

1See also [51].

102

Figure F.1: Johnson distribution

so that:

limk→0k−ng(k) = limk→b(b− k)−ng(k) = 0

for any value of n. The distribution curve of k therefore has ’high contact’ at either

end of its range of variation. By taking the first derivative of g(k) with respect to k

and equating to 0:

2k − b = (logk

b− k− µ)

b

σ2

and calling m the value of k which is the mode we obtain the following useful expres-

sion for σ with b = 1 which is used in Section 4.5:

σ2 =−µ+ log( m

1−m)

2m− 1

The shape of the Johnson distribution for different values of its variance and m =

0.1619 is shown in Fig.F.1

103

Appendix G

Truncation algorithm applied to asubset

In this Appendix we apply the truncation algorithm of Chapter 2 to the last 500

datapoints of the dataset described in the Introduction. In the process of truncation

we can observe the same qualitative behaviour of the monitored quantities as the one

observed for the entire dataset. In particular, after a series of significant variations

of the observed quantities, a stabilization is visible.

Figure G.1: Volume of the ellipsoid as a function of the number of points removed

104

Figure G.2: Variation for Fig G.1

Figure G.3: Minimum covariance determinant as a function of the number of pointsremoved

105

Figure G.4: Variation for Fig G.3

Figure G.5: Correlation matrix determinant as a function of the number of pointsremoved

106

Figure G.6: Variation for Fig. G.5

107

Figure G.7: Correlations as a function of the number of points removed. The symbol ρsignifies correlation and the subscripts C, B, M and E denote Credit, Bond, Mortgageand Equity, respectively, as described in the Introduction.

108

Figure G.8: Variation for Fig. G.7

109

Figure G.9: Variation in the eigenvalues of the correlation matrix as a function of thenumber of points removed. The eigenvalues are ordered from the smallest λ1 to thelargest λ4.

110

References

[1] F. Black and R. Litterman. Global portfolio optimization. Financial Analysts

Journal, September:28–43, 1992.

[2] O.J. Blanchard and W.M. Watson. Bubbles, rational expectations and specula-

tive markets. In Wachtel P., editor, Crisis in Economic and Financial Structure:

Bubbles, Bursts and Shocks. Lexington Books, 1982.

[3] J-P. Bouchaud and M. Potters. Theory of Financial Risk From Statistical

Physics to Risk Management. Cambridge University Press, 2000.

[4] F. Busetti and A. Harvey. When is a Copula Constant? A Test for Changing

Relationships. Journal of Financial Econometrics, 2010.

[5] Stubbs R.A Ceria S. Incorporating estimation errors into portfolio selection:

Robust efficient frontiers. Technical report, Axioma Inc. Technical Report, 2004.

[6] Roll R. Chakrabarti R. East asia and europe during the 1997 asian collapse: A

clinical study of a

nancial crisis. Journal of Financial Markets, 5(1):1–30, 2002.

[7] A. Charnes and W.W. Cooper. Programming with linear fractional functionals.

Naval Research Logistics Quarterly, 9(3-4):181–186, 1962.

[8] U. Cherubini and E. Luciano. Copula methods in finance. John Wiley & Sons,

2004.

[9] V.K. Chopra and W.T. Ziemba. The effect of errors in means, variances, and co-

variances on optimal portfolio choice. Journal of Portfolio Management, 19(2):6–

11, 1993.

[10] G. Connor, L. R. Goldberg, and R. A. Korajczyk. Portfolio Risk Analysis.

Princeton University Press, 2010.

111

[11] R. Cont. Empirical properties of asset returns: stylized facts and statistical

issues. Quantitative Finance, 1(5):223–236, 2001.

[12] S.J. Devlin, R. Gnanadesikan, and J.R. Kettenring. Robust estimation and

outlier detection with correlation coefficients. Biometrika, 62(3):531–545, 1975.

[13] A. Dias and P. Embrechts. Dynamic copula models for multivariate high-

frequency data in finance. 2004.

[14] D.C. Dowson and A. Wrag. Maximum-entropy distributions having prescribed

first and second moments. IEEE TRANSACTlONS ON INFORMATION THE-

ORY, 19(5):688–693, 1973.

[15] B. Efron and R. Tibshirani. Bootstrap methods for standard errors, confidence

intervals and other measures of statistical accuracy. Statistical Science, 1(1):54–

75, 1986.

[16] M. A. El-Erian. Looking ahead. Journal of Portfolio Management, 36(2):4, 2010.

[17] J. A. Feigenbaum. More on a statistical analysis of log-periodic precursors to

financial crashes. Quantitative Finance, 1(5):527–532, 2001.

[18] T.S. Ferguson. On the rejection of outliers. In Proceedings of the 4th Berkeley

Symposium on Mathematical Statistics and Probability, pages 253–287, 1961.

[19] R. Frydman and M. D. Goldberg. Imperfect Knowledge Economics: Exchange

Rates and Risk. Princeton University Press, 2007.

[20] R. Frydman and M. D. Goldberg. Financial markets and the State: Long swings,

risk and the scope of regulation. Capitalism and Society, 4(2):Article 2, 2009.

[21] C. Genest, B. Remillard, and D. Beaudoin. Goodness-of-fit tests for copulas: a

review and power study. Insurance: Mathematics and Economics, 44(2):199–213,

2009.

[22] A. Gilio. Goodness-of-fit tests for copulas: a review and power study. Lecture

Notes in Computer Science, 945:200–209, 1995.

[23] J. Goodman. A comment to the maximum entropy principle. Risk Analysis,

7(2):269–272, 1987.

112

[24] F.E. Grubbs. Procedures for detecting outlying observations in samples. Tech-

nometrics, 11(1):1–21, 1969.

[25] H.A. Hauksson, M. Dacorogna, T. Domenig, U. Muller, and G. Samorodnitsky.

Multivariate extremes, aggregation and risk estimation. Quantitative Finance,

1(1):79–95, 2001.

[26] U. Herold. Computing implied returns in a meaningful way. Journal of asset

management, 6(1):53–64, 2005.

[27] E.T. Jaynes. Information theory and statistical mechanics. The Physical Review,

106(4):620–630, 2006.

[28] N.L. Johnson. Systems of frequency curves generated by methods of translation.

Biometrika, 36(1/2):149–176, 1949.

[29] E. Jondeau and M. Rockinger. Optimal portfolio allocation under higher mo-

ments. European Financial Management, 12(1):29–55, 2006.

[30] C.P Kindleberger and R. Aliber. Manias, Panics, and Crashes - A History of

Financial Crisis. Wiley Investment Classics, 2005.

[31] E. Kole, K. Koedijk, and Verbeek M. Selecting copulas for risk management.

Journal of Banking & Finance, 31(8):2405–2423, 2007.

[32] M. Kritzman and Y. Li. Skulls, financial turbulence and risk management. Fi-

nancial Analysts Journal, 66(5):30–41, 2010.

[33] J. Kwiatkowski and R. Rebonato. A coherent aggregation framework for stress

testing and scenario analysis. accepted for publication in Applied Mathematical

Finance, 2010.

[34] H. W. Lilliefors. On the Kolmogorov-Smirnov test for normality with mean and

variance unknown. Journal of the American Statistical Association, 62(318):399–

402, 1967.

[35] F. Lillo and R.N. Mantegna. Symmetry alteration of ensemble return distribu-

tion in crash and rally days of financial markets. European Physical Journal B,

15(4):603–606, 2000.

[36] F. Longin and B. Solnik. Is the correlation in international equity retuns constant:

1960-1990? Journal of International Money and Finance, 14(1):3–26, 1995.

113

[37] Y. Malevergne and D. Sornette. Testing the gaussian copula hypothesis for

financial assets dependences. Finance 0111003, EconWPA, November 2001.

[38] B. Mandelbrot. The variation of certain speculative prices. The Journal of

Business, 36(4):394–420, 1963.

[39] H. Markowitz. Portfolio selection: efficient diversification of investments. Black-

well, 1959,1991.

[40] H. Markowitz. Mean-Variance Analysis in Portfolio Choice and Capital Markets.

Blackwell, 1987.

[41] R. Mashal and Zeevi A. Beyond correlation: Extreme co-movements between

financial assets, 2002.

[42] R. Merton. Optimum Consumption and Portfolio Rules in a Continuous Time

Model. Journal of Economic Theory, 3(4):373–413, 1971.

[43] A. Meucci. Beyond Black-Litterman: views on non-normal markets. RISK, pages

87–92, February 2006.

[44] A. Meucci. Risk and Asset Allocation. Springer, 2007.

[45] H. Moskowitz and R.K. Sarin. Improving the consistency of conditional prob-

ability assessment for forecasting and decision making. Management Science,

29(6):735–749, 1983.

[46] D. Nowakowski and N. Gurushina. The history of riks: Correlations, volatilities

and z-scores. Technical report, Roubini Global Economics, 2010.

[47] Bank of International Settlements. Principles for sound stress testing. 2009.

[48] D.J. Olive. Applied Robust Statistics. 2008.

[49] J. Olmo. Extreme value theory filtering techniques for outlier detection. Tech-

nical report, City University Economics Discussion Papers No 09/09, 2009.

[50] Davidson P. Risk and uncertainty. In Robert Skidelsky and Christian Westerlind

Wingstrom, editors, The Economic Crisis and the State of Economics. Palgrave

Macmillan, 2010.

[51] B. R. Parresol. Recovering parameters of Johnsons SB distribution. Technical

report, United States Department of Agriculture Research Paper SRS31, 2003.

114

[52] J. Pearl. Causality: models, reasoning, and inference. Cambridge University

Press, 2009.

[53] W. Poston, E.J. Wegman, C.E. Priebe, and J.L. Solka. A deterministic method

for robust estimation of multivariate location and shape. Journal of Computa-

tional and Graphical Statistics, 6(3):300–313, 1997.

[54] R. Rebonato. A Bayesian approach to coherent stress testing. In Rethinking Risk

Measurement. Incisive Media, 2010.

[55] R. Rebonato. A Bayesian approach to stress testing and scenario analysis. Jour-

nal of Investment Management, 8(3):1–13, 2010.

[56] R. Rebonato. Coherent Stress Testing – A Bayesian Approach to Financial

Stress. John Wiley, 2010.

[57] R. Rebonato and J. Chen. Evidence for state transition and altered serial co-

dependence in us interest rates. Quantitative Finance, 9(3):259–278, 2009.

[58] R. Rebonato and V. Gaspari. Analysis of drawdowns and drawups in the US

interest-rate market. Quantitative Finance, 6(4):297–326, 2006.

[59] B. Remillard, N.A. Papageorgiou, and F. Soustra. Dynamic Copulas. SSRN

eLibrary, 2010.

[60] J-C. Rochet. The future of banking regulation. In J-C Rochet M Dewatripont

and J Tirole, editors, Balancing the Banks - Global Lessons from the Financial

Crisis. Princeton University Press, 2010.

[61] P.J. Rousseeuw and A.M. Leroy. Robust Regression and Outlier Detection. John

Wiley, 1987.

[62] P.J. Rousseeuw and S. Van Aelst. Minimum volume ellipsoid. Wiley Interdisci-

plinary Reviews: Computational Statistics, 1(1):71–82, 2010.

[63] P.J. Rousseeuw and K. VanDriessen. A fast algorithm for the minumum covari-

ance determinant estimator. Journal of the American Statistical Association,

41(3):212–223, 1999.

[64] P. Samuelson. Lifetime portfolio selection by dynamic stochastic programming.

The review of economics and statistics, 51(3):239–246, 1969.

115

[65] S. Satchell and A. Scowcroft. A demystification of the Black - Litterman model:

Managing quantitative and traditional portfolio construction. Journal of Asset

Management, 1(2):138–150, 2000.

[66] C. Schinkus. Is econophysics a new discipline? the neopositivist argument.

Physica A, 389(18):3814–3821, 2010.

[67] C. Schluter and M. Trede. Identifying multiple outliers in heavy-tailed distri-

butions with an application to market crashes. Journal of Empirical Finance,

15(4):700–713, 2002.

[68] W.F. Sharpe. Expected utility asset allocation. Financial Analyst Journal,

63(5):18–30, 2007.

[69] R.J. Shiller. Irrational Exuberance. Princeton University Press, 2005.

[70] D. Sornette. Significance of log-periodic precursors to financial crashes. Quanti-

tative Finance, 1(4):452–471, 2001.

[71] D. Sornette. Critical Phenomena in Physical Sciences. Springer Verlag, 2004.

[72] D. Sornette. Dragon-kings, black swans and the prediction of crises. International

Journal of Terraspace Science and Engineering, 2(1):1–18, 2009.

[73] C. Stein and Nassaj F. Inadmissibility of the usual estimator for the mean of a

multivariate normal distribution. In Proceedings of the 3rd Berkeley Symposium

on Probability and Statistics, 1955.

[74] J.C. Strelen and Nassaj F. Analysis and generation of random vectors with

copulas. In WSC ’07: Proceedings of the 39th conference on Winter simulation,

2007.

[75] D.M. Titterington. Optimal design: Some geometrical aspects of d-optimality.

Biometrika, 62:311–320, 1975.

[76] R. Tzani and A. P. Polychronakos. Correlation breakdown, copula credit models

and arbitrage. GARP Risk Review, pages 27–37, December 2008.

[77] J. Walters. The Black-Litterman model in detail. 2009.

[78] J. Williamson. Bayesian nets and causality. Oxford University Press, 2005.

116

coherent asset allocation and diversi cation in the ... · 18-04-2011 · coherent asset...

Documents