coherent asset allocation and diversi cation in the ... · 18-04-2011 · coherent asset...
TRANSCRIPT
Coherent Asset Allocation andDiversification in the Presence of
Stress Events
Alexander Denev
Christ Church
University of Oxford
A thesis submitted in partial fulfillment of the MSc in
Mathematical Finance
April 18, 2011
2
Acknowledgements
I would like to thank Dr. Riccardo Rebonato not only for his precious
advice but also for giving me the chance to work on such intriguing topic.
Working on it has been real fun and pleasure for me. I would like also
to thank all my friends and my mother for their continuous support and
encouragement in finishing this thesis.
Abstract
The backward looking approach of Markowitz theory and even of some
of its improved versions do not reflect the necessity to consider informa-
tion present in the current world about events that might create market
dislocations. Information about such events might be absent from any
historical dataset but not taking it into consideration could seriously un-
derstate the risk for many asset classes. This necessity is underlined in
numerous publications by scholars, practitioners and regulators, especially
after the subprime debacle 2007-2009. Such mood is already incorporated
in investors’ strategies who protect their portfolios with insurance trades
in case a an extreme event or series of such events materializes. However,
this strategy is realized ex post i.e. after running mathematical mod-
els to calculate optimal asset allocations, as an afterthought. Having a
mathematical tool to satisfy both the need for diversification and forward
looking information about stress events in an integrated fashion is still
missing. This thesis tries to fill this gap by introducing a new framework
for asset allocation in the presence of stress events carried out in a co-
herent way and not ad hoc, as an afterthought. We propose a procedure
that blends standard statistical techniques such as correlations, copulae
etc. with a subjective Bayesian approach based on causality assumptions
to model stress events. The approach is highly transparent, auditable and
easy to be understood by non-specialists.
Because the practical application of the procedure is of extreme impor-
tance we introduce also some original tools that ease its implementation,
its robustness and its calculation speed. More specifically, we present i) a
maximum entropy approach for the degree of confidence in the model ii)
a fast numerical technique which avoids continuous repetitions of Monte
Carlo simulations iii) a linear programming technique which helps the
estimation of probabilities iv) the concept of stressed efficient frontier.
Finally, we suggest a link with the Black-Litterman model which becomes
in this context a particular case of our approach.
Contents
1 Introduction 1
1.1 Outline of the Problem and Dataset . . . . . . . . . . . . . . . . . . . 6
2 Statistics 8
2.1 Identification of Normal Data . . . . . . . . . . . . . . . . . . . . . . 8
2.1.1 Some Remarks About the MVE, MCD and Related Methods . 14
2.2 Constructing the Marginal Distribution of the ‘Normal’ Risk Factors . 24
2.3 Constructing a Joint Distribution of ‘Normal’ Risk Factors . . . . . . 27
3 Causality 36
3.1 Treatment of Exceptional Events . . . . . . . . . . . . . . . . . . . . 36
3.2 Bayesian nets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.3 Constructing a Bayesian Net . . . . . . . . . . . . . . . . . . . . . . . 39
3.3.1 Building the Marginal and Conditional Probability Tables . . 41
3.3.2 Linear Programming . . . . . . . . . . . . . . . . . . . . . . . 42
3.3.3 From the Conditional Probability Tables to the Joint Probabil-
ities of Rare Events . . . . . . . . . . . . . . . . . . . . . . . . 45
4 Fusion 46
4.1 Mapping from the ‘Normal’ Joint Distribution to the ‘Normal’ Return
Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.2 Mapping from the Exceptional Joint Distribution to the Exceptional
Return Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.3 Splicing the ‘Normal’ and ‘Exceptional’ Portion of the P&L Distribution 47
4.4 Maximization of the Utility . . . . . . . . . . . . . . . . . . . . . . . 48
4.4.1 Numerical optimization . . . . . . . . . . . . . . . . . . . . . . 50
4.5 Maximum Entropy for the choice of k . . . . . . . . . . . . . . . . . . 51
i
5 A Worked-Out Example 56
5.1 Construction of the Bayesian Net . . . . . . . . . . . . . . . . . . . . 56
5.2 Discussion of the Results . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.3 Maximum entropy allocations . . . . . . . . . . . . . . . . . . . . . . 65
5.4 A fast algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.4.1 Gaining intuition . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.4.2 Generalizing the results . . . . . . . . . . . . . . . . . . . . . 74
6 Conclusions 78
6.1 Future research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
A Total moments 81
A.0.1 First moment . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
A.0.2 Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
A.0.3 Covariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
A.0.4 Utility maximization . . . . . . . . . . . . . . . . . . . . . . . 83
B Fusion with the Black-Litterman approach 85
B.1 Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
B.2 Investor views and implied drifts . . . . . . . . . . . . . . . . . . . . 87
B.2.1 Implied drifts in the case of knowledge contained in the market 88
B.2.2 Implied drifts in the case of specific knowledge . . . . . . . . . 90
B.2.3 Summing up . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
C Efficient frontier 92
D Influence plots 98
E Truncated distributions 100
F The Johnson distribution 102
G Truncation algorithm applied to a subset 104
Bibliography 104
ii
List of Figures
2.1 Robust Mahalanobis distance vs date of observation. The major crisis
are clearly visible in terms of the concentration of large values of the
robust distance: the Russian crisis, the dot-com crisis and the recent
subprime debacle. The red line corresponds to χ24(0.975) = 3.34. The
parameter h was set to [0.75N ] (this is also the value adopted in [63]).
The two peaks on the most right part of the figure correspond to the
dates immediately after Lehman’s default: 15 and 17 September 2008.
The highest peak in the middle is the reopening of the stock exchange
after 9/11. The highest peak on the left is the 31 August 1998, a few
days after Russia defaulted on its debt. . . . . . . . . . . . . . . . . . 11
2.2 Volume of the ellipsoid as a function of the number of points removed 15
2.3 Variation for Fig 2.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.4 Minimum covariance determinant as a function of the number of points
removed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.5 Variation for Fig 2.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.6 Correlation matrix determinant as a function of the number of points
removed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.7 Variation for Fig. 2.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.8 Correlations as a function of the number of points removed. The sym-
bol ρ signifies correlation and the subscripts C, B, M and E denote
Credit, Bond, Mortgage and Equity, respectively, as described in the
Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.9 Variation for Fig. 2.8 . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.10 Variation in the eigenvalues of the correlation matrix as a function of
the number of points removed. The eigenvalues are ordered from the
smallest λ1 to the largest λ4. . . . . . . . . . . . . . . . . . . . . . . . 20
2.11 Body (circles) and outliers (stars) after the removal of 160 points. . . 21
iii
2.12 Robust Mahalanobis distance vs date of observation of the ’body’. The
red line corresponds to χ24(0.975) = 3.34. The parameter h was set to
[0.75N ]. For this graph the outliers were not removed but replaced
by the average of the dataset in order to have historical comparability
with Figure 2.1. The extreme peaks of Figure 2.1 are no longer present. 21
2.13 Influence plot of the standardized returns of Bond and Credit. The in-
fluence contour lines show by how much approximately the correlation
coefficient vary if we remove a single point sitting on that line. The red
points represent the outliers eliminated by the truncation algorithm.
Some outliers may appear as not influential on the two dimensional
graph but we must remember that the truncation was carried out in
a 4-dimensional space. On a such higher dimensional space they are
influential. The method used here to draw the plot is the one presented
in Appendix D. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.14 Influence plot of the standardized returns of Bond and Equity. See
Fig 2.13 for an explanation. . . . . . . . . . . . . . . . . . . . . . . . 23
2.15 The fit to the S&P daily returns obtained using a Gaussian and a
t-Student distribution. . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.16 The Q-Q plot for the two fits in Figure 2.15. . . . . . . . . . . . . . . 26
2.17 Scatterplot between Bond-Mortgage for the five different subsets. In
the low-right corner the scatterplot on the entire dataset is shown. The
correlations are respectively 88.1%, 87.1%, 93.1%, 91.3%, 81.6% and
86.5%. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.18 Scatterplot between Bond-Equity for the five different subsets. In the
low-right corner the scatterplot on the entire dataset is shown. The cor-
relations are respectively −13.1%, −43.2%, −2.9%, −29.6%, −43.5%
and −22.2%. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.19 Correlation between Bond and Equity for the entire dataset. The cor-
relation is calculated within a sliding window of 250 datapoints. . . . 34
2.20 Gaussian copula. Cumulative distributions of the four distances used
to assess the GOF of the copula for the last subset. . . . . . . . . . . 34
2.21 t copula. Cumulative distributions of the four distances used to assess
the GOF of the copula for the last subset. . . . . . . . . . . . . . . . 35
2.22 Scatterplot of the generated random numbers with the fitted copula
(and mapped inversely from [0, 1] to the real axis with the help of the
fitted marginals) for Bond and Mortgage. . . . . . . . . . . . . . . . . 35
iv
3.1 Example of a Bayesian net . . . . . . . . . . . . . . . . . . . . . . . . 40
5.1 The Bayesian net used for the example in this Section . . . . . . . . . 57
5.2 The asset allocation to the four sub-portfolios as a function of the total
probability mass of being in a normal state obtained with a logarithmic
utility function. The allocation on the far right is the Markowitz-like
allocation that would apply in an exclusively normal world. . . . . . . 60
5.3 Same as Fig 5.2 for a power utility function β = 0.8. . . . . . . . . . . 61
5.4 Same as Fig 5.2 for a power utility function β = 1.2. . . . . . . . . . . 61
5.5 Allocations with a two-moment simplified optimization approach for a
logarithmic utility function. . . . . . . . . . . . . . . . . . . . . . . . 62
5.6 Allocations as a function of the degree of confidence - truncated Gaussian 65
5.7 Allocations as a function of the degree of confidence - Johnson . . . . 69
5.8 Logarithmic utility function for two sets of returns as the allocations
vary. The two sets are: plane with squared markers - r = (1%,−4%,−5%);
plane without markers - r = (−1%, 2%, 3%). . . . . . . . . . . . . . . 73
5.9 Allocations as a function of the normalization factor in the case: Monte
Carlo for each k - thin line, numerically inferred - thick line . . . . . . 77
C.1 Stressed efficient frontier: k = 0 - blue, k = 0.1 - red, k = 0.15 - green,
k = 0.2 - yellow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
C.2 Standard deviation of the minimum variance portfolio as a function of k 96
C.3 Concavity of the efficient frontier as a function of k . . . . . . . . . . 97
C.4 Maximum entropy efficient frontier, frontier at k = 0 and frontier ob-
tained with the covariance matrix and drifts before truncation of the
outliers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
F.1 Johnson distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
G.1 Volume of the ellipsoid as a function of the number of points removed 104
G.2 Variation for Fig G.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
G.3 Minimum covariance determinant as a function of the number of points
removed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
G.4 Variation for Fig G.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
G.5 Correlation matrix determinant as a function of the number of points
removed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
G.6 Variation for Fig. G.5 . . . . . . . . . . . . . . . . . . . . . . . . . . 107
v
G.7 Correlations as a function of the number of points removed. The sym-
bol ρ signifies correlation and the subscripts C, B, M and E denote
Credit, Bond, Mortgage and Equity, respectively, as described in the
Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
G.8 Variation for Fig. G.7 . . . . . . . . . . . . . . . . . . . . . . . . . . 109
G.9 Variation in the eigenvalues of the correlation matrix as a function of
the number of points removed. The eigenvalues are ordered from the
smallest λ1 to the largest λ4. . . . . . . . . . . . . . . . . . . . . . . . 110
vi
Chapter 1
Introduction
In the late 50s Markowitz laid down the foundations of the modern asset allocation
approach [39][40] through a new theory which since then has become one of the main
pillars of investment decisions. Although it has several limitations, the main idea
of diversification, which is at its heart, is still deeply rooted in investor’s mindset
dictating allocation strategies. The idea which was not new at the time but had
been known since the times of the Venice merchants is reworded sometimes simply
as ”do not put all the eggs in the same basket”. As simple as that, but cast in the
powerful language of mathematics, it gave a quantitative dimension of concepts like
risk and return and their mutual dependence and trade-off. Such notions become
more intuitive by making use of powerful representations such as the efficient frontier
which gives a straightforward bird-eye grasp of how these quantities are related. One
of appealing features of the theory, and a major drawback, is its mathematical sim-
plicity. By making simplistic assumptions, closed form solutions are derived, subject
to an immediate understanding and easiness of calculation. But even if the main
Markowitz idea did not lose its momentum, the theory in its original form has been
hardly used and has undergone several criticisms and improvements over the decades.
Let’s start with some stylized facts of asset returns series. In first place, especially
for daily data, it widely known they are not well described by normal distributions, as
it is sometimes assumed. More than 6σ events populate the tails of the distributions
quite more frequently than expected when postulating normality1. Not yet clear in
the 50s this fact was crystallized as an increasing quantity of high-frequency data
became available2.
1On longer time scales, however, financial time series exhibit aggregational gaussianity i.e. thenormal approximation becomes increasingly better. See, e.g., [10] Chapter 1, Figure 1.3a to 1.3f,and the discussion in the text.
2Mandelbrot in the 60s even advanced the hypothesis that financial returns series could haveinfinite variance [38].
1
Second, volatility exhibits clustering behaviour: periods of high and low volatility
seem to alternate in contrast with the homoscedastic assumptions of Markowitz.
Third, correlations are assumed constant in the model while an overwhelming
evidence has been given, that they are prone to increase during turbulent times 3.
Without listing all the stylized statistical facts of returns series4 we endorse the
idea that any improvement of portfolio models, in general, should take into account
the dichotomy of turbulent/non-turbulent periods between which statistical proper-
ties show different patterns. In other words, we start from the fact that there are
periods in which statistical frequentist description of financial series derived from
non-distressed conditions breaks down and the listed stylized facts become more and
more visible: correlations change dramatically, volatility increases, tail events happen
frequently, drifts change. Our view is that such regimes must be taken into account
for a robust asset allocation. This sentiment is well exemplified by in the following
quote by Mohamed A. El-Erian, PIMCO’s CEO [16]:
‘...[Recent] developments will serve to further highlight the danger of
[...] being overly committed to an historical policy portfolio whose rigid
backward-looking characterization no longer corresponds to the realities
of today and tomorrow....’ ‘...[W]ith its conventional (or, to be more pre-
cise, reduced-form) analytical foundation now subject to some motion, it
will become even more difficult to rely just on a traditional portfolio di-
versification as both necessary and sufficient to deliver high returns and
mitigate risks. Diversification will remain necessary, but a lot more atten-
tion will be devoted to the appropriate specification of tail scenarios and
tail hedges...’
This view is gaining currency not only among practitioners and many economists , but
even among regulators, who had been wedded until recently to a strictly frequentist
view of risk. See, for instance, the views recently expressed by the Basel Committee:
’...The financial crisis has shown that estimating ex ante the probabil-
ities of stress events is problematic. The statistical relationships used to
3As Longin and Solnik show empirically in a seminal paper [36] by using extreme value theory,the null hypothesis of multivariate normality for the negative tail of the joint return distribution forthe examined equity indices can be rejected, but not for the positive tail i.e. markets become morecorrelated when they are bearish. For an empirical summary about how correlation changed duringthe global financial crisis see [46]. For an empirical study of the joint extremes of foreign exchangeseries see [25].
4A detailed summary is given in [11].
2
derive the probability tend to break down in stressed conditions. In this
respect, the crisis has underscored the importance of giving appropriate
weight to expert judgment in defining relevant scenarios with a forward
looking perspective...’ - The Basel Committee [47]
Another sentence is quite to the point [60]:
’...A systemic crisis is by definition rare event, and statistical models
drawing on data from ”normal” periods are inappropriate; for example,
it is well known that correlations between the returns of financial assets
increase considerably during crisis periods...’
One of the paths followed to overcome the understatement of risk by normality as-
sumptions is the use of heavy tailed marginal distributions and non-gaussian copulas.
The t-student distribution, for example, with its power law decaying tails could be
used to accommodate arbitrary large returns. Several studies indeed show that it
provides better fits for financial series, especially at high frequencies5.
Another line of research tries to model the heavy tails of the distributions sta-
tistically through the Extreme Value Theory (EVT). The approach consist of fitting
to the extreme values over a given threshold a statistical distribution of their own
while modelling the center with another suitable one (Normal, empirical etc.). While
in these both approaches risk seems no longer understated, the backward-looking
stance, criticized also in the above quotations, could raise some objections to their
validity.
Let’s consider, for example, some of the major crisis that have characterized the
last two decades: the Mexican and Russian crisis, the Asian crisis, the dot-com bubble
and the global financial turmoil 2007-2009. The cause of each of them was different
as well as the underlying transmission mechanisms, the markets affected by conta-
gion and the asset classes involved. Generally speaking, it is very hard to find by
scrutinizing history crises ignited by the same initial series of events and spreading
through the same asset classes and countries. Even if theories do exist about the
cyclicality and repeatability of crises6, the observed pattern of appearance and its
time regularity in the last century was very different. Minsky’s model7, for example,
tries to give a general explanation and anatomy of a typical crisis and the underlying
5See again [10] Chapter 10.6For instance, the Kitchin inventory cycle of 39 months, the Kuznets cycle of 20 years linked to
the housing construction.7See [30] for an introduction.
3
economic mechanism linked to the expansion of the credit supply was confirmed on
many occasions8. However, since the objective of an asset allocation theory is not to
find a universal economic generating mechanism (like Minsky’s theory) but to choose
an optimal allocation to asset classes, the level of detail needed for this task is more
granular and it is at asset class level and below. Thus ”details” become important
when an investor is faced with the decision where to allocate wealth particularly in
regard to which asset class could be hit and its co-dependence with the other classes.
All these details could be different for each situation of market distress. For instance,
by looking at home prices in US after the Second World War up to 2007 nothing
could suggest that this particular asset class would be vulnerable to such a crash
as in 2007-20099. A hypothetical investor at the beginning of 2007 could have been
easily fooled by trying to make a forecast based on a time series that for the previous
60 years had shown only small deviations from otherwise a positive trend.
A so called school of ”econophysics”10 has analyzed the signatures of extreme val-
ues ([35], [72]). According to such school, the most extreme events form a statistical
class of their own separated from the normal times paradigm. Assuming a normal
distribution for the data would signify postulating less ”once in the life of the uni-
verse” probability of occurrence for events with a much more frequent appearance, a
fact confirmed abundantly from empirical findings. Thus, according to them, these
extremes need no to be discarded as outliers but would need a model of their own to
be described11. This is also the view adopted in this thesis.
Econophysicists defend a theory according to which specific signatures in the assets
behaviour could anticipate such extreme movements ([71] [70]). The log-precursors, as
they are called, are peculiar oscillations which are observable just before the crash and
are the results of altered serial co-dependence and amplification of daily price returns.
The underpinning mechanism is the result of interplay between rational agents and
noisy traders. Although, the approach gives nice fits for some of the historical crashes
8”Each crisis also has its unique individual features - the nature of the shock, the object ofspeculation, the form of credit expansion, the ingenuity of the swindlers [...]; Details proliferate;structure abides” Kindleberger [30] in response to a criticism of the Minsky model that every crisisis unique and so not subject to generalizations.
9Shiller [69] shows that inflation-adjusted U.S. home prices increased 0.4% per year from 1890-2004 and 0.7% per year from 1940 - 2004.
10For an interesting comparison in a neopositivist framework of how classical economics andeconophysics differ in their approach to modelling economic phenomena see [66].
11Indeed, Rebonato et al. [58] [57] analyze the drawdown properties of interest rates and suggestthat exceptional events do belong to a different statistical class, as Sornette [71] suggests. However,they also point out that several, and not one, signatures exist, each corresponding to a different wayfor the market to be distressed. These studies suggest that during exceptional periods market dobehave differently, but not always in the same way.
4
and making some successful predictions, universal validity of the theory is still under
question (see [17] for some objections raised to the log-precursors theory). In fact,
even being close to the critical time tc12 there is still implicit randomness in the event
happening because, had it been certain, it could have been forecasted and avoided (or
hedged)13. So, as a consequence, a bubble, instead of bursting, could land smoothly.
Such provision for random outcome makes the theory even less testable. The log-
periodic model led to two false alarms respectively in 1997 and 1999 in the US stock
market.
Absent a convincing proof of the econphysicists claim, we together with many
economists take a rather sceptical view of the ability to provide a frequentist statis-
tical description of future economic events14. The approach presented here indeed
takes such an agnostic stance. With the econophysicists we also believe that extreme
events belong to a class of their own. But we do not assume that market behaviour in
situations of distress displays a universal signature. Neither we believe that Extreme
Value Theory with his backward-looking approach can always take into account the
complexity and joint behaviour of assets under new exceptional circumstances. In-
stead we attempt to specify the distressed co-dependencies and stand-alone properties
of the returns distribution on a case-by-case basis by making use of our (imperfect
but useful) understanding of how the world works. As we shall see, the exceptional
co-dependencies will be a direct output of the specific causal links between the drivers
of exceptional returns that the asset manager will have identified.
Of course, the idea of complementing statistical approaches with stress testing and
scenario analysis is not new. Asset managers have been aware for a long time of the
exceptional and unpredictable market behaviour in conditions of distress and have
traditionally tried to defend their portfolios by adding, after the optimization, vari-
ous types of insurance trades (e.g., out-of-the- money puts). However, these defensive
positions have typically been added to optimized portfolios as an afterthought, and
12The critical time tc is the most probable time of occurrence of the crash.13In fact this is the only way of being compatible with the rational expectation hypothesis on
which the model is based. For a discussion of the compatibility of the coexistence of bubbles andrational expectations see [2].
14For instance, Rochet [60] points out, ’financial risks are not exogenous, but arise from thebehavior of economic agents. Consequently, the statistical distribution of [...] financial returns isnot stationary but depends critically on the economic and regulatory environment, as well as theindividual incentives confronting the many economic agents who participate in different aspects offinancial intermediation’. See also Davidson [50], page 18 and passim, for a discussion of limits of theergodic theorem, or the body of work that goes under the rubric of Imperfect Knowledge Economicsfor a more general criticism of the predictability of economic change (see, e.g., Frydman et all. [20][19]).
5
outside a coherent probabilistic and utility-maximization framework. Such ad hoc
treatment may seem quite arbitrary and lacking the desired quantitative integration
with the statistical modelling part. The purpose of this thesis is to overcome such
drawback and to give a recipe of how this integration can be done ex ante in a coherent
way. It will make use of standard statistical techniques such as distributions, copulae,
correlations etc. together with a Bayesian approach based on causality assumptions.
In other words, we claim that ‘most of the times’ markets behave regularly, i.e., display
regularities (both in the marginal distributions and in the co-dependence structure)
that can be uncovered using frequentist statistical techniques. As for tail events, sub-
jective probabilities (whose assignment obviously also takes into account frequentist
information when appropriate) and assumptions about the causal structure of tail
events then provide the required complement to the frequentist information. Given
these two inputs, the approach proposed in this thesis blends them in a new kind
of portfolio optimization framework. Its subjective inputs are transparent, auditable
and challengeable by non-specialists. In this sense the approach proposed here trans-
lates and expands to an asset management context many of the ideas introduced for
stress testing in [56], [55], [54].
The Black-Litterman model (Black et al. [1]) was the first model of this kind with
a forward-looking perspective. It gives possibility to the investor to express views on
how the market will behave and amalgamate his opinions with statistical data in a
very elegant way. We will highlight later the drawbacks and how this approach can
be integrated in the framework we propose here.
1.1 Outline of the Problem and Dataset
In this thesis we will consider the problem of how to choose, given a fixed investment
amount (the budget constraint), the optimal composition of a top portfolio, Π, (ie,
the weights, w, to be given to several sub-portfolios). For simplicity of exposition
we assume that each sub-portfolio is invested in a different asset class and that it
can be associated with a single dominant risk factor. This assumption makes the
exposition clearer, but can be easily relaxed. So, in our example one sub-portfolio
could be a long-only US equity portfolio (and the risk factor would be an appropriate
US equity index); another sub-portfolio could be a portfolio of US investment grade
corporate bonds (and the risk factor would be an investment grade bond index, or,
say, a AA spread) etc. Given the stylized nature of the example, mutatis mutandis
6
we can therefore speak interchangeably of a portfolio, of a risk factor or of an asset
class.
Apart from giving a general recipe, we will illustrate the approach by analyzing
step by step a real world example of the proposed procedure.
The dataset we will use consists of four asset classes, Government Bonds, Investment-
Grade Credit Bonds, Equities and High-Grade RMBS Securities (called asset class
Bond, Credit, Equity and Mortgage, respectively, in the following). More precisely,
the following indices were used15:
• for Bond the BarCap US Treasury Index;
• for Credit the BarCap US Credit Index;
• for Equity the S&P 500;
• for Mortgage the BarCap US MBS Index.
The data set consists of 3360 x 4 daily returns16, covering the period February 1997
to June 2010.
This thesis is organized as follows. In Chapter 2 a technique to obtain ”clean” (in
a sense we will explain later) quantities like correlations and volatilities is presented.
In Chapter 3 the Bayesian nets technology, which is the foundation of the approach, is
introduced. Chapter 4 fuses the ”cleaned” quantities with the output of the Bayesian
net and shows how to perform the optimization. A real world example is presented
and discussed in Chapter 5. Finally, some of the technical details and proofs which
complement the body of the thesis are put in Appendices.
All the coding and figures were done in MATLAB (version 7.10.0.499 R2010a).
15Bloomberg tickers: Bond - LUATTRUU, Credit - LUCRTRUU, Equity - SPX, Mortgage -LUMSTRUU.
1636 non-contiguous missing values in the dataset were linearly interpolated.
7
Chapter 2
Statistics
2.1 Identification of Normal Data
In general, we assume that, for each sub-portfolio, a distribution of (arithmetic re-
turns) is available. The time-tk arithmetic return for sub-portfolio i, rki , is defined
as
rki =xk+1i
pki− 1
where pki denotes the price level of sub-portfolio i at time tk, xk+1i denotes the value
at time tk+1 of the same sub-portfolio plus the (reinvested) dividends accrued from
time tk to time tk+1.
Since we believe that exceptional returns and co-dependencies cannot be mod-
elled statistically but need a forward-looking Bayesian approach, the first step is to
isolate them from ’normal’ returns1. In general, the distribution of ’normal’ returns
could display fatter tails then the Gaussian one. This is due ‘mundane’ and ‘regular’
phenomena such as, say, stochastic volatility or small or medium-size jumps which
can give rise to fat tails, without necessarily implying that the market is in a state of
distress. Because of this, in what follows, we will leave the possibility to have such
’normal’ returns modelled with means other than Gaussian distributions.
Several techniques can be employed to identify such ‘body’ of ’normal’ returns of
a distribution. The simplest one is to truncate all the data points over a threshold
(e.g., a given number of standard deviations) in the empirical margins and define as
‘body’ the rest of the data. This procedure is very simple but rather arbitrary. More
sophisticated and ‘objective’ methods have been developed which aim at eliminating
1In this thesis the adjective ‘normal’ is often used referred to data to mean ‘non-exceptional’.To avoid confusion, the word ‘normal’ is never used in the statistical sense, for which sense theword ‘Gaussian’ is invariably used. To further avoid confusion, the adjective ‘normal’ is often placedamong quotes (as in ‘normal’).
8
outliers in the data. The earliest papers on the topic (see, e.g., [24], [18]) focused on
outliers of the Gaussian distribution. More recent studies broaden the idea to fat-
tailed distributions such as t-Student or other power law decaying distributions (see
[49] and [67]). The central statement is again that, although heavy-tailed distribution
are used by definition to accommodate ‘large’ returns, there are still values which are
so extreme that they cannot be explained by the hypothesized distribution. The
outcome of these studies is a statistically formalized test procedure.
The focus of these studies is the unidimensional case. This could hide some inter-
esting dynamics in the joint behaviour of asset classes. A useful multi-dimensional
technique is based on the volume of the minimum volume ellipsoid (MVE) that con-
tains the data2 and the minimum covariance determinant (MCD). Both MVE and
MCD belong to the class of the so called high-breakdown estimators, highly resistant
to outliers.
Let’s consider a set of N data points (asset returns in our case) r = (r1, r2, ..., rn)
each defined in p dimensions. For each of the points in the dataset we can define the
Mahalanobis distance:
MD(rj) =√
(rj − r)TS−1(rj − r) (2.1)
where T means transposition, S is the sample covariance matrix and r is the sample
mean of the p dimensional vectors r. If data come from a multivariate normal distri-
bution the Mahalanobis distances follow a χ2p distribution. It would be intuitive to
define the body of data as the locus of points such that:
L ≡ r;MD(r) ≤√χ2p(α)
where α represents a predetermined confidence level3. This approach, however, has
an intrinsic problem. The covariance matrix S and the means r in (2.1) are both
estimated from the sample. Therefore the presence of outliers might inflate heavily
these estimates and if we make use of (2.1) we could miss some potential outliers
which will appear as ’normal’ points. This outcome is called masking effect. The
MVE and MCD are measures that allow to overcome such issue. Let’s define first the
MVE [62]:
2See [61] [62] [63] for an introduction. A detailed description can be also found in [44].3For the use of the Mahalanobis distance to discern periods of financial turbulence see also [32],
an article published during the writing of this thesis. However, the use of the ’pure’ Mahalanobisdistances has a severe drawback, described in what follows, which we will overcome with the help ofthe MVE and MCD.
9
Definition 1: The MVE location estimator t and scatter estimator C minimize
the determinant of C subject to the condition
#i; (ri − t)TC−1(ri − t) ≤ c2 = h
where the minimization is over all t ∈ Rp and C ∈ PDS(p), the class of positive
definite symmetric matrices of size p
In this definition h is a number lying between [N/2] + 1 and N where [] means
integer part. The symbol # stands for number of points. Therefore, the MVE is an
estimator of the center and the scatter of the h points that are the most concentrated
in a given dataset. The constant c can be chosen according to different criteria. One
such criterion leads to the choice c as a consistent estimator of the covariance matrix
for data coming from a multivariate normal distribution i.e. c =√χ2p(α) where
α = h/n. We define the MCD as follows:
Definition 2: The MCD location estimator t is the average of h points and the
scatter estimator is the minimum determinant covariance matrix C subject to the
following condition:
#i; (ri − t)T C−1(ri − t) = h
Once these estimators are found they can be used in (2.1) to calculate the robust
Mahalanobis distances and label as outliers observations which robust distance ex-
ceeds a give threshold. However, given h points in the p-dimensional space finding
the MVE or MCD, means sweeping through(N
h
)=
N !
(N − h)!h!
possible combinations i.e. computational complexity O((Nh
)). Even for small datasets
this might be infeasible. Fortunately, methods exists that allow to overcome this
computational issue. A very fast method called FAST-MCD which finds the minimum
covariance determinant can be found in [63]4. As an illustration, we show in Fig.2.1
the robust Mahalanobis distances calculated with the FAST-MCD approach as a
function of the observation date in the dataset. As already explained, a possible
criterion to identify the body of the data is to cull all the points above a certain
threshold which for 4 assets might be χ24(0.975) = 3.34.
Another method based on the MVE/MCD philosophy is proposed in [53] and [44].
We will describe it in detail because it will the main method we will use to decide
4In the same paper the author recommends the use of the MCD over the MVE for reasons ofstatistical efficiency and computation speed. However, we find in what follows that the MVE cangive some additional insight and can be used as a complement.
10
Figure 2.1: Robust Mahalanobis distance vs date of observation. The major crisisare clearly visible in terms of the concentration of large values of the robust distance:the Russian crisis, the dot-com crisis and the recent subprime debacle. The red linecorresponds to χ2
4(0.975) = 3.34. The parameter h was set to [0.75N ] (this is also thevalue adopted in [63]). The two peaks on the most right part of the figure correspondto the dates immediately after Lehman’s default: 15 and 17 September 2008. Thehighest peak in the middle is the reopening of the stock exchange after 9/11. Thehighest peak on the left is the 31 August 1998, a few days after Russia defaulted onits debt.
11
which points to ban as outliers. The procedure consists of three separate routines
combined together.
Routine A - Calculation of the smallest ellipsoid that contains h obser-
vations5
Step 1: Given a set of h observations define the initial weights as follows:
wn =1
h, n = 1, ..., h
Step 2: Compute the location parameter t and scatter C according to
t =1∑h
n=1wn
h∑n=1
wnrn
C =h∑
n=1
wn(rn − t)(rn − t)T
Step 3: Calculate the squared Mahalanobis distance:
Ma2n = (rn − t)TC−1(rn − t), n = 1, ..., h
Step 4: For every n for which Ma2n > 1 change the respective weight to:
wn ← wnMa2n
Step 5: If convergence is reached stop and define the volume of the ellipsoid as:
V = γp√|C|
where γp is the volume of the unit sphere in p dimensions.
γp =πp/2
Γ(p2
+ 1)
otherwise go to Step 2.
Routine B - Removal of the farthest outlier6
Step 1: Define the N × p matrix
U =
r1 − t....
rN − t
5The method in this Routine was originally introduced in [75].6The method in this Routine was originally introduced in [53].
12
where t is the sample mean of the data.
Step 2: By denoting with λn the n-th element of the diagonal of the information
matrix :
λn = (U(UTU)−1UT )nn
remove the observation with the highest value of λn. This by definition is the obser-
vation that changes the most the determinant of the sample covariance matrix.
Routine C - Truncation
Step 1: Start with the entire dataset.
Step 2: Calculate the sample covariance matrix C.
Step 3: Use Routine A to find the location and scatter parameters of the minimum
volume ellipsoid ε.
Step 4: Find the farthest outlier with Routine B and remove it from the data.
Step 5: If the number of data points left equals h = [0.5N ] stop the algorithm.
Otherwise go to Step 2.
The main idea is that by plotting ε and C as a function of the points left in
the dataset the delimiting line between outliers and body is where these value start
to show abrupt changes. Other important quantities can be also monitored during
truncation and their stabilization used to infer a separation line between body and
outliers’ cloud.
An example of this procedure is shown in Figures 2.2 to 2.10. Figure 2.2 shows
the volume of the ellipsoid as a function of numbers of observations removed from the
dataset. Figures 2.4 and 2.6 display the determinant of the covariance and correlation
matrices, respectively, also as function of the number of removed observations. Figures
2.3, 2.5 and 2.7 then show variations in the same quantities. We define variation of a
quantity q as:
δq(i) =q(i+ 1)− q(i)
q(i)
where q(i) is the value of the quantity estimated on the dataset if i observations
are removed.
The FAST-MCD method can be used in the same fashion. Once the robust
distances are determined for the whole dataset, one can start by removing the point
with the largest distance, then the second largest and so on and monitor some key
quantities, such as the covariance matrix determinant, until they stabilize.
13
It is apparent that excluding 100 to 200 points (3% to 5% of the whole data
set) renders the covariance and correlation structure much more stable7. Figures 2.8
and 2.9 show that this is indeed the case by displaying the individual elements of
the correlation matrix among the Bond, Credit, Mortgage and Equity time series as
a function of the number of points removed. Figure 2.10 displays the stabilization
of the four eigenvalues of the correlation matrix. As we shall see, as long as these
stabilization requirements are met, the exact identification of which data points belong
to the body and the tail of the distribution is in itself less important. For the concrete
example examined here we will define the first 160 truncated points as outliers and
the rest as the ’body’. Figure 2.11 shows visually this separation for the Equity,
Bond and Credit returns. In Figure 2.12 the robust Mahalanobis distances after the
removal of the outliers are shown. Finally, an influence plot can be seen in Figure
2.13 and 2.148.
2.1.1 Some Remarks About the MVE, MCD and RelatedMethods
It is important to make a clear distinction between the methods of the previous
section. The Minimum Volume Ellipsoid of h points out of N is defined as the
volume of those h points that are the most concentrated i.e. with the minimum
containing ellipsoid. So, if we are given a set of 100 points and h = 50, we have to
play with all the different combinations of 50 points out of those 100 until we find
those that have the containing ellipsoid with the minimum possible volume. That
ellipsoid is the Minimum Volume Ellipsoid (MVE) for that parameter h. The original
authors (Rousseeuw et al. [61], [62], [63]) first fix h, lets say h = [0.75N ], and then
find the most concentrated set of points for that value of h. Finding those points
means finding C and t in Definition 1. With this metric, obtained for the h most
concentrated points (for h = [0.75N ]), we can calculate the Mahalanobis distance for
7The same procedure but with the FAST-MCD method was tested and it points to the sameconclusions. However, we will not show the plots related to such approach because visually they areextremely close to those presented here.
8We note in closing this section that Bouchaud and Potters [3] find that, by excluding the lowesteigenvalues from the orthogonalization of an empirical correlation matrix obtained from S&P 500returns, the resulting efficient frontier becomes much more stable as a function of the temporalsubsection of data used (eg, first half versus second half). They also find that the purged correlationmatrices become much more stable over time. They point out that, even if an investor availed herselfof the knowledge of the realized returns, investing according to a Markowitz prescription using allthe eigenvectors of the correlation matrix would result in much riskier ex post portfolios than if theinvestment had been made using a small number of stable eigenvalues. Bouchaud and Potters tendto explain this by positing that the lowest eigenvalues are simply picking up noise.
14
Figure 2.2: Volume of the ellipsoid as a function of the number of points removed
Figure 2.3: Variation for Fig 2.2
15
Figure 2.4: Minimum covariance determinant as a function of the number of pointsremoved
Figure 2.5: Variation for Fig 2.4
16
Figure 2.6: Correlation matrix determinant as a function of the number of pointsremoved
Figure 2.7: Variation for Fig. 2.6
17
Figure 2.8: Correlations as a function of the number of points removed. The symbol ρsignifies correlation and the subscripts C, B, M and E denote Credit, Bond, Mortgageand Equity, respectively, as described in the Introduction.
18
Figure 2.9: Variation for Fig. 2.8
19
Figure 2.10: Variation in the eigenvalues of the correlation matrix as a function ofthe number of points removed. The eigenvalues are ordered from the smallest λ1 tothe largest λ4.
20
Figure 2.11: Body (circles) and outliers (stars) after the removal of 160 points.
Figure 2.12: Robust Mahalanobis distance vs date of observation of the ’body’. Thered line corresponds to χ2
4(0.975) = 3.34. The parameter h was set to [0.75N ]. Forthis graph the outliers were not removed but replaced by the average of the dataset inorder to have historical comparability with Figure 2.1. The extreme peaks of Figure2.1 are no longer present.
21
Figure 2.13: Influence plot of the standardized returns of Bond and Credit. Theinfluence contour lines show by how much approximately the correlation coefficientvary if we remove a single point sitting on that line. The red points represent theoutliers eliminated by the truncation algorithm. Some outliers may appear as notinfluential on the two dimensional graph but we must remember that the truncationwas carried out in a 4-dimensional space. On a such higher dimensional space they areinfluential. The method used here to draw the plot is the one presented in AppendixD.
22
Figure 2.14: Influence plot of the standardized returns of Bond and Equity. See Fig2.13 for an explanation.
all the points in the dataset. The reasoning behind the MCD is the same, but we
use Definition 2 instead. The numerical search for the MCD is fast thanks to the
FAST-MCD method. The search for the MVE is slower but, nevertheless, there are
some numerical techniques which render it faster than the brute force search. Once
the Mahalanobis distances are calculated in this way for all the points either with
the MVE metric or with the MCD metric they are called robust distances. They can
then be ordered and a procedure can be employed to eliminate one by one the most
distant points and to monitor variation in some key quantities.
What Meucci [44] does, is to calculate the MVE for all the N datapoints, then
to calculate it for N − 1 datapoints, then for N − 2 and so on. The calculation is
therefore carried out not just on the h = [0.75N ] most concentrated points. This
raises the question of how the MVE can be found for N−1, N−2 etc. datapoints. A
method which, the author states, might be sometimes suboptimal, is what presented
in Routine A, B, C above. We calculate with Routine A the MVE for N points. With
Routine B we eliminate the farthest outlier and thus get N − 1 points. The N − 1
points thus obtained are those that are contained by the MVE. In other words, if,
instead of removing the farthest outlier, we removed another point at random, we
would still have N − 1 points, but the ellipsoid that contains these points will have a
23
bigger volume than the one calculated on the N − 1 points obtained by removing the
farthest outlier, as defined by Routine B. By definition it would not be the minimum-
volume ellipsoid. To rephrase, if we want to find the MVE of N − 1 points out of N ,
we have two possibilities:
1. Play with all the possible N − 1 points out of N until we get to the N − 1
points with the smallest volume of the containing ellipsoid.
2. Start from N points, remove the farthest outlier, calculate the ellipsoid volume
for the remaining ones and we can be reasonably confident that this is the ellipsoid
with the minimum volume, i.e. the MVE.
Procedure 2 is, of course, faster and simpler than procedure 1 (which, if imple-
mented naively, is combinatorial), but it is not guaranteed to yield the truly minimum-
volume ellipsoid. We use the safe procedure 2 in our study, but we have checked that,
for the data we have been using, the two approaches give qualitatively very similar
results9.
2.2 Constructing the Marginal Distribution of the
‘Normal’ Risk Factors
Once the ‘normal’ portion of the data has been identified as suggested in the previous
section, a parametric distribution can be fitted to the empirical distribution of ‘nor-
mal’ returns obtained for each individual risk factor. Depending on the risk factor
and the sampling interval, a Gaussian distribution may or may not be adequate. For
daily returns, even the body of most time series is not well described by a Gaussian
distribution. We find that a Student-t distribution does an acceptable job for our
dataset.
As an example, Figures 2.15 and 2.16 present the unidimensional fit in the form
of a histogram and a Q-Q plot for the asset class Equity using the Gaussian and the
t-Student distributions10. Let’s define the empirical cumulative distribution by:
Fn(x) =1
n
n∑i=1
IXi≤x
9We highlight in closing that for the application of the MVE/MCD approach we described,instead of inspecting figures and decide visually a cut-off which separates body from outliers, in apractical implementation, an automatic criterion/criteria to terminate the truncation can be used.For example, whenever the variation of the correlation matrix determinant falls below a certainthreshold T , e.g. T = f(1/N), the truncation routine could be stopped.
10The maximum likelihood principle was used to estimate the parameters.
24
Figure 2.15: The fit to the S&P daily returns obtained using a Gaussian and a t-Student distribution.
where IXi≤x is the indicator function, equal to 1 if Xi ≤ x and equal to 0 otherwise,
and the KS distance as:
Dn = supx|Fn(x)− F ∗(x)|
where F ∗(x) is the hypothesized distribution with parameters estimated from the
sample. Since the parameters are not known but estimated, the KS cannot be applied
directly, at least by using the commonly tabulated critical points. We can resort
to Monte Carlo simulation11 in order to find the distribution of D and hence the
confidence levels.
The results are reported in Tab. 2.1 for 10,000 simulations. The better fit of the
t-Student distribution is evident.
11For the case of KS test in case of unknown mean and variance for the normal distribution, see[34]. The method to infer a distribution for the distance D will be very well described in Section 2.3in the context of copula fitting.
25
Figure 2.16: The Q-Q plot for the two fits in Figure 2.15.
Asset class P-value (t-Student) P-value (Gaussian)
Bond 0.5251 0.0086Credit 0.5914 0.0091Equity 0.3627 1.4496e-005Mortgage 0.9622 2.1527e-006
Table 2.1: KS test for the t-Student and Normal fits. P-value is the probability oferror if H0 is rejected. H0 is the null hypothesis that the empirical distribution comesfrom the tested one.
Asset class Mean Standard Deviation Degrees of Freedom
Bond 0.00027 0.00248 9.2Credit 0.00031 0.00260 9.8Equity 0.00031 0.00888 4.9Mortgage 0.00025 0.00136 4.6
Table 2.2: Parameters of the fitting t-Student distributions
26
2.3 Constructing a Joint Distribution of ‘Normal’
Risk Factors
The procedure described in the previous section produces univariate marginal distri-
butions for each risk factor. These marginals display fatter tails than Gaussians, but
nonetheless refer to non-extreme market conditions. The next step is to conjoin them
using a copula. We are going to choose the copula that provides the best fit of the
data12.
In general, if for each risk factor the ’normal’ portion of the data can be satis-
factorily described by a Gaussian distribution and they can be joined by a Gaussian
copula, we shall see that important computational savings can be achieved (because
a simple closed-form expression can be obtained in some cases by linking the distri-
bution of portfolio returns and the weights of the individual sub-portfolios). If this
is not the case, the procedure described in the following can still be applied, but the
final numerical search becomes somewhat more burdensome. Conceptually, however,
nothing changes.
Broadly speaking, there are three methods to estimate the parameters of a copula
from data [8]:
1. Exact Likelihood Method: a one-step method that estimates jointly the pa-
rameters of the marginals and those of the copula. Statistically, it is the most
efficient method, but it is computationally very onerous.
2. Inference from Marginals Method: a two-step procedure whereby the marginals
are fitted first, and the copula then. This is the method that has been used in
this study.
3. Canonical Maximum Likelihood Method: it estimates directly the parameters
of the copula without fitting the marginals, as the empirical marginals are used
instead.
Two types of copulas were fitted: t-Student and Gaussian. A given copula could
show a higher likelihood13 but the likelihood in itself is not a measure of how good a fit
is, nor does it give p-values on the basis of which we could reject a hypothesis. We will
present some shortcut methods that allow a fast computation of the goodness-of-fit.
12The MATLAB function copulafit was used to find the parameters of the copulae. This functionby default uses maximum likelihood estimation.
13Already Mashal et al. [41] showed that the t copula is a superior fit for equity returns on thebasis of a likelihood ratio test.
27
Goodness-of-fit (GOF) measures for copulas are well described in [21]. The com-
putational complexity of these methods is much higher than those used to assess the
GOF of one dimensional distributions. The starting point is the construction of the
empirical copula:
Cn(u) =1
n
n∑i=1
1(Ui1 ≤ u1, ..., Uid ≤ ud) (2.2)
u = (u1, u2, ..., ud) ∈ [0, 1]d
The second step is to use a kind of Kolmogorov-Smirnov or Anderson-Darling test for
the distance between this copula and a hypothesized copula. However, the empirical
copula may become very hard to estimate especially in the case studied in this thesis
consisting of 4 assets, 320014 data points each. Apart from computational demands,
the storage of information is another issue to solve15.
Malevergne et al. [37] propose another method, much faster, to tackle this problem
for a Gaussian copula. They show that under the hypothesis H0 that the dependence
between random variables can be described by a Gaussian copula, the variable:
z2 =
p∑j,i=1
Φ−1(Fi(xi))(ρ−1)ijΦ
−1(Fj(xj)) (2.3)
follows a χ2 distribution with p degrees of freedom. Φ−1(.) is the inverse cumulative
standardized normal distribution, the Fi are the cumulative marginal distribution
functions and p is the number of assets. The matrix ρ is defined as:
ρij = Cov[Φ−1(Fi(xi)),Φ−1(Fj(xj))]
They propose the following four distance measures:
KS : d1 = maxz|Fz2(z2)− Fχ2(z2)|
Average KS : d2 =
∫|Fz2(z2)− Fχ2(z2)|dFχ2(z2) (2.4)
Anderson−Darling : d3 = maxz|Fz2(z2)− Fχ2(z2)|√Fχ2(z2)[1− Fχ2(z2)]
Average Anderson−Darling : d4 =
∫ |Fz2(z2)− Fχ2(z2)|√Fχ2(z2)[1− Fχ2(z2)]
dFχ2(z2)
14This is the number of data points left after truncation (removal of 160 observations).15Some methods exist regarding how to optimize the calculation time and how to store the in-
formation more efficiently by using some properties of the empirical copula. A solution is proposedin [74] which comes with an accompanying MATLAB program pwlCopula freely available on theInternet.
28
The KS distances are more sensitive to deviations in the bulk of the distributions.
On the contrary, the AD statistics are more sensitive to the tails. A deviation from
statistics of common use is the presence of moduli in the numerator instead of squares.
The advantage is that such distances are less sensitive to outliers. The disadvantage
is that standard statistical tests (ω-test and Ω-test) cannot be used. However this
disadvantage disappears, since the covariance matrix is not known but is estimated
from data, and in any case the exact parameters needed in the derivation of such
statistics are not known. A bootstrap method can be used [15]. The following recipe
taken from [37] is followed also in this thesis:
1.) Given the original time series generate the Gaussian variables y = Φ−1(Fi(xi)).
2.) Estimate the covariance matrix ρ of the Gaussian variables y, which allows to
compute the variable z2 in Eq.2.3 and then measure the distance (according to one
or some of the proposed distance measures above) of its estimated distribution to the
χ2-distribution.
3.) Given this covariance matrix ρ, generate numerically a time series of N (3200
in this case) Gaussian random vectors with the same covariance matrix ρ.
4.) For the time series of Gaussian vectors synthetically generated with covariance
matrix ρ estimate its sample covariance matrix ρ
5.) To each of the N vectors of the synthetic Gaussian time series, associate the
corresponding realization of the random variable z2, called z2.
6.) Construct the empirical distribution for the variable z2 and measure the
distance between this distribution and the χ2-distribution.
7.) Repeating 10,000 times the steps 3 to 6, obtain an accurate estimate of
the cumulative distribution of distances between the distribution of the synthetic
Gaussian variables and the theoretical χ2-distribution.
8.) Then, the distance obtained at step 2 for the true variables can be transformed
into a significance level by using the synthetically determined distribution of distances
between the distribution of the synthetic Gaussian variables and the theoretical χ2-
distribution.
The procedure can be extended also to a t-Student copula as done in [31]. Kole
et al. show that if H0 is the hypothesis that the dependence structure comes from a
t-Student copula with correlation matrix ρ and ν degrees of freedom then the variable:
z2 =
p∑j,i=1
Ψ−1(Fi(xi), ν)(ρ−1)ijΨ−1(Fj(xj), ν)/p
29
where Ψ−1(Fi(xi), ν) is the inverse of the cumulative standard Student’s t distribution
with degrees of freedom parameter ν, is distributed according to an F-distribution
with degrees of freedom p and ν. The same steps as in the case of Gaussian copula
with obvious modifications can be applied to this variable and a distribution of the
four GOF distances estimated.
We show the results in Table 2.3, section of the table ”Entire dataset”. We see
that, although the t-Student copula shows a better fit witnessed by the higher p-
values16, neither of the copulae is a good absolute fit, as clear from the 3 out of 4
p-values, at least if we choose a threshold value of 1% or higher. To understand what
”went wrong”, let’s split the last 10 years of the dataset in 5 equal parts and run the
fit (univariate and copula) separately on each subset, after applying the truncation
methods of Section 2.117. The results of the fit are shown in the five upper sections of
Table 2.318. The t-copula now shows also a much better absolute fit for each subset.
The cumulative distributions of the 4 distances for the two copulas are in Fig. 2.20 -
2.21 for the last subset and p-values and distances in Table 2.4.
We present in Fig. 2.17 and 2.18 scatterplots between pair of assets for the five
subsets. Apart from being visually different, they show also different correlations in
time, due to the changing underlying structure. To confirm additionally this fact, we
present in Fig. 2.19 a sliding window of 250 datapoints of the correlation coefficient
between Bond and Equity. This instability leads the GOF tests to reject both the
elliptical copulae even with 1% threshold19. It would be natural at this point to choose
the last 2 years of data for the next steps, since the copula description provides better
results in terms of fit and the correlation structure is more stable. Nevertheless, in
16This is also confirmed by log-likelihood of the t-copula which in this case is 6990 while that ofthe Gaussian copula is 6620.
17We tested and ascertained that the truncation procedure of Section 2.1 has the same qual-itative implications for each subset. In particular, the same behaviour is observed for theMVE/MCD/Correlation Matrix/Correlations/Eigenvalues and their variations. After a series ofturbulent changes, a stabilization is observed, taken as a sign of the separation body-outliers. Onaverage 30 outliers were removed from each subset after application of the procedure. See AppendixG for figures related to last subset.
18We report the results for the copula fit only. Nevertheless, we tested also that the univariatet-Student distribution provides consistently a better fit for the asset classes marginals.
19The effect on copulae of changing correlations is examined in [59]. By introducing the conceptof dynamic copula Remillard et al. show that for the correlation between Can/US exchange rateand oil prices (NYMEX oil futures) the Gaussian copula is a good fit (high p-value) if the sample issplit in subsamples. This is not the case if the entire sample is taken into account. They attributethis to the change of dependence in time. Another line of research is that of Embrechts et al. [13]who try to model the change of dependence structure by time varying the parameters of the chosencopula. We will not pursue such route here. Finally, in [4], a test is offered to check against thehypothesis that the copula is changing over time.
30
Subset Gaussian copula P-value t copula P-value
1st subset 0.018 0.1040.012 0.2440.032 0.0870.002 0.392
2nd subset 0.046 0.3440.053 0.6420.044 0.0820.006 0.708
3rd subset < 10−3 0.098< 10−3 0.456< 10−3 0.156< 10−3 0.462
4th subset < 10−3 0.042< 10−3 0.514< 10−3 0.032< 10−3 0.692
5th subset 0.084 0.2520.138 0.2730.052 0.0620.128 0.328
Entire dataset < 10−3 < 10−3
< 10−3 0.002< 10−3 0.007< 10−3 0.197
Table 2.3: P-values representing the probability of error if H0 is rejected for the Entiredataset and the five subsets described in the text. H0 is the null hypothesis that theempirical multivariate distribution comes from the tested copula.
what follows, we will use the entire dataset because of the richness of interesting effects
it will exhibit later and that we would like to show. This choice will not compromise
at all the philosophy and the mathematics we will examine. However, we should bear
in mind for practical purposes the limits we showed in this Section20.
From now onwards we will implicitly assume that the t-copula is used to model
the ’normal’ part21. Once the marginals have been conjoined using the copula of
choice, the modelling of the ‘normal’ portion of the data is complete. In Fig. 2.22
the scatterplot of data points generated through the fitted t copula are shown.
20It is worth stressing again that the t copula provides consistently better fit than the Gaussiancopula over the entire dataset as well as in each subset.
21The degrees of freedom coming from the fit are ν = 5.72.
31
Distance Gaussian copula P-value t copula P-value
d1 0.067 0.084 0.049 0.252d2 0.305 0.138 0.081 0.273d3 0.247 0.052 0.205 0.062d4 1.166 0.128 0.311 0.328
Table 2.4: Distances and p-values representing the probability of error if H0 is re-jected for the last subset. H0 is the null hypothesis that the empirical multivariatedistribution comes from the tested copula.
Figure 2.17: Scatterplot between Bond-Mortgage for the five different subsets. In thelow-right corner the scatterplot on the entire dataset is shown. The correlations arerespectively 88.1%, 87.1%, 93.1%, 91.3%, 81.6% and 86.5%.
32
Figure 2.18: Scatterplot between Bond-Equity for the five different subsets. In thelow-right corner the scatterplot on the entire dataset is shown. The correlations arerespectively −13.1%, −43.2%, −2.9%, −29.6%, −43.5% and −22.2%.
33
Figure 2.19: Correlation between Bond and Equity for the entire dataset. The corre-lation is calculated within a sliding window of 250 datapoints.
Figure 2.20: Gaussian copula. Cumulative distributions of the four distances used toassess the GOF of the copula for the last subset.
34
Figure 2.21: t copula. Cumulative distributions of the four distances used to assessthe GOF of the copula for the last subset.
Figure 2.22: Scatterplot of the generated random numbers with the fitted copula (andmapped inversely from [0, 1] to the real axis with the help of the fitted marginals) forBond and Mortgage.
35
Chapter 3
Causality
The previous Chapter gave an original combination of methods and ideas to deal
with the isolation and modelling of the ’normal’ times returns. However, the truly
innovative idea of this thesis will be exposed in this and subsequent Chapters.
3.1 Treatment of Exceptional Events
The next step of our approach is the specification of those tail events that, given the
particular macroeconomic and financial conditions of the moment, may be expected
to have a large effect on a given portfolio. These significant events could be identified
on the basis of macro analysis, or starting from the vulnerabilities of a portfolio,
as suggested in [56]1. The analysis should be limited to a handful of ‘extreme but
plausible’2 occurrences that have the potential to inflict very serious damage (or,
indeed, to generate exceptional gains) in a given portfolio.
In the approach we propose these extreme events are modelled as Boolean vari-
ables, that is, they can assume ‘true’ or ‘false’ logical values. For instance, at the
time of this writing, the possibility of the break-up of the European Monetary Union
and default of some European countries could be such events. These relationships
between the variables will be modelled through Bayesian Nets (BN). The next sec-
tion introduces such technology on a general level3. while a concrete example will be
examined in Chapter 5.
1See the discussion in Chapter 13 in particular.2This is the expression used by Bank of International Settlements [47] in a stress-testing context.3A list of all the properties and supporting theorems of BNs will not be given in this thesis. A
deeper and more exhaustive introduction can be found in [78] and [52]. The next section will bebased on these two sources.
36
3.2 Bayesian nets
The task of determining a joint probability distribution P (E1, E2....En) of n random
variables is a complex one, even if there is the assumption that the random variables
exist in a dichotomous discrete state space. A table with 2n rows could be used to
specify the different combinations of the values of the Es and the associated joint
probabilities. The number of equations needed to fully determine this table is 2n − 1
considering the fundamental property that all the probabilities must sum up to 1.
The problem becomes quite complex both from computational and conceptual point
of view as the number of dimensions grow and with it the requirement of giving
growing number of relationships between the variables.
Fortunately, sometimes there are links which greatly simplify the problem and
this happens when one variable depends on a small set of other variables. Such
information allows the decomposition of such a large problem into little pieces each
involving only a subset of variables. In this context, graphical representation helps
particularly well as intuition fades when the number of dimensions grows. A natural
graphical representation is given by the Directed Acyclical Graphs (DAG)4. DAGs
are constituted from a collection of vertices and directed edges, each edge connecting
one vertex to another, such that there is no way to start from a vertex Ei and follow
a sequence of edges that leads back to Ei. In the context of probability, they became
known as Bayesian networks mainly because of (see [52]):
(1) the subjective nature of input information
(2) the reliance on Bayes conditioning to update information
(3) the distinction between causal and evidential way of reasoning.
An example of a Bayesian network is presented in Fig. 3.1. A formal definition
follows.
A Bayesian network B on a set of variables E consists of two components (see
[78]):
1.) A DAG G = (E,A) where E and A are respectively the sets of vertices and
directed edges in the graph. A family terminology is used when describing a net. For
instance, in Figure 3.1 E3 has two parents (ParE3): E1 and E2 i.e. a set of nodes
from which there is an arrow pointing to E3. E1 on the other hand has no parents
4Undirected graphs are used, usually to represent symmetrical spatial relationships. Directedgraphs are convenient for causal and temporal relationships.
37
and two children: E4 ∈ (ChiE1) and E3 ∈ (ChiE1) i.e. a set of nodes reached by an
arrow from E1.
2.) A probability specification S. For every variable Ei ∈ E a probability ta-
ble is specified containing the probability of that variable conditional on its parents
p(Ei|ParEi) (only marginal probability for nodes without parents).
A BN graph and its probability specification are linked by an assumption called
Markov Condition. According to this assumption, each variable, given the knowledge
of its parents, is probabilistically independent of all the variables which are not its
own descendants.
We will describe a graphical criterion that allows to read the conditional indepen-
dencies directly from a DAG5 (see [52]):
Definition 1: A path p is said to be d-separated (or blocked) by a set of nodes Z
if and only if
1.) p contains a chain i→ m→ j or a fork i← m→ j such that the middle node
m is in Z, or
2.) p contains an inverted fork (or collider) i→ m← j such that the middle node
is not in Z and such that no descendant of m is in Z
A set Z is said to d-separate the set of nodes X from the set of nodes Y if and
only if Z blocks every path from a node in X to a node in Y .
The following theorem combines d-separation and conditional independence (see
[52]):
Theorem 1: If sets X and Y are d-separated by Z in a DAG G, then X is inde-
pendent of Y conditional on Z in every distribution compatible with G. Conversely,
if X and Y are not d-separated in a DAG G by Z, then X and Y are dependent on
Z in at least one distribution compatible with G.
D-separation is important because it determines all and the only independencies
implied by the BN. For example, in Fig. 3.1, E1 d-separates E4 from E3 because:
1.) The path E4 → E1 → E3 contains a fork with middle node E1 which is in Z6
2.) It is the only path from E4 to E3
The described properties of BNs allow to reduce the number of probabilities to be
supplied to obtain the joint probability P (for example, we see that, thanks to the
d-separation, P (E4|E1, E2) = P (E4|E1)).
5This criterion becomes particularly useful in complex and large graphs.6It is actually the only node in Z.
38
By exploiting the chain rule of probability calculus:
P (E1, E2, ...., En) =n∏i=1
P (Ei|Ei−1, Ei−2, ..., E1)
we can deduce a master formula for the joint probability decomposition of a Bayesian
net by combining it with the Markov condition:
P (E1, E2, ...., En) =n∏i=1
P (Ei|ParEi) = (3.1)
=n∏i=1
P (Ei|Y i1 , Y
i2 ....)
The Y ij is the set of parents of the node i.
From now on we will adopt the following notation for joint probability tables:
E1 E2 E3 E4 Joint Probability0 0 0 0 p0
0 0 0 1 p1
0 0 1 0 p2
0 1 0 0 p3
1 0 0 0 p4
0 0 1 1 p5
0 1 0 1 p6
0 1 1 0 p7
1 0 0 1 p8
1 0 1 0 p9
1 1 0 0 p10
0 1 1 1 p11
1 0 1 1 p12
1 1 0 1 p13
1 1 1 0 p14
1 1 1 1 p15
(3.2)
We will assume that the i-th column is associated with the event Ei. We will
denote with Ei = 1 the i-th event happening, Ei = 0 otherwise. In the last column
the joint probability of the particular combination of events to the left occurring or
not is shown. p0 will be always linked to none of the events occurring.
3.3 Constructing a Bayesian Net
In what follows our attention will be restricted to S2 networks i.e. networks with
at most two parents. This may seem reductive but nevertheless introduces a lot of
39
Figure 3.1: Example of a Bayesian net
computational comforts and at the same time is easier to be managed conceptually
and to be explained to the upper management layers.
The simple example of Bayesian net in Fig. 3.1 shows a situation where variable E1
exerts a causal influence on variables E4 and E3, variable E2 affects variable E3, and
variables E1 and E2 do not directly influence each other. Each node will be associated
to an event that could hit the asset classes in the portfolio under consideration.
The specification of how the stress events interact i.e. the causal structure (the
topology) of a Bayesian net is one of the most important (and most delicate) steps in
the procedure proposed here. This is where the asset manager makes use of her under-
standing of the world today. This information, imperfect as it may be, is invaluable:
we know, say, that a fall in equities will cause equity implied volatilities to rise, and
not the other way around. Yet, when we use associative measure (such as correlations
and copulae) we discard this information. We do so at a great informational loss.
Since, as discussed below, the inputs to the probability tables at each node are
arrived at using a mixture of frequentist, implied and subjective probabilities, this
imposes a ‘cognitive’ limit on the complexity of the underlying net7. This is, admit-
tedly, a limitation of the approach. However, assigning a simplified picture of ‘how
the world works’ is certainly better than providing no such information at all – at
least as long as the inputs are transparent and auditable and sensitivity analysis is
easy to carry out.
The next section shows how the marginal probabilities and conditional probability
tables can be filled in. It must be stressed at this stage, however, that assigning con-
ditional probabilities can often be simpler than specifying the marginal probabilities
for the same events: it may be very difficult to assign a probability for, say, a 1987-like
market crash or for an overnight doubling of equity implied volatilities. However, one
can safely venture that the conditional probability of the increase in equity implied
7It must be stressed that, for reasonable application, the limit of the proposed approach is notat all computational, as the whole procedure is not CPU-time-intensive.
40
volatility given the equity market crash should be well above 50%. For the purposes
of our approach, this degree of precision is more than adequate.
3.3.1 Building the Marginal and Conditional Probability Ta-bles
The first step is to assign marginal probabilities to the nodes i.e. to the events that
could affect a portfolio. To carry out this task, market-implied information (such as
spreads of credit default swaps) can be profitably used. When this information is
not available (or deemed not reliable8) the subjective input of the asset manager is
in general required. Frequentist information is, of course, always available as a fall-
back, complement or ‘sanity check’. As far as the actual assignments are concerned,
spurious precision should be avoided. For the marginal probabilities only an order-of-
magnitude assessments is required (probably nothing more precise than once-a-year,
once-every-few-years, once-a-business-cycle, once-a-century statements can be made).
For conditional probabilities rarely do market-implied quantities provide useful in-
formation, and greater reliance must therefore be placed on subjective input. So, once
the marginal probabilities have been (approximately) elicited, the singly-conditioned
probabilities can be assigned to ‘buckets’ determined by asking the asset manager
whether the occurrence of event E1 decreases (increases) the occurrence of event E2
by a little, a lot, makes no difference or makes the occurrence of E3 almost certain.
It is well known that elicitation of conditional probabilities is a delicate task, and
that many cognitive biases make it harder than it seems. Rebonato [56] discusses
at length elicitation techniques that can make the task easier and less error-prone.
Given these cognitive difficulties in assigning conditional probabilities, it is not un-
common that the subjectively-assigned conditional probabilities may end up being
incoherent. This means that there exists no set of (non-negative, smaller-than-one)
joint probabilities from which the conditional probabilities could have been obtained.
A list of sanity checks derived from probability theory can be applied on them in
order to ensure their consistency with the probability axioms. A list of conditions is
the following [45]:
P (Ei|Ej) ≤ 1 (3.3)
P (Ei|Ej) ≤ P (Ei)/P (Ej) (3.4)
8For instance, because the CDS spreads might be impacted by a systemic distress that pushesaway market prices from fundamental values. Another reason might be the conflict of interestbetween those who provide the data on the one hand and the final users of it (see [60]).
41
P (Ei|Ej) ≥P (Ei) + P (Ej)− 1
P (Ej)(3.5)
P (Ei|Ej) ≥ 0 (3.6)
and the triplet conditions:
1 ≥ P (Ej|Ei)P (Ei|Ek)P (Ek|Ej)P (Ek|Ei)P (Ej|Ek)
≥ 0 for i 6= j 6= k (3.7)
Provided that the probabilities in the BN have been chosen in coherent way these
relationship will be satisfied. Otherwise, the portfolio manager must revise and up-
date them.
The performed sanity checks are not always sufficient to clean all the embedded
logical inconsistencies in a conditional probability matrix. Other techniques such as
Linear programming exist to cleanse the incoherent conditional probability matrix,
and return the closest (in some sense) coherent conditional probability table. See,
e.g., [33], [45] and [22].
Even if the asset manager is unable to provide values to the probabilities at all
nodes, still some useful bounds for them (as well as for the joint probability) can be
provided given the constraints of the assigned probabilities to the other nodes in the
BN. This again can be done with the help Linear Programming. A brief summary of
such technique and two useful original extensions introduced in this thesis to [56] are
illustrated in the next subsection.
3.3.2 Linear Programming
Linear Programming is a technique that maximizes a given linear function f(p0, p1, ...p2n−1)
(objective function) subject to linear constraints. The constraints that we encounter
in the construction of the Bayesian net can be expressed in the form:
~Ei.~p = p(i) i = 1...n
( ~Ei ∗ ~Ej).~p = p(i, j) i = 1...n− 1, j > i
...................
( ~E1 ∗ ~E2.... ~En).~p = p(1, 2...n) (3.8)
2n−1∑i=0
pi = 1
p0, p1, p2, .....p2n−1 > 0
42
where the pi, components of the vector ~p denote the different realizations of the joint
probability i.e. the last column of (3.2). The symbol * stands for multiplication
component by component of the event vectors ~Ei corresponding to the columns in
(3.2). The symbol . stands for the classical scalar product between vectors. p(i) is
the probability of the i-th event happening, p(i, j) 9 of the i-th and the j-th happening
together and so on. Of course, only the constraints for which the r.h.s. of the above
equations is known will be used.
Using a software package such as MATLAB lower/upper bounds for the joint
probability can be found by minimizing/maximizing the objective function equal to
one component of the vector ~p at a time, given the constraints. One can start only
with constraints on the marginal probabilities i.e. the first line of (3.8). If single
conditioned probabilities are added as constraints, the resulting bounds are tighter.
In fact, by introducing more constraints the allowed region of variability is further
restricted by additional hyperplanes thus decreasing the range of possible values for
the objective function.
There are additional constraints that could be implemented which will restrict
further the ranges for the joint probability. In this light, in what follows we propose
2 extensions to [54].
By definition of a Bayesian net there are relations of the type P (E4|E1, E3) =
P (E4|E1) which put additional restrictions. These relations state the independence of
a node from other nodes (which are not descendants) given the knowledge of the par-
ent(s). Lets call the relations of the type P (E4|E1, E3) = P (E4|E1), P (E3|E1, E4) =
P (E3|E1) etc. 1-level independence, relations like P (E4|E1, E2, E3) = P (E4|E1) 2-
level independence and so on for bigger nets. In general, n-level independence con-
straint means that a node is independent of n other nodes (non-descendants) given
the knowledge of the parent. This further restrains the boundaries for the joint prob-
ability. It is easy to introduce such constraints because they are linear. For example:
P (E4|E1, E3) =p12 + p15
p9 + p12 + p14 + p15
= P (E4|E1)
from which the linearity of the constraint is obvious10.
Let’s return one step back and ask the question: how can bounds be obtained
not only for joint probabilities but also for some other quantities like conditional
probabilities? This could be often the case when the asset manager feels bold enough
to guess some of the numbers in the matrix with single conditioned probabilities and
9p(i, j) can expressed as p(i, j) = p(i|j) ∗ p(j) = p(j|i) ∗ p(i).10The quantity P (E4|E1) is assumed known in order to use the constraint.
43
wants to have a guidance on how to choose the remaining ones. Let’s suppose he does
not feel confident about the value of P (E3|E1) and wants to have some idea about
its possible range. The objective function to minimize/maximize is given by:
P (E3|E1) =( ~E1 ∗ ~E3).~p
~E1.~p=
=p9 + p12 + p14 + p15
p4 + p8 + p9 + p10 + p12 + p13 + p14 + p15
which is visibly not linear. A trick to linearize such expression can be found in
[7]. A new scalar variable s is introduced :
s =1
~E1.~p(3.9)
which is the inverse of the denominator of the expression for P (E3|E1). By defining
the new variables:
~x = s ∗ ~p (3.10)
the objective function becomes:
P (E3|E1) = ( ~E1 ∗ ~E3).~x (3.11)
which is now linear in the variables. The constraints become:
~Ei.~x = s ∗ p(i) i = 1...n
( ~Ei ∗ ~Ej).~x = s ∗ p(i, j) i = 1...n− 1, j > i
...................
( ~E1 ∗ ~E2.... ~En).~x = s ∗ p(1, 2...n)
2n−1∑i=0
xi = s
s, x1, x2, .....x2n−1 > 0
which are easily solvable with the Linear Programming techniques. The approach is
directly extendable to doubly conditioned probabilities. The technique introduced in
this Section will be exploited in Chapter 5 in the context of a real world example.
44
3.3.3 From the Conditional Probability Tables to the JointProbabilities of Rare Events
After completing the previous steps the asset manager will have at her disposal a
Bayesian net which describes the assumed causal relationships among the rare events
and the associated conditional probability tables. In building this net, she has only
been asked to provide ‘easy’ (i.e., cognitively resonant and/or market-implied) prob-
abilities.
Once the topology of the net has been chosen, and supposed that all the condi-
tional probability tables have been filled in, the construction of the joint probability
is a mechanical task that can always be accomplished by (3.1).
In summary, the Bayesian-net technology provides the required bridge between the
‘easy’ probabilities the asset manager can feel confident to assign and the ‘difficult’
probabilities that are required. For the example in Fig. 3.1, for instance, only four
marginal, three singly-conditioned and one doubly-conditioned probability have to be
assigned in order to specify fully the 16 joint probabilities.
45
Chapter 4
Fusion
4.1 Mapping from the ‘Normal’ Joint Distribution
to the ‘Normal’ Return Distribution
At this point we have established the joint distribution for the risk factors, both in
the ‘normal’ and in the exceptional regime. For the ’normal’ part we have already
at our disposal a joint distribution f(r1, r2, ...) in the space of returns. As for the
exceptional part, our task will now be to link the derived probabilities to a set of
returns that the extreme events of the BN could cause, if they materialize.
4.2 Mapping from the Exceptional Joint Distribu-
tion to the Exceptional Return Distribution
Various returns distributions can be assumed for the asset classes in case the identified
stress events occur. For the sake of simplicity, we will assume that each joint event
will give rise to a series of extreme returns, to which a multidimensional Dirac-δ dis-
tribution is associated, with ‘area’ equal to its joint probability. Each Delta function
is ’centered’ around the losses (negative returns) incurred by the asset classes given
that the joint event happens. If we have a joint probability table with 16 joint events
(one of which is the event P (E1, E2, ..., En) - nothing happens - to which no Delta is
associated, see next section), we will have also 15 Delta functions1. The mathematical
form of the exceptional joint return distribution can be expressed as follows:
Fexceptional(r1, r2....rn) =1
1− p0
2n−1∑i=1
piδ(r1 − Li1, r2 − Li2...rn − Lin)
1The simplifying picture of using δ-distributions for the losses can be easily relaxed in favour ofmore complicated loss distributions, such as a Gaussian with a mean µ and uncertainty parameterσ, without changing conceptually the approach.
46
The sum on the right runs through all the exceptional events each weighted by
a joint probability pi. The symbol Lij is the loss associated to the j-th asset class in
case that the exceptional event with joint probability pi happens. There might be
some configurations of events in the joint table for which a stress event related to a
given asset class might be non-happening (Ej = 0). In this case, will assume the loss
associated to it to be simply Lij = 0.
4.3 Splicing the ‘Normal’ and ‘Exceptional’ Por-
tion of the P&L Distribution
At this point we can blend together the ’normal’ and stressed returns distributions
as:
F (r1, r2....rn) = (1− k)f(r1, r2, ...) +k
1− p0
2n−1∑i=1
piδ(r1−Li1, r2−Li2...rn−Lin) (4.1)
where f(r1, r2, ...) is the return distribution coming from the used copula and
marginals and k is a normalization factor. A crucial question is how to choose this
factor.
One of the probabilities of the joint events obtained from the Bayesian net (the
probability P (E1, E2, ..., En) = 1−p0) is the probability that none of the n exceptional
events E1, E2, ..., En that the asset manager has identified will occur. If the asset
manager is sufficiently confident in the Bayesian net part of the exercise, the joint
probability k = 1 − p0 directly provides the required ‘normalization’ of the ‘normal’
(continuous) and exceptional (Dirac-δ) contributions. Then the area of the discrete
portion of the distribution (that describes the exceptional events) is simply equal to
1 − p0. All the combinations of elementary extreme events would then have exactly
the probabilities implied by the Bayesian net.
Alternatively, the asset manager can take a less pure but more pragmatic approach
and treat the quantity k as a free parameter of the procedure. If this pragmatic choice
is made, the normal portion of the distribution is given mass 1−k and the sum of the
masses of Dirac-δ distributions are renormalized (uniformly scaled) so as to have total
mass k. As shown in the following, this allows the determination of the sensitivity of
the outputs to an input that is an important part of the procedure.
As a third alternative, k can be looked at as a distribution which can be determined
via the Maximum Entropy Principle. We will describe how this can be achieved in
Section 4.5.
47
4.4 Maximization of the Utility
By this stage of the procedure we have obtained a composite distribution of returns
which contains information both about the ‘normal’ market relationships and about
the specific links among the asset classes that we have posited to prevail if the par-
ticular market dislocations we have identified were to materialize. This information,
of course, is not present in any data base – yet any attempt at diversification that
did not take into account, albeit in an approximate manner, this possibility would be
both misguided and dangerous.
The next step of the procedure is the choice of a utility function. The pros and
cons of various choices have been discussed in literally thousands of papers. For the
application at hand, it is useful to choose a utility function that allows a simple and
continuous ‘tuning’ of the degree of risk aversion.
In the application described below, we will make use of a power utility function2,
parametrized by the coefficient β:
Upower(c) =1
1− β(c1−β − 1
)(4.2)
so as to give degrees of risk aversion greater and smaller than the logarithmic function
(that is obtained in the limit as β =⇒ 1):
Ulog(c) = limβ→1
1
1− β(c1−β − 1
)= ln (c)
The elements of the vector of weights, w, are varied under the budget constraint
wT1n = 1
until the expected utility of the objective function (the end-of-period final wealth3) is
maximized i.e. the final aim is to maximize over the weights the following expression
2Concretely, we try to offer a solution the problem of static, one-period portfolio optimizationunder budget constraints. As it is well known, (see, e.g., [64]), if the returns are independent andidentically distributed (iid) through time and the investor has a constant relative risk aversion, thenthe one-period (myopic) solution iteratively solves the dynamic problem. Given the emphasis givenin the present approach to the state-specific nature of the exceptional returns, the iid hypothesis isclearly not applicable. However, Merton [42] showed that, if the logarithmic utility function is used,then the iid assumption can be relaxed, and therefore the myopic solution also offers the optimalsolution. In our study we shall use both the logarithmic and the power utility function. When theformer (logarithmic) choice is applicable, we shall obtain a solution that is also valid in the dynamicsetting. For the power-law utility function we shall simply deal with a static, one-period problem.
3We adopted in this thesis the absolute wealth as objective. Other options exist such as therelative wealth to a reference portfolio or changes in wealth as in Prospect Theory.
48
at the investment horizon:
E[U ] = (1−k)
∫U(1+wr)f(r)dr+k
∑i>0
∫δ(r1−Li1, r2−Li2...rn−Lin)
pi1− p0
U(1+wr)dr
(4.3)
w∗ = argmaxw∈ΘE[U ]
where Θ are the set of constraints, such as the budget constraint, short selling, max-
imum allocations to each asset class etc. The symbol r is the vector of returns and
wr is a shorthand notation for the scalar product between the weight vector and the
return vector i.e.∑
iwiri.
We note in passing that the integration of (4.3) can be done easily in the case of
Gaussian distribution for f and an exponential utility function:
U(1 + wr) = −exp[−1 + wr
ξ]
In this case the integral is nothing but the characteristic function of the k-dimensional
normal distribution. In fact:∫U(1 + wr)f(r)dr =
1
(2π)k/2|Σ|1/2
∫−e−
1+wrξ exp
(−1
2(r − µ)′Σ−1(r − µ)
)dr =
= −exp[−1− wµξ
+1
2
w′Σw
ξ2] (4.4)
where µ is the vector of expected values of r. The integral involving the δ-functions
is straightforward and involves a sum of exponentials:
−∑i>0
pi1− p0
e−1+wLi
ξ
where again wLi is a shorthand notation for∑n
j=1wjLij. Although not analytically
solvable, the optimization can be easily carried out numerically with a standard soft-
ware package4 and it can be very fast. In this thesis, however, we will not use
exponential utility functions for the reasons explained above.
When the marginal distributions of sub-portfolio returns cannot be satisfactorily
modelled by a Gaussian distribution or when the portfolio contains strongly non-
linear products, then the optimization must be carried out by Monte Carlo simulation.
This is, of course, more computationally demanding, but, given the relatively small
number of sub-portfolios, does not pose a prohibitive burden on the whole procedure.
Moreover, resampling techniques can be used, and they are known to give rise to
4Again the MATLAB function fmincon is quite suitable for this problem.
49
reduced sensitivity to the estimated parameters. This is a well-known problem with
the Markowitz optimization technique, as the results can be highly variable even for
small changes in input parameters5. This was one of the drivers that motivated Black
et al. [1] in their approach. Unlike the Black-Litterman model, where the sensitivity
is diminished by smoothing the inputs, in the resampling procedure the same result
is reached by averaging the outputs of several scenarios6.
The weights w∗ provide a coherent allocation to the various sub-portfolios, where
the adjective ‘coherent’ stresses that the allocation has been arrived at by taking into
account in a consistent manner the investor’s preferences over the outcomes associated
both with normal and exceptional market conditions. ’Protection trades’ are not
attached as an incoherent afterthought to an optimization carried out assuming a
stable investing universe.
The sensitivity of the outputs to the subjective inputs can be readily explored, as
is shown in the worked-out example in the following Chapter.
4.4.1 Numerical optimization
This subsection will give an overview of the numerical routine that will be followed
to find the optimal allocation. At this point we have found the univariate marginals
and the best fit copula. They were fitted on daily returns data but the investment
horizon in the future might be different from one day. We thus need to simulate the
returns over a longer period of time7. We will apply the following steps:
Step 1 : Simulate N random numbers with the MATLAB function copularnd with
the fitted copula parameters.
Step 2 : Map them through the inverse cumulative univariate marginals to returns
for each asset class.
Step 3 : Go to Step 1 for each day until the end of the investment horizon T .
Step 4 : By multiplying the returns ri = (1 + ri1)(1 + ri2)...(1 + riT )− 1, where rij is
the return of the asset class i for the j-th day, at the end of the process we will have
N points of the joint return distribution at the investment horizon.
5See, eg, [5]. As for the relative importance of the parameters, Ziemba et. al [9] show thatestimation errors in the expected returns are, as a rule of thumb, about 10 times more importantthen errors in variances and covariances.
6The approach of this thesis will be linked to the BL model in Appendix B.7If the investment horizon is sufficiently long e.g. 1 year, we could exploit the aggregational
Gaussianity of asset returns (See Introduction, footnote 1).
50
Step 5 : Feed into the optimization function fmincon8 a function, described in the
next 2 substeps, to search for the maximum value of the utility in the space of the
weights w, given the constraints on them (∑p
i=1 wi = 1 and wi > 0).
Step 5a: Multiply each of the simulated p-dimensional N points by the weights
w of the asset classes9. Feed into the utility function U(1 +∑p
i=1 wiri) where 1 +∑pi=1wiri is a N x 1 dimensional vector. Take the mean value to obtain the expected
value of the utility function coming from the normal part of the problem.
Step 5b: Splice the mean value by giving it weight 1− k with k times the utility
coming from the stress distribution. Since the stress returns are fixed, there is no
need for a Monte Carlo simulation for this part.
For the practical example in the next Chapter we will use N = 5, 000, 000.
4.5 Maximum Entropy for the choice of k
As stated in Section 4.3, the parameter k can be treated as a free parameter and
be given an arbitrary value by the asset manager according to his beliefs (or given
the value 1 − p0). Such precision might appear as spurious. It is appealing to ’blur’
its value by introducing the concept of degree of confidence that the investor has in
his inference through the Bayesian net (which could include also estimation errors in
the marginal/conditional probabilities). As an analogy, in the Black-Litterman (BL)
model, the views that the investor expresses are not precise but they are distributions
with variance Ω2 proportional to the degree of confidence in the expressed opinions.
If she is very confident in her forecast the variance Ω2 → 0 and the views are delta
functions centered around the forecasted value. Viceversa, if the confidence is low the
distributions are much more dispersed.
What we will do in this section is to represent k as a distribution g(k) and not as
a fixed parameter. Such distribution should be the least prejudicial (least committal)
distribution compatible with the maximum prior knowledge we have about k and can
be chosen via the Principle of Maximum Entropy (PME)10. Our aim is to propose the
maximum unpresumptive distribution, the one that does not incorporate through its
moments and domain more information than we have, given a set of constraints. The
8The interior-point algorithm was used. The termination tolerance on the function value was leftto its default setting: 10−6.
9An initial set of guess weights must be given.10For an introduction see [27].
51
objective is to maximize the information:
H(k) = −∫g(k)log(g(k))dk
over its domain by assuming the following constraints for g(k):
1.) It is defined in the interval [0, 1]
2.) Its mode is at the point k = 1− p0
3.) Our degree of confidence is given by a variance Ω2
The second point reflects our wish of having a peaked distribution around the
event ’nothing happens’ coming from the Bayesian net i.e. we want it to be the most
probable value.
This problem is already solved in [14]. Dowson et al. show that in the closed
interval [0, 1] the most unpresumptive distribution, if we know the mean and the
variance, is:
1.) The truncated exponential if m2(E[k]) = m2te(E[k])
2.) The truncated Gaussian if m2(E[k]) < m2te(E[k])
3.) The U-shape distribution if m2(E[k]) > m2te(E[k])
where m2te(E[k]) is the second moment of the truncated exponential distribution
as a function of its first moment (the mean E[k])11. For example, if we assume certain
values for the mean and the variance, we can calculate the second moment and, if
this is smaller than the one coming from the truncated exponential with the same
mean, the maximum entropy distribution is the truncated Gaussian distribution.
In order to satisfy all the assumptions, the most natural solution for the problem
at hand is the second option i.e. the truncated Gaussian distribution12.
Let’s see how the optimal weights can be calculated in this case. In the case of
fixed k, we remind that the optimal values of w are found as follows:
E[U ] = (1−k)
∫U(1+wr)f(r)dr+k
∑i>0
∫δ(r1−Li1, r2−Li2...rn−Lin)
pi1− p0
U(1+wr)dr
(4.5)
w∗ = argmaxw∈ΘE[U ]11For practical purposes the small values of the mode such as those in a stress testing context,
however, we show that the three conditions can be approximated as 1.) E[k] = Ω 2.) Ω < E[k] 3.)Ω > E[k]. See Appendix E for the proof and a general introduction to the truncated exponentialand Gaussian distributions.
12Because of the constraint on the mode we cannot choose the truncated exponential or the U-shape distributions.
52
Suppose that k is not fixed but follows a distribution g(k)13. In this case, the
optimization is modified as follows:
Ef [U ] =
∫g(k)E[U ]dk = (4.6)
=
∫(1− k)g(k)
∫U(1 + wr)f(r)drdk + ....
...+
∫kg(k)
∑i>0
∫δ(r1 − Li1, r2 − Li2...rn − Lin)
pi1− p0
U(1 + wr)drdk
w∗f = argmaxw∈ΘEf [U ]
To carry out the integration in (4.6) with respect to k is pretty straightforward:
Ef [U ] = (1−µ)
∫U(1+wr)f(r)dr+µ
∑i>0
∫δ(r1−Li1, r2−Li2...rn−Lin)
pi1− p0
U(1+wr)dr
(4.7)
where µ is the mean value of g(k):
µ =
∫kg(k)dk
Now equation (4.7) resembles (4.5) with k replaced by µ. The conclusion is that
the optimal values for (4.6) are the ones of (4.5) by making the replacement k = µ
i.e. the solution of (4.6) is given by the optimal allocation given by (4.5) at the one
particular point: the mean point. However, the mean point and the mode do not
always coincide for the truncated Gaussian, as we will explain below. The particular
simple solution found here only comes about because of the very special dependence
(linear) of the distribution (4.3) on the parameter k. If the uncertainty had been in a
parameter that affects non-linearly the distribution (say, in the mean or the variance),
this particular simple result would not apply.
The choice of a truncated Gaussian may be good if we consider a small uncertainty
around the mean. For a bigger uncertainty the distribution acquires features which
might not seem very desirable. First of all, the mean starts to be displaced from the
mode, a phenomenon negligible for small values of Ω when the distribution can be
considered almost symmetric. Moreover, for example, if mode = 0.161914 and Ω = 0.1
the truncated Gaussian has a value greater than 0 in k = 015. Thus the probability
of the system being in a small and close interval around k = 0 may be not negligible.
13The reasoning that follows applies to any kind distribution.14This is the value that we will obtain for our practical example in Chapter 5 but any other value
is just as good for this example.15If Ω = 0.05 the value at k = 0 is very close to 0 so this choice of Ω is a ’good’ choice.
53
Another drawback may be the fact that we can not reach higher values of k (let’s
say k = 0.916) without inflating substantially the variance. However, if we increase
the variance too much, we saw that the truncated Gaussian becomes a U-distribution
which is not a desired property of the problem at hand.
We search for a remedy to this by looking at some other features which go beyond
the constraints on the first two moments and have not been exploited yet. We assume
the following for g(k):
1.) It is defined in the interval [0, b] where b ≤ 1 i.e. we relax the assumption to
have a coverage of the full interval [0, 1].
2.) Its mode is at the point k = 1− p0
3.) Our degree of confidence is given by a variance Ω2
4.) g(0) = g(b) = 0 i.e. we require the distribution function to vanish at the
extrema.
This kind of problem seems rare and disregarded in previous literature and, to
the knowledge of the author, only Goodman [23] has dealt with it in the context
of Geophysics. Goodman proves that the Maximum Entropy distribution in a closed
interval having prescribed the first two moments and which additionally satisfies point
4.) is the modified log-normal distribution (Johnson SB distribution):
g(k) =b√
2πσk(b− k)exp[−1
2(log( k
b−k )− µσ
)2] 0 ≤ k ≤ b (4.8)
For a description of this distribution see Appendix F. The Johnson distribution, as
well as the normal and lognormal distributions, has only two independent moments
and all other moments can be expressed through them17. The shape of the Johnson
distribution can be seen in Fig. F.1. By taking the first derivative of (4.8) and equat-
ing to 0 at the modal value we can obtain an useful relation between the parameters18
(see Appendix F) which for b = 1 reads:
σ2 =−µ+ log( m
1−m)
2m− 1
16The value k = 0.9 is just for this example. If 1− p0 = 0.01, for example, values of k = 0.3 maybe unreachable.
17It is important to underline that both for the truncated Gaussian and for the Johnson SB dis-tributions, given two parameters of the set (mode, mean, variance), the third parameter is uniquelydetermined i.e. by knowing 2 of the 3 parameters is enough to know the third without inputtingextra information, let’s say, on higher moments.
18The parameters µ and σ do not coincide with mean and variance. See [28] for a detailedanalytical derivation of the moments of the Johnson SB distribution which, however, do not have asimple form.
54
where m is the mode fixed by the assumptions of the problem. Having one free
parameter left, we can find it numerically by availing ourselves of the knowledge of
the variance Ω2 which is a given input of the problem together with the mode.
The drawback of this kind of distribution is that it is not so immediate to grasp
as the symmetrical Gaussian distribution. The variance is easily imagined as a width
of a bell curve but it is more difficult in the case of an asymmetrical distribution such
as the Johnson. However, we can always use a reasoning helped by the Chebishev
theorem e.g. if mean = 0.2 and Ω = 0.1 we are sure that the probability of exceeding
0.2 + 3 ∗ 0.1 = 0.5 is at most approximately 10%.
In the next Section we will give a real world example by treating k both as free
parameter and as a distribution and will show the allocations with the two methods.
55
Chapter 5
A Worked-Out Example
5.1 Construction of the Bayesian Net
In order to illustrate the strengths and weaknesses of the procedure, we present in
this section a simple worked-out example in some detail. The stylized problem we
examine in this section is that of the allocation among the asset classes presented in
the Introduction.
The analysis to identify the body of the distribution was carried out as described in
Chapter 2. The cut-off point was determined using a combination of the ellipsoid and
the minimum covariance determinant methods. 160 data points were excluded by the
algorithm, corresponding to approximately 5% of the full data set. The marginals
of the truncated data set were modelled using a t-Student distribution and were
conjoined through a t-Student copula.
We assume an investment horizon of 3 months1 and in the Monte Carlo simulation
we will simulate all the daily returns of the ’normal’ part to the end of this interval.
The traditional inputs to a Markowitz optimization (i.e., the correlation matrix
and the stand-alone expected returns and volatilities) are given below2Correlation Bond Credit Equity MortgageBond 1 0.96 −0.22 0.87Credit 0.96 1 −0.16 0.87Equity −0.22 −0.16 1 −0.11
Mortgage 0.87 0.87 −0.11 1
1For an investment horizon of such length one could resort also to the aggregational Gaussianity
of asset returns (see Introduction, footnote 1) to save computation time. It depends on the level oftrade-off precision/time one is willing to accept.
2In real-life applications, the expected returns are likely to be assigned using a subjective Black-Litterman approach. This extension has not been incorporated in the present analysis for the sakeof brevity, but it is conceptually straightforward. The use of shrinkage estimators such as Stein’sestimator [73] is another alternative.
56
Figure 5.1: The Bayesian net used for the example in this Section
Expected Return
Bond 1.81%Credit 2.07%Equity 2.05%
Mortgage 1.67%
VolatilityBond 2.27%Credit 2.36%Equity 9.20%
Mortgage 1.44%
The Bayesian net in Figure 5.1 (repeated here with new labelling for ease of
reference) was assumed, with the following interpretation for the four events:
• Event Bond: Large sell-off in the Government Bond market due to unexpected
change in monetary stance, with 1994-like increase by 75 bp of the Fed target
rate.
• Event Mortgage: Large widening of Mortgage spreads due to the forced selling
of inventories prompted by new capital rules.
• Event Credit: Large widening of Credit spreads.
• Event Equity: Sell-off in the Equity market.
In this example it is assumed that both the sell-off in Government Bonds and
the widening of mortgage spreads would increase the probability of investment-grade
spreads widening significantly. It is also assumed that the sell-off in Government
Bonds and the widening in mortgage spreads do not directly affect each other (given
the specific reason for the mortgage weakening). Finally the sell-off in Government
Bonds is assumed to have a strong negative effect on the equity market. In con-
structing this example it was assumed that the event were identified on the basis of
the vulnerabilities of each portfolio. This being the case, each stress event is simply
57
associated with a single portfolio. This simplifies the presentation, and the example
could be straightforwardly generalized.
Given the known composition of each subportfolio, x, the stress loss, L(x), that
would occur, if the associated stress event occurred with certainty, can be estimated.
These subportfolio losses are given in the vector below:Stress Losses
L(Bond) −5%L (Credit) −20%L (Equity) −10%
L (Mortgage) −5%
For clarity, each entry gives the loss if the stress event associated with each asset class
materialized, and the whole portfolio had been invested in that asset class.
The marginal probabilities of occurrence, P (x), of the four stress events were
chosen as follows: Marginal probability
P (Bond) 4%P (Credit) 7%P (Equity) 7%
P (Mortgage) 3%
Given the causal structure embedded in the Bayesian net above, only three singly-
conditioned probabilities, P (x|y), are required in order to specify the full joint prob-
ability matrix. For this example these were assumed to be as follows:
Conditional probabilityP (Credit|Bond) 40%
P (Credit|Mortgage) 25%P (Equity|Bond) 50%
The doubly-conditioned probability, P (Credit|Bond,Mortgage), was assumed to be
50%3. The set of probabilities chosen in such way satisfies the sanity checks (3.3)
- (3.7). The conditional probabilities were also ’cleaned’ through the Linear Pro-
gramming techniques in [56]. The resulting joint probabilities are shown in the table
below.
3As the subjective conditional probabilities are obtained by elicitation, sometimes it is too difficultto venture a guess for P (Credit|Bond,Mortgage). The linear programming techniques mentionedin the previous chapter and subsection 3.3.2 were used as a guide to infer a lower and upper boundfor this and the other probabilities.
58
Bond Credit Equity Mortgage Joint Probability0 0 0 0 0.83810 0 0 1 0.02080 0 1 0 0.04600 1 0 0 0.04461 0 0 0 0.01170 0 1 1 0.00110 1 0 1 0.00650 1 1 0 0.00251 0 0 1 0.00031 0 1 0 0.01171 1 0 0 0.00770 1 1 1 0.00041 0 1 1 0.00031 1 0 1 0.00031 1 1 0 0.00771 1 1 1 0.0003
Note that, given the assumed causal dependence among the four events, the most
unlikely event is not the simultaneous occurrence of all the four stress events. For
instance, the joint events:
• only Bond and Mortgage happen,
• the Equity, Mortgage and Bond events happen with no widening of credit
spreads
• Credit, Mortgage and Bond happen without a sell-off in Equity
have the same probability of occurrence as all events happen simultaneously. A
glance at the posited causal structure confirms that this makes intuitive sense.
The asset manager can either believe that the probability of no stress event pro-
vided by the Bayesian net procedure is fundamentally correct (in which case the mass
of the normal distribution would be 0.8381); or, as suggested above, can treat the
normalization factor as a free parameter or even assume it is a distribution and use
the Maximum Entropy Principle (see Chapter 4).
Given this information, and for any vector of weights w, the optimal portfolio
weights can be calculated by maximizing (4.3). A logarithmic and a power utility
function (with exponents β of 1.2 and 0.8 to straddle in terms of risk aversion the
logarithmic utility) were chosen for the optimization. The results are shown below for
three different degrees of risk aversion (See Figs 5.2 to 5.4). The asset allocation is
59
Figure 5.2: The asset allocation to the four sub-portfolios as a function of the to-tal probability mass of being in a normal state obtained with a logarithmic utilityfunction. The allocation on the far right is the Markowitz-like allocation that wouldapply in an exclusively normal world.
shown as a function of the mass in the normal body of the distribution (this quantity
is 1 − k in the captions below). Therefore the asset allocation that can be read
for k = 0 is a Markowitz-like allocation: the allocation that would be chosen if the
exceptional events never materialized and correlations and variances were therefore
homoscedastic. The smooth curves that evolve to the left of these four points show
how these allocations change as the probability of the world entering a distressed state
increases. The results are discussed in detail in the following section. In Appendix C
the efficient frontier for different values of k is derived and compared.
5.2 Discussion of the Results
The first observation is that the qualitative features of the results display a reasonably
mild dependence on the degree of risk aversion. Given the well-known difficulties in
estimating this quantity in a reliable manner, this is a nice robustness feature of the
procedure.
60
Figure 5.3: Same as Fig 5.2 for a power utility function β = 0.8.
Figure 5.4: Same as Fig 5.2 for a power utility function β = 1.2.
61
Figure 5.5: Allocations with a two-moment simplified optimization approach for alogarithmic utility function.
We will introduce an useful concept at this point and it will be used to shed light
on the discussion that follows. The distribution (4.1) has its own moments which are
calculated up to the second in Appendix A. We will call them total moments exactly
because they are moments of the spliced distribution. We can use an optimization
engine assuming Gaussian distribution of returns with mean and variance given by
the total moments and infer the optimal allocations. We can thus have in what follows
an intuitive benchmark against this simple approximation that relies on the first two
moments. Of course, we do not expect the results to coincide with those coming
from the exact calculation engine based on the t-copula and Monte Carlo of the full
distribution but nevertheless we could gain an useful insight. The total moments
are function of the quantity 1 − k and for each value of it we will have a different
allocation. In Fig. 5.5 we can see the allocations with this approach4. In Tables 5.1
and 5.2 the used moments’ values are shown.
One can then profitably begin the analysis from the central case of the logarithmic
utility function (see Figure 5.2). In the Markowitz-like case (no exceptional events)
4The approximation of total moments seems to hold quite well, at least if we inspect visually thefigures. Under what circumstances the approximation holds is described in Appendix A.
62
the allocations are divided among two of the four asset classes: Credit and Equity. In
this benchmark case, the highest allocation is for the Credit subportfolio (92%). This
is not surprising, given the high return expected from this asset class. The Credit
subportfolio, however, has been assumed to be the most vulnerable to the associated
stress event (L(Credit) = 20%). Therefore, as the probability mass in the normal
state decreases, the allocation to Credit quickly decreases. Indeed, for all degrees
of risk aversion, the allocation to the Credit rapidly goes to zero as soon as the
probability of the world not remaining in a normal state over the investment horizon
is non-negligible. This result is both intuitive and, to some extent obvious. It pays
stressing, however, the value of such an obvious recommendation: in some situations
the expected returns from some assets may appear very attractive (compared to when
their risk is assessed by looking at the historical record) exactly because the market
is pricing in the vulnerability to events that have not materialized yet. In the run-up
to the 2007 crisis, for instance, exotic instruments such as Constant-Proportionality
Debt Obligations (CPDOs) commanded a AAA rating, but a yield of 200 basis points
above other government AAA debt. Similarly, the attractive yields for peripheral
European government debt in the spring and summer of 2010 may have appeared
inexplicably attractive if gauged on their historical record. A frequentist-based asset
allocation technique would suggest very high allocations to these inexplicably cheap
assets.
The non-monotonic behaviour of the allocation to the Equity and to the Bond
subportfolios is less intuitively obvious and deserves some discussion. The allocation
to Equity is low close to the normal state but as the probability of stress events
increases, i.e., moving towards the left in the three graphs, its allocation begins to
increase. This is easy to understand, because the stress loss associated with the
Equity subportfolio is lower than that for Credit (L(Equity) = 10% vs L(Credit)
= 20%), but their marginal probability of occurrence is the same: 7%. In order
to understand better the behaviour of the allocations, we have to look at the event
correlation implied by the Bayesian net assumed above. The event correlation matrix
is defined as5:
ρij =P (Ei, Ej)− P (Ei)P (Ej)√
[P (Ei)(1− P (Ei)P (Ej)(1− P (Ej))]
5For a detailed discussion of the event correlation matrix, see [56] and [76]. The event correlationmatrix clearly has nothing to do with the correlation matrix among the returns of the assets whichis given by the Total correlation matrix worked out in Appendix A and shown in Tables 5.3 - 5.6for some values of the normalization factor.
63
and in our specific example:Event Correlation Bond Credit Equity Mortgage
Bond 1 0.26 0.34 0Credit 0.26 1 0.09 0.12Equity 0.34 0.09 1 0
Mortgage 0 0.12 0 1
From the table we can see that the correlation of occurrence of the loss associated
with Equity and Credit is low: 0.09. A simple calculation of the total expected
return (i.e., the expected return including stress events, see Table 5.1) then shows
that this quantity decreases less rapidly as we move towards the left in the graphs
above for Equity than for Credit, but the two assets continue to retain a negative
total correlation (See Tables 5.3 - 5.6). Therefore, as long as the probability of being
in the stressed state is low, the diversification benefit for allocating between Equity
and Credit arising from the negative correlation in the ”normal” correlation matrix
is not completely lost. However, as we move further to the left (i.e, as the probability
of entering the excited state increases) the associated losses both to Equity and to
Credit start to be non-negligible compared to the other two assets and their total
expected returns becomes comparable to them. This behaviour is also confirmed by
the allocation engine with the total moments in Figure 5.5.
Let’s now examine closely the allocation to the Bond asset class. To understand
how it changes with respect to the allocation to Credit, we have to consider again
the total expected return. Note first that the no stress allocation to Bond is 0 for all
degrees of risk aversion. This is because of its low expected return compared to Credit
with which it is highly correlated (96%). The allocation to Bond replaces completely
the allocation to Credit after k approaches 4% since its total expected return starts to
be comparable to that of Credit (due to a lower loss L(Bond) = 5% and lower marginal
probability of occurrence P (Bond) = 4%). Indeed, a calculation of the total expected
return for Bond and Credit shows that, for k = 4%, they become very similar (1.69%
versus 1.64%), but the total standard deviation for Bond is significantly lower (2.34%
versus 3.70%). Furthermore, Bond is more negatively correlated with Equity than
Credit (-18% versus -6% for that level of the normalization factor k) with which it
can diversify better. For these reasons Bond starts replacing Credit as a partner of
Equity.
The subportfolio Bond replaces Credit but its allocation weight does not keep on
increasing but starts to be slowly shifted toMortgage. This happens because its stress
marginal probability of occurrence is higher than that of Mortgage (P (Mortgage) =
64
Figure 5.6: Allocations as a function of the degree of confidence - truncated Gaussian
3% vs P (Bond) = 4%) while having the same stress loss: L (Mortgage) = L (Bond)
= 5%. The total return of the two assets becomes comparable around 75%. However,
Mortgage starts rising before that point because it consistently enjoys lower volatility
than Bond.
5.3 Maximum entropy allocations
The allocations obtained by choosing for g(k) a truncated Gaussian with mode =
0.1619 are plotted in Fig. 5.6 for different values of the confidence parameter Ω.
Since the allocations are almost constant around the mode, in the example we are
examining, there is no big variation in the optimal weights, even for high values of Ω.
The same allocation sensitivity study was carried out also in the case of a Johnson
distribution and the results are shown in Fig. 5.7 with b = 1. We see that, as the
variance is increased, more weight is given to the Mortgage class with respect to case
of the truncated Gaussian. This is understandable since the Johnson distribution
is more skewed to the right than the truncated Gaussian and its mean is displaced
by definition from the mode thus capturing bigger values of the factor k where the
allocation to Mortgage dominates.
65
1− k (%) Bond Credit Equity Mortgage
100 0.0181 0.0207 0.0205 0.016799 0.0178 0.0196 0.0199 0.016498 0.0175 0.0186 0.0192 0.016297 0.0172 0.0175 0.0186 0.015996 0.0169 0.0164 0.0180 0.015795 0.0166 0.0153 0.0173 0.015494 0.0163 0.0143 0.0167 0.015193 0.0160 0.0132 0.0160 0.014992 0.0157 0.0121 0.0154 0.014691 0.0154 0.0111 0.0148 0.014490 0.0151 0.0100 0.0141 0.014189 0.0148 0.0089 0.0135 0.013888 0.0144 0.0078 0.0129 0.013687 0.0141 0.0068 0.0122 0.013386 0.0138 0.0057 0.0116 0.013185 0.0135 0.0046 0.0109 0.012884 0.0132 0.0036 0.0103 0.012583 0.0129 0.0025 0.0097 0.012382 0.0126 0.0014 0.0090 0.012081 0.0123 0.0003 0.0084 0.011880 0.0120 -0.0007 0.0078 0.011579 0.0117 -0.0018 0.0071 0.011278 0.0114 -0.0029 0.0065 0.011077 0.0111 -0.0039 0.0058 0.010776 0.0108 -0.0050 0.0052 0.010575 0.0105 -0.0061 0.0046 0.010274 0.0102 -0.0072 0.0039 0.009973 0.0099 -0.0082 0.0033 0.009772 0.0096 -0.0093 0.0027 0.009471 0.0093 -0.0104 0.0020 0.009270 0.0090 -0.0115 0.0014 0.0089
Table 5.1: Total mean. See Appendix A for details of the calculation.
66
1− k (%) Bond Credit Equity Mortgage
100 0.0227 0.0236 0.0920 0.014499 0.0229 0.0276 0.0919 0.014798 0.0231 0.0311 0.0918 0.015097 0.0233 0.0342 0.0917 0.015296 0.0234 0.0370 0.0915 0.015595 0.0236 0.0396 0.0914 0.015794 0.0238 0.0420 0.0913 0.016093 0.0239 0.0442 0.0911 0.016292 0.0241 0.0463 0.0910 0.016491 0.0242 0.0483 0.0909 0.016790 0.0244 0.0502 0.0907 0.016989 0.0245 0.0520 0.0906 0.017188 0.0246 0.0537 0.0904 0.017387 0.0248 0.0553 0.0902 0.017586 0.0249 0.0569 0.0901 0.017785 0.0250 0.0584 0.0899 0.017984 0.0251 0.0599 0.0897 0.018083 0.0253 0.0613 0.0895 0.018282 0.0254 0.0626 0.0893 0.018481 0.0255 0.0639 0.0891 0.018580 0.0256 0.0652 0.0889 0.018779 0.0257 0.0664 0.0887 0.018878 0.0258 0.0676 0.0885 0.019077 0.0258 0.0687 0.0883 0.019176 0.0259 0.0698 0.0881 0.019375 0.0260 0.0709 0.0879 0.019474 0.0261 0.0719 0.0877 0.019573 0.0262 0.0729 0.0874 0.019772 0.0262 0.0739 0.0872 0.019871 0.0263 0.0749 0.0869 0.019970 0.0264 0.0758 0.0867 0.0200
Table 5.2: Total volatility. See Appendix A for details of the calculation.
k = 95% Bond Credit Equity Mortgage
Bond 1 0.68 -0.16 0.81Credit 0.68 1 -0.04 0.63Equity -0.16 -0.04 1 -0.05Mortgage 0.81 0.63 -0.05 1
Table 5.3: Total correlation matrix for k = 0.95. See Appendix A for details of thecalculation.
67
k = 90% Bond Credit Equity Mortgage
Bond 1 0.61 -0.11 0.77Credit 0.61 1 0.01 0.57Equity -0.11 0.01 1 -0.01Mortgage 0.77 0.57 -0.01 1
Table 5.4: Total correlation matrix for k = 0.90. See Appendix A for details of thecalculation.
k = 85% Bond Credit Equity Mortgage
Bond 1 0.58 -0.06 0.73Credit -0.06 1 0.04 0.53Equity -0.06 0.04 1 0.02Mortgage 0.73 0.53 0.02 1
Table 5.5: Total correlation matrix for k = 0.85. See Appendix A for details of thecalculation.
k = 80% Bond Credit Equity Mortgage
Bond 1 0.55 -0.02 0.70Credit 0.55 1 0.06 0.50Equity -0.02 0.06 1 0.05Mortgage 0.69 0.05 0.04 1
Table 5.6: Total correlation matrix for k = 0.80. See Appendix A for details of thecalculation.
68
Figure 5.7: Allocations as a function of the degree of confidence - Johnson
5.4 A fast algorithm
In this Section we present a fast and efficient algorithm to infer allocations for different
values of k without resorting every time to a new Monte Carlo simulation. This
algorithm is another original contribution of this thesis and for practical purposes it
can give invaluable saving of computation time without affecting much the allocations.
The problem of finding the optimal weights means maximizing the expected value
of the utility of the final wealth at the investment horizon which, as we saw, means
maximizing:
E[U ] = (1− k)
∫U(1 + wr)f(r)dr + k
∑i>0
∫δ(r − Li)
pi1− p0
U(1 + wr)dr (5.1)
over w given the constraints. δ(r − Li) is a shorthand notation for δ(r1 − Li1, r2 −Li2...rn − Lin).
If the integration cannot be carried out easily, as in the example leading to (4.4),
and/or it is not solvable numerically, a Monte Carlo simulation becomes necessary.
Moreover, finding the optimal weights for several values of k and create graphs as
those shown in Figure 5.6 requires a series of Monte Carlo simulations for each value
69
of the parameter k according to the granularity we choose. Calculating the allocation
for different values of k might be necessary also in real world applications and not
only for a presentation purpose. This is true especially if we endorse the Maximum
Entropy suggestion where we could like to see the allocation over an interval of values
of the confidence parameter Ω6. A Monte Carlo simulation for every chosen parameter
might be quite a burdensome procedure7. In what follows we present another solution
to this problem.
Let’s first consider the problem of optimization in normal times. In this case, the
problem reduces to optimizing:
E[U ] =
∫U(1 + wr)f(r)dr (5.2)
subject to the constraint∑
iwi−1 = 0 and, if we do not want short selling constraints,
also to wi > 0. E is the normal times expectation operator.
Let’s start by considering the problem without short selling constraints. Lagrange
multipliers can be used to find a stationary point. By defining the Lagrange function
as:
Λ = E[U ]− λ(∑i
wi − 1)
the following equations must be satisfied (necessary conditions):
∂Λ
∂wj= 0,
∂Λ
∂λ= 0
or ∫∂U(1 + wr)
∂wjf(r)dr − λ = 0∑
i
wi − 1 = 0
the last equation being simply the constraint. We can derive from here the very useful
relation at the stationary point w:∫∂U(1 + wr)
∂wjf(r)dr =
∫∂U(1 + wr)
∂wlf(r)dr ∀j, l (5.3)
6Following [44] we might also want to explore the behaviour of the allocations not only for k butalso for a broader set K of values of k (k ∈ K) called stress set. We might estimate an opportunitycost of choosing among the different values of k in the stress set.
7The total moments approximation can be good to save computation time and give very closeallocations to those of the exact problem but this might not be always the case, especially in marketsthat deviate strongly from Gaussianity.
70
Suppose that we have already the solution of this problem w and want to calculate
the shift in weights w + δw that optimize the full problem (5.1). We can expand the
utility function around w.
U(w + δw) = U(w) + J(w)δw +1
2δwTH(w)δw (5.4)
where J is the Jacobian matrix, which is a vector (the gradient) for the scalar-valued
utility function and H is the square matrix of second-order partial derivatives of the
utility function (Hessian matrix) with respect to w. If we take the derivative of this
equation with respect to the generic weight wj we obtain:
∂U
∂wj= J(w) +
∑i
δwiHij(w)
Again Lagrange multipliers can be used to find the maximum of (5.1). By defining
the Lagrange function as:
Λ = E[U ]− λ(∑i
wi − 1)
the following equations must be satisfied at the stationary point:
∂Λ
∂wj= 0,
∂Λ
∂λ= 0
or
(1− k)
∫∂U(1 + wr)
∂wjf(r)dr + k
∑i>0
∫δ(r − Li)
pi1− p0
∂U(1 + wr)
∂wjdr − λ = 0
∑i
wi − 1 = 0
By substituting the expansion of the utility function around w in the above equations
we restate the first condition as:∫Jj(w)[(1− k)f(r) + k
∑i>0
δ(r − Li)pi
1− p0
]dr+
+∑l
∫Hlj(w)δwl[(1− k)f(r) + k
∑i>0
δ(r − Li)pi
1− p0
]dr − λ = 0
Combined with the second condition which can be rewritten as:
−∑i
δwi = 0
71
it becomes just a linear system of equations in the shifts δw which can be solved
with traditional linear algebra methods. By defining the total bordered Hessian ma-
trix as:
HT =
∫H11(w)h(r)dr
∫H12(w)h(r)dr · · · −1∫
H21(w)h(r)dr∫H22(w)h(r)dr · · · −1
......
. . ....
−1 −1 · · · 0
where h(r) = (1− k)f(r) + k
∑i δ(r − Li)
pi1−p0 , we can express the solution as:
δwj =detHj
detHT
(5.5)
where Hj is the total bordered Hessian matrix where the j-th column is replaced by
the vector of coefficients given in this case by the Jacobian:
Hj =
∫H11(w)h(r)dr · · · −
∫J1(w)h(r)dr · · · −1∫
H21(w)h(r)dr · · · −∫J2(w)h(r)dr · · · −1
......
.... . .
...
−1 −1 0 · · · 0
It is very important to remind that these matrices are calculated at the ”old” equi-
librium point w. We can see that in case k = 0 the column containing the Jacobians
is composed of identical elements because of (5.3). In this case, because a matrix
having two rows proportional to each other has a 0 determinant, the shifts of weights
δwi is trivially zero.
5.4.1 Gaining intuition
In order to gain intuition of why the approximation (5.4) is a good approximation, we
plot in Fig.5.8 the logarithmic utility of three asset classes i.e. we plot the quantity
log(1+w1∗r1 +w2∗r2 +(1−w1−w2)∗r3) for two sets of returns ri, as w1 and w2 vary.
The third asset allocation is given by w3 = 1−w1−w2 but a short selling constraint
w3 > 0 is not imposed. The results, however, if such constraint is imposed, will be a
subset of Fig. 5.8. We see in the two cases that the utility is almost a plane and that
72
Figure 5.8: Logarithmic utility function for two sets of returns as the allocations vary.The two sets are: plane with squared markers - r = (1%,−4%,−5%); plane withoutmarkers - r = (−1%, 2%, 3%).
even drastic changes in the allocations can be taken into account by a second order
expansion.
Later in this subsection we will evaluate the quality of the approximation in terms
of opportunity cost.
We can gain some additional insight through the two dimensional case. The
solution of (5.5) for δw1 reduces to:
δw1 =E[J2]− E[J1]
2E[H12]− E[H11]− E[H22]
where the expectation is with respect to h(r). If k = 0 the numerator of this expression
is zero by virtue of (5.3). More explicitly, in the case of power utility function (4.2):
δw1 =
=−k∑
i(Li1 − Li2) pi
1−p0 (w1Li1 + w2L
i2)−β
β[(1− k)∫
(r1 − r2)2(w1r1 + w2r2)−β−1f(r) + k∑
i(Li1 − Li2)2 pi
1−p0 (w1Li1 + w2Li2)−β−1]
73
for k close to 0 we can neglect the second term in the denominator with respect to
the first.
δw1 = − k∑
i(Li1 − Li2)pi(w1L
i1 + w2L
i2)−β/(1− p0)
β(1− k)∫
(r1 − r2)2(w1r1 + w2r2)−β−1f(r)dr(5.6)
The conclusion then is straightforward for small k: the shift is proportional to
the factor k and the probability weighted stressed returns difference for the different
scenarios. Moreover, from the additional weight (w1Li1 + w2L
i2)−β it is clear that the
lower the values of Li2 and Li2 the greater the shift. δw is also inversely proportional
to the quantity 1−p0. Finally, the problem is also sensitive to the shape of the utility
function through the parameter β.
It is important to underline that we would have reached the same solution for
small k had we started with an alternative method such as regular perturbation theory
and considered the second term of (5.1) as a small perturbation. We would have then
expanded the weights in power series around the solution of the unperturbed problem
w i.e. w = w+ kw1 + k2w2 + ... and equated the terms of the same order by inserting
in (5.1). We would have obtained exactly (5.6) by neglecting second and higher order
terms.
5.4.2 Generalizing the results
What discussed in the previous section applies to a simple case. What we need,
however, in most of the cases is a solution for more than 2 assets and for values of k
which are not necessarily small. The multidimensional case involving three or more
assets is more complicated and the solution does not have the straightforward form
of (5.6) but must be kept in the form of the matrices HT and Hj.
In general, in the numerator of (5.6) we must deal with stress returns Li weighted
by integrals of the form∫rirj(
∑l rlwl)
−β−1f(r)dr. Although analytically the problem
is not solvable, a solution is directly computable thanks to equation (5.5) and it is
very fast compared to a repeated Monte Carlo simulation for each k. In fact, what is
required is to carry out a simulation only once for k = 0. During the simulation the
elements of the matrices HT and Hj related to the normal part can be calculated and
stored without almost any additional extra time and memory. Once they are stored
they can be reused to infer the shifts δw for any value of k.
If we introduce short selling constraints in the form wi > 0 the equality (5.3) is in
general no longer valid. However, we could still utilize the expansion of U(w) around
74
w and maximize the following expression over the weights for each k:∫(U(w) + J(w)δw)[(1− k)f(r) + k
∑i>0
δ(r − Li)pi
1− p0
]dr+
+1
2
∫δwTH(w)δw[(1− k)f(r) + k
∑i>0
δ(r − Li)pi
1− p0
]dr
For the numerical search one can resort to techniques which include inequality
constraints such as the Kuhn-Tucker optimization. The function fmincon in MAT-
LAB, which accepts inequality constraints, can be also used and the above expression
is easily optimized over w.
As a practical example, consider the graphs in Chapter 5. One of such graphs
requires approximately 1 hour to be calculated8. With the above expansion the
required time is that of the first simulation for k = 0 i.e. around 2 minutes, plus
5 additional seconds to infer the rest of the graph. Fig. 5.9 compares the Monte
Carlo simulation with the above approximation which, as visible from the figure,
holds quite well for all the values k. In order to have an idea about the ”cost” of this
approximation we could search numerically for each k, as done similarly in [29], for
a constant c that would make the investor indifferent between the two optimizations
i.e.:
E[U(1 + w∗optr + c)] = E[U(1 + woptr)]
where w∗opt are the optimal weights obtained with the approximated utility function,
wopt those obtained as a solution to the full problem and c is the opportunity cost.
In Table 5.7 we can see the value of c for several values of the normalization factor9.
We note in closing that, unlike the total moments approach, the method of this
Section relies on a different expansion i.e expansion of weights vs expansion of returns
(to which the total moment approach can be compared, see Appendix A). Moreover,
it does not assume any particular form (e.g. Gaussian) for the full distribution for
returns but leaves it as it is. Which one of the two approximations to be used and is
better, must be examined in the context of the problem to be solved. For example, if
there are not short selling constraints, the search algorithm can push the solutions to
values of w far from the equilibrium value w, something of limited effect if the weights
belong to [0, 1]. Then the weights expansion might not hold very well compared to
the total moments expansion. If, on the other hand, weights are bounded in [0, 1] and
8The machine used for the simulation has a AMD Phenom Triple-Core 2.1GHz processor.9It might appear strange that for c = 0 the opportunity cost is different from 0. This is due to
the limitations of the simulation used to infer the weights e.g. termination tolerance level of 10−6
(see Section 4.4.1).
75
1− k (%) c ∗ 103
100 0.018299 0.000198 -0.010297 -0.008996 -0.010795 -0.021494 -0.119393 -0.003792 -0.003891 -0.004690 -0.106689 -0.101988 -0.130287 0.000186 0.072085 -0.028884 -0.028483 0.024082 0.022781 0.100680 -0.000779 -0.100478 -0.005177 -0.099476 0.000475 0.0003
Table 5.7: Opportunity cost c (multiplied by 103) for the logarithmic utility as afunction of 1− k.
the full return distribution has high third and higher moments and so the Gaussian
approximation is not good, the weights expansion might be better.
76
Figure 5.9: Allocations as a function of the normalization factor in the case: MonteCarlo for each k - thin line, numerically inferred - thick line
77
Chapter 6
Conclusions
A simple method has been presented to carry out a coherent programme of asset
allocation based on utility maximization in the presence of stress events with subjec-
tively assigned probabilities. The method relies on the ability of the asset manager to
specify in an approximate way the causal links, if any, among the stress events that
can most affect a given portfolio. The approach has several advantages. First of all,
it is very transparent. Its crucial ingredients (i.e., the subjective probabilities, the
assumed causal links, the fraction of the probability in the distresses state, etc.) are
clearly visible, and affect the result in an intuitively understandable manner. This
helps the asset managers intuition, as the discussion in the previous sections shows.
The approach also lends itself to thorough and thoughtful sensitivity analysis, as lit-
erally every input can be understood, changed by hand and the results inspected.
Thanks to its intuitional appeal, the approach therefore allows the scrutiny by non-
specialists, trustees and other senior investment officers. We should stress again that,
even though we gave an example only in regard to exceptional loss events, the pro-
cedure is just as valid and applicable to exceptional gains for a given asset class i.e.
we can speak about Profit&Losses and not only about Losses.
Several add-ons give additional power to the whole approach and ease its practical
implementation.
First, the Linear Programming technique of subsection 3.3.2, extended here to
include non-linear fractional constraints, gives a guidance on the choice of conditional
probabilities in the Bayesian net. As the structure of the net becomes more and more
elaborate, this tool might become essential in order to arrive fast at reliable probability
values for the nodes.
Second, thanks to the numerical recipe presented in Section 5.4, no additional
computational burden is posed when calculating the optimal weights for several val-
ues of the normalization factor compared to the Monte Carlo run in the case of no
78
stress events. This is especially helpful, but not only, in the case when the Maximum
Entropy Approach of Section 4.5 is used which might require the calculation of allo-
cations on a broad spectrum of values for the normalization factor to infer allocations
for different values of the confidence Ω.
Third, the concept of Total Moments allows an easy comparison with a simple
allocation based on Gaussian returns which is a very intuitive concept, widely known
among asset managers.
Forth, another known concept such as Efficient Frontier was introduced and ex-
tended in this thesis. This extension allows the asset manager to have a global picture
of the modifed risk/return tradeoff in the presence of potential stress events by a sim-
ple eye inspection.
Finally, the Black-Litterman model was integrated in the framework and made
”clean” by inputting in it a covariance matrix obtained with the methods of Chapter
2. In fact, the BL world can be seen as a particular case of the world we propose
here: forward-looking and with stressed events.
6.1 Future research
Some improvements of the approach described here could go in the direction of intro-
ducing uncertainty in the estimated probabilities for each node in a BN i.e. assuming
for them a distribution instead of fixed values. This uncertainty was dealt with in
this thesis through the maximum entropy principle but on a high level, through the
parameter k. A more granular approach on a single node level could be examined.
The same uncertainty could be also incorporated in the stress loss (gain) which
were assumed here distributed as a Dirac-δ. A more complicated distribution with
uncertainty parameter σ can be assumed e.g. a Gaussian or a uniform distribution.
Moreover, the BN can be complemented by additional nodes associated to events
that do not give rise to losses. For example, imagine the event ”Political elections in
country X”. It does not mean by itself a loss in any particular asset class but it could
cause a series of losses (or gains) if it is connected to events that actually represent
losses. For instance, this node could be a parent node with arrows pointing to other
events associated to losses e.g. ”Crash of the stock market” which in itself represents
a loss, if we had stocks in our portfolio.
This thesis opens the way to some additional applications that can be further
explored. One possible direction might be the application of the described methods
to credit portfolio models. In fact, some of the commercial products in this area,
79
e.g. CreditMetrics, use as an input a correlation matrix inferred from equity prices to
calculate the portfolio loss distribution on a given horizon. Tail events are deduced
via a backward looking simulation of the joint comovement of the underlying factors
such as industry or country indices. The cleansing of historical tail events and splicing
with forward-looking tail events based on causality assumptions, as proposed in this
thesis, could be explored. This direction inevitably suggests that the much discussed
CDO models can be also improved with the same methodology.
80
Appendix A
Total moments
Let’s calculate the first and second moments of the returns with the complete distri-
bution i.e ’normal’ part plus exceptional:
F (r1, r2....rn) = (1− k)f(r1, r2, ...) +k
1− p0
2n−1∑i>0
piδ(r1−Li1, r2−Li2...rn−Lin) (A.1)
The first part on the r.h.s. represents the normal part and the f is the outcome of
the copula fitting process. We will not care about the form of f in this Appendix so
let’s call it Normal. We rewrite the expression for F (r1, r2....rn) as:
F (r1, r2....rn) = (1− k) ∗Normal+k
1− p0
2n−1∑i>0
piδ(r1 −Li1, r2 −Li2...rn −Lin) (A.2)
A.0.1 First moment
Let’s calculate the expected value of a given return ri:
E[rj] = (1− k)E[rj] +k
1− p0
∫ ∞−∞
2n−1∑i>0
pirjδ(r1 − Li1, r2 − Li2...rn − Lin)
The E gives the expectation in the normal state of the world. The sum in the second
term has no zero elements only when i = j. This yields immediately:
E[rj] = (1− k)E[rj] +k
1− p0
P (Ej)Lj (A.3)
where P (Ej) is the marginal probability of the event j happening. The loss Lj is
negative so the second term is also negative and thus the expected value decreases as
we move away from normal states.
81
A.0.2 Variance
To work out the variance we also need:
E[r2j ] = (1− k)E[r2
j ] +k
1− p0
P (Ej)L2j (A.4)
derived with the same logic as for the first momentum. By definition the variance is:
V ar(rj) = E[r2j ]− E[rj]
2
by using the above derivations:
V ar(rj) = (1− k)E[r2j ] +
k
1− p0
P (Ej)L2j − ...
...− (1− k)2E[rj]2 − (
k
1− p0
)2P (Ej)2L2
j − 2(1− k)k
1− p0
E[rj]P (Ej)Lj
This can be regrouped as:
V ar(rj) = (1− k)V ar[rj] +k
1− p0
P (Ej)L2j + ...
...+ k(1− k)E[rj]2 − (
k
1− p0
)2P (Ej)2L2
j − 2(1− k)k
1− p0
E[rj]P (Ej)Lj =
= (1− k)V ar[rj] +k
1− p0
V ars[rj]L2j + k(1− k)(E[rj]
2 − 21
1− p0
E[rj]P (Ej)Lj) + ...
...+k
1− p0
P (Ej)2L2
j(1−k
1− p0
)
where V ar is the variance of the normal part of the problem and V ars:
V ars[rj] = P (Ej)− P (Ej)2
The expression has the desired property to reduce to the normal variance only when
k = 0. Notice, however, that it does not reduce to V ars[rj]L2j when the world is
completely in a non-normal state k = 1. It does so only in the case p0 = 0.
A.0.3 Covariance
By definition of covariance:
cov[ri, rl] = E[rirl]− E[ri]E[rl]
So we use:
E[rirl] = (1− k)E[rirl] +k
1− p0
P (EiEl)LiLl
82
which gives:
cov(ri, rl) = (1− k)E[rirl] +k
1− p0
P (EiEl)LiLl − ....
...− ((1− k)E[ri] +k
1− p0
P (Ei)Li)((1− k)E[rl] +k
1− p0
P (El)Ll)
which by expanding and regrouping can be cast as:
cov(ri, rl) = (1− k)cov[ri, rl] +k
1− p0
covs[ri, rl]LiLl + ....
...+ k(1− k)(E[ri]E[rl]−1
1− p0
E[ri]P (El)Ll −1
1− p0
E[rl]P (Ei)Li) + ...
...+k
1− p0
P (Ei)P (El)LiLj(1−k
1− p0
)
where:
covs[ri, rl] = P (EiEl)− P (Ei)P (El)
The same reasoning as for the variance can also be applied here. If the world is in a
completely normal state the covariance reduces to the normal covariance and when
k = 1 it does not reduce to covs[ri, rl]LiLl. The term multiplying k(1 − k) adds
always a positive amount apart from the extreme points k = 0, k = 1 where it is zero.
So we cannot deduct for the variance/covariance the same strictly non-increasing
behaviour as for the mean. It may well that the variance/covariance does not assume
its maximum value at the extrema but at some other point in the open interval (0, 1).
The correlation is then the expression for the covariance divided by the square
root of the product of the variances of the two returns. Its form becomes cumbersome
and we will not write it down explicitly but we can see from the previous derivations
that it has the same desirable property to reduce to the normal at the point k = 0.
A.0.4 Utility maximization
The idea is now to replace the utility maximization over the distribution (4.1) with
utility maximization over a Gaussian distribution with the moments calculated in this
Appendix i.e. instead of maximizing:
E[U ] = (1− k)
∫U(1 + wr)f(r; E[r],Σ, ν)dr + k
∑i>0
∫δ(r − Li) pi
1− p0
U(1 + wr)dr
(A.5)
maximize:
E[U ] =
∫U(1 + wr)f(r; rT ,ΣT )dr (A.6)
83
over the weights and subject to the constraints of the problem. f is the Gaussian
density, the subscript T means Total and ν is the degrees of freedom of the fitted
copula. The distribution parameters in both cases were made explicit. To go one
step further let’s write the form of the utility function we chose:
E[U ] =
∫[(1− k)f(r; E[r],Σ, ν) + k
∑i>0
δ(r − Li) pi1− p0
]1
1− β((1 + wr)1−β − 1
)dr
Let’s expand the the utility function around r = E[r] = rT up to the second order:
E[U ] =
∫[(1− k)f(r; E[r],Σ, ν) + k
∑i>0
δ(r − Li) pi1− p0
]F (r, w, rT , β)dr
where
F (r, w, rT , β) =1
1− β((1 + wrT )1−β − 1
)+
w(r − rT )
(1 + wrT )β−
−β 1
2
∑i
∑k wiwk(rk − rkT )(ri − riT )
(1 + wrT )β+1+ o((r − rT )3)
By denoting F0, F1, F2 etc. the terms of the expansion, we have by construction:∫[(1− k)f(r; E[r],Σ, ν) + k
∑i>0
δ(r − Li) pi1− p0
]2∑l=0
Fldr =
∫f(r; rT ,ΣT )
2∑l=0
Fldr
The term involving (r− rT )3 is zero in the case of the Gaussian approximation while
it is generally different from zero for the full distribution. Discrepancies of this kind
will be in general present also for higher moments and these will give rise to the
different allocations with the two methods. As long as the higher order terms can be
neglected, the allocations will be similar. For example, if the copula/marginal part of
the full problem can be considered Gaussian and as long as the value of k (and hence
the probability of distress) is small enough, this approximation might give excellent
results1.
1For the example in Chapter 5, given the time horizon considered, the ’normal’ part of theproblem is well represented by a Gaussian copula and marginals (see Footnote 1 of that Chapter)and thus this approximation holds quite well. Finally, we note that the paper of Jondeau et al. [29]an approach, which also relies on expansions, is described. However, they rely on more complicatedexpansion (up to the forth moment) of the final wealth around its expected value, without assumingany particular form for the returns distribution.
84
Appendix B
Fusion with the Black-Littermanapproach
The approach proposed in this thesis can be integrated easily with the Black-Litterman
model [1]1.
B.1 Formulation
For the BL model in a Bayesian form see e.g. [65]. Without going into deep details
the model can be cast as (without a proportionality integration constant)2:
h(µ|r) ∝ f(r|µ)g(µ) (B.1)
where g(µ) is the probability density function of the beliefs of the investor on the
asset returns (the prior), f(r|µ) is the probability density function of the likelihood
i.e. the conditional probability on the investor’s view of the data returns. h(µ|r) is
the posterior and it reflects investor’s updated view from the amalgamation of the
data and his prior views3. g(µ) is assumed distributed as N(q,Ω) where q represents
the investor’s view on the vector of returns4 and Ω is the confidence in these views.
f(r|µ) is assumed to be N(µ, τΣ) where Σ is the covariance matrix of returns and τ
1In [43] an extension of the BL model to non-normal markets is proposed. However, the im-provement is mainly in the direction of considering fat-tailed copulae and therefore different fromthe approach proposed here. For the strength and the weaknesses of the different approaches to theBL model see [77].
2The material in this and the following sections will rely also on the results of Appendix A.3Returns r can be implied through equilibrium considerations e.g. through the market capital-
ization of the asset classes. In what follows, we will leave open both the possibilities of equilibriumand data driven returns.
4It may represent not only absolute returns on an asset but also difference in returns betweendifferent assets. It is represented as Pµ with P a l x p matrix, l being the number of the views andp the number of assets.
85
is a scaling factor often set to 1. Under these assumptions the posterior may be cast
as a normal distribution with a modified drift and variance of the drift (µBL, σBL)
given by:
µBL = [(τΣ)−1 + P ′Ω−1P ]−1[(τΣ)−1r + P ′Ω−1q] (B.2)
σBL = [(τΣ)−1 + P ′Ω−1P ]−1
In our case the probability distribution function of the portfolio returns is in the
form:
F (r) = (1− k)f(r) +k
1− p0
2n−1∑i=1
piδ(r − Li) (B.3)
with mean5:
rT = (1− k)r +k
1− p0
2n−1∑i>0
piLi
and standard deviation of the estimate of the mean:
σT = (1− k)σr ∝ 1/√N
i.e. we assume that we are absolutely confident only in the estimate of the means of
the stressed part: Li6.
We can express our approach in a Bayesian form as a mixture of priors gi(µ) that
multiply respectively the likelihoods fi(r|µ), one for each of the events in the joint
probability table and one for the ’normal’ times part:
h(µ|r) =2n−1∑i=0
hi(µ|r) ∝2n−1∑i=0
fi(r|µ)gi(µ)
where we represent the ’normal’ part for i = 0 and we set g0(µ) equal to:
g0(µ) = (1− k)N(q,Ω)
where N(q,Ω) is Gaussian distribution with mean given by our view on the average
return (or difference in returns as explained in footnote 3) and confidence in the view
given by Ω. For gi(µ) we assume:
gi(µ) =k
1− p0
piδ(µ− Li) for i > 0
5We remind again that r and is a short-hand notation for the vector of expected returns.6It is important to say that this is uncertainty in the estimate of the stressed mean and not the
variance of the stressed distribution around the mean which in our case is a δ-Dirac but could beany other distribution.
86
What about the likelihood functions fi(r|µ)? Let’s assume that the market in ’normal’
times is well represented by a Gaussian. If we draw one single observation from this
distribution its likelihood will be:
f0(rj|µ) ∝ exp−(rj − µ)′Σ−1(rj − µ)
2
by extending the reasoning to N observations, we reach the widely known result for
the sample mean r:
f0(r|µ) ∝ exp−(r − µ)′Σ−1(r − µ)
2/N (B.4)
where r is the sample average of the N observations i.e. r = 1N
∑Nj=1 rj. We will
consider (B.4) to be the likelihood of the ’normal’ part. For the stressed parts we
will put fi(r|µ) = 1 i.e. we have no past market history with which to blend the
hypothesized parameters Li. The total corrected drift will be then:
µcorr = (1− k)µBL + k∑i>0
pi1− p0
Li
with µBL defined as in (B.2) but with the replacement τ = 1/N .
B.2 Investor views and implied drifts
The ’normal’ part, as we saw, has a likelihood which contains the historical average
r but it can be treated in a more sophisticated way if we interpret it in a BL light i.e.
we can use for it a likelihood which incorporates the market implied drifts as in the
equilibrium BL model7. Additionally, we can also imply drifts for the stressed part
and express separate opinions on them. We will show in the next subsections two
ways of getting the implied drifts from the market capitalization weights. However,
it is important to underline since now that having views both in the ’normal’ part
and the part which comes out of the Bayesian net is feasible although it might appear
at first sight as misleading. But having certain subjective views on the ’normal’
part and at the same time the market not being in distress is not a contradiction.
For example, in ’normal’ times the investor can have the view ’This year equities of
the PIIGS8 countries will underperform German equities by 3%’ which might seem
natural in the light of the recent developments. This view is not a stress event. On
the other side, the event ’Default of country X belonging to PIIGS’ is something
7The BL model relies on the assumptions of the CAPM to set the equilibrium returns.8Acronym used to label jointly Portugal, Italy, Ireland, Greece and Spain.
87
that could have a systemic impact and trigger extreme moves on all the assets and
as such is incorporated separately in the Bayesian net which is mutually exclusive
with the ’normal’ part by construction. On the implied drifts of such extreme moves
the investor could express different opinions without ’interfering’ with the views in
’normal’ times.
In summary, subjective views that do not disrupt markets could be incorporated
in the body of the distribution a la Black-Litterman while on the drifts of the extreme
scenarios, which are accommodated in the BN part of the exercise, the investor could
express separate opinions. In this way the BL model becomes a particular case of a
world with stress events i.e. with Bayesian net.
Moreover, we should also remember that the ’body’ of the ’normal’ part was
obtained by truncating extreme points from the data and we saw also that throwing
away just a few datapoints can change substantially the covariance matrix which is
central in the BL calculations. By cleaning the data we also prepare cleaner inputs to
the BL formulae for the ’normal’ part, something which was not done in the original
model.
In what follows we will show two ways of deriving the market implied drifts. The
first one makes the assumption that the information about the stress scenarios is
already incorporated in the market equilibrium. The other point of view is that the
scenarios express a really idiosyncratic opinion of the asset manager and as such are
not incorporated in the market. A real world situation will be a middle way between
the two extrema. How to extend this approach to hybrid situations can be object of
further research.
B.2.1 Implied drifts in the case of knowledge contained inthe market
Let’s derive the reverse optimization starting from the maximization of a quadratic
utility9:
U = Ψ− δ
2Ψ2
where Ψ is the final wealth. The total density has the usual form:
F (r1, r2....rn) = (1− k)f(r1, r2, ...) +k
1− p0
2n−1∑i=1
piδ(r1 − Li1, r2 − Li2...rn − Lin)
9The methodology of this section hold also for a generic form of the quadratic utility functionU = AΨ2 +BΨ +C. For directions about how to set the market risk aversion parameter δ, see [77],[65]. Finally, the derivation we will present does not include the budget constraint as in the majorityof papers on the BL model. An extension that includes such constraint can be found in [26].
88
We have to maximize over the weights the following expression (with short-hand
notation for the returns):
E[U ] =
∫[(1− k)f(r) +
k
1− p0
2n−1∑i=1
piδ(r − Li)][(1 + wr)− δ
2(1 + wr)2]dr
if we take the derivative of with respect to wj:
dE[U ]
dwj=
∫[(1− k)f(r) +
k
1− p0
2n−1∑i=1
piδ(r − Li)][rj − δ(1 + wr)rj]dr =
= (1− δ)µTj +∑l
wlE[rjrl] = (1− δ)µTj +∑l
wl((1−k)E[rjrl] +k
1− p0
P (EjEl)LjLl)
where E is the normal times expectation operator and µTj is the total drift. The
implied total drift of the asset j is given then by equating to 0 the above expression:
µTj = − 1
1− δ∑l
wl((1− k)E[rjrl] +k
1− p0
P (EjEl)LjLl)
where the weights w represent the relative market capitalization of the asset classes
involved in the problem and they are known parameters. This expression can now be
easily solved for µTj and split in:
µTj = πnormalj + µstressedj
with:
πnormalj = − 1
1− δ∑l
wl(1− k)E[rjrl]
and
µstressedj = − 1
1− δ∑l
wlk
1− p0
P (EjEl)LjLl
The stressed drift thus obtained can be equated to (A.3):
µstressedj =k
1− p0
P (Ej)Ljimplied
Limplied can be easily derived by inversion and we will call it πstressedj .
We note that the BL implied drifts are usually derived from the utility: U =
wr − λwTΣw. However, such starting point would have not enabled us to perform
the split between stressed and ’normal’ components because of the presence of mixed
’stressed-normal’ cross products in the covariance matrix10.
10The differences in the mean-variance approach between these two types of utility functions isillustrated with a practical example by Sharpe [68].
89
B.2.2 Implied drifts in the case of specific knowledge
In this case, the implied drifts are given as if no information about the stress events
is contained in the market:
µTj = πnormalj = − 1
1− δ∑l
wlE[rjrl]
B.2.3 Summing up
If we adopt the first approach for the implied drifts, we can split the likelihoods as
follows:
f0(r|µ) ∝ exp−(πnormal − µnormal)′Σ−1(πnormal − µnormal)2τ
where τ is a constant11. For the stressed part12:
fi(r|µ) ∝ exp−(πstressed − µstressed)′Σ−1(πstressed − µstressed)2τ
If we adopt the second approach instead, we can simply write:
f0(r|µ) ∝ exp−(πnormal − µnormal)′Σ−1(πnormal − µnormal)2τ
and:
fi(r|µ) = 1
But what about the priors g(µ)? In the first case, we can again use the distribution
N(q,Ω) for g0(µ) but a δ-Dirac for gi(µ) means that the implied stressed drifts will
be completely shrunk to their prior. Unless we are absolutely certain about our
parameters estimates, we could relax this restrictive assumption. In order to ’smooth’
the effect of the priors we can then use for them Gaussian distributions N(qi,Ωi) and
amalgamate them with the respective likelihoods.
In the second case, we simply apply the implied drifts only to the ’normal’ part
and stick to flat priors for the stressed part13.
The total corrected drift will be then:
µcorr = (1− k)µ0BL + k
∑i>0
pi1− p0
µiBL
11The constant τ is often set to 1 in the BL model. See [77] for an extensive discussion on thedifferent ways of setting this parameter.
12We will make the assumption that the uncertainty in the implied drifts σ2τ is the same for thestressed and ’normal’ parts. In general, two different constants τ1 and τ2 can be introduced and thesensitivity of the method analyzed when varying the ratio m = τ1/τ2.
13Again we have the option of expressing the priors as N(qi,Ωi) if we are not absolutely certainabout the parameters.
90
with each of the µiBL defined as in (B.2) but with the replacement of the respective
parameters (e.g. π, Ω, q etc.).
91
Appendix C
Efficient frontier
In a mean-variance approach we search for the minimum variance given an expected
portfolio return µ∗ (or viceversa maximum return given a variance σ∗). This approach
is applicable whenever the expected value of the utility function is function only of
the first two moments of the objective function1(see [44], Chapter 6). This happens
only in two cases i) the market is elliptical ii) the utility function is quadratic. The
markets we treat are clearly not elliptical after the splicing of the stress distribution.
Neither we chose a quadratic utility function for the reasons in Section 4.4. However,
if we relax this last assumption we could go for the mean-variance approach. Relaxing
the assumption is not a very strong requirement as long as a Taylor expansion to the
second order of the utility function is a good approximation.
In order to carry out the constrained optimization we have to write in first place
the Lagrangian2:
L(w, λ1, λ2) = σ∗ − λ2(wµ− µ∗)− λ1(w~1− 1) = ... (C.1)
... = w′Cw − λ2(wµ− µ∗)− λ1(w~1− 1)
where the quantity w and µ are the allocations and µj = 1 + rj for the asset class
j. C is the covariance matrix of returns. The two constraints are on the vector µ∗
and that the weights must sum up to 1. In this problem we are searching for the
minimum variance σ∗, given the constraints. Let’s take the derivative with respect
to w:∂L
∂w= 2Cw − λ1
~1− λ2µ = 0 (C.2)
1The final wealth in our case.2The derivation of the Efficient Frontier equation which we adapt here to our problem can be
found in many books and papers e.g. see [44].
92
rearranging:
w = C−1λ1
2~1 + C−1λ2
2µ (C.3)
Let’s multiply on the left by µ:
µw = µ∗ =λ1
2µ′C−1~1 +
λ2
2µ′C−1µ
and then by ~1:
~1′w =λ1
2~1′C−1~1 +
λ2
2~1′C−1µ
we can solve for the two Lagrange multipliers:
λ1 =2(K3 − µ∗K2)
K4
λ2 =2(µ∗K1 −K2)
K4
where:
K1 = ~1′C−1~1; K2 = ~1′C−1µ; K3 = µ′C−1µ; K4 = K1K3 −K22
Finally, let’s multiply C.2 on the left by w′:
0 = 2w′Cw − λ1w′~1− λ2w
′µ = 2σ∗ − λ1w′~1− λ2w
′µ
by rearranging:
σ∗ = (K1µ∗2 − 2K2µ
∗ +K3)/K4
this is the equation of the efficient frontier. By changing variables:
σ∗ = a1µ∗2 + a2µ
∗ + a3 (C.4)
A plot of the efficient frontier as a function of the normalization factor k is depicted
in Fig. C.1 for the example in Chapter 5. We can see a non-monotonic behaviour. As
k grows the convexity of the frontier grows and in the figure becomes maximum for
k = 10% (only for the curves represented in the figure). Then it decreases for higher
values of k. This means that, if for a fixed return, let’s say of 2.5%, the maximum
variance is not be given for the highest values of k. In order to better understand this
behaviour we must study the minumum for each k (the µ∗ with the smallest variance)
and the concavity, given by the second derivative.
By taking the first derivative with respect to µ∗ in order to find the minimum:
∂σ∗
∂µ∗= 2a1µ
∗min + a2 = 0
93
µ∗min = − a2
2a1
with variance at this point obtained by plugging this expression in (C.4):
σ∗min = − a22
4a1
+ a3
and for the concavity:∂2σ∗
∂µ∗2= 2a1 (C.5)
we can study how the minimum and the concavity change as k changes.
In Fig. C.2 the standard deviation of the minimum variance portfolio is repre-
sented as a function k. We can see that it grows as k grows. This is line with the
intuition that a global increase of volatility, such as the one during stressed markets,
shifts the efficient frontier to the right. Such global increase in volatility during stress
markets (i.e. for k > 0) is confirmed by the growing value of the total variance for all
the asset classes as a function of k (see Table 5.2)3.
The concavity is shown in Fig. C.3. We see that efficient frontier with the biggest
second derivative is for the market around k = 0.05. Intuitively the bigger the
curvature of efficient frontier, the lesser the diversification potential4. In fact, in a
frontier with bigger curvature more extra risk must be taken for a given increase
in return. In line with this logic, the market for k = 0.05 offers less diversification
benefits. We can gain additional intuition by examining the simple case of 2 assets
only. By expressing Eq.(C.5) as summation:
∂2σ∗
∂µ∗2=
2∑
ij C−1ij∑
ijml C−1mlC
−1ij (µiµj − µlµj)
(C.6)
this reduces for the two dimensional case to:
∂2σ∗
∂µ∗2= 2
σ21 + σ2
2 − 2ρσ1σ2
(µ1 − µ2)2(C.7)
By recalling the definitions of total moments, we see that the denominator is the
square of a linear function in k. The minimum of such function is for equal drifts
µ1 = µ2 and the value of kmin can be found by taking the first derivative and equation
to 0. We thus obtain:
kmin = − E[µ1]− E[µ2]P (E1)L1−P (E2)L2
1−p0 − (E[µ1]− E[µ2])
3With the exception of Equity.4The curvature as a measure of diversification potential, for example, is also used in [6] to study
the Asian markets before, during and after the crisis of 1997.
94
which for, let’s say a protfolio composed of Bond and Equity only, is at kmin = 0.07.
The numerator of (C.7) is also quadratic in k but always ≥ 0. For the case of Bond
and Equity, the numerator is monotonic in k and decreasing in the interval k ∈ (0, 1).
Thus the expression (C.7) is diverging around k = 0.07 and gives rise to a degenerate
solution of infinite concavity.
Suppose that the parameter k that appears in σ(k) and µ(k) follows a distribution
g(k). We assume that g(k) is the maximum entropy distribution chosen according to
the criteria of Section 4.5. We could then integrate the minimum variance σ∗(k, µ∗)
with respect to k weighted by the probability of that k: g(k). We will define then
the Maximum Entropy efficient frontier as the frontier given by the value of σ result
of the integration over k, which we will denote σ∗. The Maximum Entropy Efficient
Frontier is depicted in Fig. C.4 for a truncated Gaussian with Ω = 0.01.
95
Figure C.1: Stressed efficient frontier: k = 0 - blue, k = 0.1 - red, k = 0.15 - green,k = 0.2 - yellow
Figure C.2: Standard deviation of the minimum variance portfolio as a function of k
96
Figure C.3: Concavity of the efficient frontier as a function of k
Figure C.4: Maximum entropy efficient frontier, frontier at k = 0 and frontier ob-tained with the covariance matrix and drifts before truncation of the outliers.
97
Appendix D
Influence plots
This Appendix follows [12] and describes a procedure that allows to calculate the
influence on correlations of a single outlier. A suitable tool to do this is by means of
the sample influence function which is defined as follows:
I−(yi; θ) = (n− 1)(θ − θ−i) (i = 1, ..., n) (D.1)
where y are the observations, n is their number, θ is an estimator and θ−i is the
same estimator but with the i − th observation removed. In our case we will focus
on the estimator of the correlation between two variables. Given two variables with
correlation ρ, the influence function for their bivariate distribution (it could be of any
type) with finite second moment is shown to be:
I(y1; y2; ρ) = −1
2ρ(y2
1 + y22) + y1y2 (D.2)
where y1 and y1 are standardized forms of y1 and y2. Now if we denote by z1 and
z2 the standardized sum and difference between y1 and y1 and if we define further
u1 = (z1 + z2)/√
2 and u1 = (z1 − z2)/√
2, then (D.2) can be written as:
I(y1; y2; ρ) = (1− ρ2)u1u2 (D.3)
by using the sample analogue of (D.3) together with (D.1) we get to the following
approximation:
I(yi1; yi2; r) = (n− 1)(r − r−i) ≈ (1− r2)ui1ui2 (D.4)
where r is the sample correlation coefficient and ui1, ui2 are the sample analogues of
u1, u2 namely:
ui1 =1
2
√n(di1 + di2)/
√1 + r + (di1 − di2)/
√1− r
98
ui2 =1
2
√n(di1 + di2)/
√1 + r − (di1 − di2)/
√1− r
with dij = (yij −mj)/√ajj, mj = (
∑yij)/n and ajj =
∑(yij −mj)
2. The suggestion
is then to use the approximation in (D.4) as a function of two variables and super-
impose selected contours of the function i.e. hyperbolae directly on the scatter plot.
As an alternative, one can also use (D.2) together with the l.h.s of (D.4) without
transforming into the variables u1, u2. This is exactly what done in Fig. 2.13 and
2.14.
99
Appendix E
Truncated distributions
In this Appendix we report without proof the first two moments of the truncated
exponential and truncated Gaussian distributions. The proofs are articulated in [48],
Chapter ”Truncated Distributions”.
The truncated exponential distribution has the following density:
f(y) =e−y/µ/µ
1− e−bµ
(E.1)
where y ∈ (0, b]. If we represent b as multiple of µ (b = lµ) then the moments are:
E[Y ] = µ[1− (l + 1)e−l
1− e−l]
and:
E[Y 2] = 2µ2[1− (l2 + 2l + 2)e−l/2
1− e−l]
if l is big enough:
E[Y ] ≈ µ
and
E[Y 2] ≈ 2µ2
The variance is then:
V ar[Y ] ≈ 2µ2 − µ2 = µ2
and the condition for the truncated Gaussian becomes:
m2(E[Y ]) = E[Y ]2 + Ω2 < m2te(E[Y ]) = 2E[Y ]2
or
Ω < E[Y ]
and viceversa for the U-shape distribution.
100
The truncated Gaussian density has the following form:
f(y) =
1√2πσ
exp(− (y−mo)22σ2 )
Φ( b−moσ
)− Φ(a−moσ
)I[a,b](y) (E.2)
where Φ is the standard normal cdf and φ is the probability density function, mo is
the mode and I is the indicator function:
I[a,b](y) = 1 if a ≤ y ≤ b
The first moment is:
E[Y ] = mo+φ(a−mo
σ)− φ( b−mo
σ)
Φ( b−moσ
)− Φ(a−moσ
)σ
and the variance:
V ar[Y ] = σ2
1 +a−moσφ(a−mo
σ)− b−mo
σφ( b−mo
σ)
Φ( b−moσ
)− Φ(a−moσ
)−
(φ(a−mo
σ)− φ( b−mo
σ)
Φ( b−moσ
)− Φ(a−moσ
)
)2
101
Appendix F
The Johnson distribution
The Johnson distribution was first introduced in [28]. Johnson noticed that a bounded
random variable x could be turned into normal by the transformation:
z = γ + δlogk − ξ
ξ + λ− k= h(k) (ξ < k < ξ + λ) (F.1)
where z is a standardized normal variable (i.e. N(0, 1)) and ξ, λ, γ and δ are param-
eters with the following domains:
ξ < k < ξ + λ, −∞ < γ <∞, λ > 0, ξ ≥ 0 δ > 0.
The Johnson SB distribution, which is the distribution of k, can be deduced from1:
f(k) = N(h(k))|dh(k)/dk| (F.2)
or more explicitly:
f(k) =δ√2π
λ
(k − ξ)(ξ + λ− k)exp−1
2[γ + δlog(
k − σξ + λ− k
)]2 (F.3)
The parameters have the following interpretation: ξ is a location factor, λ is a
scale factor, γ and δ determine the shape. In our case we want the variable k to vary
within [0, b] and so we can put ξ = 0 and λ = b. By substituting then δ = 1/σ and
γ = −µ/σ we recover the expression (4.8):
g(k) =b√
2πσk(b− k)exp[−1
2(log( k
b−k )− µσ
)2] 0 ≤ k ≤ b (F.4)
If we expand (F.4) we get:
g(k) =be−
µ2
2σ2
√2πσ
k−1
2σ2 log(k)−(− µ
σ2 +1)(b− k)−1
2σ2 log(b−k)+(− µ
σ2−1)e−log(k)log(b−k)/σ2
1See also [51].
102
Figure F.1: Johnson distribution
so that:
limk→0k−ng(k) = limk→b(b− k)−ng(k) = 0
for any value of n. The distribution curve of k therefore has ’high contact’ at either
end of its range of variation. By taking the first derivative of g(k) with respect to k
and equating to 0:
2k − b = (logk
b− k− µ)
b
σ2
and calling m the value of k which is the mode we obtain the following useful expres-
sion for σ with b = 1 which is used in Section 4.5:
σ2 =−µ+ log( m
1−m)
2m− 1
The shape of the Johnson distribution for different values of its variance and m =
0.1619 is shown in Fig.F.1
103
Appendix G
Truncation algorithm applied to asubset
In this Appendix we apply the truncation algorithm of Chapter 2 to the last 500
datapoints of the dataset described in the Introduction. In the process of truncation
we can observe the same qualitative behaviour of the monitored quantities as the one
observed for the entire dataset. In particular, after a series of significant variations
of the observed quantities, a stabilization is visible.
Figure G.1: Volume of the ellipsoid as a function of the number of points removed
104
Figure G.2: Variation for Fig G.1
Figure G.3: Minimum covariance determinant as a function of the number of pointsremoved
105
Figure G.4: Variation for Fig G.3
Figure G.5: Correlation matrix determinant as a function of the number of pointsremoved
106
Figure G.6: Variation for Fig. G.5
107
Figure G.7: Correlations as a function of the number of points removed. The symbol ρsignifies correlation and the subscripts C, B, M and E denote Credit, Bond, Mortgageand Equity, respectively, as described in the Introduction.
108
Figure G.8: Variation for Fig. G.7
109
Figure G.9: Variation in the eigenvalues of the correlation matrix as a function of thenumber of points removed. The eigenvalues are ordered from the smallest λ1 to thelargest λ4.
110
References
[1] F. Black and R. Litterman. Global portfolio optimization. Financial Analysts
Journal, September:28–43, 1992.
[2] O.J. Blanchard and W.M. Watson. Bubbles, rational expectations and specula-
tive markets. In Wachtel P., editor, Crisis in Economic and Financial Structure:
Bubbles, Bursts and Shocks. Lexington Books, 1982.
[3] J-P. Bouchaud and M. Potters. Theory of Financial Risk From Statistical
Physics to Risk Management. Cambridge University Press, 2000.
[4] F. Busetti and A. Harvey. When is a Copula Constant? A Test for Changing
Relationships. Journal of Financial Econometrics, 2010.
[5] Stubbs R.A Ceria S. Incorporating estimation errors into portfolio selection:
Robust efficient frontiers. Technical report, Axioma Inc. Technical Report, 2004.
[6] Roll R. Chakrabarti R. East asia and europe during the 1997 asian collapse: A
clinical study of a
nancial crisis. Journal of Financial Markets, 5(1):1–30, 2002.
[7] A. Charnes and W.W. Cooper. Programming with linear fractional functionals.
Naval Research Logistics Quarterly, 9(3-4):181–186, 1962.
[8] U. Cherubini and E. Luciano. Copula methods in finance. John Wiley & Sons,
2004.
[9] V.K. Chopra and W.T. Ziemba. The effect of errors in means, variances, and co-
variances on optimal portfolio choice. Journal of Portfolio Management, 19(2):6–
11, 1993.
[10] G. Connor, L. R. Goldberg, and R. A. Korajczyk. Portfolio Risk Analysis.
Princeton University Press, 2010.
111
[11] R. Cont. Empirical properties of asset returns: stylized facts and statistical
issues. Quantitative Finance, 1(5):223–236, 2001.
[12] S.J. Devlin, R. Gnanadesikan, and J.R. Kettenring. Robust estimation and
outlier detection with correlation coefficients. Biometrika, 62(3):531–545, 1975.
[13] A. Dias and P. Embrechts. Dynamic copula models for multivariate high-
frequency data in finance. 2004.
[14] D.C. Dowson and A. Wrag. Maximum-entropy distributions having prescribed
first and second moments. IEEE TRANSACTlONS ON INFORMATION THE-
ORY, 19(5):688–693, 1973.
[15] B. Efron and R. Tibshirani. Bootstrap methods for standard errors, confidence
intervals and other measures of statistical accuracy. Statistical Science, 1(1):54–
75, 1986.
[16] M. A. El-Erian. Looking ahead. Journal of Portfolio Management, 36(2):4, 2010.
[17] J. A. Feigenbaum. More on a statistical analysis of log-periodic precursors to
financial crashes. Quantitative Finance, 1(5):527–532, 2001.
[18] T.S. Ferguson. On the rejection of outliers. In Proceedings of the 4th Berkeley
Symposium on Mathematical Statistics and Probability, pages 253–287, 1961.
[19] R. Frydman and M. D. Goldberg. Imperfect Knowledge Economics: Exchange
Rates and Risk. Princeton University Press, 2007.
[20] R. Frydman and M. D. Goldberg. Financial markets and the State: Long swings,
risk and the scope of regulation. Capitalism and Society, 4(2):Article 2, 2009.
[21] C. Genest, B. Remillard, and D. Beaudoin. Goodness-of-fit tests for copulas: a
review and power study. Insurance: Mathematics and Economics, 44(2):199–213,
2009.
[22] A. Gilio. Goodness-of-fit tests for copulas: a review and power study. Lecture
Notes in Computer Science, 945:200–209, 1995.
[23] J. Goodman. A comment to the maximum entropy principle. Risk Analysis,
7(2):269–272, 1987.
112
[24] F.E. Grubbs. Procedures for detecting outlying observations in samples. Tech-
nometrics, 11(1):1–21, 1969.
[25] H.A. Hauksson, M. Dacorogna, T. Domenig, U. Muller, and G. Samorodnitsky.
Multivariate extremes, aggregation and risk estimation. Quantitative Finance,
1(1):79–95, 2001.
[26] U. Herold. Computing implied returns in a meaningful way. Journal of asset
management, 6(1):53–64, 2005.
[27] E.T. Jaynes. Information theory and statistical mechanics. The Physical Review,
106(4):620–630, 2006.
[28] N.L. Johnson. Systems of frequency curves generated by methods of translation.
Biometrika, 36(1/2):149–176, 1949.
[29] E. Jondeau and M. Rockinger. Optimal portfolio allocation under higher mo-
ments. European Financial Management, 12(1):29–55, 2006.
[30] C.P Kindleberger and R. Aliber. Manias, Panics, and Crashes - A History of
Financial Crisis. Wiley Investment Classics, 2005.
[31] E. Kole, K. Koedijk, and Verbeek M. Selecting copulas for risk management.
Journal of Banking & Finance, 31(8):2405–2423, 2007.
[32] M. Kritzman and Y. Li. Skulls, financial turbulence and risk management. Fi-
nancial Analysts Journal, 66(5):30–41, 2010.
[33] J. Kwiatkowski and R. Rebonato. A coherent aggregation framework for stress
testing and scenario analysis. accepted for publication in Applied Mathematical
Finance, 2010.
[34] H. W. Lilliefors. On the Kolmogorov-Smirnov test for normality with mean and
variance unknown. Journal of the American Statistical Association, 62(318):399–
402, 1967.
[35] F. Lillo and R.N. Mantegna. Symmetry alteration of ensemble return distribu-
tion in crash and rally days of financial markets. European Physical Journal B,
15(4):603–606, 2000.
[36] F. Longin and B. Solnik. Is the correlation in international equity retuns constant:
1960-1990? Journal of International Money and Finance, 14(1):3–26, 1995.
113
[37] Y. Malevergne and D. Sornette. Testing the gaussian copula hypothesis for
financial assets dependences. Finance 0111003, EconWPA, November 2001.
[38] B. Mandelbrot. The variation of certain speculative prices. The Journal of
Business, 36(4):394–420, 1963.
[39] H. Markowitz. Portfolio selection: efficient diversification of investments. Black-
well, 1959,1991.
[40] H. Markowitz. Mean-Variance Analysis in Portfolio Choice and Capital Markets.
Blackwell, 1987.
[41] R. Mashal and Zeevi A. Beyond correlation: Extreme co-movements between
financial assets, 2002.
[42] R. Merton. Optimum Consumption and Portfolio Rules in a Continuous Time
Model. Journal of Economic Theory, 3(4):373–413, 1971.
[43] A. Meucci. Beyond Black-Litterman: views on non-normal markets. RISK, pages
87–92, February 2006.
[44] A. Meucci. Risk and Asset Allocation. Springer, 2007.
[45] H. Moskowitz and R.K. Sarin. Improving the consistency of conditional prob-
ability assessment for forecasting and decision making. Management Science,
29(6):735–749, 1983.
[46] D. Nowakowski and N. Gurushina. The history of riks: Correlations, volatilities
and z-scores. Technical report, Roubini Global Economics, 2010.
[47] Bank of International Settlements. Principles for sound stress testing. 2009.
[48] D.J. Olive. Applied Robust Statistics. 2008.
[49] J. Olmo. Extreme value theory filtering techniques for outlier detection. Tech-
nical report, City University Economics Discussion Papers No 09/09, 2009.
[50] Davidson P. Risk and uncertainty. In Robert Skidelsky and Christian Westerlind
Wingstrom, editors, The Economic Crisis and the State of Economics. Palgrave
Macmillan, 2010.
[51] B. R. Parresol. Recovering parameters of Johnsons SB distribution. Technical
report, United States Department of Agriculture Research Paper SRS31, 2003.
114
[52] J. Pearl. Causality: models, reasoning, and inference. Cambridge University
Press, 2009.
[53] W. Poston, E.J. Wegman, C.E. Priebe, and J.L. Solka. A deterministic method
for robust estimation of multivariate location and shape. Journal of Computa-
tional and Graphical Statistics, 6(3):300–313, 1997.
[54] R. Rebonato. A Bayesian approach to coherent stress testing. In Rethinking Risk
Measurement. Incisive Media, 2010.
[55] R. Rebonato. A Bayesian approach to stress testing and scenario analysis. Jour-
nal of Investment Management, 8(3):1–13, 2010.
[56] R. Rebonato. Coherent Stress Testing – A Bayesian Approach to Financial
Stress. John Wiley, 2010.
[57] R. Rebonato and J. Chen. Evidence for state transition and altered serial co-
dependence in us interest rates. Quantitative Finance, 9(3):259–278, 2009.
[58] R. Rebonato and V. Gaspari. Analysis of drawdowns and drawups in the US
interest-rate market. Quantitative Finance, 6(4):297–326, 2006.
[59] B. Remillard, N.A. Papageorgiou, and F. Soustra. Dynamic Copulas. SSRN
eLibrary, 2010.
[60] J-C. Rochet. The future of banking regulation. In J-C Rochet M Dewatripont
and J Tirole, editors, Balancing the Banks - Global Lessons from the Financial
Crisis. Princeton University Press, 2010.
[61] P.J. Rousseeuw and A.M. Leroy. Robust Regression and Outlier Detection. John
Wiley, 1987.
[62] P.J. Rousseeuw and S. Van Aelst. Minimum volume ellipsoid. Wiley Interdisci-
plinary Reviews: Computational Statistics, 1(1):71–82, 2010.
[63] P.J. Rousseeuw and K. VanDriessen. A fast algorithm for the minumum covari-
ance determinant estimator. Journal of the American Statistical Association,
41(3):212–223, 1999.
[64] P. Samuelson. Lifetime portfolio selection by dynamic stochastic programming.
The review of economics and statistics, 51(3):239–246, 1969.
115
[65] S. Satchell and A. Scowcroft. A demystification of the Black - Litterman model:
Managing quantitative and traditional portfolio construction. Journal of Asset
Management, 1(2):138–150, 2000.
[66] C. Schinkus. Is econophysics a new discipline? the neopositivist argument.
Physica A, 389(18):3814–3821, 2010.
[67] C. Schluter and M. Trede. Identifying multiple outliers in heavy-tailed distri-
butions with an application to market crashes. Journal of Empirical Finance,
15(4):700–713, 2002.
[68] W.F. Sharpe. Expected utility asset allocation. Financial Analyst Journal,
63(5):18–30, 2007.
[69] R.J. Shiller. Irrational Exuberance. Princeton University Press, 2005.
[70] D. Sornette. Significance of log-periodic precursors to financial crashes. Quanti-
tative Finance, 1(4):452–471, 2001.
[71] D. Sornette. Critical Phenomena in Physical Sciences. Springer Verlag, 2004.
[72] D. Sornette. Dragon-kings, black swans and the prediction of crises. International
Journal of Terraspace Science and Engineering, 2(1):1–18, 2009.
[73] C. Stein and Nassaj F. Inadmissibility of the usual estimator for the mean of a
multivariate normal distribution. In Proceedings of the 3rd Berkeley Symposium
on Probability and Statistics, 1955.
[74] J.C. Strelen and Nassaj F. Analysis and generation of random vectors with
copulas. In WSC ’07: Proceedings of the 39th conference on Winter simulation,
2007.
[75] D.M. Titterington. Optimal design: Some geometrical aspects of d-optimality.
Biometrika, 62:311–320, 1975.
[76] R. Tzani and A. P. Polychronakos. Correlation breakdown, copula credit models
and arbitrage. GARP Risk Review, pages 27–37, December 2008.
[77] J. Walters. The Black-Litterman model in detail. 2009.
[78] J. Williamson. Bayesian nets and causality. Oxford University Press, 2005.
116