risk measures: the estimation error problem · pdf filerisk measures: the estimation error...
TRANSCRIPT
RISK MEASURES: THE ESTIMATION
ERROR PROBLEM
Imre Kondor
Parmenides Foundation, Pullach/Munich
Workshop on Systemic Risk and Regulatory Market Risk Measures
Parmenides Foundation, Pullach/Munich
June 2-3, 2014
This work has been supported by the EU Collaborative project FOC – Forecasting Financial Crises, grant No. 255987, and the INET project Correlations in Complex Heterogeneous Networks, grant ID # INO1200019
Coworkers
• András Szepessy, Raiffeisen Bank, Hungary
• Tünde Újvárosi, Raiffeisen Bank, Hungary
• Szilárd Pafka (ELTE PhD student → CIB (Intesa San Paolo) →Paycom.net, California)
• Gábor Nagy, CIB Bank, Hungary
• István Varga-Haszonits (ELTE PhD student →Morgan-Stanley →MSCI )
• Gábor Papp (ELTE)
• Stefano Ciliberti (Capital Fund Management)
• Marc Mézard (Ecole Normale Superieure)
• Susanne Still (University of Hawaii)
• Matteo Marsili (Abdus Salam ICTP, Trieste)
• Fabio Caccioli (SISSA → SFI → Oxford → Cambridge University Business School)
PRELIMINARIES
Market risk
Stemming from fluctuations of stock and bond
prices, derivatives, exchange rates, interest rates, etc.
Typically short time exposure (< 10 days)
High frequency fluctuations
There is no perfect risk measure
Risk measures are trying to characterize the situation
of an often huge, complex and heterogeneous
organization that is facing a lot of Knightian
uncertainty and that is keenly interested in
obfuscating its real predicament, even for itself.
In the simplest of cases, this task would be
tantamount to condensing the information in a high
dimensional distribution into a single number. This
entails a huge information reduction, and as such
will always be exposed to criticism.
Preliminary considerations
• Regulatory risk measures:
- diagnostic tools
- decision making tools
- constraints on investment decisions.
• Portfolio selection and risk measurement
• Out of sample prediction
• A problem in high dimensional statistics, the role of
N/T ( N is the dimension, T the sample size)
Expected properties of risk measures
• Mathematical requirements (e.g. convexity)
• Robustness vs. sensitivity
• Ease of implementation vs. difficulty of gaming
• Stability against estimation error vs. biased
estimates
• Ease of conceptualization by players
• Ease of communication
• Accord with intuition, gut feeling
A footnote on convexity
• A non-convex risk measure
- penalizes diversification
- does not allow correct risk aggregation
- cannot provide basis for rational pricing of risk
- cannot support a consistent limit system
- rewards division of portfolio, firm, etc.
• The 1996 Basel Amendment endorsed (the not
necessarily convex) VaR as the regulatory market
risk measure. Btw, the Standard Model defined in
the same Amendment introduced implicit, position
dependent risk measures, some of which were also
concave.
ESTIMATION ERROR OF
VARIOUS RISK MEASURES
Variance
• The classic risk measure: the average
quadratic deviation
• Assumes a tight underlying distribution, such
as the Gaussian, but financial time series are
often fat tailed („25 sigma events” may
occur)
• Minimizing the variance of a multivariate fat
tailed pdf may actually increase the risk at the
tails
Value at Risk (VaR)
• A high quantile: The threshold below which a given
percentage (say 1%) of the weight of the profit-loss
distribution resides = The minimal loss we incur on
the worst day out of a hundred = The best outcome of
the worst 1% of cases.
• Introduced in the wake of the US savings and loan
crisis by J.P. Morgan in the early 90’s.
• Spread over the industry and in regulation.
• It came under criticism from academics for its lack of
subadditivity already back in the 90’s.
Alternative risk measures
• Mean Absolute Deviation (MAD) – much like a
piece-wise linear version of the standard
deviation:
• Expected Shortfall (ES): the conditional average
over a high quantile
• Maximal Loss (ML): the extreme case of ES, the
optimal combination of the worst outcomes.
• ES and ML are coherent. VaR, ES, and ML are
downside risk measures.
t i
iitabs wxT
1
Portfolios
• Linear combination of returns with weights .
• Markowitz: Variance to be
minimized under budget constraint
and expected return .
• For simplicity, consider minimal risk portfolio with no limit on short selling.
• Additional constraints will be considered later.
ir
iw
1iiw
P i iiw
2
P i ij j
i j
w w
P i iir w r
Empirical covariance matrices
• Covariance matrix estimated from measurements in the market:
σ𝑖𝑗(1)
= 1
𝑇−1 𝑟𝑖𝑡𝑇𝑡=1 𝑟𝑗𝑡
• For a portfolio of N assets the covariance matrix has O(N²) elements. The time series of length T for N assets contain NT data. In order for the measurement to be precise, we need N <<T. Bank portfolios may contain hundreds of assets, and it is hardly meaningful to use time series older than 4 years, while the sampling frequency cannot be high for portfolio optimization. Therefore, N/T << 1 rarely holds in practice. As a result, there will be a lot of noise in the estimate, and the error will grow fast with N/T.
A measure of the estimation error
Assume we know the true covariance matrix σ(0)and
the noisy one σ(1) . Then a natural measure of the
estimation error is
where 𝑤(0)∗ and 𝑤(1)∗ are the optimal weights
corresponding to σ(0) and σ(1), respectively.
ij
jiji
ij
jiji
ww
ww
q)*0()0()*0(
)*1()0()*1(
2
0
Estimation error 𝑞0 for the standard deviation as
risk measure, and for iid normal distributed returns
• For iid normal variables one can easily prove that for
large N and T and N/T <1 the sample average of 𝑞0 is:
𝑞0 = 1
1−𝑁/𝑇
• The estimation error of the variance blows up at a
critical value of the ratio (𝑁/𝑇)𝑐 = 1.
• For iid normal variables the full distribution of 𝑞0 is
known; 𝑞0 „self-averages”.
The next slides show (I. Kondor, S. Pafka, and G. Nagy: Noise sensitivity of
portfolio selection under various risk measures. Journal of
Banking and Finance, 31:1545–1573, 2007.)
• simulation results of wi (portfolio weights) for
standard deviation, MAD and 95% ES, all for
iid normal input variables,
• display of q0 as function of T/N.
• The results show that the estimation noise is
very significant and that MAD (ES) require
more (much more) data, i.e. smaller N/T, than
the variance.
The estimation error (q0) as function of
T/N (for large N and T):
The phase transition picture
• The divergence of the estimation error can be
viewed as an algorithmic phase transition
• It displays universality: the exponent of the
divergence of 𝑞0 is independent of the objective
function, the character of the underlying time
series, etc.
• The critical ratio of N/T depends on the risk
measure
• Transverse fluctuations of the weight vector are
much more violent than the longitudinal ones
The feasibility problem
• In addition to the large fluctuations, for finite N and T, the portfolio optimization problem for ES and ML does not always have a solution for any value of the N/T ratio! (These risk measures can become unbounded.)
• For finite N and T, the existence of the optimum is a probabilistic issue, it depends on the sample.
• As N and T → ∞ with N/T = fixed, this probability goes to 1 resp. 0, according to whether N/T is below, or above a critical (𝑁/𝑇)𝑐 - the transition becomes sharp.
Illustration: the case of Maximal Loss
Definition of the problem (for simplicity, we are looking for the global minimum and allow unlimited short selling):
where the w’s are the portfolio weights and the x’s the returns.
N
i
itiTt
xw1
1maxmin
w
11
N
i
iw
Probability of the minimax problem having a
solution (for general N and T, symmetric
underlying distribution):
1
11
11
2
T
NkT k
Tp
In the limit N,T → ∞, with
N/T fixed, the transition
becomes sharp at N/T = ½.
The estimation error diverges
as we go to N/T= ½ from
below.
Generalization: Expected Shortfall
• ML is the limiting case of ES, when the threshold goes to 1.
• ES shows the same instability as ML, but the locus of this instability depends not only on N/T, but also on the threshold β above which the conditional average is calculated. So there will be a critical line.
This critical line or phase boundary for ES has been obtained
numerically by I. K., Sz. Pafka, G. Nagy: Noise sensitivity of portfolio
selection under various risk measures, Journal of Banking and
Finance, 31, 1545-1573 (2007) and calculated analytically in A.
Ciliberti, I. K., and M. Mézard: On the Feasibility of Portfolio
Optimization under Expected Shortfall, Quantitative Finance, 7, 389-
396 (2007)
The estimation
error diverges as
one approaches
the phase
boundary
from below
Generalization to all coherent measures I. Kondor and I. Varga-Haszonits: Instability of portfolio optimization
under coherent risk measures, Advances in Complex Systems, 13, 425-
437 (2010) • The intuitive explanation for the instability of ES and ML
is that for a given finite sample there may exist a dominant
item (or a dominant combination of items) that produces a
larger return at each time point than any of the others, even
if no such dominance relationship exist between them on
very large samples. This leads the investor to believe that if
she goes extremely long in the dominant item and
extremely short in the rest, she can achieve an arbitrarily
large return on the portfolio, at a risk that goes to minus
infinity (i.e. no risk).
• These considerations extend to all coherent measures
→ István’s talk
Downside risk measures
• As a matter of fact, this type of instability appears even beyond the set of coherent risk measures, and may appear in downside risk measures in general.
• By far the most widely used risk measure up till now has been Value at Risk (VaR). It is a downside measure. It is not convex, therefore the stability problem of its historical estimator is ill-posed.
• Parametric VaR, however, is convex, and this allows us to study the stability problem. Along with VaR, we also look into the closely related parametric estimate for ES.
• Parametric estimates are expected to be more stable than historical ones. We will then be able to compare the phase diagrams for the historical and parametric ES.
Phase diagram for parametric VaR and ES I. Varga-Haszonits and I. Kondor: The instability of downside risk
measures, J. Stat. Mech. P12007 doi: 10.1088/1742-
5468/2008/12/P12007 (2008)
• In the region above the respective phase boundaries
the optimization problem does not have a solution.
• In the region below the phase boundary there is a
solution, but for it to be a good approximation to the
true risk we must go deep into the feasible region. If
we go to the phase boundary from below, the
estimation error diverges.
• The phase boundary for ES runs above that of VaR,
so for a given confidence level α the critical ratio for
ES is larger than for VaR (we need less data in order
to have a solution). For practically important values of
α (95-99%) the difference is not significant.
Parametric vs. historical estimates
• The parametric ES curve runs above the
historical one: we need less data to have a
solution when the risk is estimated
parametrically than when we use raw historical
data. It seems as if we had some additional
information in the parametric approach.
• Where does this information come from?
• It is injected into the calculation „by hand”
when fitting the data to an independently
chosen probability distribution.
Adding linear constraints
In practice, portfolio optimization is always subject to some constraints on the allowed range of the weights, such as a ban on short selling and/or limits on various assets, industrial sectors, regions, etc. These constraints restrict the region over which the optimum is sought to a finite volume where no infinite fluctuations can appear. One might then think that under such constraints the instability discussed above disappears completely.
• This is not so. If we work in the vicinity of the phase
boundary, sample to sample fluctuations in the
weights will still be large, but the constraints will
prevent the solution from running away to infinity.
Instead, it will stick to the „walls” of the allowed
region. These constraints act as a regularizer.
• For example, for a ban on short selling (wi > 0) these
walls will be the coordinate planes, and as N/T
increases, more and more of the weights will become
zero. This phenomenon is well known in portfolio
optimization. (B. Scherer, R. D. Martin,
Introduction to Modern Portfolio Optimization with
NUOPT and S-PLUS, Springer, New York (2005))
REGULARIZATION
Regularization
• High dimensional optimization problems need to be
regularized, large deviations have to be constrained or
penalized by adding a suitably chosen regularizer to the
objective function.
• Various regularization-related methods have appeared
scattered in the literature.
• Systematic application of regularization advocated by S.
Still and I. Kondor: Regularizing portfolio optimization,
New Journal of Physics, 12, 075034 (2010) , with an
illustration on the example of ES + 𝐿2 (related to
support vector regression)
• The general case of 𝐿𝑝 is considered in F. Caccioli, I.
Kondor, M. Marsili and S. Still: 𝐿𝑝 regularized
portfolio optimization, submitted to Quantitative
Finance, arXiv: 1404.4040
• Ban on short selling is a kind of hard implementation
of 𝐿1. This regularizer tends to yield sparse solutions
(it is often used in place of 𝐿0), it sets some of the
weights to zero, thereby decreasing diversification,
but also transaction costs. If there are several nearly
equivalent items in the portfolio 𝐿1 may select some
of them randomly. With N/T increasing, a growing
number of weights are set to zero.
• An alternative, soft realization of 𝐿1 regularization is to
add the sum of the absolute values of the weights (with
a Lagrange multiplyer) to the risk measure.
• In the case of risk measures that are linear in the
weights, such as ES or coherent measures in general,
this procedure only shifts the instability, but does not
completely eliminate it.
• 𝐿2 enhances diversification. Coupled with the variance
it is called „shrinkage”.
• The combination of 𝐿1 and 𝐿2 („elastic net”) may be
useful in portfolio selection.
Regularization and market impact
• Liquidity is bounded, liquidation of a portfolio
moves prices against the investor. Taking into
account this effect constraints positions – just like
regularization.
• F. Caccioli, S. Still, M. Marsili and I. Kondor:
Optimal Liquidation Strategies Regularize
Portfolio Optimization, European Journal of
Finance, 1,1-18 (2011), and F. Caccioli, I. Kondor,
M. Marsili and S. Still: 𝐿𝑝 regularized portfolio
optimization.
• The correspondence:
Closing remarks on portfolio selection
• Given the nature of the portfolio optimization task,
one typically works in that region of parameter space
where sample fluctuations are large. Since the critical
point where these fluctuations diverge depends on the
risk measure, the confidence level, and on the method
of estimation, one must be aware of how close one’s
working point is to the critical boundary, otherwise
one will be grossly misled by the unstable algorithm.
• Regularization tames the divergent estimation error, at
the price of introducing bias, prior information or
deliberate choice.
• Downside risk measures have been introduced,
because they ignore positive fluctuations that
investors are not supposed to be afraid of.
Perhaps they should be: the downside risk measures
display the instability described here which is
basically due to a false arbitrage alert and may
induce an investor to take very large positions on the
basis of misleading information stemming from
finite samples.
• In a way, the global disaster that engulfed us in
2007-09 was a macroscopic example of such a folly.
• Several technical questions about risk measures
remain open. We hope to find answers in the future.
THANK YOU!