copula theory: an application to risk modeling · 2011. 5. 25. · finally, we will introduce a new...

26
Travaux d’Études et de Recherches Research Project Report May 2011 Copula theory: an application to risk modeling Eric Pradier Grenoble INP - Ensimag Ecole Nationale Supérieure d’Informatique et de Mathématiques Appliquées de Grenoble

Upload: others

Post on 26-Jan-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

  • Travaux d’Études et de RecherchesResearch Project Report

    May 2011

    Copula theory:

    an application to risk modeling

    Eric Pradier

    Grenoble INP - EnsimagEcole Nationale Supérieure d’Informatique et de Mathématiques Appliquées de Grenoble

  • Abstract

    This paper compiles the research and experiments I carried out during my re-search project, as part of my penultimate year of engineering studies at GrenobleINP Ensimag.

    In this paper, we will present elements of the copula theory, including dependence co-efficients in order to study copula properties on several examples. Then we will focuson risk management applications of copulas and particularly, the Loss DistributionApproach (LDA model). The model will be introduced as originally formulated inthe context of operational risk and Basel II regulation. We will discuss the needfor dependence modeling and comment the choice among several copulas, based ontheoretical intuitions. Finally, we will take the LDA model into practice in the fieldof environmental risk. The experiment includes a full review of the advantages anddrawbacks of the model.

    My research work was tutored by Stéphane Girard, an experienced research sci-entist at INRIA Rhône-Alpes. His expertise and research interests include statisticsof the extremes, multivariate data analysis, functional estimation and applicationsof statistics. I was hosted by the Mistis team which is a joint venture betweenINRIA and Jean Kuntzman Laboratory in Grenoble. The activity of the team isfocused on developing statistical methods in order to study and model complex ran-dom systems. The principle fields of application are image processing, spatial data,biomedical engineering and industry.

    I would like to thank warmly Stéphane Girard for his time, his insightful viewson the subject and his tutoring all along the research project.

    2

  • Contents

    1 Introduction 4

    1.1 Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.2 Work plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

    2 Copulas: Elements of theory 4

    2.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.2 Sklar’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.3 Dependence properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

    3 An application to operational risk modeling: LDA model 11

    3.1 Context: operational risk and financial regulation . . . . . . . . . . . . . . 123.2 Loss Distribution Approach (LDA) . . . . . . . . . . . . . . . . . . . . . . 123.3 Risk dependencies in LDA . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.4 Which copula to use? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

    4 Beyond operational risk: the LDA model and environmental risk 16

    4.1 Our problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164.2 LDA model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174.3 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

    5 Conclusion 24

    3

  • 1 Introduction

    1.1 Context

    This research adresses the problem of dependence modeling for risk management appli-cations. It particularly investigates the use of copulas in this field as developed in [1].After the 2007-2008 financial crisis, Gaussian copula methods for CDO pricing [8] havebeen blamed for underestimating extreme events as analyzed in [6]. In this paper, we willlook into the correlation problem in the LDA model [3] used in operational risk quantifi-cation as part of the Basel II Economic Capital framework [4]. We will discuss the choiceamong copulas in order to fairly capture the dependence between extreme events. Manyalternatives to the Gaussian copula can be considered for dependence modeling [7].Finally, we will introduce a new application for the LDA model: environmental risk [9].We will try to assess the use of LDA to model extreme rainfalls using rainfall data.

    1.2 Work plan

    In the first section of this paper, elements of the copula theory are presented in order toenlighten dependency properties of different copulas.The second section is consecrated to the LDA model in operational risk. This modelis used by financial institutions to compute the Economic Capital for operation risk inthe Basel II framework. The correlation problem is adressed and an alternative to theGaussian copula model is discussed.In the third section, the LDA is adapted for extreme rainfall modeling. Experiments arecarried out using a data set collected in French region of Cévennes [9].

    2 Copulas: Elements of theory

    A copula is a function that joins multivariate distribution functions to the one-dimensionalmarginal distribution functions. Strictly speaking, a copula is a multivariate distributionwhose one-dimensional margins are uniform on the interval [0, 1].

    In numerical simulation, copulas are a practical tool to generate multivariate distribu-tions with given dependence properties regardless of the marginal distributions. Theyare particularly useful when dealing with phenomena that cannot be approached by nor-mal distributions. The idea behind generating multivariate distributions from a copulais based on the simple transformation from a uniform distribution on [0, 1] to any otherdistribution using the cumulative density function.

    In this section, we will provide elements of copula theory and detail some families ofcopulas.

    2.1 Definitions

    2.1.1 n-dimensional Copula

    A n-dimensional copula (or n-copula) is a multivariate joint distribution defined on then-dimensional unit cube [0, 1]n.

  • C : [0, 1]n → [0, 1] is a n-copula if ∀i, ui ∈ [0, 1] :

    1. C(u1, . . . , uj−1, 0, uj+1, . . . , un) = 0 with 0 ≤ j ≤ n;

    2. C(1, . . . , 1, uj, 1, . . . , 1) = uj with 0 ≤ j ≤ n;

    3. C is n-increasing i.e.for each hyperrectangle B = ×ni=1[xi, yi] ⊆ [0, 1]

    n;

    V olc(B) :=∑

    z∈B

    (−1)N(z)C(z) ≥ 0

    where N(z) = card{k|zk = xk}.

    We can interpret those properties in the following way:1. and 2. ensure that marginal distributions are uniform distributions.3. ensures that the copula is a proper cumulative distribution function.

    2.1.2 2-dimensional copula

    In the particular case of 2-dimensional copula, the definition can be simplified as: C :[0, 1]2 → [0, 1] is a 2-copula if ∀(x, y) ∈ [0, 1]2 :

    1. C(0, y) = 0 and C(x, 0) = 0

    2. C(1, y) = y and C(x, 1) = x

    3. C is 2-increasing i.e.

    ∀x1 ≤ x2 and y1 ≤ y2 ∈ [0, 1], C(x2, y2) − C(x1, y2) − C(x2, y1) + C(x1, y1) ≥ 0

    Graphically, the 2-increasing property can be viewed as: the probability of a random draw(x, y) to be in the rectangle (x1, y1), (x1, y2), (x2, y2), (x2, y1) is positive. The figure belowillustrates this property.

    0.0 0.2 0.4 0.6 0.8 1.0

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    Scatterplot

    X

    Y

    0.0 0.2 0.4 0.6 0.8 1.0

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    5

  • 2.2 Sklar’s theorem

    Sklar’s theorem (1959) is a central result to the copula theory. It is the foundation ofmost, if not all, applications.

    In general terms, the theorem states that for all cumulative distribution function H withone-dimensional marginals (Hi)1≤i≤n, there exists a copula C that binds the marginals togive the joint distribution.

    Specifically, for the bivariate case, Sklar’s theorem is :

    For any bivariate distribution function H(x, y) with one-dimensional marginals F andG, there exists a copula such that :

    H(x, y) = C(F (x), G(x))

    Moreover, if the marginal distribution functions F and G are continuous, C is unique.

    2.3 Dependence properties

    A copula links a multivariate cumulative distribution function (cdf) to its one-dimensionalmarginals cdf and therefore, it carries the dependence structure between these marginals.In this section, we will present several quantitative and qualitative measures of the de-pendence structure of a copula.

    2.3.1 Concordance

    Let (xi, yi) and (xj , yj) be two observations of a pair of random variables (X, Y ).

    We say that (xi, yi) and (xj , yj) are concordant if (xi − xj)(yi − yj) > 0. That is tosay that (xi, yi) and (xj, yj) are concordant if xi < xj and yi < yj, or if xi > xj andyi > yj.Conversely, (xi, yi) and (xj , yj) are discordant if (xi − xj)(yi − yj) < 0.

    In an informal manner, we could see that concordance describes a pair of random variablesin which, “large values tend to be associated with large values and small values tend tobe associated with small values”.

    2.3.2 Kendall’s Tau

    Kendall’s τ is a measure of association between two random variables. It is defined interms of concordance.

    Let (xi, yi)1≤i≤n be a sample of n observations from (X, Y ), a pair of continuous ran-dom variables.Let c denotes the number of concordant pairs of observations (xi, yi) and (xj , yj) with(i, j) ∈ {1, . . . , n}2. Let d denotes the number of discordant pairs. Kendall’s τ is definedas

    τ =c − d

    c + d=

    c − d

    (n2 )

    6

  • where (n2 ) is the binomial coefficient equals to the number of pairs.

    The population version of Kendall’s τ for a vector (X, Y ) of continous random variableswith joint distribution function H is defined as the probability of concordance minus theprobability of discordance for a pair of observations (X1, Y1) and (X2, Y2) that is chosenrandomly from the sample:

    τ = P(

    (X1 − X2)(Y1 − Y2) > 0)

    − P(

    (X1 − X2)(Y1 − Y2) < 0)

    We can demonstrate (see [2]) that for any pair of continuous random variables, Kendall’sτ can be written as follows :

    τC = 4

    ∫ ∫

    [0,1]2C(u, v)dC(u, v)− 1 = 4E[C(U, V )] − 1

    where (U, V ) is a couple of uniformly distributed random variables on [0, 1].We can see that τ only depends on the copula. This expression of τ is a consequence ofSklar’s theorem.

    2.3.3 Spearman’s rho

    Spearman’s ρ is non-parametric correlation coefficient. It is also based on concordanceand discordance.Let (X1, Y1), (X2, Y2) and (X3, Y3) be three independent random variables with a commonjoint distribution H . The population version of Spearman’s ρ is defined to be proportionalto the probability of concordance minus the probability of discordance for the two vectors(X1, Y1) and (X2, Y3), that have the same margins but one has distribution function H ,while the components of the other are independent:

    ρ = 3

    (

    P(

    (X1 − X2)(Y1 − Y3) > 0)

    − P(

    (X1 − X2)(Y1 − Y3) < 0)

    )

    .

    Spearman’s ρ can be rewritten in term of copula as :

    ρC = 12

    ∫ ∫

    [0,1]2C(u, v)dudv − 3.

    2.3.4 Positively Quadrant Dependent

    Let X and Y be random variables. X and Y are positively quadrant dependent (PQD) iffor all (x, y) in R2,

    P (X ≤ x, Y ≤ y) ≥ P (X ≤ x)P (Y ≤ y),

    or equivalently,P (X > x, Y > y) ≥ P (X > x)P (Y > y).

    We can see that X and Y are PQD implies that the probability that they are simultane-ously small (respectively, large) is at least as great as it would be if they were independent.In term of copula, this property can be rewritten as :

    C(u, v) ≥ uv

    with (u, v) ∈ [0, 1]2.

    7

  • 2.3.5 Tail Dependence

    Tail dependence coefficients are designed to capture the dependence between the marginalsin the upper-right quadrant and in the lower-left quadrant of [0, 1]2.

    The upper tail dependence parameter λU for a couple (X, Y ) of random variables suchthat H(x, y) = C(F (x), G(y)), is defined as :

    λU = limt→1−

    P (Y > G−1(t)|X > F−1(t)).

    We can show that

    λU = 2 − limt→1−

    1 − C(t, t)

    1 − t

    Similarly, we can define the lower tail dependence parameter λL with t → 0+.

    If λU = 0 we can say that C has no upper tail dependence. This means that if we gofar enough into the upper tail of the joint distribution, extreme events appear to occurindependently.

    2.3.6 Max-stability and Extreme Value Copulas

    A copula is max-stable if for every r in R+ and all u, v in [0, 1],

    C(u, v) = Cr(u1

    r , v1

    r ).

    The limit of the sequence of copulas {C(r)(u, v) = Cr(u

    1

    r , v1

    r )} leads to the notion of ex-treme value copula.

    A copula C∗ is an extreme value copula if there exists a copula C such that

    C∗(u, v) = limn→∞

    Cn(u1

    n , v1

    n ).

    A theorem (see theorem 3.3.5 in [2]) establishes that a copula is max-stable if and only ifit is an extreme value copula.

    2.4 Examples

    In this section, we will present several copulas that will be important later in this paper.

    2.4.1 Gaussian Copula

    The Gaussian Copula (or Normal Copula) is one of the most standard copula in financeand risk management due to its elegance and numerical convenience. It was first intro-duced in financial modeling by David X. Li in [8].

    The Gaussian Copula distribution function is the following :∀(u1, . . . , un) ∈ [0, 1]

    n and ρ ∈ [0, 1]

    Cρ(u1, . . . , un) = φnρ(φ

    −1(u1), . . . , φ−1(un))

    8

  • where φ−1 is the inverse of the univariate normal distribution function and φnρ is the cumu-lative distribution function for the n-dimensional normal distribution with ρ as covariancematrix.

    Dependence properties

    We consider a bivariate Gaussian copula with a single correlation factor σ. The followingdependence measures are easily derived:

    • Kendall’s τ : τ =2

    πarcsin(σ)

    • Tail dependence: λu = 0

    The tail dependence coefficient enlightens that regardless of how high the correlation is,if we go far enough in the tail, extreme events appear to occur independently. This is animportant drawback of the Gaussian copula, especially when considering risk measuressuch as the Value-at-Risk (“VaR”) that typically derive from upper tail quantiles (99.5%or 99.9%), which is to say extreme events.

    Illustrations for the bivariate case with the single correlation σ = 0.5 :

    X1

    X2

    Z

    Gaussian copula density

    0.0 0.2 0.4 0.6 0.8 1.0

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    Gaussian copula with normal marginals

    X1

    X2

    −4 −2 0 2

    −3

    −2

    −1

    01

    23

    Gaussian copula with normal marginals

    X1

    X2

    0 5 10 15 20 25 30

    05

    1015

    20

    Gaussian copula with lognormal marginals

    X1

    X2

    9

  • 2.4.2 Archimedean Copulas

    An Archimedean copula is defined in the following way: ∀(u1, . . . , un) ∈ [0, 1]n

    C(u1, . . . , un) = ϕ−1

    [

    n∑

    i=1

    ϕ(ui)]

    where ϕ is the generator i.e. a function that satisfies:

    1. ϕ(1) = 0,

    2. ϕ is strictly decreasing on [0, 1],

    3. ϕ is convex on [0, 1].

    Some notorious bivariate Archimedean copulas include :

    • Independent copula: C(u, v) = u.v where ϕ(t) = − ln(t),

    • Clayton copula: C(u, v) = max[(u−α + v−α − 1)−1

    α , 0],

    where ϕ(t) =1

    α(t−α − 1); α ∈ [−1,∞[\{0},

    • Frank copula: C(u, v) = −1

    αln

    (

    1 +(e−αu − 1)(e−αv − 1)

    e−α − 1

    )

    ,

    where ϕ(t) = − ln

    (

    e−αt − 1

    e−α − 1

    )

    ; α ∈] −∞, +∞[\{0}.

    2.4.3 Gumbel Copula

    Gumbel copula is an asymmetric Archimedean copula with greater dependence in theupper tail than in the lower tail.Gumbel copula is given by:

    C(u, v) = exp

    (

    −[

    (− ln u)α + (− ln v)α]

    1

    α

    )

    .

    Its generator is:ϕ(t) = (− ln t)α where α ∈ [1,∞[.

    From the theorem 1.4.6 in [2], the Gumbel copula is an extreme value copula because itsastifies the max-stable property. This property is interpreted in the following manner:

    Let (X1, . . . , Xn) and (Y1, . . . , Yn) be indentically distributed random vectors such thatcouples (Xi, Yi) are bound by the same copula C.Let Cmax be the copula of the couple (X(n), Y(n)) = (max(Xi), max(Yi)). According to [2]:

    Cmax(u, v) = C(u1

    n , v1

    n )n = C(u, v)

    That is to say that for the Gumbel copula, the pair (Xi, Yi) is bound by the same pair asthe pair (X(n), Y(n)).

    10

  • Therefore the Gumbel copula has good properties to model dependence between extremeevents. In particular, the upper tail dependence coefficient λu = 2 − 2

    1

    α .

    Kendall’s τ relationship with the α parameter of the Gumbel copula is: τ =α − 1

    α.

    Here are illustrations for the bivariate case with α = 3:

    X1

    X2

    Z

    Gumbel copula density

    0.0 0.2 0.4 0.6 0.8 1.00.

    00.

    20.

    40.

    60.

    81.

    0

    Gumbel copula with normal marginals

    X1

    X2

    −3 −2 −1 0 1 2 3

    −3

    −2

    −1

    01

    23

    Gumbel copula with normal marginals

    X1

    X2

    0 10 20 30 40

    05

    1015

    2025

    30

    Gumbel copula with lognormal marginals

    X1

    X2

    Compared to the figure in 2.4.1, we note the asymetric shape of the Gumbel copula.Upper tail dependence is stronger than with a Gaussian Copula. We also see that it hasmore dependence in the upper tail than in the lower tail.

    3 An application to operational risk modeling: LDA

    model

    Risk modeling has arisen as a major issue in the process of mitigating risks. Quantitativemodels are used in numbers of fields such as finance, insurance, energy or health.In order to fairly assess the overall risk exposure, models have to take into account thecorrelation between individual risks when aggregating them. More importantly, modelsmust capture the dependence between extreme events, thus reflecting the possibility ofrare events cumulation.Copulas offer robust framework for correlation modeling. Therefore it is not surprising tosee them appearing in actuarial and financial models.

    11

  • 3.1 Context: operational risk and financial regulation

    In the case of finance, Basel II enforces financial institutions to hold capital reserves ap-propriate to the risk it exposes itself. It aims to increase the stability in the internationalfinancial system. The risk factors are market risk, credit risk and operational risk.

    We will study a model to compute the economic capital for operational risk. The BaselCommittee defines operational risk as: "The risk of loss resulting from inadequate orfailed internal processes, people and systems or from external events".

    Thus, we can see that risk factors are various and heterogeneous. The different eventtypes officially defined by the Basel Committee are:

    1. Internal Fraud e.g. misappropriation of assets, tax evasion, intentional mismarkingof positions, bribery.

    2. External Fraud e.g. theft of information, hacking damage, third-party theft andforgery.

    3. Employment Practices and Workplace Safety e.g. discrimination, workers compen-sation, employee health and safety.

    4. Clients, Products, & Business Practice e.g. market manipulation, antitrust, im-proper trade, product defects, fiduciary breaches, account churning.

    5. Damage to Physical Assets e.g natural disasters, terrorism, vandalism.

    6. Business Disruption & Systems Failures e.g. utility disruptions, software failures,hardware failures.

    7. Execution, Delivery, & Process Management e.g. data entry errors, accountingerrors, failed mandatory reporting, negligent loss of client assets.

    3.2 Loss Distribution Approach (LDA)

    The Loss Distribution Approach (LDA) consists in modeling losses for individual risksand then aggregating them to calculate the overall loss distribution. The economic capitalis computed as the Value-at-Risk, which is to say the 99.5% quantile for insurances and99.9% quantile for banks.

    3.2.1 Modeling loss for individual risks

    The distribution of a loss Lr for an individual risk r is seen as a mixture of two probabilitydistributions. These two distributions represent two caracteristics of a risk:

    1. Nr represents the number of events caused by the risk r over a year. We say thatNr follows the frequency distribution.

    2. Xr describes the severity of loss e.g. the amount of money lost each time an eventoccur. We say that Xr follows the unit loss distribution.

    12

  • Then the overall loss Lr for the individual risk r is given by:

    Lr =

    Nr∑

    i=1

    Xr,i .

    We say that Lr follows the annual loss distribution for the individual risk r.

    The assumptions carried in the model are the following:

    1. Nr and (Xr,1, Xr,2, . . . ) are independent random variables,

    2. Xr,1, Xr,2, . . . is a set of independent and identically distributed random variables.

    Usually, the frequency distribution is modeled with a Poisson distribution. The α param-eter of the Poisson distribution is set to be the average number of events occuring over ayear. Alternative distributions are also used: the binomial distribution allows to modelevents that cannot occur more than a certain number of time; the negative binomial offersmore flexibility than the Poisson with its two parameters.

    The unit loss distribution is represented with a heavy tailed distribution in order to cap-ture extreme events. A lognormal model is chosen in the standard approach, but Weibulland Pareto distributions can also be used if appropriate.

    Finally, we can see that the annual loss distribution for the individual risk r is an in-finite Poisson mixture model:

    P (Lr ≤ x) =

    ∞∑

    n=1

    P(

    n∑

    i=1

    Xr,i ≤ x)

    P(

    Nr = n)

    .

    No closed form can be derived for P (Lr ≤ x) under these assumptions.

    3.2.2 Overall risk exposure and Economic Capital

    The aggregate loss distribution for the institution is computed by summing annual lossdistributions for each individual risks (ri)1≤i≤m listed:

    G =m

    i=1

    Lri =m

    i=1

    Nri

    j=1

    Xri,j .

    Then the Economic Capital for operational risk (EC) in Basel II is computed in thefollowing manner:

    EC = V aRα(G)

    where VaR is the Value-at-Risk, defined by the 99.5% quantile (α = 0.5%) for insurancecompanies and the 99.9% quantile (α = 0.1%) for banks.

    3.3 Risk dependencies in LDA

    Risk dependencies is a major modeling issue in every risk model. In this section, we willpresent the implications of independence versus dependence in LDA, and analyze howcopulas can adress the dependence modeling between individual risks.

    13

  • 3.3.1 Diversification and Value-at-Risk

    In the operational risk framework, diversification represents the reduction of the Eco-nomic Capital due to decorellated risks. This concept is adapted from the investmentworld where it states that risk is reduced by investing in different assets. A simple ex-ample of diversification is provided by the proverb “Don’t put all your eggs in the samebasket”.In practice, we can easily imagine that Internal Fraud is (at least partially) decorellatedfrom Damage to Physical Assests but financial institutions are short in data and correla-tion estimations have to be considered with great caution.

    The LDA model as presented above assumes a total independence between all type ofrisks. This hypothesis is not acceptable for the regulators and for a fair quantification ofthe Economic Capital.A simple alternative exists, it consists in summing the value-at-risk of all individual risks.Then the Economic Capital is:

    EC =m

    i=1

    V aRα(Lri) .

    This alternative formulation of the Economic Capital assumes that all type of risks arecompletely dependent. For example, this would mean that, if a natural disaster causedan important economic loss for the institution, then internal fraud will also cost a lot tothe firm this year.Therefore, we can see that both independence and complete dependence are not satisfayingassumptions in terms of risk dependencies.Moreover, one important characteristic of the Value-at-Risk as a risk measure is that theVaR of a sum of partially correlated risks can be higher than the sum of the VaR of thesame individual risks.Finally, we can see that the correlation largely affects the Economic Capital. The LDAmodel must take correlation into account in order to fairly quantify the Economic Capital.

    3.3.2 Copulas for dependence modeling

    Previously, we saw that copulas represent a practical tool to model dependence in mul-tivariate distributions with non-gaussian marginals. They offer a robust framework andthe possibility to capture different type of associations between the marginal distributions.

    Naturally, we will adapt the Loss Distribution Approach in order to take into accountcorrelated events.Several alternatives are available:

    1. Frequency correlation assumes that annual frequencies Nr are not independent vari-ables. This could reflect a common dependence in frequencies with a third partfactor such as economic cycle or gross income.

    2. Severity correlation assumes that unit losses Xr are not independent accross differenttype of risks. But this is conceptually difficult to assume severity independence inone risk and correlation accross different risk.

    14

  • 3. Individual loss correlation assumes dependence between individual risks Lr.

    Even if frequency correlation and individual loss correlation are different by nature andby meaning, we can see that frequency correlation implies individual loss correlations.Moreover, individual loss correlation is the cheapest way to add correlation in aggregateloss. We will present the formulation of the LDA model with individual loss correlation.

    Let C be a m-dimensional copula and Lri be individual loss distributions, then the jointdistribution H of individual losses is:

    H(x1, x2, . . . , xm) = C(Lr1(x1), Lr2(x2), . . . , Lrm(Xm)).

    Let (Hi)1≤i≤m be the marginals of the joint distribution H , then the Economic Capital isgiven by:

    EC = V aRα(

    m∑

    i=1

    Hi) .

    3.4 Which copula to use?

    The most common copula used in financial risk applications is the Gaussian copula. Ithas been introduced in the field for credit risk by David X. Li [8] and then spread toother financial modeling applications. This approach has been later heavily criticized byacademics and practicionners because it fails to capture dependence in the tail of distri-butions. Indeed, we saw in paragraph 2.4.1 that the Gaussian copula has a zero coefficientfor tail dependence, which means events occur independently far enough in the tails ofthe joint distribution.Theoretically, the use of another copula is highly recommended when considering rareevents. We propose the use of the Gumbel copula introduced in paragraph 2.4.3 becauseit presents a positive upper tail dependence coefficient. Other copulas present this kindof features and the Gumbel copula is just used here for an illustrative purpose.

    We propose a simple experiment that will illustrate the lack of dependence in the up-per tail of a Gaussian copula.Let H be a joint distribution with uniformely distributed marginals X and Y . We willcompute the probability p = P (X ≥ 0.95, Y ≥ 0.95).If X and Y are independent: p = P (X ≥ 0.95) × P (Y ≥ 0.95) = 0.0025.If X and Y are perfectly dependent: p = P (X ≥ 0.95) = 0.05.Bivariate data will be simulated following a Gumbel copula. A Gaussian copula will befitted using the maximum likelihood method. We will analyze the theoretical probabilityp with the Gumbel copula versus the probability p for the fitted Gaussian copula fordifferent dependence levels.The figure below shows:

    1. the theoretical probability p with the Gumbel copula in red,

    2. the empirical probability p found with the simulated data with the points “1”,

    3. the 95% confidence interval for the empirical probability p in green,

    4. the probability p estimated with the fitted Gaussian copula in blue,

    15

  • 5. the independent and perfectly dependent cases in bordeaux.

    1

    11

    11

    1

    11

    11

    1

    1

    1

    1

    11

    11

    1

    11

    1

    1

    11

    1

    1

    1

    1

    11

    1

    1

    1

    1

    1

    1

    1

    11

    11

    1

    1

    1

    11

    1

    1

    1

    1

    11

    1

    1

    11

    1

    1

    1

    1

    1

    1

    11

    1

    1

    1

    1

    11

    11

    11

    1

    1

    1

    1

    1

    1

    1

    1

    1

    11

    1

    1

    1

    1

    1111

    1

    1

    11

    1

    1

    2 4 6 8 10

    0.01

    0.02

    0.03

    0.04

    0.05

    Gumbel’s Alpha

    Pro

    babi

    lity

    We can see that the fitted Gaussian copula systematically underestimates the probabilityp = P (X ≥ 0.95, Y ≥ 0.95). This shows that the Gaussian copula fails to capture taildependence.

    4 Beyond operational risk: the LDA model and envi-

    ronmental risk

    4.1 Our problem

    4.1.1 Context

    We can define environmental risk as the chance that human health, the economy or theenvironment will be affected as the result of the presence of environmental hazards. Var-ious type of risk fall into this definition such as natural disasters (earthquakes, typhoons,tsunamis, etc), pollution due to human activity or animal species extinctions. Quantita-tive methods can permit to model or predict this kind of events.

    4.1.2 Problem

    Our particular example illustrates the study of rainfall in order to size protection worksagainst floods. Rainfull is determining factor in the design of civil engineering works such

    16

  • as dams, canals, bridges or dikes. Therefore, engineers are interested in assessing the riskthat an extreme rainfall threatens their work. They need to determine extreme rainfallswith very low probabilities such as once a century.We propose to adapt the LDA model used in operational risk to address this issue. Wewill model the monthly frequency and the severity of rainfalls in various locations thatimpact the construction work. Our objective is to determine the impact of an extremerainfall in the area in term of water height.

    4.1.3 Data

    The data set used for the experiment is a daily rainfall set over 51 years in 180 weatherstations located in the French region of Cévennes.

    4.2 LDA model

    We will present the adaptation of the LDA model to our problem and provide an analysisof the fitting and results of the model.

    4.2.1 Frequency distributions

    The frequency distribution Nr models the monthly frequency distribution of rainfallsnoted in the weather station r i.e. the number of rainy days in a month.The standard LDA model recomends to model frequency with a Poisson distribution,or Poisson law of small numbers. The single parameter λ of the Poisson distribution isestimated using the maximum likelihood estimator i.e. the mean of the empirical distri-bution. As the maximal number of rainy days is majored by 30, we can also consider abinomial distribution.

    Nevertheless, these two distributions are not flexible enough to correctly fit our data.The negative binomial distribution is better choice as it allows more flexibility with itstwo parameters.

    The figure below represents the empirical frequency distribution and the different modelsfor a given weather station. The fitted Poisson distribution is plotted in blue and thenegative binomial in red. These distributions are discrete but represented as continuousfor illustrative purpose.

    17

  • Histogram of Frequency distribution

    Rainy days

    Den

    sity

    0 5 10 15 20

    0.00

    0.05

    0.10

    0.15

    Finally, we chose the negative binomial distribution to model frequency distributions. Inorder to validate this choice, the Chi-square test can be carried out.

    4.2.2 Severity distributions

    The severity distribution Xr models the rainfall distribution noted in the weather stationr in millimeters.The LDA model suggests the use of the lognormal distribution. Maximum likelihoodmethod is used to fit the data.

    The figure below represents the empirical severity distribution and the lognormal modelfor the same weather station. Rainfalls are represented horizontally in millimeters.

    18

  • Histogram of Severity distribution

    Rainfall

    Den

    sity

    0 50 100 150

    0.00

    0.05

    0.10

    0.15

    0.20

    0.25

    LogNormal

    This figure represents a probability plot for the lognormal model:

    0.0 0.2 0.4 0.6 0.8 1.0

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    QQPlot for the lognormal model

    Empirical distribution

    Mod

    el

    We can see that the lognormal model fits well to the data. In order to confirm thisgraphical analysis, the Anderson-Darling test can be carried out.

    19

  • 4.2.3 Monthly rainfall distributions

    The monthly rain distribution Lr is a mixture of the frequency and severity distributionfor the r weather station. It represents the total amount of rainfall in a month in acertain location. It is the equivalent of the individual risk distribution in the operationalrisk application.The mathematical formulation of this distribution is :

    Lr =Nr∑

    i=1

    Xr,i

    where Nr is the number of rainy days (negative binomial distribution) during the monthand Xr is the severity distribution (Lognormal distribution).

    In practice, this distribution is only available using Monte Carlo simulations becausethere is no closed form for the distribution function. As a result, the variability of theestimations is increasing rapidly in the tail of the distribution.The figure below represents a boxplot of 100 Monte Carlo estimations for the quantilesbetween 75% and 99% (each 1%) versus the empirical quantiles extracted from the data(in red). Box number 1 represents the 75% quantile estimations and box number 25 rep-resents 99% quantile estimations. Sample size is 1000 for each Monte Carlo simulation.The data set size is 620 samples.

    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

    100

    200

    300

    400

    500

    Quantile plot

    Qua

    ntile

    We can see that the model for monthly rainfall distributions fits well the empirical distri-bution until the 97% quantile.Beyond the 97% quantile, we note that the model overestimates the empirical quantiles.The previous figure illustrates the high variability of Monte Carlo estimations for tailquantiles in the monthly rainfall distribution.

    20

  • 4.2.4 Copulas and aggregated impact distributions

    The aggregated impact distribution represents the sum of monthly rainfall in weatherstations impacting the civil engineering work. As the weather stations are located in thesame geographical area, we can easily figure out that the monthly rainfall will be depen-dent, but not necessarily perfectly dependent. Therefore, we will model this dependenceusing copulas as suggested in paragraph 3.4.

    In our experiment, we consider two weather stations Lr1 and Lr2 , then the aggregatedimpact distribution is defined by:

    G =2

    i=1

    Lri =2

    i=1

    Nri

    j=1

    Xri,j

    where Lr1 and Lr2 are bound by a copula C.

    We will compare the results for a Gaussian and a Gumbel copula. Copulas have beenfitted to the data using the maximum likelihood method.First, the two figures below show QQ-Plots for the Normal and Gumbel models. Eachfigure represents one Monte Carlo simulation with 1000 samples. Empirical quantiles arerepresented horizontally and model’s quantiles are represented vertically. The unit is themillimeter.

    0 100 200 300 400 500

    010

    020

    030

    040

    050

    0

    QQ Plot Gumbel Model

    Empirical distribution

    Mod

    el

    21

  • 0 100 200 300 400 500

    010

    020

    030

    040

    050

    0

    QQ Plot Normal Model

    Empirical distribution

    Mod

    el

    We note that both models overestimate the tail of the empirical distribution for quantilesbeyond 90%.The following figures represent the quantiles from 75% to 99% (each 1%) versus theempirical quantiles for both models. Box plots are based on 100 Monte Carlo estimationswith 1000 samples.

    1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

    200

    300

    400

    500

    600

    700

    Quantile for Gumbel Model

    quan

    tile

    22

  • 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

    200

    300

    400

    500

    600

    700

    Quantile for Normal Modelqu

    antil

    e

    Once again we can see that both models overestimate the tail of the distribution.

    4.3 Analysis

    In this paragraph, we saw that the LDA model is a natural approach to many problemsrelated to event risk. The mix in term of frequency and severity of events is very intuitive.

    In the application to rainfall data, we noticed that the frequency distributions could bemodeled by other discrete distributions. The negative binomial distribution constitutesa good alternative to the Poisson distribution. The negative binomial distribution canmodel “overdispersed” Poisson distributions thanks to its two parameters, as seen in thefigure in section 4.2.1. The adoption of this distribution in the present example improvesthe results of the model.The lognormal model fits well to severity distributions but we could have thought to otherdistributions such as Weibull or (truncated) Pareto distributions for other applications.The aggregated impact distributions resulting from Monte Carlo simulations fits well theempirical distributions until far tail quantiles. But we saw that Monte Carlo simulationscause an important variability in the estimation of tail quantiles beyond 90%. This con-stitutes a huge drawback to the LDA model as we cannot derive any closed form formulafor the aggregated impact distribution, which is a mixture of the frequency and severitydistributions. We noticed that the model generated higher tail quantiles than those ob-served in the data.

    Finally, we were not able to compare the model with a Gaussian copula and the onewith a Gumbel Copula because of the cumulative effect of highly variable quantile esti-mations in the tail and overestimated tail quantiles. Both models generated overestimatedtail quantiles compared to the date. Therefore, in the present example, we are not able

    23

  • to confirm our theoretical intuitions about copulas in the LDA model. Indeed, the choiceof a copula highly depends on the data modeled.

    At least, we can see that Monte Carlo computations experimentally show that the Gum-bel copula model presents more dependence in the tail than the Normal copula model.The figure below illustrates this phenomenon with a QQ-plot based on one Monte Carlosimulations for each model in the previous example:

    0 500 1000 1500

    050

    010

    0015

    00

    QQ Plot Gumbel vs Normal model

    Normal model quantiles

    Gum

    bel m

    odel

    qua

    ntile

    s

    Finally, the application of the LDA model to environmental risk needs to be refined inorder to produce better results. We suggest the use of goodness of fit tests for frequencyand severity distributions selection such as those mentionned earlier. Other distributionscould also be considered to model frequency and severity.The dependence between frequency and severity distributions needs to be investigatedbecause the LDA model assumes that those variables are independent.When a more accurate model is designed, it could be compared to other approaches to thisproblem such as the extreme value approach developed by Laurent Gardes and StéphaneGirard in [9].

    5 Conclusion

    In this paper, we reviewed state-of-the-art elements from the copula theory and exploredan application of copulas for risk management: the LDA model. We presented the LDAmodel as originally formulated in the framework of operational risk and financial reg-ulations (Basel II Economic Capital). We presented how dependence between risk isimportant for a fair quantification of the Economic Capital and introduced the use ofcopulas to model this dependence. We formulated theoretical considerations in terms ofdependence modeling (particularly tail dependence) in order to suggest the right copulas

    24

  • in the case of operational risk. The normal copula may be avoided in this field.Then we studied an application of the Loss Distribution Approach beyond the field of op-erational risk: extreme rainfalls and environmental risk. We enlightened the advantagesof the LDA model:

    1. Natural approach to model event risk,

    2. Flexibility for frequency and severity distributions modeling,

    3. Easy computations using Monte Carlo simulations.

    But we also noted the following drawbacks:

    1. Highly variable quantile estimations in the tail of the aggregated impact distribution,

    2. No closed-form formula for the aggregated impact distribution,

    3. Lack of data makes the fitting and result analysis hard.

    Therefore, we can see that despite his insightful approach to risk modeling the use of theLDA model and copulas is very speculative and theoretical in the field of operational riskgiven the lack of historical data.

    25

  • References

    [1] Eric Bouyé, Valdo Durrleman, Ashkan Nikeghbali, Gaël Riboulet & Thierry Roncalli,Copulas: an open field for risk management. 2001.http://www.thierry-roncalli.com/download/copula-rm.pdf

    [2] Roger B. Nelsen, An Introduction to Copulas, Springer Series in Statistics, Secondedition, 2006.

    [3] Antoine Frachot, Thierry Roncalli & Eric Salomon, The Correlation Problem in Op-erational Risk, GRO Technical report, 2004.http://www.thierry-roncalli.com/download/lda-correlations.pdf

    [4] Pavel V. Shevchenko, Implementing Loss Distribution Approach for Operational Risk,CSIRO Mathematical and Information Sciences, Sydney, 2009.http://arxiv.org/abs/0904.1805

    [5] Basel Committee on Banking Supervision, Results from the 2008 Loss Data CollectionExercise for Operational Risk, 2009.

    [6] Catherine Donnely & Paul Embrechts, The devil is in the tails: actuarial mathematicsand the subprime mortgage crisis, RiskLab, ETH Zürich, 2010.http://www.math.ethz.ch/b̃altes/ftp/CD_PE_devil_Jan10.pdf

    [7] Valdo Durrleman, Ashkan Nikeghbali & Thierry Roncalli, Which copula is the rightone?, GRO Technical report, 2000.http://www.thierry-roncalli.com/download/copula-choice.pdf

    [8] David X. Li, On Default Correlation: A Copula Function Approach, Journal of FixedIncome, Vol. 9, No. 4, pp. 43-54, 2000.

    [9] Laurent Gardes & Stéphane Girard, Conditional extremes from heavy-tailed distribu-tions: An application to the estimation of extreme rainfall return levels, Extremes,13(2), 177-204, 2010.

    26