modelling and forecasting expected ... - john w....

24
Modelling and forecasting expected shortfall with the generalized asymmetric Student-t and asymmetric exponential power distributions Dongming Zhu School of Economics, Shanghai University of Finance and Economics John W. Galbraith* Department of Economics, McGill University Abstract Financial returns typically display heavy tails and some degree of skewness, and conditional variance models with these features often outperform more limited models. The difference in performance may be especially important in estimating quantities that depend on tail features, including risk measures such as the expected shortfall. Here, using recent generalizations of the asymmetric Student-t and exponential power distributions to allow separate parameters to control skewness and the thickness of each tail, we fit daily financial return volatility and forecast expected shortfall for the S&P 500 index and a number of individual company stocks; the generalized distributions are used for the standardized innovations in a nonlinear, asymmetric GARCH-type model. The results provide evidence for the usefulness of the general distributions in improving fit and prediction of downside market risk of financial assets. Constrained versions, corresponding with distributions used in the previous literature, are also estimated in order to provide a comparison of the performance of different conditional distributions. JEL classification codes: G10, C16 Key words: asymmetric distribution, expected shortfall, NGARCH model. * Corresponding author. Tel. +1 514 398 8964; [email protected]; 855 Sher- brooke St. West, Montreal, QC H3A 2T7, Canada. The first author gratefully ac- knowledges partial research support from the Leading Academic Discipline Program, 211 Project for Shanghai University of Finance and Economics (the 3rd phase). The support of the Social Sciences and Humanities Research Council of Canada (SSHRC) and the Fonds qu´ ebecois de la recherche sur la soci´ et´ e et la culture (FQRSC) is also gratefully acknowledged. We thank seminar participants at HEC Montr´ eal and the Computational and Financial Econometrics 2010 conference (CFE ’10) for valuable comments. 1

Upload: others

Post on 07-Feb-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

  • Modelling and forecasting expected shortfall withthe generalized asymmetric Student-t and

    asymmetric exponential power distributions

    Dongming ZhuSchool of Economics,

    Shanghai University of Finance and Economics

    John W. Galbraith*Department of Economics, McGill University

    Abstract

    Financial returns typically display heavy tails and some degree of skewness, andconditional variance models with these features often outperform more limited models.The difference in performance may be especially important in estimating quantitiesthat depend on tail features, including risk measures such as the expected shortfall.Here, using recent generalizations of the asymmetric Student-t and exponential powerdistributions to allow separate parameters to control skewness and the thickness of eachtail, we fit daily financial return volatility and forecast expected shortfall for the S&P500 index and a number of individual company stocks; the generalized distributions areused for the standardized innovations in a nonlinear, asymmetric GARCH-type model.The results provide evidence for the usefulness of the general distributions in improvingfit and prediction of downside market risk of financial assets. Constrained versions,corresponding with distributions used in the previous literature, are also estimated inorder to provide a comparison of the performance of different conditional distributions.

    JEL classification codes: G10, C16

    Key words: asymmetric distribution, expected shortfall, NGARCH model.

    * Corresponding author. Tel. +1 514 398 8964; [email protected]; 855 Sher-brooke St. West, Montreal, QC H3A 2T7, Canada. The first author gratefully ac-knowledges partial research support from the Leading Academic Discipline Program,211 Project for Shanghai University of Finance and Economics (the 3rd phase). Thesupport of the Social Sciences and Humanities Research Council of Canada (SSHRC)and the Fonds québecois de la recherche sur la société et la culture (FQRSC) is alsogratefully acknowledged. We thank seminar participants at HEC Montréal and theComputational and Financial Econometrics 2010 conference (CFE ’10) for valuablecomments.

    1

  • 1 Introduction

    Substantial progress has been made in the modelling and forecasting of financial assetreturns since the ARCH and GARCH models were first developed in Engle (1982) andBollerslev (1986). A key feature of this progress has been the incorporation of asymme-tries and thicker tails into conditional variance models, both through the structures ofthe models themselves (see for example Engle and Ng 1993, which will be used below)and through the use of thick-tailed distributions for the standardized innovations ofthe models. Bollerslev (1987) pioneered the use of the t− distribution for this purpose,yielding thicker-tailed predictive densities for asset returns than were available throughthe GARCH specification combined with Gaussian standardized innovations.1

    Subsequent literature has explored the use of conditional distributions that areskewed as well as heavy-tailed. Hansen (1994) used a skewed t− for this purpose,and other skew extensions of the t− distribution have been proposed by Fernandezand Steel (1998), Theodossiou (1998), Branco and Dey (2001), Bauwens and Laurent(2002), Jones and Faddy (2003), Sahu et al. (2003), Azzalini and Capitanio (2003), Aasand Haff (2006) and others. An alternative to the skewed t− is the skewed exponentialpower distribution; see for example Fernandez et al. (1995), Ayebo and Kozubowski(2004) and Komunjer (2007). The availability of relatively long financial data setshas allowed the estimation and exploitation of these asymmetrical and heavy-tailedmodels, and empirical investigation has found that such extensions can be useful incharacterizing the pattern of financial returns; see for example Mittnik and Paolella(2003) and Alberg et al. (2008).

    These distributions constrain the modelling of asymmetry and tails by using two pa-rameters which together control skewness and thickness of the left and right tails. Zhuand Zinde-Walsh (2009) and Zhu and Galbraith (2010) propose further generalizationswhich allow a separate parameter for each of these quantities, so that two parameterscontrol thickness of the two tails and a third allows skewness to change independentlyof the tail parameters; the former extends the skewed exponential power distributionin this way, and the latter extends the skewed Student-t. Another approach that allowsmore general characterizations is to use mixtures; Rombouts and Bouaddi (2009) suc-cessfully use mixtures of exponential power distributions to fit tails. Because there isthe potential that tail thickness differs non-negligibly between the left and right tails(for example, because market declines may involve more extreme price movements thanmarket advances), these extensions can allow better fitting of the rates of tail decay andtherefore can facilitate better estimation and forecasting of downside risk, for which theleft tail alone is relevant. With increasingly large data sets describing financial assetreturns, these less-constrained forms offer the potential for improvement in the subtletyof our fit to data, and therefore in the accuracy of forecasts of quantities related torisk.

    The present paper explores the use of these distributions for modelling the con-

    1Empirical evidence has clearly indicated that daily (for example) financial return data continue toexhibit conditional tail-fatness even after allowing for the GARCH effect; see for example Bollerslev etal. (1992).

    2

  • ditional distribution of asset returns, and in particular for forecasting downside riskthrough the expected shortfall, a measure which is sensitive to losses in the extreme tailof the distribution of returns. The asymmetric exponential power distribution (AEPD)proposed by Zhu and Zinde-Walsh (2009), and the generalized asymmetric Student-t (AST) distribution proposed by Zhu and Galbraith (2010), are used to model thestandardized innovations in the nonlinear asymmetric NGARCH model of Engle andNg (1993). We find evidence of the potential to improve both fits to financial returndata and forecasts of the expected shortfall using these distributions. As well, becausethese distributions generalize a number of those available in the empirical literature onfinancial returns, and because we treat both the skewed t− and the skewed exponentialpower distribution, our results provide an overview and comparison of the risk fore-casting performance of these conditional distribution classes and of a number of specialcases which have previously appeared.

    The next section of the paper reviews the two distributions, in density functionforms that are convenient for estimation of conditional variance models, and summa-rizes results on expected shortfall for these distributions. Section 3 describes the modelsand the data that will be used, the performance of the models in fitting volatility offinancial returns according to a number of measures, and reports statistical inferenceon the adequacy of the distribution models. Section 4 describes methods and gives em-pirical results for the problem of predicting downside risk, in the form of the expectedshortfall. A final section concludes.

    2 Asset return models and generalized distributions

    We begin by characterizing the models of financial returns and the distributions to beused for modelling the standardized innovations, in a common notation; we summarizesome results in Zhu and Zinde-Walsh (2009) and Zhu and Galbraith (2010).

    2.1 Models for return processes

    The return process r = {rt} is modelled as rt = mt+εt with εt = σtzt, where mt and σ2tare the conditional mean and variance of rt given the information set available at timet− 1 (i.e., mt = E(rt|Ωt−1) and σ2t = E((rt −mt)2|Ωt−1)), the zt are i.i.d. innovationswith zero mean and unit variance and Ωt−1 is the information set available at t−1. Thismodel allows different features of the return process only through different specificationsof mt, σ

    2t and through the distribution of zt. The combination of an ARMA specification

    for the conditional mean and a GARCH specification for the conditional variance isnow common in empirical finance; here we simply assume that the mean is mt = m,for any t. As in Stentoft (2008) and Zhu and Zinde-Walsh (2009), here σ2t is specifiedto be subject to the non-linear asymmetric GARCH (NGARCH) structure of Engleand Ng (1993), which allows for leverage; Stentoft (2008), in the context of optionpricing models, compares the NGARCH with the restriction to GARCH and finds thatthe more general NGARCH structure tends to produce both better model fits andreductions in bias and RMSE in option pricing models.

    3

  • The focus of our attention is the specification of the distribution of zt, by whichthe conditional distribution of the return process is determined, so that the condi-tional expected shortfall can be computed and predicted. Here zt is assumed to be a

    standardized AST or AEPD r.v., that is, ztd= [X − ω(β)])/δ(β), where X has one of

    the standard AST or AEPD distributions (with a parameter vector β, ω (β) = E(X)and δ (β) =

    √V ar (X)) described in the next sub-sections. The return series {rt} is

    therefore an AST-NGARCH(1,1) or AEPD-NGARCH(1,1) process,

    rt = m+ σtzt, zt ∼ i.i.d.AST (0, 1), or ∼ i.i.d.AEPD(0, 1), (1)σ2t = b0 + b1σ

    2t−1 + b2σ

    2t−1(zt−1 − c)2

    = b0 + b1σ2t−1 + b2(rt−1 −m− cσt−1)2. (2)

    The leverage effect is represented by the parameter c in (2); c > 0 implies a negativecorrelation between the innovations in the asset return and conditional volatility ofthe return. The use of AST or AEPD distributions allows us to capture better thesignificant asymmetry present in the conditional distribution of asset returns causedby the asymmetric effect of (good and bad) shocks.

    We now introduce the distributions to be used for the standardized innovations zt,in density function form.

    2.2 The AEPD density

    The standard AEPD density has the form:

    fAEP (y;β) =

    (αα∗ )KEP (p1) exp

    (− 1p1

    ∣∣ y2α∗

    ∣∣p1) , if y ≤ 0;( 1−α1−α∗ )KEP (p2) exp

    (− 1p2

    ∣∣∣ y2(1−α∗) ∣∣∣p2) , if y > 0, (3)where β = (α, p1, p2) is the parameter vector, α ∈ (0, 1) is the skewness parameter,p1 > 0 and p2 > 0 are the left and right tail parameters respectively, KEP (p) ≡1/[2p1/pΓ(1 + 1/p)] is the normalizing constant of the GED distribution, and α∗ isdefined as

    α∗ = αKEP (p1)/[αKEP (p1) + (1− α)KEP (p2)]. (4)

    Note that

    α

    α∗KEP (p1) =

    1− α1− α∗

    KEP (p2) = αKEP (p1) + (1− α)KEP (p2) ≡ BAEP . (5)

    The AEPD density function is continuous at every point and unimodal with mode aty = 0, but is not differentiable at y = 0. The parameter α∗ provides scale adjustmentsto the left and right parts of the density to ensure continuity under changes to theshape parameters (α, p1, p2). The special case in which p1 = p2 is equivalent to theskewed exponential power distributions given by Komunjer (2007), Theodossiou (1998)and Fernandez et al. (1995); for a detailed discussion see Zhu and Zinde-Walsh (2009).The general form of the AEPD density is expressed as 1σfAEP (

    y−µσ ;β), where µ and σ

    the location (mode) and scale parameters, respectively.

    4

  • Since the AEPD density is in general asymmetric, its mode (y = 0) is generallydifferent from its mean, denoted by ω(β),

    ω(β) ≡ E [YAEP ] =1

    BAEP

    [(1− α)2 p2Γ(2/p2)

    Γ2(1/p2)− α2 p1Γ(2/p1)

    Γ2(1/p1)

    ]; (6)

    its standard deviation δ(β) ≡√V ar [YAEP ] is given by

    [δ(β)]2 =1

    B2AEP

    [(1− α)3 p

    22Γ(3/p2)

    Γ3(1/p2)+ α3

    p21Γ(3/p1)

    Γ3(1/p1)

    ]− [ω(β)]2 . (7)

    When the i.i.d. innovations zt in the model (1) are assumed to be standardized AEPDrandom variables, their density is given by

    f (z;β) = δ(β)fAEP (ω(β) + δ(β)z;β) .

    Zhu and Zinde-Walsh (2009, Fig. 1) provide graphical illustrations of the AEPDdensity for various parameter values, and Komunjer (2007) for the case in which p1 =p2.

    2.3 The AST density

    The AST distribution in its standard probability density function form is defined by

    fAST (y;β) =

    αα∗K(υ1)

    [1 + 1υ1

    ( y2α∗

    )2]−(υ1+1)/2, if y ≤ 0;

    1−α1−α∗K(υ2)

    [1 + 1υ2

    (y

    2(1−α∗)

    )2]−(υ2+1)/2, if y > 0,

    (8)

    where2 β = (α, υ1, υ2), α ∈ (0, 1), υ1 > 0 and υ2 > 0, K(υ) ≡ Γ((υ+1)/2)/[√πυΓ(υ/2)],

    and α∗ is defined as α∗ = αK(υ1)/[αK(υ1) + (1− α)K(υ2)]; as well,

    α

    α∗K(υ1) =

    1− α1− α∗

    K(υ2) = αK(υ1) + (1− α)K(υ2) ≡ BAST . (9)

    These parameters have interpretations similar to those in the definition of the AEPDdensity (3). The mean and variance of the standard AST distribution are given asfollows:

    E [YAST ] = 4BAST

    [−α∗2 υ1

    υ1 − 1+ (1− α∗)2 υ2

    υ2 − 1

    ]≡ ω(β), (10)

    V ar [YAST ] = 4

    [αα∗2υ1υ1 − 2

    +(1− α)(1− α∗)2υ2

    υ2 − 2

    ]− [ω(β)]2 ≡ [δ(β)]2 . (11)

    The density of a standardized AST random variable [YAST − ω(β)])/δ(β) has the formf (z;β) = δ(β)fAST (ω(β) + δ(β)z;β), which is one of the alternatives used for thedistribution of zt in (1).

    2Γ(·) is the gamma function.

    5

  • We can write a general form of the AST density, with location and scale parametersµ and σ, as 1σfAST (

    y−µσ ;β). Again, a re-scaled version is useful for computation:

    f∗AST (y; θ) =

    [1 + 1υ1

    (y−µ

    2ασK(υ1)

    )2]−(υ1+1)/2, if y ≤ µ;

    [1 + 1υ2

    (y−µ

    2(1−α)σK(υ2)

    )2]−(υ2+1)/2, if y > µ,

    (12)

    where θ = (α, υ1, υ2, µ, σ)T .

    Figure 1 illustrates AST densities using (12) with various values for α, υ1, and υ2.In each case the location parameter µ is 0 and the scale parameter σ is 1. In left-hand-side panels, υ1 is held constant at 2 while υ2 varies; in right-hand-side panels, υ2 = 2while υ1 varies. The central panels illustrate the α = 0.5 case: the densities are ofcourse symmetric in cases for which α = 0.5, υ1 = υ2, which are among those depicted.For the restriction to υ1 = υ2 but with α unrestricted, the distribution is equivalent tothe skewed Student-t distributions of Hansen (1994) and Fernandez and Steel (1998)(which are equivalent after re-parameterization; see Zhu and Galbraith 2010, p.298).

    2.4 Downside risk measurement in the AEPD and AST

    The Expected Shortfall (ES) is a measure of risk of loss that has gained increasingprominence in recent financial literature, particularly because (unlike the Value atRisk) it is a coherent risk measure, and takes into account extreme negative returns.3

    However, since the expected shortfall is defined as a conditional mean as in (13), sothat this risk measure is very sensitive to extreme negative returns, it is particularlyimportant to model the left tail of the distribution of returns accurately.

    We will define the expected shortfall for random variable Y at a point in the supportof the distribution q as

    ES(q) ≡ E(Y | Y < q); (13)

    note however that the expression is sometimes defined with a negative sign, in whichcase the expected shortfall becomes a positive number for negative returns. Note alsothat while we are defining the ES at a point q in the return distribution rather than ata probability p = F (q), the point q is expressed as a percentage return in the empiricalstudy below; for example we consider quantities such as ES(-1%), i.e. the expectedreturn conditional on the return being less than -1%. The expression for ES in terms ofa confidence level p, denoted by ES∗(p), is given by substituting q = V aR(p) = F−1 (p)into the expression for ES(q).

    The expected shortfall is computed analytically for the AEPD and AST distribu-tions in Zhu and Zinde-Walsh (2009) and Zhu and Galbraith (2010) respectively. For

    3A risk measure is called coherent if it satisfies a set of reasonable properties proposed by Artzneret al. (1997, 1999): translation invariance, subadditivity, positive homogeneity and monotonicity (seealso Bradley and Taqqu 2003).

    6

  • the AEPD in standard form (3), ES can be expressed as

    ESAEP (q;β) =2

    FAEP (q)

    {−αα∗E (|YEP (p1)|)

    [1−G

    (h1 (q) ;

    2

    p1

    )]+(1− α)(1− α∗)E (|YEP (p2)|)G

    (h2 (q) ;

    2

    p2

    )}, (14)

    where β = (α, p1, p2), FAEP (q) is the AEPD cdf, YEP (pj) is a r.v. having the standardGED distribution with parameter pj , G(·; 2/pj) is the gamma cdf with parameter 2/pj(j = 1, 2), the expressions for FAEP (q) and E (|YEP (p)|) can be found in Zhu andZinde-Walsh (2009), E (|YEP (p)|) = p1/pΓ(2/p)/Γ(1/p), and

    h1 (q) =1

    p1

    ∣∣∣∣min{q, 0}2α∗∣∣∣∣p1 , h2 (q) = 1p2

    ∣∣∣∣max{q, 0}2(1− α∗)∣∣∣∣p2 .

    For the AST distribution in standard form (8),

    ESAST (q;β) =2

    FAST (q)

    {−αα∗E (|Yt(υ1)|) [1 + w1 (q)](1−υ1)/2 + (15)

    (1− α)(1− α∗)E (|Yt(υ2)|)(

    1− [1 + w2 (q)](1−υ2)/2)}

    ,

    where β = (α, υ1, υ2), FAST (q) is the AST cdf, Yt(υj) is a r.v. having standard Student-t distribution with parameter υj > 1, the expressions for FAST (q) and E (|Yt(υ)|) canbe found in Zhu and Galbraith (2010), E (|Yt(υ)|) =

    √υ/πΓ ((υ − 1) /2) /Γ (υ/2) , and

    w1 (q) =1

    υ1

    (min{q, 0}

    2α∗

    )2, w2 (q) =

    1

    υ2

    (max{q, 0}2(1− α∗)

    )2.

    Substituting the expressions for E (|YEP (pj)|) and E (|Yt(υj)|) respectively into (14)and (15) will lead to the results in Zhu and Zinde-Walsh (2009) and Zhu and Galbraith(2010).

    Again, taking q = V aRAEP (p) ≡ F−1AEP (p) or q = V aRAST (p) ≡ F−1AST (p) as

    appropriate, we can also express the ES as a function of a confidence level p ∈ (0, 1).Figure 2 plots the ES expressed in this way for a number of examples of AEPD

    and AST distributions. Different values of the tail and asymmetry parameters areused for rough conformity with the range of values observed empirically (as reportedbelow), and for comparability of the two distributions. The top panels treat cases inwhich α is below 12 ; the middle panels show cases in which the two tail parametersare equal, corresponding therefore with the skewed exponential power distribution andskewed Student-t respectively: α is again chosen in these cases for rough conformitywith empirical estimates reported below for these models. (Note that these graphicsdiffer in form from those used by Komunjer (2007) for plotting ES.) While the patternsare fairly similar, the AST expected shortfalls tend to show slightly greater curvature,turning up somewhat more rapidly at very low quantiles (the extreme left tail). Thelower panels illustrate the fact, visible in another form in Figure 1, that higher valuesof the asymmetry parameter α are associated with thicker lower tails, hence greaterexpected shortalls.

    7

  • 3 Data, estimation and inference

    We now consider empirical modelling and forecasting of the expected shortfall withGARCH-class models and compare forecast performance with error distributions givenby either the general AEPD or AST distributions or a special case of one of these.In recording results from the special cases, we are able to provide an overview andsummary of the forecast performance for a wide set of error distributions which nestsmany forms in the existing literature. For the AEPD, we estimate the general form andspecial cases corresponding with the skewed exponential power distribution (SEPD, inwhich p1 = p2 is imposed in the AEPD-), the generalized error distribution (GED),and the AEPD with α restricted to 12 . Correspondingly, for the AST distribution, weestimate the general form and special cases corresponding with the skewed Student-t(SST, in which υ1 = υ2 is imposed in the AST), the usual Student-t (ST), and theAST with α restricted to 12 to represent asymmetry arising purely from different leftand right tail behavior.

    We model daily returns on the S&P500 composite index and several individualcompany stocks (Adobe Systems, Bank of America, JP Morgan, Pfizer, and Starbucks)chosen to represent different industries and lengths of data history; we include twofinancial companies because of the particularly interesting effects of recent events onthis sector. The data sets end at December 31 2008; for the S&P 500 we take twosamples, one beginning on January 2, 1965 (11076 observations), and the other fora shorter period which excludes the crash of 1987 and the immediate aftermath; theshorter sample begins January 2, 1990 (4791 obs.).4

    3.1 Model fit and comparison

    For each model, maximum likelihood estimates are obtained by numerical maximizationof the corresponding likelihood function.5 Let the parameter vector of the likelihoodbe φ, where φ =(m, b0, b1, b2, c, α, p1, p2) for the AEPD and φ =(m, b0, b1, b2, c, α,υ1, υ2) for the AST. The general form of the likelihood is then

    lT (φ; r) =

    T∑t=1

    {log δ − log σt + log fY (ω + δ

    rt −mσt

    ;β)

    }, (16)

    where fY (·;β) is the AEPD density with parameters β = (α, p1, p2)′, the AST densityfunction with parameters β = (α, υ1, υ2)

    ′, or a constrained version of one of these.The parameters ω ≡ ω(β) and δ ≡ δ(β) denote the mean and standard deviation of

    4Data come from CRSP (Center for Research on Security Prices, University of Chicago). The returnrt in period t is defined as rt = 100 × (Pt − Pt−1)/Pt−1, where Pt is the security price or index levelat time t. Other sample sizes are: Adobe, 5646; Bank of America, 6760; J.P. Morgan, 10052; Pfizer,16632; Starbucks, 4161.

    5The ML estimation is implemented in Matlab 7 with the command ‘fmincon’ and initial valuesφ0 = (mean(r), b0, 0.9, 0.05, 0, 0.5, 6, 6) for the AST-NGARCH(1,1) model and φ0 = (mean(r), b0,0.9, 0.05, 0, 0.5, 2, 2) for the AEP-NGARCH(1,1) model, where b0 is given by the sample variance ofreturn data multiplied by 1− b1 − b2 = 0.05. Code for estimation is available from the authors.

    8

  • fY (·;β) respectively; their expressions are given in (6)-(7) for the AEPD and in (10)-(11) for the AST. Since we model a return series as a stationary AEPD-NGARCH(1,1)or AST-NGARCH(1,1) process, the unconditional mean and variance are in either caseE(rt) = m and V ar(rt) = b0/

    [1− b1 − b2(1 + c2)

    ]. To examine purely the effects

    of different distributional specifications of innovations we constrain the unconditionalmoments E(rt) and V ar(rt) = b0/

    [1− b1 − b2(1 + c2)

    ]in all eight models (the general

    AEPD, AST and their restricted cases), to equal the same values, the sample meanand variance of returns respectively, which are their consistent estimates.

    The ML estimates of the parameters of the AEPD, AST and the nested distributionclasses mentioned above, and their standard errors, are presented for the post-crashsample of S&P 500 Composite Index data in Table 1a/1b; for individual companystocks we report below a more limited set of results on fit and forecast performance.

    Table 1a: Parameter estimates for AEPD-NGARCH(1,1) modelsS&P 500 daily returns, 02.01.1990–31.12.2008

    b0 b1 b2 c α p1 p2AEPD .0103

    (.0017).880(.0088)

    .057(.0044)

    .9947(.0011)

    .461(.024)

    1.31(.053)

    1.71(.100)

    AEPD, α = 12 .0106(.0017)

    .878(.0089)

    .057(.0044)

    1.002(.0011)

    1.38(.034)

    1.59(.056)

    SEPD .0108(.0017)

    .876(.0091)

    .057(.0044)

    1.012(.0011)

    .536(.0085)

    1.47(.036)

    GED .0107(.0017)

    .877(.0090)

    .056(.0044)

    1.018(.0011)

    1.46(.033)

    Table 1b: Parameter estimates for AST-NGARCH(1,1) modelsS&P 500 daily returns, 02.01.1990–31.12.2008

    b0 b1 b2 c α υ1 υ2AST .0099

    (.0016).879(.0088)

    .056(.0044)

    1.013(.0011)

    .499(.0171)

    6.82(.930)

    16.7(6.68)

    AST, α = 12 .0099(.0016)

    .878(.0087)

    .056(.0043)

    1.013(.0011)

    6.82(.715)

    16.5(4.96)

    SST .0103(.0016)

    .875(.0091)

    .057(.0045)

    1.024(.0011)

    .531(.0104)

    8.94(.969)

    ST .0101(.0016)

    .876(.0092)

    .056(.0045)

    1.031(.0011)

    8.74(.878)

    Estimates of the parameters bi and c of the NGARCH model are similar across allspecifications; again, the mean m is in each case set to the sample mean of returnsrather than being estimated jointly with the other parameters. Note that, while theparameter α of the general AST is estimated to be almost exactly 0.5 on this dataset, values on the other data sets analyzed are typically well below 0.5. These values,and likelihood ratio tests of the hypothesis H0 : α =

    12 are presented in Table 2; on

    9

  • individual company data the restriction to α = 12 is invariably rejected at conventionallevels, usually very strongly. The estimated values fall around 0.45, suggesting thatvalues of α somewhat below 0.5 are an empirical regularity on equity return data, andreinforcing the suggestion that relaxing the restriction to α = 12 provides a genuineimprovement in the description of such data.

    Table 2: Likelihood ratio tests of HA : α =12 and HB : equal tail parameters

    Company or index class α̂ HA : stat. HA : p− HB : stat. HB : p−

    S&P 500, 1990– AEPD 0.461 2.08 0.15 7.14 .008S&P 500, 1990– AST 0.499 .005 0.94 6.95 .008S&P 500, 1965– AEPD 0.462 3.80 0.05 10.9 .001S&P 500, 1965– AST 0.498 .032 0.86 5.54 .019Adobe Systems AEPD 0.490 10.4 .001 5.08 .024Adobe Systems AST 0.436 21.1

  • and unit variance, their estimated cdf F̂ (x) can be expressed as F̂ (x) = FY (ω̂+ δ̂x; β̂),where FY (·;β) is the cdf of the AEPD or AST with the three parameters β (fewer inrestricted cases), β̂ is the ML estimate of β, and ω̂ and δ̂ are given by ω̂ = ω(β̂) andδ̂ = δ(β̂), respectively the estimated mean and standard deviation of FY (·;β). Sincethe AD statistic is actually the sup-norm of the normalized difference between FT (x)and F̂ (x) and thus gives appropriate weight to the tails of the distribution, it can beused to measure goodness of fit in the tails.

    Table 3 displays the four measures of goodness-of-fit for the estimated AEPD-NGARCH(1,1) and AST-NGARCH(1,1) models on the 1990– sample of S&P 500 indexdata; for other samples, Table 4a/4b below reports the model ranked best by eachmeasure.

    Table 3: Goodness-of-fit measures for all NGARCH(1,1) modelsS&P 500 daily returns, 02.01.1990–31.12.2008

    L AICC SIC ADAEPD -6210.2 12436. 12420. 15.6

    AEPD, α = 12 -6211.2 12436. 12422. 19.8SEPD -6213.7 12442. 12428. 27.2GED -6222.4 12457. 12445. 42.8

    AST -6209.5 12435. 12419. 3.53AST, α = 12 -6209.5 12433. 12419. 3.57

    SST -6213.0 12440. 12426. 6.08ST -6218.2 12448. 12436. 7.28

    As the estimate of α for the AST in Table 1b would suggest, the unrestricted ASTand AST with α = 12 are essentially indistinguishable on this data set by most criteria,although the AICC chooses the restricted version, and the AD statistic the unrestricted.Either of these is preferred by all statistics to the SST (one tail parameter) and thestandard Student-t. For the AEPD, all criteria prefer the fully general version exceptthe AICC, which is equal to the value with α = 12 to five digits.

    In comparing the two classes, AEPD and AST, the AST is preferred in each caseto the corresponding AEPD distribution, by any of the goodness-of-fit criteria.

    Statistical inference takes the form of LR tests (LR = 2(Lu − Lr), where Lu andLr are respectively unrestricted and restricted log-likelihood values) of the restrictionα = 12 and of the restriction to equal tail parameters (p1 = p2 or υ1 = υ2). On theS&P 500 data (for either sample size, as Table 2 below indicates), and for each classof distribution, the test fails to reject α = 12 , but does reject the restriction to one tailparameter rather than two. However, on the individual company data (Table 2) therestriction to α = 12 is invariably rejected, and the restriction to one tail parameter isrejected in a minority of cases (four of ten at the 10% level, and only two of ten at the5% level). Of course, the tail parameters are (being dependent on the relatively sparse

    11

  • extreme observations) relatively difficult to estimate precisely, so that distinguishingdifferent left and right tail parameters may require very large samples.

    Tables 4a/4b report a summary of the best-ranked models, by the measures de-scribed earlier, for each of the AEPD and AST classes. The criteria are generally inagreement in favoring the most general model within each class, and in favoring theAST class over the AEPD. Specifically, the AD statistic invariably chooses the mostgeneral AST specification as the preferred model; the AICC also invariably prefers theAST class, but within each class often chooses a restricted model rather than the mostgeneral; the SIC always chooses the most general member of each class, and in all butone case (Bank of America) chooses the AST class rather than the AEPD.

    Table 4a: Preferred AEPD-class model by various criteria6

    Data series AICC SIC AD MAE−1.2%,j=1

    MAE−1.2%,j=5

    S&P 500, 1990– A A A A AS&P 500, 1965– A A A A-D A-DAdobe Systems A A A A* A*

    Bank of America A A* A A D*JP Morgan C A B A* A*

    Pfizer C A D A* A*Starbucks C A D A* A*

    Table 4b: Preferred AST-class model by various criteria7

    Data series AICC SIC AD MAE−1.2%,j=1

    MAE−1.2%,j=5

    S&P 500, 1990– B* A* A* A* A*S&P 500, 1965– B* A* A* A-D* A-D*Adobe Systems C* A* A* B A

    Bank of America A* A A* D* DJP Morgan C* A* A* A D

    Pfizer A* A* A* D DStarbucks C* A* A* C C

    3.2 Inference on conditional distribution specifications

    As others have noted (e.g. Taylor 2008), direct inference on expected shortfall forecastsleads to few significant distinctions because of the small number of values in the lower

    6A: General AEPD; B: AEPD with α = 0.5; C: SEPD; D: GED. In each table, ‘*’ indicates amore favorable value than for the best element of the other model class. The notation A-D indicatesthat results are essentially indistinguishable within a model class; the ‘*’ nonetheless indicates whichmodel class provides superior results.

    7A: General AST; B: AST with α = 0.5; C: SST; D: ST

    12

  • tails. However, we can test the conditional distribution specifications directly. Theevidence in this section complements the likelihood-based inference just reported, whichcan test restrictions within the context of a model class, but cannot test the adequacyof the model class (here AEPD or AST) itself.

    To test the applicability of the AEPD and AST distributions for modeling thestandardized innovations, we apply the test of Bai (2003), which is designed preciselyfor models of the type used here. The test allows us to consider the null hypothesis H0:the conditional distribution of rt conditional on Ωt−1 is in the AEPD or AST family.

    Bai’s test statistic is Tn = sup0≤r≤1

    ∣∣∣Ŵn(r)∣∣∣ , where Ŵn(r) is defined in equation (5)of Bai (2003); under some regularity conditions, Tn

    d→ sup0≤r≤1 |W (r)| under H0,where W (r) is a standard Brownian motion. Bai (2003) also provides critical valuesof the test procedure at significance levels 10%, 5%, and 1%, which are 1.94, 2.22, and2.80, respectively. Details of the test implementation in this context are given in theAppendix.

    Table 5: Bai tests of conditional distributionCompany or index AEPD SEPD GED AST SST ST

    S&P 500, 1990– 0.93 1.32 3.10 1.70 1.46 3.11S&P 500, 1965– 1.93 1.13 2.67 1.64 1.29 2.75Adobe Systems 7.93 10.2 10.0 1.26 1.19 3.87Bank of America 4.91 4.11 4.22 2.49 2.35 3.56JP Morgan 5.40 5.51 3.83 4.12 3.99 6.13Pfizer 9.22 9.94 5.54 5.59 5.46 8.22Starbucks 4.74 4.58 3.98 1.08 0.94 3.64

    Table 5 provides test statistics for each of the data sets. On S&P 500 data, none ofthe more general specifications are rejected; in individual company data, however, thereare numerous rejections, and indeed all AEPD class specifications are rejected on eachcompany data series. More general AST specifications, by contrast, survive more tests.Nonetheless in data from Pfizer and J.P. Morgan all distributions are rejected, althoughless dramatically for the AST specifications. Results for these AST-class models aremarginally significant (at 5%, not at 1%) for Bank of America data; given that little isknown about the small-sample properties of the Bai test to outliers, we would interpretthe marginal rejections very cautiously. AST and SST specifications are the only oneswhich are not rejected on Adobe and Starbucks data.

    Overall this evidence is compatible with that of Tables 2–4, suggesting that AST-class models provide superior overall characterizations of the full conditional distribu-tion. Again, however, we note that this does not imply that purely tail-related featuresof performance will be superior.

    13

  • 4 Expected shortfall prediction

    4.1 ES forecast evaluation

    To predict the downside risk in the period t + j (j = 1, 2, 3, ...) using the informationavailable in period t, we must give a j-step-ahead forecast of the conditional distri-bution of returns rt+j , F̂t+j|t(rt+j). Given the models specified in (1), the conditionaldistribution is time-varying only due to the time-varying conditional mean and vari-ance. Therefore forecasting the conditional distribution amounts to estimating theparameters of the model using the data available at time t, and then forecasting theconditional mean (mt+j) and variance (σ

    2t+j) of rt+j .

    The forecast of the conditional ES can be divided into two steps. First of all, wegive the forecasts of the conditional mean and variance. Theoretically, the j-step-ahead

    forecast of σ2t+j , denoted by σ2t+j|t, is defined in the usual way: σ

    2t+j|t ≡ E

    (σ2t+j |Ωt

    );

    σ2t+1|t = b0 + b1σ2t + b2(rt −m− cσt)2, (17)

    σ2t+j|t = b0 +[b1 + b2(1 + c

    2)]σ2t+j−1|t, j ≥ 2. (18)

    Similarly, the j-step-ahead forecast of mt+j is expressed as mt+j|t ≡ E (mt+j |Ωt); forthe models (1) mt+j|t = m.

    8 Note here that these forecasts of the conditional mean andvariance do not depend on zt’s distribution specification.

    9 We can therefore use thetwo-step method as in Komunjer (2007) to estimate the m and parameters of NGARCHequation by using data up to time t and a Gaussian quasi-maximum likelihood estimator(QMLE), which are consistent and are expressed as (m̂t, b̂0t, b̂1t, b̂2t, ĉt). Substitutingthese estimates into the expressions for mt+j|t and σ

    2t+j|t (see (17) and (18)) yields

    m̂t+j|t and σ̂2t+j|t, the estimates of the j-step-ahead forecasts of mt+j and σ

    2t+j ,

    m̂t+j|t = m̂t,

    σ̂2t+1|t = b̂0t + b̂1tσ̂2t + b̂2t(rt − m̂t − ĉtσ̂t)2,

    σ̂2t+j|t = b̂0t +[b̂1t + b̂2t(1 + ĉ

    2t )]σ̂2t+j−1|t, j ≥ 2.

    Based on these forecasts in time t, the return rt+j can be expressed approximately asr̂t+j ≡ m̂t+j|t+ σ̂t+j|tzt+j that has an AEPD or AST-type conditional distribution withmean m̂t+j|t and standard deviation σ̂t+j|t, and thus the j-step-ahead forecast of the

    conditional ES, ÊSt+j|t(q) ≡ E(r̂t+j |r̂t+j < q,Ωt), is expressed as

    ÊSt+j|t(q) = m̂t+j|t + σ̂t+j|t

    [ESA(qt+j ;β)− ω (β)

    δ (β)

    ], (19)

    8For an ARMA specification of the conditional mean, the j-step-ahead forecast of mt+j based onthe information available at time t, mt+j|t ≡ E (mt+j |Ωt), can also be evaluated; for example, ifmt = m+ a1rt−1, then mt+j|t = m

    (1− aj1

    )/ (1− a1) + aj1rt.

    9However, for the TGARCH(1,1) specification, σ2t = b0+b1σ2t−1+σ

    2t−1z

    2t−1 [b2 + c1(zt−1 < 0)], σ

    2t+j|t

    depends not only on the specifications of conditional mean and variance but also on the specificationof zt’s distribution through the term E

    [z2t 1(zt < 0)

    ].

    14

  • where ESA (q;β) represents either ESAEP (q;β) or ESAST (q;β) given in (14) and (15),qt+j = ω (β) + δ (β) (q − m̂t+j|t)/σ̂t+j|t, ω (β) and δ (β) are defined in (6)-(7) for theAEPD and in (10)-(11) for the AST. Obviously, the conditional ES depends on all thespecifications of mt, σ

    2t and zt’s distribution.

    In the second step, we estimate the parameter vector β of the distribution ofztspecified in (1), which is necessary for evaluation of ÊSt+j|t(q) in (19). Note thatif the model (1) is correctly specified, then the ex post innovations ẑi ≡ (ri − m̂t)/σ̂i(i = 1, 2, ..., t) constructed by using the QMLE (m̂t, b̂0t, b̂1t, b̂2t, ĉt) from the first step(with σ̂2i = b̂0t + b̂1tσ̂

    2i−1 + b̂2t(ri−1− m̂t− ĉtσ̂i−1)2), should approximately have a stan-

    dardized AEPD or AST distribution. Therefore, a consistent MLE of β, denoted byβ̂t, can be obtained by fitting the standardized AEPD or AST distribution using theex post innovations {ẑi}ti=1. We evaluate ÊSt+j|t(q) by substituting β̂t into (19).

    4.2 Empirical ES forecast results

    Figure 3 illustrates the one-step-ahead forecast expected shortfalls on all six data seriesusing an initial sample for estimation of 1000, with recursive updating of parameterestimates thereafter, using the general AST distribution and a -1% return threshold.Note that we are computing these expected shortfalls at a particular loss rather thanat a particular tail probability, as is more usual. There is considerable variation acrosssecurities, as well as through time, in the typical level of expected shortfall; note thatvertical scales differ, and in particular that the two financial stocks have wide verticalscales to accommodate the high expected shortfalls observed recently.

    To check predictive performance out-of-sample, we split the data samples. We usea larger initial sample of N = 2000 points for more precise initial estimation than wasrequired for the graphical results depicted above; the remaining sample points (sub-samples of varying size across the data sets) are used for out-of-sample evaluation. We

    recursively evaluate ÊSt+j|t(q), N = 2000 ≤ t ≤ T − j, for one and five steps ahead(j = 1, 5), and we set the threshold (loss) returns q = −1.2%,−1%,−0.8%,−0.6%in order to have a substantial number of sample points at which losses exceed q. Foreach of (j, q), if the model is correctly specified we expect the average of the observed

    rt+j-values (rN+j , ..., rT ) less than q should be approximately equal to the ÊSt+j|t(q)predicted by the model. If the observed expected shortfall

    ÊSj(q) ≡1

    J

    T−j∑t=N

    rt+j1{rt+j < q}, where J =T−j∑t=N

    1{rt+j < q} (20)

    is higher10 than a model’s average predictive ES,

    ÊSM

    j (q) ≡1

    J

    T−j∑t=N

    ÊSt+j|t(q)1{rt+j < q}, (21)

    then the model tends to overestimate the risk. This is measured by the mean error,

    MEj(q) ≡ ÊSM

    j (q) − ÊSj(q); a negative value therefore represents overestimation of10i.e. less negative: recall by (13) that we do not use a sign change in the definition of ES.

    15

  • this measure of risk. Another important measure of predictive out-of-sample perfor-mance is the mean absolute error

    MAEj(q) ≡1

    J

    T−j∑t=N

    ∣∣∣ÊSt+j|t(q)− ÊSj(q)∣∣∣ 1{rt+j < q}. (22)Tables 6a/6b show the predictive performance for the expected shortfall risk on

    the S&P 500 data, 1990–; the entries in the table are the mean errors 100MEj(q) andthe mean absolute errors 100MAEj(q) of the expected shortfall predictions for oneand five steps ahead. From the mean errors MEj , we see that both the AEPD-classand AST-class models tend to overestimate the risk (have a negative mean error) athorizon j = 1 for larger loss thresholds (q = −1.2%,−1%), but show mean errors closeto zero for the smaller thresholds (q = −0.8%,−0.6%). At the longer horizon j = 5,all models underestimate risk on this data set. The AST-class tends to underestimateES more than the AEPD-class. In the mean absolute errors MAEj , the AST-class hasbetter performance than the AEPD-class for larger loss thresholds (q = −1.2%), but isin general worse for smaller loss thresholds; the more general distributions outperformthe corresponding subclasses in almost all cases. The best performance by the MAEcriterion is in boldface in each case.

    Table 6a: Predictive performance for expected shortfall risk11

    AEPD-NGARCH(1,1) models, S&P 500 daily returns, 02.01.1990–31.12.2008

    q -1.2% -1% -0.8% -0.6%j = 1 ME1,MAE1 ME1,MAE1 ME1,MAE1 ME1,MAE1

    AEPD -0.062, 0.372 -0.030, 0.374 -0.012, 0.356 0.018, 0.344AEPD, α = 12 -0.043, 0.392 -0.019, 0.391 -0.010, 0.370 0.010, 0.359

    SEPD -0.029, 0.400 -0.007, 0.399 0.001, 0.376 0.020, 0.354GED 0.005, 0.397 0.025, 0.397 0.030, 0.373 0.047, 0.355

    j = 5 ME5,MAE5 ME5,MAE5 ME5,MAE5 ME5,MAE5

    AEPD 0.031, 0.331 0.061, 0.325 0.058, 0.312 0.071, 0.299AEPD, α = 12 0.059, 0.349 0.080, 0.339 0.065, 0.320 0.066, 0.309

    SEPD 0.073, 0.360 0.093, 0.349 0.076, 0.328 0.076, 0.304GED 0.103, 0.364 0.121, 0.354 0.103, 0.330 0.101, 0.311

    11Note: The entries are the mean errors ME j (q) ≡ ÊSM

    j (q)− ÊS j (q) and the mean absoluteerrors MAE j (q) defined in (22), multiplied by 100, for the threshold losses (negative returns)q = −1.2%,−1%,−0.8%,−0.6% and j = 1, 5.

    16

  • Table 6b: Predictive performance for expected shortfall risk12

    AST-NGARCH(1,1) models, S&P 500 daily returns, 02.01.1990–31.12.2008

    q -1.2% -1% -0.8% -0.6%j = 1 ME1,MAE1 ME1,MAE1 ME1,MAE1 ME1,MAE1

    AST -0.034, 0.367 0.001, 0.378 0.018, 0.363 0.043, 0.354AST,α = 12 -0.034, 0.374 -0.003, 0.382 0.012, 0.365 0.034, 0.362

    SST -0.006, 0.391 0.020, 0.397 0.029, 0.376 0.048, 0.354ST 0.021, 0.391 0.045, 0.398 0.052, 0.377 0.068, 0.362

    j = 5 ME5,MAE5 ME5,MAE5 ME5,MAE5 ME5,MAE5

    AST 0.057, 0.328 0.092, 0.333 0.088, 0.321 0.097, 0.312AST, α = 12 0.059, 0.334 0.090, 0.335 0.083, 0.322 0.088, 0.318

    SST 0.089, 0.356 0.114, 0.353 0.101, 0.334 0.103, 0.310ST 0.113, 0.362 0.137, 0.360 0.123, 0.339 0.122, 0.321

    We do not record the information analogous to that in Tables 6a/6b for each ofthe other data sets, but instead provide a summary of these results for each of theindividual companies: the best-performing model by MAE at the smallest threshold, in1-step and 5-step forecasts, is recorded in Table 4a/4b along with the model selectioncriteria. Results are presented separately for AEPD- and AST- class models; a ‘*’indicates which of these classes provided a superior results on each criterion.

    With respect to model fit, as we have noted, each of the criteria tends clearly tofavor the AST class rather than the AEPD; within each of these classes the most generalvariant is usually selected by AD and always by SIC, but there is more variability inwithin-class results from the AICC (which nonetheless invariably prefers the AST classto AEPD). While not unambiguous, therefore, the overall evidence tends to favor themost general AST specifications. Moreover, in likelihood ratio tests of the restrictionto α = 12 , the restriction is strongly rejected on company data, although not on thecomposite index. The estimated values of α are invariably below 0.5, suggesting agenuine empirical regularity. The restriction to one tail parameter rather than twois also rejected in general. Overall, both information criteria and statistical inferencetend to suggest that the additional parameters in the most general AEPD or ASTspecifications are valuable, on the available sample sizes, in characterizing financialreturn volatility.

    In the empirical prediction of expected shortfall, the AEPD class tends to outper-form the AST on these data sets, and in most cases the model which provides the bestES predictions is the most general form of AEPD; however, there are exceptions. Thebest model for forecasting expected shortfall, whichever class it comes from, tends to

    12See note to Table 6a.

    17

  • be one of the fully-general models with α and both tail parameters allowed to vary.The empirical prediction results therefore tend to confirm the result just noted, thatthe additional parameters of the most general specifications provide genuine improve-ments in our ability to model and forecast financial return data, relative to the morerestrictive specifications available in the previous literature.

    5 Concluding remarks

    Useful measures of downside risk must reflect the potential for extreme outcomes, butmeasuring and forecasting such risk for financial assets is challenging because of theasymmetry and heavy-tailedness of return distributions. Nonetheless, a great dealof progress has recently been made in treating these features more realistically, withaccompanying improvements in forecasting power for downside risk measures.

    The present paper has attempted further progress on this research program. Usinggeneral, asymmetric exponential power and Student-t distributions with separate pa-rameters to control skewness and the thickness of each tail, we have greater flexibilityto use information in a large sample of data, and in particular to avoid constrainingthe left and right tails to have the same thickness. In doing so we can potentiallyobtain better estimates of the thickness of the left tail, with corresponding potentialimprovements in forecasting power for risk of loss, measured here by the expectedshortfall.

    The natural question to ask of any generalization of a model is whether the ad-ditional parameters provide discernible improvements, or whether the additional flex-ibility is unnecessary or is dominated at available sample sizes by the efficiency costof estimating additional parameters. Here, evidence from model fit criteria, statisticalinference and out-of-sample forecasts tends to suggest that the additional generalitydoes in many cases produce improvements in both fit and forecasting power relative tothose with more restricted specifications of the distribution of standardized innovations.Likelihood ratio tests of the restriction that sets the additional parameter α, control-ling skewness independently of the tail parameters, to its central value of 12 generallyreject the restriction strongly, suggesting that the additional flexibility offered by themore general distributions has genuine value in describing financial returns data. Theparameter is typically estimated to be less than 12 . Tail parameters are less preciselyestimated, and tests of the restriction to equal tail parameters often fail to reject onthese samples, although both goodness of fit and forecasting performance generallyproduce better results with the additional parameter.

    The balance of evidence suggests the potential for genuine improvement in risk man-agement models and forecasts relative to restricted variants of these models availablein the prior literature, although there will certainly be cases in which more restrictedversions will be appropriate. Whether improvements in fits and forecasts would alsobe visible in asset return series where asymmetry may be less important, such as someexchange rate series, remains to be seen.

    18

  • 6 Appendix

    Using the numerical method of Bai (2003, Appendix B), the test statistic can be com-puted as:

    Tn ≈ sup1≤j≤n

    ∣∣∣Ŵn(υj)∣∣∣ = max1≤j≤n

    √n

    ∣∣∣∣∣ jn − 1nj∑

    k=1

    [g′ (υk)

    ]TC−1k Dk (υk − υk−1)

    ∣∣∣∣∣ ,where Dk =

    ∑ni=k g

    ′ (υi) , Ck =∑n

    i=k g′ (υi) [g

    ′ (υi)]T (υi+1 − υi), and where υ1 < υ2 <

    ... < υn are ordered values of Ût = FAEP (ω(β) + δ(β)ẑt;β) or FAST (ω(β) + δ(β)ẑt;β),where FAEP (·) and FAST (·) are the cdf’s of standard AEPD and AST distributions.Expressions for these can be found in Zhu and Zinde-Walsh (2009) and Zhu and Gal-braith (2010), respectively. In addition, we impose υ0 = 0 and υn+1 = 1. What remainsis to derive the function g (r) so that we can compute the derivative values g′ (υi) ofg (·); by equation (8) of Bai (2003),

    g (r) = (g1, g2, g3)T =

    (r, f0

    (F−10 (r)

    ), f0(F−10 (r)

    )F−10 (r)

    )T,

    where f0 and F0 are respectively the pdf and cdf of the standardized AEPD or AST, thederivative of g (·) is given by g′1 (r) = 1, g′2 (r) = f ′0

    (F−10 (r)

    )/f0

    (F−10 (r)

    ), and g′3 (r) =

    1 + g′2 (r)F−10 (r) . For the standardized AEPD, F

    −10 (r) =

    [F−1AEP (r;β)− ω(β)

    ]/δ(β),

    g′2 (r) =

    {δ(β)

    (1

    2α∗

    )p1 ∣∣F−1AEP (r;β)∣∣p1−1 , if r < α,−δ(β)

    (1

    2(1−α∗)

    )p2 ∣∣F−1AEP (r;β)∣∣p2−1 , if r > α.For the standardized AST, F−10 (r) =

    [F−1AST (r;β)− ω(β)

    ]/δ(β), 13

    g′2 (r) =

    −δ(β)

    (υ1+12α∗υ1

    )F−1AST (r;β)

    2α∗

    [1 + 1υ1

    (F−1AST (r;β)

    2α∗

    )2]−1, if r < α,

    −δ(β)(

    υ2+12(1−α∗)υ2

    )F−1AST (r;β)2(1−α∗)

    [1 + 1υ2

    (F−1AST (r;β)2(1−α∗)

    )2]−1, if r > α.

    When β is replaced with a consistent estimate β̂, for all Ck to be nonsingular, we use

    the statistic T ∗n = [n/ (n− l)]1/2 sup1≤j≤n−l

    ∣∣∣Ŵn(υj)∣∣∣ for a small number l > 0 (seeTheorem 4 of Bai (2003)).

    References

    [1] Aas, K. and Haff, I.H. (2006). The generalized hyperbolic skew Student’s t-distribution. Journal of Financial Econometrics, 4(2), 275-309.

    13Here we need to evaluate only g′ (υi) (i = 1, 2, ..., n). In fact, for the standardized AEPD or ASTD,F−10 (υi) = ẑ(i), F

    −1AEP (υi;β) or F

    −1AST (υi;β) = ω(β) + δ(β)ẑ(i) with the corresponding ω(β) and δ(β).

    19

  • [2] Alberg, D., H. Shalit and R. Yosef (2008). Estimating stock market volatility usingasymmetric GARCH models. Applied Financial Economics 18, 1201-1208.

    [3] Anderson, T. and Darling, D. (1952). Asymptotic theory of certain “goodness offit” criteria based on stochastic process. The Annals of Mathematical Statistics 23,193-212.

    [4] Artzner, P., Delbaen, F., Eber, J.M. and Heath, D. (1997). Thinking coherently.Risk 10.

    [5] Artzner, P., Delbaen, F., Eber, J.M. and Heath, D. (1999). Coherent measures ofrisk. Mathematical Finance 9, 203-228.

    [6] Ayebo, A. and Kozubowski, T.J. (2004). An asymmetric generalization of Gaussianand Laplace laws. Journal of Probability and Statistical Science 1, 187-210.

    [7] Azzalini, A. and A. Capitanio (2003). Distributions generated by perturbationof symmetry with emphasis on a multivariate skew t distribution. Journal of theRoyal Statistical Society B 65, 367-389.

    [8] Bai, J. (2003). Testing parametric conditional distributions of dynamic models.Review of Economics and Statistics 85, 531-549.

    [9] Bauwens, L. and Laurent, S. (2002). A new class of multivariate skew densities,with application to GARCH models. Computing in Economics and Finance 5.

    [10] Bollerslev, T. (1986). Generalized autoregressive conditional hetero- skedasticity.Journal of Econometrics 31, 307-327.

    [11] Bollerslev, T. (1987). A conditional heteroskedastic time series model for specula-tive prices and rates of return. Review of Economics and Statistics 69, 542-547.

    [12] Bollerslev, T., R.Y. Chou and K.F. Kroner (1992). ARCH modeling in finance: Aselective review of the theory and empirical evidence. Journal of Econometrics 52,5-59.

    [13] Bradley, B.O. and M.S. Taqqu (2003). Financial risk and heavy tails. Handbook ofHeavy Tailed Distributions in Finance, ed. S.T. Rachev.

    [14] Branco, M.D. and Dey, D.K. (2001). A general class of multivariate skew-ellipticaldistributions. Journal of Multivariate Analysis 79, 99-113.

    [15] Engle, R.F. (1982). Autoregressive conditional heteroskedasticity with estimatesof the variance of U.K. inflation. Econometrica 50, 987-1008.

    [16] Engle, R.F. and V.K. Ng (1993). Measuring and testing the impact of news onvolatility. Journal of Finance 48, 1749-1778.

    [17] Fernandez, C. and Steel, M.F.J. (1998). On Bayesian modeling of fat tails andskewness, Journal of the American Statistical Association 93, 359-371.

    20

  • [18] Fernandez, C., Osiewalski, J. and Steel, M.F.J. (1995), Modeling and inferencewith υ−distributions, Journal of the American Statistical Association 90, 1331-1340.

    [19] Hansen, B. E. (1994). Autoregressive conditional density estimation. InternationalEconomic Review, 35 (3), 705-730.

    [20] Hurvich, C. and Tsai, C. (1989). Regression and time series model selection insmall samples. Biometrika 76, 297-307.

    [21] Jones, M.C. and Faddy, M.J. (2003). A skew extension of the t distribution, withapplications. Journal of the Royal Statistical Society, Series B, 65, 159-174.

    [22] Komunjer, I. (2007). Asymmetric power distribution: Theory and applications torisk measurement. Journal of Applied Econometrics 22, 891-921.

    [23] Mittnik, S. and Paolella, M. S. (2003). Prediction of financial downside-risk withheavy-tailed conditional distributions. Handbook of Heavy Tailed Distributions inFinance, ed. S. T. Rachev.

    [24] Rombouts, J.V.K. and M. Bouaddi (2009). Mixed exponential power asymmetricconditional heteroskedasticity. Studies in Nonlinear Dynamics and Econometrics13, issue 3, article 3.

    [25] Sahu, S.K., D.K. Dey and M.D. Branco (2003). A new class of multivariate skewdistributions with applications to Bayesian regression models. The Canadian Jour-nal of Statistics 31, 129-150.

    [26] Stentoft, L. (2008). American option pricing using GARCH models and the NormalInverse Gaussian distribution. Journal of Financial Econometrics 6, 540-582.

    [27] Taylor, J.W. (2008). Using exponentially weighted quantile regression to estimatevalue at risk and expected shortfall. Journal of Financial Econometrics 6, 382-406.

    [28] Theodossiou, P. (1998). Financial data and the skewed generalized t Distribution.Management Science, 44 (12-1), 1650-1661.

    [29] Zhu, D. and J.W. Galbraith (2010). A generalized asymmetric Student-t distri-bution with application to financial econometrics. Journal of Econometrics 157,297-305.

    [30] Zhu, D. and V. Zinde-Walsh (2009). Properties and estimation of asymmetricexponential power distribution. Journal of Econometrics 148, 86-99.

    21

  • Figure 1AST densities for various parameter values

    µ = 0, σ = 1

    α = 0.3, υ1 = 2 α = 0.3, υ2 = 2

    α = 0.5, υ1 = 2 α = 0.5, υ2 = 2

    α = 0.7, υ1 = 2 α = 0.7, υ2 = 2

    22

  • Figure 2Expected shortfall as a function of quantile, AEPD and AST

    µ = 0, σ = 1

    AEPD, α = 0.45, p2 = 1.6 AST, α = 0.45, ν2 = 11

    0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.40.5

    1

    1.5

    2

    2.5

    3

    3.5

    4

    4.5

    p1=1.2

    p1=1.4

    p1=1.6

    0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.40.5

    1

    1.5

    2

    2.5

    3

    3.5

    4

    4.5

    5

    ν1=5

    ν1=8

    ν1=11

    AEPD, α = 0.55, p2 = p1 AST, α = 0.55, ν2 = ν1

    0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.41

    1.5

    2

    2.5

    3

    3.5

    4

    4.5

    5

    5.5

    p1=1.2

    p1=1.4

    p1=1.6

    0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.41

    1.5

    2

    2.5

    3

    3.5

    4

    4.5

    5

    5.5

    6

    ν1=5

    ν1=8

    ν1=11

    AEPD, p1 = 1.3, p2 = 4 AST, ν1 = 6, ν2 = 12

    0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.40.5

    1

    1.5

    2

    2.5

    3

    3.5

    4

    4.5

    5

    α=0.45α=0.50α=0.55

    0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.40.5

    1

    1.5

    2

    2.5

    3

    3.5

    4

    4.5

    5

    5.5

    α=0.45α=0.50α=0.55

    23

  • Figure 3One-day-ahead forecast expected shortfalls from the general AST:

    E(rt|rt ≤-1%), initial sample = 1000

    S&P 500 Adobe

    1965 1970 1975 1980 1985 1990 1995 2000 2005 2010−5.5

    −5

    −4.5

    −4

    −3.5

    −3

    −2.5

    −2

    −1.5

    −1

    1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 2010−8

    −7

    −6

    −5

    −4

    −3

    −2

    −1

    Bank of America JP Morgan

    1985 1990 1995 2000 2005 2010−10

    −9

    −8

    −7

    −6

    −5

    −4

    −3

    −2

    −1

    1970 1975 1980 1985 1990 1995 2000 2005 2010−8

    −7

    −6

    −5

    −4

    −3

    −2

    −1

    Pfizer Starbucks

    1940 1950 1960 1970 1980 1990 2000 2010−5

    −4.5

    −4

    −3.5

    −3

    −2.5

    −2

    −1.5

    1996 1998 2000 2002 2004 2006 2008 2010−5

    −4.5

    −4

    −3.5

    −3

    −2.5

    −2

    −1.5

    Vertical axes: percentage return; horizontal axes: date.

    24