volatility modeling and forecastingfaculty.baruch.cuny.edu/smanzan/finmetrics/files/... · i...

Volatility modeling and forecasting

Sebastiano Manzan

ECO 4051 | Spring 2018

1 / 50

Introduction

I The Figure below shows the returns and absolute returns starting inJanaury 1980 at the daily frequency for:

I S&P 500 Index returns (left)I The JPY/USD exchange rate (right)

−20

−10

0

10

1980 1990 2000 2010 2020−6

−4

−2

0

2

1980 1990 2000 2010 2020

0

5

10

15

20

1980 1990 2000 2010 2020

0

2

4

1980 1990 2000 2010 2020

2 / 50

I The ACF of the returns and absolute returns up to lag 50(left: S&P; right: JPY/USD)

−0.04

−0.02

0.00

0.02

0 10 20 30 40 50

lag

ACF

−0.02

0.00

0.02

0 10 20 30 40 50

lag

ACF

0.0

0.1

0.2

0.3

0 10 20 30 40 50

lag

ACF

0.00

0.05

0.10

0 10 20 30 40 50

lag

ACF

3 / 50

VIX

I The CBOE Volatility Index (VIX) is a measure of market uncertainty that isconstructed from implied volatilities of S&P 500 Index options

I It is widely used as a proxy for market volatility

VIX <- getSymbols("^VIX", from="1990-01-01", auto.assign=FALSE) %>% Clp0 <- autoplot(VIX, ts.colour="tomato1") + theme_bw() + labs(x=NULL, y=NULL, title = "VIX")

20

40

60

80

1990 2000 2010

VIX

I How can we construct a measure of volatility (similar to VIX) for any assetwe are interested in?

4 / 50

I The AR(1) assumes that Rt+1 = β0 + β1Rt + εt+1, where:I Et(Rt+1) = β0 + β1RtI V art(Rt+1) = σ2

I The Et(·) and V art(·) represent the expected return of Rt+1 conditional onthe information available at time t

I The evidence in the previous graph indicates that this might not be arealistic assumption because returns are heteroskedastic (i.e., the varianceand the standard deviation change over time)

I In other words, we need V ar(εt) to be σ2t instead of σ2

I Why should we care about modeling the time variation of the standarddeviation σ2

t ? What is the usefulness of forecasting volatility instead of or inaddition to forecasting returns? Numerous applications in finance:

I pricing derivativesI measuring risk (Value-at-Risk)I portfolio allocation

5 / 50

A volatility model

I We assume that returns follow the model

Rt+1 = µt+1 + ηt+1

that decomposes the return in two components:I Expected return µt+1: the predictable componenent of the return; it

can be assumed equal to 0, equal to a constant µ, or following anAR(1) process µt+1 = φ0 + φ1 ∗Rt

I Shock ηt+1: the unpredictable component of the return; we canassume that it is equal to ηt+1 = σt+1εt+1 where:

I Volatility: σt+1 is the standard deviation of the error at time twhich we assume varies over time; for example, σt+1 = ω + αR2

tI Standardized shock: εt+1 represents an unpredictable error term

or a shock with mean zero and variance 1 (additionally, we canassume that it is normally distributed)

6 / 50

I Both µt+1 and σt+1 can be assumed to be functions of past values

I At the market closing in day t I can then calculate the forecast of the returnthe following day, µt+1, and the expected volatility the following day, σt+1

I How to model the volatility process σt? We consider three approaches:I Moving Average (MA)I Exponential Moving Average (EMA)I Generalized Auto-Regressive Conditional Heteroskedasticity

(GARCH) model

I At high frequencies (daily or intra-daily) there is typically very littlepredictability in the mean but significant predictability in the conditionalvariance; it is thus reasonable to assume that µt+1 = 0 (of course, we canstatistically test this hypothesis . . . )

7 / 50

Moving Average (MA)

I The MA model consists of averaging the square returns on a window of thelatest M days (where M denotes the window size)

I The estimate of the variance in day t is σ2t and it is given by

σ2t+1 =

1M

(R2t +R2

t−1 + · · ·+R2t−M+1

)=

1M

M∑j=1

R2t−j+1

I Dependence on M :I M = 1: σ2

t+1 = R2t

I M = t: σ2t+1 = σ2

I Using the squared returns in the summation relies on the assumption thatµt+1 = 0 so that Rt+1 = ηt+1

I If µt+1 6= 0 then we apply MA on ηt+1 = Rt+1 − µt+1 that is

σ2t+1 =

(∑M

j=1 η2t−j+1

)/M

8 / 50

I The MA estimate of volatility can be interpreted as a weighted average ofthe square returns

σ2t+1 =

t∑j=1

wjR2j

where the weight wj of day j takes the following form:I wj = 1/M if (t−M + 1) ≤ j ≤ t

(ex: for M = 25 wj = 1/25 = 0.04 or 4%)I wj = 0 if j < (t−M + 1)

I The MA approach can be implemented in R using the rollmean() functionin package zoo which takes as arguments:

I the window size MI if the estimate should be aligned to the left, center, or right of the

window

9 / 50

GSPC <- getSymbols("^GSPC", from="1980-01-01", auto.assign=FALSE)sp500daily <- 100 * ClCl(GSPC) %>% na.omitnames(sp500daily) <- "RET"sigma25 <- zoo::rollmean(sp500daily^2, 25, align="right")names(sigma25) <- "MA25"ggplot(merge(sigma25, sp500daily)) + geom_line(aes(time(sp500daily), abs(RET)),color="gray80") +

theme_bw() + geom_line(aes(index(sp500daily), MA25^0.5), color="orangered3") +labs(x=NULL, y=NULL)

0

5

10

15

20

1980 1990 2000 2010 2020

10 / 50

I The effect of increasing the window size M is to provide a smoother (slowlychanging) estimate of volatility σt+1 since each daily return receives asmaller weight

I Below a comparison of M = 25 and 100

sigma100 <- zoo::rollmean(sp500daily^2, 100, align="right")names(sigma100) <- "MA100"ggplot(merge(sigma25, sigma100)) +

geom_line(aes(time(sigma25), MA25^0.5),color="orangered3", size=0.3, linetype="dashed") +theme_bw() + geom_line(aes(index(sigma25), MA100^0.5), color="springgreen4", size=0.6) +labs(x=NULL, y=NULL) + ylim(c(0, 6))

0

2

4

6

1980 1990 2000 2010 2020

11 / 50

Pros/Cons of the MA approach

I The MA approach is very simple to calculate and implement which isconvenient when dealing with a large number of assets

I A drawback of the approach is that it is sensitive to large returns: when alarge return enters/exits the estimation window M the estimate σ2

t+1 has aupward/downward jump

I How to choose M? Practitioners use M = 25 (one trading month) as an adhoc value rather than being chosen optimally

12 / 50

Exponential Moving Average (EMA)

I An alternative approach is the EMA approach which solves the problem ofthe sensitivity to large returns

I EMA is calculated as follows:

σ2t+1 = λ ∗R2

t + λ(1− λ) ∗R2t−1 + λ(1− λ)2 ∗R2

t−2 + ...

= λ

∞∑j=0

(1− λ)jR2t−j

where λ is a smoothing parameter (equivalent of M for MA)

13 / 50

I Based on the formula for σ2t+1, its lagged value σ2

t is given by

σ2t = λ ∗R2

t−1 + λ(1− λ) ∗R2t−2 + ...

I This expression for σ2t−1 can be used to rewrite σ2

t as:

σ2t+1 = λ ∗R2

t + λ(1− λ) ∗R2t−1 + λ(1− λ)2 ∗R2

t−2 + · · ·= λ ∗R2

t + (1− λ) ∗(λR2

t−1 + λ(1− λ)R2t−2 + · · ·

)= λ ∗R2

t + (1− λ)σ2t

which shows that the current volatility estimate is given by a weightedaverage of the previous day estimate (σ2

t ) and the current day square return(R2t )

I The weight function wj for EMA is wj = λ ∗ (1− λ)j ; compared to the MAweight function:

I It weighs all observations (not only the M most recent)I The weighs are (smoothly) declining the farther in the past

14 / 50

Comparison of the weight functions

0.00

0.02

0.04

0.06

−200−150−100−500

Previous days

Weights on past observations

I Can you match the color to the method?I MA(25), MA(100), EMA(0.06), and EMA(0.04)I darkolivegreen4 royalblue1, tomato4, and royalblue4

15 / 50

I Practitioners have found that a value of λ of 0.06 for EMA works well for alarge set of assets

I The package TTR (part of quantmod) provides functions to calculate movingaverages, both simple and of the exponential type as in code below:

library(TTR)# SMA() function for Simple Moving Average; n = number of daysma25 <- SMA(sp500daily^2, n=25)names(ma25) <- "MA25"# EMA() function for Exponential Moving Average; ratio = lambdaema06 <- EMA(sp500daily^2, ratio=0.06)names(ema06) <- "EMA06"autoplot(merge(ma25, ema06)^0.5, ncol=2) + theme_bw()

EMA06 MA25

1980 1990 2000 2010 2020 1980 1990 2000 2010 20200

1

2

3

4

5

1

2

3

4

5

16 / 50

I Comparison of the MA and EMA volatility estimators between 2008 and2010 (gray lines are the absolute returns)

temp <- merge(sp500daily, ma25, ema06) %>% window(., start="2008-01-01", end="2009-12-31")ggplot(temp) + geom_point(aes(time(temp), abs(RET)), color="gray45", size=0.4) +

geom_line(aes(time(temp), MA25^0.5), color="indianred1", size = 0.8) +geom_line(aes(time(temp), EMA06^0.5), color="seagreen4", size = 0.8) +theme_bw() + labs(x=NULL, y=NULL)

0

3

6

9

12

2008−01 2008−07 2009−01 2009−07 2010−01

17 / 50

Forecasting with MA and EMA

I In day t we estimate volatility based on the current and past returns usingMA or EMA

I We then assume that variance next period Et(σ2t+1) = σ2

t and, moregenerally, Et(σ2

t+k) = σ2t

I This assumption implies that the variance follows a random walk model(without drift) and that volatility is non-stationary

I How do you feel assuming that volatility is not mean-reverting??

18 / 50

Auto-Regressive Conditional Heteroskedasticity (ARCH) models

I The MA and EMA are simple tools that work in many situations ofpractical relevance, in particular forecasting volatility at short horizons

I However, they require to introduce some assumptions that might bequestionable:

I ad hoc choice of the parameters (M and λ) rather than beingestimated

I volatility is non-stationary (random walk)

I An alternative approach is represented by the ARCH model

19 / 50

I Assume for now that µt+1 = 0 so that Rt+1 = ηt+1

I The ARCH model assumes that the conditional variance σ2t+1 is a function

of the current squared return, that is,

σ2t+1 = ω + α ∗R2

t

where ω and α are parameters to be estimated.I The model above is called ARCH(1) since only one lag is included, but it

can be generalized to include p lagged returns

I A more general specification is the Generalized ARCH (GARCH) modelwhich is characterized by the following Equation for the conditional variance:

σ2t+1 = ω + α ∗R2

t + β ∗ σ2t

where, the previous variance forecast σ2t is included in addition to the

current return R2t

I We typically refer to the previous model as ARCH(1) and GARCH(1,1)since we included one lag of R2

t and σ2t ; this can be generalized to more lags

20 / 50

Conditional vs Unconditional variance

I We make a distinction between:I σ2: the unconditional variance (i.e. the long-run variance)I σ2

t+1: the conditional variance (i.e. based on information available attime t)

I They measure the volatility that you expect by holding the asset for along-period (e.g., 20 years) vs holding the asset for a short period (e.g., oneday)

I The unconditional mean of the conditional variance, E(σ2t+1) = σ2, for the

GARCH(1,1) is given by:

σ2 =ω

1− (α+ β)

I The unconditional variance of the returns is finite if α+ β < 1.

I On the other hand, if α+ β = 1 the long-run variance is not finite andvariance is non-stationary (as for EMA)

21 / 50

I What does it mean that variance (or volatility) is non-stationary? Do weexpect the variance to be mean reverting?

I If volatility is mean-reverting it means that:I σ2

t+1 oscillates (more/less persistently) around its unconditional meanσ2

I Periods of high volatility (higher than σ2) will be followed by periodsof low volatility (lower than σ2)

I This is consistent with the discussion in the first slide in which we observereturns alternating between periods of high and low volatility (volatilityclustering)

22 / 50

I On the other hand, non-stationary volatility means that we expect variance(or volatility) to stay at the current level forever

I This implication seems at odds with the observation that volatilityalternates between periods of high and low volatility

I Although the forecasts assuming α+ β = 1 might be as accurate as thosefrom models assuming α+ β < 1 in the short-run, they might differsignificantly in the long-run

I When do we need accurate long-run forecast of volatility? Pricinglong-dated options, investment allocation, etc.

23 / 50

GARCH vs EMA

I If we set ω = 0, α = λ and β = 1− λ in the GARCH conditional varianceequation we obtain

σ2t+1 = λR2

t + (1− λ)σ2t

which is the conditional variance for EMA

I So EMA is a restricted version of a GARCH model in which α and β arerestricted to be equal to λ and 1− λ so that . . .

I . . . α+ β = λ+ 1− λ = 1 means that EMA volatility is non-stationary

I Empirically, this hypothesis can be tested by assuming the null hypothesisα+ β = 1 versus the one sided hypothesis that α+ β < 1.

24 / 50

Forecasting with GARCH models

I The expected variance k days from the current period t relative to thelong-run variance σ2 is given by

Et(σ2t+k − σ

2) = (α+ β)k ∗ (σ2t − σ2)

and shows that:I If α+ β is less than 1 the forecast of the conditional variance will

converge to the unconditional variance as k increasesI For α+ β = 1 we expect E(σ2

t+k − σ2) = (σ2

t − σ2) whatever k is(non-stationarity)

I EMA assumes that α+ β = 1 and thus volatility is never expected to revertback to its long run mean σ2

25 / 50

GJR-GARCH

I Another GARCH specification that has become popular was proposed byGlosten, Jagganathan and Runkle (hence, GJR-GARCH) with theconditional variance given by

σ2t+1 = ω + α1 ∗R2

t + γ1R2t ∗ I(Rt ≤ 0) + β ∗ σ2

t

where the effect of the unexpected return depends on its sign:I if positive its effect on the conditional variance is α1I if negative the effect is α1 + γ1

I Testing the hypothesis that γ1 = 0 thus provides a test of the asymmetriceffect of current and past squared returns on volatility.

26 / 50

Estimation of GARCH models

I Assume that returns are modeled as Rt+1 = µt+1(θµ) + σt+1(θσ)εt+1I Example:

I θµ = (β0, β1) (parameters of the conditional mean)I µt(θµ) = β0 + β1RtI θσ = (ω, α, β) (parameters of the conditional variance)I σ2

t (θσ) = ω + α ∗R2t + βσ2

t

I OLS cannot be used because minimizing the Residuals Sum of Squares

T∑t=1

(Rt+1 − µt(θµ))2

does not involve the θσ parameters, but only θµ

27 / 50

I We need an alternative approach to estimate the model that minimizes ameasure of “error” that is also a function of the parameters of theconditional variance (θσ)

I This alternative approach is called Maximum Likelihood (ML) and consistsof finding the parameter values that maximize the likelihood (or probability)of the sample observations

I ML requires that we make an assumption about the distribution of the εtand we typically assume that they follow a normal distribution (however, wecan also assume different distributions such as a t distribution)

28 / 50

Intuition for ML estimation

I Assume that Rt+1 follows a N(µ, σ2) and we have a sample of Tobservations (t = 1, · · · , T )

I Based on a sample of T returns, how do we estimate µ and σ2?

I The Probability Density Function (PDF) for the normal distribution is givenby

f(Rt+1|µ, σ2) =1

√2πσ2

exp(−

12

(Rt+1 − µ

σ

)2)

mu = 0sigma = 1ggfortify::ggdistribution(dnorm, seq(-6, 6, 0.1), mean = mu, sd = sigma) + theme_bw()

0%

10%

20%

30%

40%

−6 −3 0 3 6

29 / 50

I ML estimation of µ and σ2 consist of the following steps:

1. The likelihood function for observation t is given by

L(Rt+1|µ, σ2) =1

√2πσ2

exp(−

12

(Rt+1 − µ

σ

)2)

2. Take the logarithm of the likelihood:

logL(Rt+1|µ, σ2) = l(Rt+1|µ, σ2) = −12

log(2πσ2)−12

(Rt+1 − µ

σ

)2

3. Sum the log-likelihood of each observation for t = 1, · · · , T and obtain

l(R|µ, σ2) =T∑t=1

−12

log(2πσ2)−12

(Rt+1 − µ

σ

)2=

−12

[T log(2πσ2) +

T∑t=1

(Rt+1 − µ

σ

)2]

4. Find the values of µ and σ2 that maximize l(R|µ, σ2) numerically

30 / 50

I Maximizing l(R|µ, σ2) means finding the values of the parameters that aremore likely to have produced the sample

I In most cases the likelihood does not provide an analytical solution of theestimator for µ and σ2 and this is the reason for using numerical methodsinstead

I Assume that σ = 1: maximizing the likelihood function is equivalent tominimize RSS since

maxµ l(R|µ, σ2) = − 12

[T log(2π) +

∑T

t=1 (Rt+1 − µ)2]minµ RSS(µ) =

∑T

t=1 (Rt+1 − µ)2

I The next slide illustrates how ML works when assuming that µ = 0 so thatwe need to maximize only over one parameter, σ

31 / 50

Illustration of ML

I In the next slide we plot the log-likelihood as a function of σ; we follow thesesteps:

1. Generate a time series from N(0, σ2) of length T2. Create a grid of values for σ2

3. Calculate the log-likelihood function l(R|σ2) for each value of σ2 in the grid4. Find the value of σ2 that maximizes the log-likelihood function

I In the simulation we assume the true value of σ = 0.94 and T = 100

32 / 50

set.seed(1234)mysd = 0.94 ## true value and we simulate from thisT = 100# 1) simulate a random sample from the normal distributionR = rnorm(T, mean=0, sd=mysd)# 2) create a grid of value for calculating likelihoodsigma = seq(0.4,5,by=0.001)# 3) calculate the loglik function at each point in the gridloglik <- matrix(NA, length(sigma), 1)for (i in 1:length(sigma)) loglik[i] = -0.5 * (T * log(2*pi*sigma[i]^2)

+ sum(( R / sigma[i])^2))# plot of the loglikelihood function and the true value of the standard deviationqplot(sigma, loglik, geom="line") +

geom_vline(xintercept=mysd, color="steelblue3", linetype="dashed") +theme_classic()

# value that maximizes the loglik functionsigma[which.max(loglik)]

[1] 0.951

−280

−240

−200

−160

1 2 3 4 5

sigma

logl

ik

33 / 50

Numerical optimization of the likelihood function

I The grid search is one way to find the maximum of a functionI A more automatic way to find the maximum (or minimum) of a function is

to use numerical optimization algorithmsI There are several ways and functions to do this in R; in the example below I

use the function optim() that requires three inputs:I the function to maximize (i.e., log-likelihood function)I starting valuesI data

set.seed(1234)mysd = 0.94 ## true value and we simulate from thisT = 100 ## sample sizeR = rnorm(T, mean=0, sd=mysd) # simulate a random sample

# define the likelihood functionmylik <- function(data, sigma){

# NB: I drop the minus sign because the function "optim" is designed to minimize a functionloglik = 0.5 * (T * log(2*pi*sigma^2) + sum(( data / sigma)^2))return(loglik)

}# Use "optim" function to find the minimum of mylik with starting value sigma0sigma0 <- 2myopt <- optim(sigma0, mylik, data=R)myopt$par

[1] 0.95098

34 / 50

ML estimation of GARCH models

I The discussion above simplified the application of ML to the case of aconstant mean and variance, but the principle applies also when the meanand variance are functions of parameters (µt+1(θµ) and µt+1(θσ)) instead ofbeing the parameters

I The family of GARCH models is large but they can all be expressed asRt+1 = µt+1 + ηt+1 with ηt+1 = σt+1εt+1

I The only difference between specifications is the conditional variance σ2t+1;

for example:I GARCH(1,1): σ2

t+1 = ω + αη2t + βσ2

tI GJR-GARCH(1,1): σ2

t+1 = ω + α1 ∗ η2t + γ1η2

t ∗ I(ηt ≤ 0) + β ∗ σ2t

I . . .

35 / 50

I The log-likelihood of any GARCH model can be expressed in terms of theparameters θ̂µ and θ̂σ and it is given by

l(R|θµ, θσ)−12

T∑t=1

[ln(2π) + lnσ2

t (θσ) +(Rt+1 − µt(θµ)

σt(θσ)

)2]

since the first term ln(2π) does not depend on any parameter it can bedropped from the function; the estimates θ̂µ and θ̂σ are then the result ofthe maximization of

l(R|θµ, θσ) = −12

T∑t=1

[lnσ2

t (θσ) +(Rt+1 − µt(θµ)

σt(θσ)

)2]

36 / 50

GARCH in R

I The package fGarch provides functionalities to estimate a wide array ofGARCH models

I The function garchFit estimates by ML a model specified by the user:I arma(p,q) for the conditional mean (typically arma(0,0) or

arma(1,0))I garch(r,s) for the conditional variance (typically garch(1,1))

I The code below estimates a GARCH(1,1) model:

require(fGarch)fit <- garchFit(~garch(1,1), data=sp500daily, trace=FALSE)round(fit@fit$matcoef, 3)

Estimate Std. Error t value Pr(>|t|)mu 0.060 0.008 7.431 0omega 0.014 0.002 7.184 0alpha1 0.084 0.006 14.389 0beta1 0.905 0.007 135.880 0

I α+ β = 0.988 is very close to one and indicates that the EMA assumptionsis likely to hold

I However, while practitioners use λ equal to 0.06 the estimation on oursample suggests a higher value of 0.084

37 / 50

I We can add an AR(1) term to the conditional mean, while the conditionalvariance remains of the GARCH(1,1) type:

fit <- garchFit(~ arma(1,0) + garch(1,1), data=sp500daily, trace=FALSE)round(fit@fit$matcoef, 3)

Estimate Std. Error t value Pr(>|t|)mu 0.060 0.008 7.395 0.000ar1 0.003 0.011 0.244 0.807omega 0.014 0.002 7.184 0.000alpha1 0.084 0.006 14.389 0.000beta1 0.905 0.007 135.842 0.000

I the AR(1) term is not significantI The estimates of α and β are the same as in the case of just the intercept

38 / 50

Plotting σt+1

I Once we have the coefficient estimates we can calculate the estimate of theconditional variance as σ2

t+1 = ω̂ + α̂R2t + β̂σ2

tI The function calculates σt as output of the estimation step

sigma <- [email protected] # class is numericqplot(time(sp500daily), sigma, geom="line", xlab=NULL, ylab="") +

theme_bw() + labs(title="Conditional standard deviation")

2

4

6

1980 1990 2000 2010 2020

Conditional standard deviation

39 / 50

Comparison of MA, EMA, and GARCH volatilitiesI We considered three methods for modeling and forecasting volatility: MA,

EMA, and GARCHI How different are the volatility forecasts obtained from these three methods?I Correlations: cor(MA,GARCH) = 0.967 and cor(EMA, GARCH) = 0.981I Scatter plots:

p1 <- ggplot() +geom_point(aes(ma25^0.5, sigma), color="gray70", size=0.3) +geom_abline(intercept=0, slope=1, color="steelblue3", linetype="dashed", size=0.8) +theme_bw() + labs(x="MA(25)",y="AR(1)-GARCH(1,1)")

p2 <- ggplot() +geom_point(aes(ema06^0.5, sigma), color="gray70", size=0.3) +geom_abline(intercept=0, slope=1, color="steelblue3", linetype="dashed", size=0.8) +theme_bw() + labs(x="EMA(0.06)",y="AR(1)-GARCH(1,1)")

gridExtra::grid.arrange(p1,p2, ncol=2)

2

4

6

0 1 2 3 4 5

MA(25)

AR

(1)−

GA

RC

H(1

,1)

2

4

6

1 2 3 4 5

EMA(0.06)

AR

(1)−

GA

RC

H(1

,1)

40 / 50

Standardized and Unstandardized residuals

I The residuals of the GARCH model are ηt+1 = σt+1εt+1I According to our assumptions they should be:

I independent over time (i.e., no auto-correlation)I normally distributed (i.e., we made this assumption)

I Two types of residuals:1. unstandardized residuals: ηt+12. standardized residuals: εt+1 = ηt+1/σt+1

res <- fit@residuals # class is numericp1 <- qplot(time(sp500daily), res, geom="line", main="UNstandardized Residuals") +

labs(x=NULL, y=NULL) + theme_bw() + theme(plot.title = element_text(size = 8))p2 <- qplot(time(sp500daily), res/sigma, geom="line", main="Standardized Residuals") +

labs(x=NULL, y=NULL) + theme_bw() + theme(plot.title = element_text(size = 8))gridExtra::grid.arrange(p1, p2, ncol=2)

−20

−10

0

10

1980 1990 2000 2010 2020

UNstandardized Residuals

−10

−5

0

5

1980 1990 2000 2010 2020

Standardized Residuals

41 / 50

I Are the standardized residuals approximately normal?I Figure below shows the histogram of εt+1 and the standard normal

distributionI The std. residuals seem to be more peaked at the center and with longer

tails (not very visible at this scale)I But: skewness = -0.47189 and excess kurtosis = 3.32609I Conclusion:

1. normal distribution for εt+1 is inaccurate2. the GARCH(1,1) conditional variance equation is inaccurate

ggplot() +geom_histogram(aes(x=res/sigma,y = ..density..), binwidth=0.2, fill="orange", color="tomato4") +stat_function(aes(res/sigma), fun=dnorm, color="steelblue4", args=list(mean=0, sd=1), size=0.7) +theme_classic() + labs(y=NULL, title="HISTOGRAM")

0.0

0.1

0.2

0.3

0.4

0.5

−10 −5 0 5

res/sigma

HISTOGRAM

42 / 50

I We started off with the ACF of the squared returns R2t+1 showing significant

and persistent autocorrelationI We now filter the residuals with the conditional standard deviation,

ηt+1/σt+1 and we find all this autocorrelation has disappearedI This indicates that filtering the returns with the GARCH(1,1) conditional

standard deviation takes care of the dependence in the squared returns

p1 <- ggacf((res/sigma), lag=20)p2 <- ggacf((res/sigma)^2, lag=20)gridExtra::grid.arrange(p1, p2, ncol=2)

−0.02

−0.01

0.00

0.01

0.02

0 5 10 15 20

lag

ACF

−0.02

−0.01

0.00

0.01

0.02

0 5 10 15 20

lag

ACF

43 / 50

Package rugarch

I Another package that estimates many GARCH-type models is rugarchI Estimation of a GARCH model proceeds in two steps:

1. ugarchspec(): specify the model in terms of the conditional mean andvariance

2. ugarchfit(): estimate the model specified in the previous step

require(rugarch)spec = ugarchspec(variance.model=list(model="sGARCH",garchOrder=c(1,1)),

mean.model=list(armaOrder=c(1,0)))fitgarch = ugarchfit(spec = spec, data = sp500daily)round(coef(fitgarch), 3)

mu ar1 omega alpha1 beta10.060 0.003 0.014 0.084 0.905

44 / 50

GJR-GARCH

I The GJR-GARCH model can be easily estimated by simply setting themodel option to gjrGARCH in the model specification function ugarchspec()

spec = ugarchspec(variance.model=list(model="gjrGARCH",garchOrder=c(1,1)),mean.model=list(armaOrder=c(1,0)))

fitgjr = ugarchfit(spec = spec, data = sp500daily)knitr::kable(data.frame(Estimate=coef(fitgjr), SE=fitgjr@fit$tval))

Estimate SE

mu 0.03723 4.52342ar1 0.00753 0.68824omega 0.01875 8.48399alpha1 0.01946 4.46062beta1 0.90219 132.58377gamma1 0.12181 12.04780

I Is the asymmetry coefficient significant?I What do we learn about the S&P 500 volatility dynamics?

45 / 50

I Comparison of GARCH(1,1) and GJR-GARCH(1,1) volatility estimates

p1 <- autoplot(merge(GARCH = sigma(fitgarch), GJR = sigma(fitgjr)), scales="fixed") + theme_bw() +theme(strip.text = element_text(size=5), text = element_text(size=5))

p2 <- ggplot(data=merge(GARCH = sigma(fitgarch), GJR = sigma(fitgjr))) +geom_point(aes(x = GARCH, y=GJR), color="gray70", size=0.3) +geom_abline(intercept=0, slope=1, color="steelblue3", linetype="dashed", size=0.8) + theme_bw() +theme(text = element_text(size=5))

gridExtra::grid.arrange(p1, p2, ncol=2)

GJR

GARCH

1980 1990 2000 2010 2020

2

4

6

8

2

4

6

8

2

4

6

8

2 4 6

GARCH

GJR

46 / 50

GJR on FX returns?

I Is the asymmetric effect of returns on volatility also relevant for exchangerate volatility?

I Estimation of GJR-GARCH(1,1) on JPY/USD daily returns:

fitgjr.fx = ugarchfit(spec = spec, data = DEXJPUS)data.frame(Estimate=coef(fitgjr.fx), SE=fitgjr.fx@fit$tval)

Estimate SEmu -0.0042226 -0.65700ar1 0.0040482 0.36879omega 0.0078450 2.55350alpha1 0.0393570 4.08780beta1 0.9390349 56.07169gamma1 0.0077128 1.39328

I Is the asymmetry coefficient significant?I What do we learn about the FX volatility dynamics?

47 / 50

How do we compare volatility models?I The Akaike Information Criterion (AIC) discussed for AR models can also

be used to select GARCH modelsI The model with the smallest criterion value is selected

fit.ic <- cbind(infocriteria(fitgarch), infocriteria(fitgjr))colnames(fit.ic) <- c("GARCH","GJR")

GARCH GJRAkaike 2.6571 2.6339Bayes 2.6608 2.6383Shibata 2.6571 2.6339Hannan-Quinn 2.6583 2.6354

I Which model shall we select: GARCH or GJR-GARCH?

I For FX returns:fit.ic.fx <- cbind(infocriteria(fitgarch.fx), infocriteria(fitgjr.fx))

GARCH GJRAkaike 1.9383 1.9382Bayes 1.9421 1.9429Shibata 1.9383 1.9382Hannan-Quinn 1.9396 1.9398

I Which model shall we select: GARCH or GJR-GARCH?48 / 50

Forecasts from GARCH models

I The estimated σt at the end of the sample (2018-04-11) is 1.423 for GARCHand 1.478 for GJR-GARCH

I We then forecast volatility for the following 250 days as shown below:

nforecast = 250garchforecast <- ugarchforecast(fitgarch, n.ahead = nforecast)gjrforecast <- ugarchforecast(fitgjr, n.ahead = nforecast)temp <- data.frame(Date = end(sp500daily) + 1:nforecast,

GJR = as.numeric(sigma(gjrforecast)),GARCH = as.numeric(sigma(garchforecast)))

ggplot(temp) + geom_line(aes(Date, GARCH), color="tomato3") +geom_line(aes(Date, GJR), color="steelblue2") +geom_hline(yintercept = sd(sp500daily), color="seagreen4", linetype="dashed") +labs(x=NULL, y=NULL, title="Volatility forecasts",

subtitle=paste("Forecast date: ", end(sp500daily))) + theme_bw()

1.1

1.2

1.3

1.4

Apr Jul Oct

Forecast date: 2018−04−11

Volatility forecasts

49 / 50

News impact curve

I The GJR-GARCH implies an asymmetric response to shocks (εt−1) relativeto the GARCH model which is evident when plotting the news impactcurve (the response of conditional variance σ2

t to a shock εt−1 1)

newsgarch <- newsimpact(fitgarch)newsgjr <- newsimpact(fitgjr)p1 <- qplot(newsgarch$zx, newsgarch$zy, geom="line", main="GARCH") + labs(x="shock",y="volatility") +

geom_vline(xintercept = 0, linetype="dashed") + theme_bw() + theme(text = element_text(size=5))p2 <- qplot(newsgjr$zx, newsgjr$zy, geom="line", main="GJR") + labs(x="shock",y="volatility") +

geom_vline(xintercept = 0, linetype="dashed") + theme_bw() + theme(text = element_text(size=5))gridExtra::grid.arrange(p1, p2, ncol=2)

1.138

1.140

1.142

1.144

−0.2 0.0 0.2

shock

vola

tility

GARCH

0.992

0.996

1.000

−0.2 0.0 0.2

shock

vola

tility

GJR

50 / 50

volatility modeling and forecastingfaculty.baruch.cuny.edu/smanzan/finmetrics/files/... · i...

Documents